Advertisement
Guest User

BlitzBASIC 2 PlayBASIC Conversion Tool WIP - Episode 04 - Parsing Variable Detection

a guest
Mar 3rd, 2022
90
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 16.85 KB | None | 0 0
  1. BlitzBASIC 2 PlayBASIC Conversion Tool WIP - Episode 04 - Parsing Variable Detection
  2. https://www.youtube.com/watch?v=Ie1LPSVwPS0&t=9s&ab_channel=PlayBasic
  3. https://www.underwaredesign.com/forums/index.php?topic=4625.msg30567#msg30567
  4. ---------------------------------------------------------------------------------------------
  5.  
  6. hello welcome back here we are looking
  7.  
  8. at another
  9.  
  10. one of these
  11.  
  12. blitz to play basic translation videos
  13.  
  14. so today i was working on
  15.  
  16. the identification of variables uh
  17.  
  18. the implicit ones in particular
  19.  
  20. so it'll pick up dim statements it'll
  21.  
  22. pick up global and local statements
  23.  
  24. but also has to pick up
  25.  
  26. variables throughout the code
  27.  
  28. just pull up this
  29.  
  30. pretty rudimentary example so we've got
  31.  
  32. our global statements we can phrase
  33.  
  34. those
  35.  
  36. uh i don't think we've got list support
  37.  
  38. at the moment but i think we're picking
  39.  
  40. up the keyword
  41.  
  42. and just storing this for later
  43.  
  44. locals the same
  45.  
  46. we just want to raise as well but i'm
  47.  
  48. not working on those at the moment
  49.  
  50. just
  51.  
  52. been working on variables
  53.  
  54. so i've got global scope which is this
  55.  
  56. initial part of the program they'll be
  57.  
  58. all tagged as global scope
  59.  
  60. and
  61.  
  62. a function here
  63.  
  64. and we've got a
  65.  
  66. bit of code inside here we've got some
  67.  
  68. array stuff there we'll just pull it out
  69.  
  70. so we don't confuse ourselves
  71.  
  72.  
  73. now things here to notice really are
  74.  
  75. our usages of our variables so i've got
  76.  
  77. local
  78.  
  79. declared here my float my string
  80.  
  81. my float has the hash symbol on it my
  82.  
  83. string
  84.  
  85. it's the first usage
  86.  
  87. so it has dollar sign at the end of it
  88.  
  89. and a loop here has
  90.  
  91. the integer
  92.  
  93. tag which is the percent symbol
  94.  
  95. input
  96.  
  97. but as we move through you'll notice
  98.  
  99. that uh
  100.  
  101. i've tagged a as an integer
  102.  
  103. i've used loop without its
  104.  
  105. postfix without
  106.  
  107. suffix b the same these are all used
  108.  
  109. without them
  110.  
  111. and down here we've got a string
  112.  
  113. and then we're using my string
  114.  
  115. without its
  116.  
  117. dollar sign
  118.  
  119. and same goes to float so previously i
  120.  
  121. was talking about wanting it to pick
  122.  
  123. these up
  124.  
  125. and do those translations for us and so
  126.  
  127. which is something you really can't just
  128.  
  129. do with like a replacement
  130.  
  131. things like the percent sign here
  132.  
  133. if you just pull in a bunch of code
  134.  
  135. into you know notepad or whatever just
  136.  
  137. and just do a raw replacement
  138.  
  139. that would probably get you a long way
  140.  
  141. to fixing those problems but um
  142.  
  143. you have a you'll have collisions as
  144.  
  145. well inside strings and comments etc but
  146.  
  147. but for the most part you wouldn't no
  148.  
  149. one would care
  150.  
  151. here's another bit of code which
  152.  
  153. actually dragged across from
  154.  
  155. from facebook just before
  156.  
  157. to demonstrate this in a broader
  158.  
  159. spectrum so we've got
  160.  
  161. in that language all variables were
  162.  
  163. inherently float
  164.  
  165. so here i've just added a local
  166.  
  167. statement with the variables in it
  168.  
  169. but left all of the occurrences
  170.  
  171. throughout the program
  172.  
  173. and without the
  174.  
  175. hash symbol on them
  176.  
  177. i've done that intentionally because i
  178.  
  179. want the converter to pick them up and
  180.  
  181. and do that work for us
  182.  
  183. so i'll just hit save on that and close
  184.  
  185. it
  186.  
  187. go back to here back to play basic now
  188.  
  189. this is an example we're loading and
  190.  
  191. running
  192.  
  193. running debug so we can see the output
  194.  
  195. you know in debug is taking 91
  196.  
  197. milliseconds
  198.  
  199. most that's just text airport to the
  200.  
  201. console
  202.  
  203. here's the function that's that's output
  204.  
  205. for this last piece here
  206.  
  207. just send that in a little bit
  208.  
  209. you can use the scroll function too with
  210.  
  211. text boxes
  212.  
  213.  
  214. you'll notice that all of the variables
  215.  
  216. that follow
  217.  
  218. here
  219.  
  220. so after a declaration up here
  221.  
  222. the original code hand
  223.  
  224. was just width etc
  225.  
  226. all of these are now
  227.  
  228. read and read and run pretty much in pv
  229.  
  230. which is pretty good
  231.  
  232. that saves a lot of messing around
  233.  
  234. i won't say it's going to be perfect
  235.  
  236. there's probably some situation where it
  237.  
  238. will do the wrong thing or whatever
  239.  
  240. at the moment that's more likely than
  241.  
  242. not but
  243.  
  244. that will give us curry that we almost
  245.  
  246. run in pv as it stands
  247.  
  248. almost
  249.  
  250. [Music]
  251.  
  252. this is actually using radians rather
  253.  
  254. than angles so we have to convert these
  255.  
  256. two
  257.  
  258. that might be something we need to do
  259.  
  260. um
  261.  
  262. in blitz as well i'm not quite sure if
  263.  
  264. the cos and sine functions are actually
  265.  
  266. if we need to wrap them or not
  267.  
  268. i haven't tried this wasn't blips code
  269.  
  270. by the way this was on something else
  271.  
  272. but
  273.  
  274. i use that as a good example of
  275.  
  276. how we can get to
  277.  
  278. add the postfix symbols do our variables
  279.  
  280. for us
  281.  
  282. on mass
  283.  
  284. uh this is the code that was that's the
  285.  
  286. converted code so this is a
  287.  
  288. it won't
  289.  
  290. probably won't compile in pv we'll
  291.  
  292. probably get some functions in there
  293.  
  294. that we're not using
  295.  
  296. but we should come pretty close um
  297.  
  298. up here
  299.  
  300. this is the list of
  301.  
  302. scopes we've found so we found
  303.  
  304. what we're kind of like calling global
  305.  
  306. scope
  307.  
  308. which is the
  309.  
  310. the entire program that's not inside a
  311.  
  312. function
  313.  
  314. from top to bottom
  315.  
  316. and this is my function which is
  317.  
  318. the source position there is wrong
  319.  
  320. but these are the variables that are
  321.  
  322. known to be inside it
  323.  
  324. um some of these are getting or
  325.  
  326. incorrect is picking up some common
  327.  
  328. commands that it shouldn't
  329.  
  330. um
  331.  
  332. we might do a secondary search on those
  333.  
  334. and make sure that they're
  335.  
  336. they're
  337.  
  338. blips keywords they should already be
  339.  
  340. classified but i think there's
  341.  
  342. i think that classification's not quite
  343.  
  344. working probably
  345.  
  346. anyway so
  347.  
  348. you know a function that we dragged
  349.  
  350. across before we've got all of these
  351.  
  352. variables widths etc
  353.  
  354. and they're all tagged as
  355.  
  356. being a type 2 which is a float and it
  357.  
  358. gives us an indication of how frequently
  359.  
  360. the variables are used
  361.  
  362. and that's kind of that's handy because
  363.  
  364. we might be able to
  365.  
  366. use that to give hints about
  367.  
  368. uh
  369.  
  370. you know in certain scopes there are
  371.  
  372. variables that are only used once
  373.  
  374. that you're not you know maybe it's a
  375.  
  376. typo maybe it's something else
  377.  
  378. give warnings possibly
  379.  
  380. so you could add you can insert a bit of
  381.  
  382. a header in front of a function let's
  383.  
  384. say
  385.  
  386. that's it with a comment that says you
  387.  
  388. know this variable is not used
  389.  
  390. more than once or something like that
  391.  
  392. possibly even remove it
  393.  
  394. the problem with that would be of course
  395.  
  396. is is if we've misclassified something
  397.  
  398. and it turns out to be a constant
  399.  
  400. somewhere else or something like that
  401.  
  402. which we don't actually support at the
  403.  
  404. moment
  405.  
  406. come to think of it
  407.  
  408. so we'll have to add
  409.  
  410. support for constants to make sure and
  411.  
  412. they'll have precedence over uh
  413.  
  414. variables as well
  415.  
  416. the variables are kind of the lowest
  417.  
  418. thing on the pecking order
  419.  
  420. but our output here so
  421.  
  422. this just gives me a bit of idea of what
  423.  
  424. variables that's found inside the scope
  425.  
  426. they all seem to be okay if you look
  427.  
  428. down the bottom here what's
  429.  
  430. in this scope here
  431.  
  432. it's picked up things like swap
  433.  
  434. do
  435.  
  436. ellipse it thinks they're variables
  437.  
  438. they're not
  439.  
  440. of course not
  441.  
  442. but you know
  443.  
  444. we're getting somewhere
  445.  
  446. uh
  447.  
  448. i'm getting somewhere with this
  449.  
  450. at the top here
  451.  
  452. pick up us
  453.  
  454. our skypes i'm not sure if our
  455.  
  456. about searching is uh
  457.  
  458. our scope searching it's working
  459.  
  460. as per blitz
  461.  
  462. but i can update that later if it's not
  463.  
  464. quite quite correct
  465.  
  466. as we're inside a function is going to
  467.  
  468. work out what version of a variable
  469.  
  470. you're using
  471.  
  472. um
  473.  
  474. you know if the thing's declared
  475.  
  476. that's global uh above it
  477.  
  478. then we need to go okay use the global
  479.  
  480. occurrence of this so if the global was
  481.  
  482. declared
  483.  
  484. let's say it was a floating point
  485.  
  486. variable and then later on using it
  487.  
  488. without the hash symbol it must update
  489.  
  490. that as well
  491.  
  492. rather than think that's a new variable
  493.  
  494. inside the scope
  495.  
  496. but i think actually
  497.  
  498. even though we're not doing a full
  499.  
  500. compile
  501.  
  502. we're just picking through with this
  503.  
  504. logic we should be able to get a lot of
  505.  
  506. that stuff done
  507.  
  508. um
  509.  
  510. to be horrified if we run it in
  511.  
  512. on the on the large code base
  513.  
  514. this takes about 10 seconds to run this
  515.  
  516. well i'm sorry i can pull compile and
  517.  
  518. run it to straight
  519.  
  520. with no debug
  521.  
  522. so
  523.  
  524. this is the
  525.  
  526. 9206 line
  527.  
  528. block of code
  529.  
  530. so it's been converted in about 900
  531.  
  532. milliseconds just under a second
  533.  
  534. that's pretty good i'm actually
  535.  
  536. i'm happy about that i wasn't expecting
  537.  
  538. it to do that well
  539.  
  540. i thought it might take you know five
  541.  
  542. ten seconds or something for code bases
  543.  
  544. that were large
  545.  
  546. run debug this takes a while because
  547.  
  548. it's dumping all of that crap to the
  549.  
  550. console and the console is just a
  551.  
  552. buffered window
  553.  
  554. so the thing looks like it's died
  555.  
  556. as you can see it's a long
  557.  
  558. lot of code
  559.  
  560. yeah now i can
  561.  
  562. pick through the scopes and and see what
  563.  
  564. um
  565.  
  566. what we've uncovered so this function
  567.  
  568. gy wind but skin
  569.  
  570. has these variables in it
  571.  
  572. some of these might not be variables
  573.  
  574. because it at the moment doesn't
  575.  
  576. understand types and the field uh
  577.  
  578. separators and types so might be picking
  579.  
  580. up field field separators as variables
  581.  
  582. here we've got some
  583.  
  584. references to these things here we're
  585.  
  586. looking for that in the code base
  587.  
  588. let's find the global scope
  589.  
  590. all right that's global scope there
  591.  
  592. got a bunch of just gar materials at the
  593.  
  594. front
  595.  
  596. gui screen width
  597.  
  598. see that's surprising
  599.  
  600. saying that screenwrites is only used
  601.  
  602. once the skin variables are used
  603.  
  604. that one's used 10 times
  605.  
  606. so that includes its initial occurrence
  607.  
  608. so it's declaration
  609.  
  610. i wonder if that's true
  611.  
  612. gee i scream with hmm
  613.  
  614. i don't believe that for a second
  615.  
  616. so gy screen
  617.  
  618. width
  619.  
  620. occurs
  621.  
  622. yeah it's more like it
  623.  
  624. was gonna say that that can't be right
  625.  
  626. so clearly we're not we're not fetching
  627.  
  628. um
  629.  
  630. hmm
  631.  
  632. we're not skipping over something there
  633.  
  634. we're not searching and finding those
  635.  
  636. correctly for us
  637.  
  638. good
  639.  
  640. what have you learnt well we're kind of
  641.  
  642. working we're kind of not working so
  643.  
  644. that's pretty much what we expect to
  645.  
  646. find isn't it
  647.  
  648. uh if i try this array thing actually i
  649.  
  650. might try the array
  651.  
  652. one here function scopes
  653.  
  654. so dimension
  655.  
  656. uh
  657.  
  658. dim
  659.  
  660. and i call it uh strings
  661.  
  662. a hundred
  663.  
  664. i'll use the example always do it the
  665.  
  666. other day so four
  667.  
  668. from the pickles naught to 100
  669.  
  670. i just want to see if it picks that up
  671.  
  672. actually sorry print
  673.  
  674. string
  675.  
  676. no
  677.  
  678. dollar sign
  679.  
  680. loop
  681.  
  682. thank you let's go
  683.  
  684. and we didn't pick that up
  685.  
  686. oh
  687.  
  688. yeah i know why there's no
  689.  
  690. trapping of the bracket
  691.  
  692. if it sees
  693.  
  694. this keyword and then a bracket
  695.  
  696. following it
  697.  
  698. you might know what to do and just
  699.  
  700. assume that's an integer
  701.  
  702. and strings without that will just be
  703.  
  704. integer so
  705.  
  706. in that scope
  707.  
  708. my function one there should be a thing
  709.  
  710. called where are we
  711.  
  712. yes yet
  713.  
  714. hmm
  715.  
  716. that's some some more oranges there
  717.  
  718. we're not getting both of those we
  719.  
  720. should have two strings if that's how it
  721.  
  722. was to seeing it
  723.  
  724. i had some really weird problems with
  725.  
  726. this this afternoon actually
  727.  
  728. weird is not saying the least and i was
  729.  
  730. at the point where i thought oh no
  731.  
  732. i found another weird bug in pv
  733.  
  734. it was right at that moment where i
  735.  
  736. uncovered oh hang on
  737.  
  738. it's not a bug in pb it's just i haven't
  739.  
  740. initialized
  741.  
  742. okay so we've opened
  743.  
  744. yeah
  745.  
  746. uh here's the logic here so if we scan
  747.  
  748. through we find a word
  749.  
  750. which is just any group of characters
  751.  
  752. that's been classified
  753.  
  754. so we'll see a block of character like
  755.  
  756. that
  757.  
  758. there is no white space the block of
  759.  
  760. characters is alphanumeric so you can
  761.  
  762. have lowercase uppercase and
  763.  
  764. numbers
  765.  
  766. and lower score character as well but i
  767.  
  768. think it can it has to start with it
  769.  
  770. with a underscore or
  771.  
  772. has to start with a
  773.  
  774. letter
  775.  
  776. anyway so we're skimming through we find
  777.  
  778. our word
  779.  
  780. we go okay
  781.  
  782. grab the token that's our variable name
  783.  
  784. we'll
  785.  
  786. need that to search for it in a second
  787.  
  788. we grab
  789.  
  790. whatever's after this thing
  791.  
  792. if this is an open bra which means that
  793.  
  794. the character after this so i just
  795.  
  796. just here for example
  797.  
  798. um
  799.  
  800. with the actual card again
  801.  
  802. so if we go let's zoom off a bit
  803.  
  804. so we're looking at this
  805.  
  806. keyword here
  807.  
  808. uh
  809.  
  810. it'll grab the token next to it which is
  811.  
  812. going to be the bracket token now what i
  813.  
  814. can actually do and get away with is you
  815.  
  816. have white space between these things
  817.  
  818. i think in pb generally it doesn't like
  819.  
  820. the white space to be there but in a lot
  821.  
  822. of languages that are pre
  823.  
  824. tokenized and then translated
  825.  
  826. which means that the white space can
  827.  
  828. just be thrown away so it doesn't matter
  829.  
  830. if there's white space between stuff
  831.  
  832. which is kind of weird
  833.  
  834. if people can't really do that well
  835.  
  836. i'm not sure enough to talk ahead but
  837.  
  838. anyway so we said this keyword here this
  839.  
  840. strings keyword
  841.  
  842. and since the next token is an unknown
  843.  
  844. stop character it's not a hash it's not
  845.  
  846. it doesn't know what to do if this is
  847.  
  848. presenting as an integer array
  849.  
  850. that's why it's failing so
  851.  
  852. what do here
  853.  
  854. sounds pretty crazy it's just um
  855.  
  856. just
  857.  
  858. jump to check this variable
  859.  
  860. that's how lazy i am yep
  861.  
  862. so look
  863.  
  864. and there we go we fixed our fix one
  865.  
  866. little shortcoming so
  867.  
  868. strings has been picked up as being
  869.  
  870. that is the array because it's saying
  871.  
  872. the next character is here or that's the
  873.  
  874. assumption it's going to make
  875.  
  876. it's pretty good assumption
  877.  
  878. um
  879.  
  880. and we're looking for that in the
  881.  
  882. current skype and found it so it's happy
  883.  
  884. to to promote it
  885.  
  886. here we have
  887.  
  888. we've got our strings array
  889.  
  890. and we found two occurrences of it
  891.  
  892. that's what we wanted
  893.  
  894. happy days
  895.  
  896. isn't that terrific so it's pretty much
  897.  
  898. just bumbling logic to fall through and
  899.  
  900. go okay this is our best guess for what
  901.  
  902. this keyword is
  903.  
  904. uh if we were building a full-on
  905.  
  906. compiler we actually have
  907.  
  908. as i was saying before you
  909.  
  910. you grab
  911.  
  912. as you found each keyword you then
  913.  
  914. dissect what the keyword actually is if
  915.  
  916. is it a
  917.  
  918. structural statement like a for
  919.  
  920. statement or a go-to or you know or a
  921.  
  922. compare you know where it's going to be
  923.  
  924. and then
  925.  
  926. uh they all have a certain pattern
  927.  
  928. so a for loop has okay after the full
  929.  
  930. statement we have a declaration then we
  931.  
  932. have
  933.  
  934. an assignment and then we have an
  935.  
  936. expression
  937.  
  938. a two statement and an expression
  939.  
  940. that's that's the pattern that i have
  941.  
  942. and if statement has
  943.  
  944. you know it's just an if and it's an
  945.  
  946. expression
  947.  
  948. uh
  949.  
  950. what we have to do is we have to break
  951.  
  952. the code down
  953.  
  954. at least then if we did that we could be
  955.  
  956. type aware this is not type aware
  957.  
  958. what i mean by that is is it
  959.  
  960. is it's not aware if you
  961.  
  962. like if you go hey here's my string
  963.  
  964. right here and i add
  965.  
  966. an integer value to it
  967.  
  968. pv is going to go what the hell are you
  969.  
  970. doing you know it might have a clue
  971.  
  972. but if we
  973.  
  974. wrote this as a true compiler we could
  975.  
  976. run through here
  977.  
  978. uh
  979.  
  980. promote this is to be the correct array
  981.  
  982. we wouldn't even need to do that
  983.  
  984. actually
  985.  
  986. uh and then we have a string we have
  987.  
  988. this replacement
  989.  
  990. we have um
  991.  
  992. conceptually half we've seen the
  993.  
  994. expression the first time
  995.  
  996. we pull the fullest result see it we end
  997.  
  998. up with like result
  999.  
  1000. string plus this integer then we have a
  1001.  
  1002. bit of logic there going okay
  1003.  
  1004. you
  1005.  
  1006. since it's a string mixed with integers
  1007.  
  1008. we're going to want to treat the string
  1009.  
  1010. like an and a number so i have the value
  1011.  
  1012. of the string
  1013.  
  1014. yeah you could do that stuff
  1015.  
  1016. in a translator for you
  1017.  
  1018. and life would go on but in our version
  1019.  
  1020. here we can't do that
  1021.  
  1022. because i don't know
  1023.  
  1024. what this is
  1025.  
  1026. right or what the parameters in here are
  1027.  
  1028. i have no idea
  1029.  
  1030. i'm not matching types i'm not matching
  1031.  
  1032. the patterns of expressions nothing
  1033.  
  1034. so
  1035.  
  1036. so we can get you know get away with a
  1037.  
  1038. very preliminary um
  1039.  
  1040. comparison comparative sort of
  1041.  
  1042. replacement but we're not going to have
  1043.  
  1044. things that are
  1045.  
  1046. you know if your input code makes no
  1047.  
  1048. sense and your apple code won't make any
  1049.  
  1050. sense and in places
  1051.  
  1052. the output code here white making sense
  1053.  
  1054. in pb because you'll have
  1055.  
  1056. i noticed there's a thing in blitz where
  1057.  
  1058. you can do you can do type conversions
  1059.  
  1060. like
  1061.  
  1062. you know
  1063.  
  1064. a equals
  1065.  
  1066. so you have a literal string
  1067.  
  1068. and have it converted for you to an
  1069.  
  1070. integer value
  1071.  
  1072. um
  1073.  
  1074. yeah it just saves having you know a
  1075.  
  1076. function
  1077.  
  1078. val for example
  1079.  
  1080. you know
  1081.  
  1082. this is a shortcut they've added
  1083.  
  1084. anyway
  1085.  
  1086. is the this is the basic logic we're
  1087.  
  1088. just picking through here
  1089.  
  1090. um
  1091.  
  1092. just to select a slight case
  1093.  
  1094. with if the next the character after our
  1095.  
  1096. current current word is this
  1097.  
  1098. and it's most likely this thing
  1099.  
  1100. yeah so i have an example statement eg
  1101.  
  1102. this word bracket
  1103.  
  1104. if for the percent
  1105.  
  1106. it would be this word
  1107.  
  1108. percent
  1109.  
  1110. hash dollar sign etc
  1111.  
  1112. um there should be a you know
  1113.  
  1114. another version here that handles um
  1115.  
  1116. the trapping of
  1117.  
  1118. fields within types
  1119.  
  1120. but that's not something it even looks
  1121.  
  1122. at the moment
  1123.  
  1124. we'll try and get this to work pretty
  1125.  
  1126. well and then we'll worry about
  1127.  
  1128. stepping up for types
  1129.  
  1130. make sure it can pick up the
  1131.  
  1132. blitz commands correctly output them you
  1133.  
  1134. know into a
  1135.  
  1136. wrapped function a format and
  1137.  
  1138. then try and
  1139.  
  1140. make some simple programs
  1141.  
  1142. in blitz and try and get them to run
  1143.  
  1144. so
  1145.  
  1146. isn't that fun
  1147.  
  1148. anyway uh thanks thanks for hanging
  1149.  
  1150. around and listening to me ramble on
  1151.  
  1152. and i'll see you next time bye
  1153.  
  1154.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement