Guest User

Untitled

a guest
Sep 19th, 2024
23
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 14.62 KB | None | 0 0
  1. (*
  2. Parzek Parser Combinator Literate Program Source
  3. By Chubak Bidpaa
  4. *)
  5.  
  6.  
  7. (*p
  8. % Preamble
  9. \DocumentMetadata{}
  10. \documentclass[a4paper]{article}
  11. \usepackage{ocamlweb}
  12. \usepackage{amsmath,amsthm,amssymb}
  13. \usepackage{filecontents}
  14. \usepackage{imakeidx}
  15. \usepackage{epigraph}
  16. \usepackage[backend=biber]{biblatex}
  17. *)
  18.  
  19. (*
  20. % Settings and preliminary commands
  21.  
  22. \makeindex
  23. \addbibresource{references.bib}
  24. *)
  25.  
  26. (*
  27. % Macros and commands
  28. *)
  29.  
  30. (*
  31. % Begin document
  32.  
  33. \begin{document}
  34.  
  35. *)
  36.  
  37. (*
  38. % Introduction
  39. The program before you is \parzek, a \index{Parser combinator@\textit{Parser combinator}} library for the \ocaml\ language. This is not a textbook, nor is it a tutorial or an academic paper, this is merely a \index{Literate program@\textit{Literate program}}.
  40.  
  41.  
  42. There exist many different implementations of the basic \parsec: some use basic functions, whereas some use \index{Monadic formulation}\parencite{swiestra2001combinator}. In this \index{Literate program}, we present \parzek, a \parsez\ written in \ocaml; which is based more on the former, rather than the latter, as discussed in \parencite{fokker1995functional} --- and several other literature, which shall be discussed later.
  43. *)
  44.  
  45. (*
  46. % Parsec history
  47.  
  48. \section{A brief, painless history of \index{Parser combinator@Parser combinators}}
  49. \label{sec:parsec_history}
  50.  
  51. \epigraph{A parser may be viewed as a function from a string of symbols to a result value.}{Graham Hutton}
  52.  
  53. There's a clear distinction between \index{parse@\textit{parsing}} and \index{lexical scan@\textit{lexically scanning}}, first pointed out by \parencite{chomsky1959certain}. A \index{Formal language@\textit{Formal language}} is defined by its \index{grammar@\textit{grammar}}, which generates that language. These grammars are grouped in four tiers of contraint:
  54.  
  55. \noindent
  56. \formallanglevel{Zero}{Context-free};\\
  57. \formallanglevel{One}{Context-sensitive};\\
  58. \formallanglevel{Two}{Context-free};\\
  59. \formallanglevel{Three}{Regular};\\
  60.  
  61. \index{lexical analysis} operates on a \index{Regular grammar}-level, whereas \index{Syntactic analysis} operates on a \index{Context-free grammar}.
  62.  
  63. Construction of \index{parser@\textit{parsers}} which attempt \index{Syntactic analysis} date back far as implementation of \index{FORTRAN}\parencite{backus1957fortran}. With the successor to FORTRAN, the \index{ALGOL 60} report introduced \index{BNF@\texttt{BNF}} notation, or the \textit{Backus-Naur form} --- which \index{parser generator@parser generators} such as \index{Yacc}\parencite{johnson1975yacc} leveraged to denote their grammars. \index{Regular Expression@\textit{Regular Expressions}} were used by tools such as \index{Lex}\parencite{lesk1975lex} to denote the \index{lexeme@lexemes}.
  64.  
  65. With the introduction of \index{Functional language@Functional languages} such as \index{ML}\parencite{milner1982ml} and \index{Miranda}\parencite{turner1986overview} --- which in turn were influenced by the seminal \index{ISWIM}\parencite{landin1966next}; methods of parsing using these \index{High-level language@\textit{High-level languages}} emerged which leveraged the fluency of \index{Functional language@Functional languages and their adjacency to \index{Combinatory logic} to construct \index{language recognizer@language recognizers} which made obsolete the need for pre-generation.
  66.  
  67. These methods existed in the \ae{}ther of \index{functional programming} for nearly a decade, being partially discussed in \parencite{wadler1985replace} --- however, it was \parencite{hutton1990parsing} which brought \index{Parser combinator@Parser combinators} into the forefront.
  68.  
  69. These \index{combinator@combinator-based} parsers had historically faced two main disadvantages\parencite{swierstra2001combinator}:
  70.  
  71. \begin{enumerate}
  72. \item As the \index{grammar@grammars} got large, both the \index{Run-time complexity} and \index{Space-time complexity} of the program increased substantially;
  73. \item These parsers had poor facilities for dealing with \index{user error@user errors};
  74. \end{enumerate}
  75.  
  76. Both these shortcomings were dealt with over-time. \parencite{leijen2001fast} introduced a much more performant \index{Parser combinator} and \index{swierstra1996deterministic} introduced methods of dealing with \index{user error@user errors}.
  77.  
  78. The aim of \parzek\ is to \textit{Stand on the shoulder} of the proverbial giants. \ref{appendix:some_parsecs} introduces several \index{Parser combinator@Parser combinators} written over the years by people like me, and some extra historical notes are provided.
  79. *)
  80.  
  81. (*
  82. % Parsec explaination
  83.  
  84. \section{Let's explain just \textbf{what} a \index{Parser combinator} is\ldots?}
  85. \label{sec:explaining_parsecs}
  86.  
  87. \epigraph{Combinatory logic is a kind of 'variable-free' logic which aims to capture the essence of mathematical functions without the need for bound variables.}{Haskell Curry}
  88.  
  89.  
  90.  
  91.  
  92. *)
  93.  
  94.  
  95. (*
  96. % References
  97.  
  98. \begin{filecontents}{references.bib}
  99. @book{bimbo2011combinatory,
  100. title={Combinatory logic: Pure, applied and typed},
  101. author={Bimb{\'o}, Katalin},
  102. year={2011},
  103. publisher={CRC Press}
  104. }
  105. @article{landin1966next,
  106. title={The next 700 programming languages},
  107. author={Landin, Peter J},
  108. journal={Communications of the ACM},
  109. volume={9},
  110. number={3},
  111. pages={157--166},
  112. year={1966},
  113. publisher={ACM New York, NY, USA}
  114. }
  115. @article{turner1986overview,
  116. title={An overview of Miranda},
  117. author={Turner, David},
  118. journal={ACM Sigplan Notices},
  119. volume={21},
  120. number={12},
  121. pages={158--166},
  122. year={1986},
  123. publisher={ACM New York, NY, USA}
  124. }
  125. @article{milner1982ml,
  126. title={How ML evlolved},
  127. author={Milner, Robin},
  128. journal={ML/Hope/LCF Newsletter},
  129. volume={1},
  130. number={1},
  131. pages={25--34},
  132. year={1982}
  133. }
  134. @book{lesk1975lex,
  135. title={Lex: A lexical analyzer generator},
  136. author={Lesk, Michael E and Schmidt, Eric},
  137. volume={39},
  138. year={1975},
  139. publisher={Bell Laboratories Murray Hill, NJ}
  140. }
  141. @book{johnson1975yacc,
  142. title={Yacc: Yet another compiler-compiler},
  143. author={Johnson, Stephen C and others},
  144. volume={32},
  145. year={1975},
  146. publisher={Bell Laboratories Murray Hill, NJ}
  147. }
  148. @book{mitchell2003concepts,
  149. title={Concepts in programming languages},
  150. author={Mitchell, John C},
  151. year={2003},
  152. publisher={Cambridge University Press}
  153. }
  154. @inproceedings{backus1957fortran,
  155. title={The FORTRAN automatic coding system},
  156. author={Backus, John W and Beeber, Robert J and Best, Sheldon and Goldberg, Richard and Haibt, Lois M and Herrick, Harlan L and Nelson, Robert A and Sayre, David and Sheridan, Peter B and Stern, Harold and others},
  157. booktitle={Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability},
  158. pages={188--198},
  159. year={1957}
  160. }
  161. @article{backus1960report,
  162. title={Report on the algorithmic language ALGOL 60},
  163. author={Backus, John W and Bauer, Friedrich L and Green, Julien and Katz, Charles and McCarthy, John and Perlis, Alan J and Rutishauser, Heinz and Samelson, Klaus and Vauquois, Bernard and Wegstein, Joseph Henry and others},
  164. journal={Communications of the ACM},
  165. volume={3},
  166. number={5},
  167. pages={299--311},
  168. year={1960},
  169. publisher={ACM New York, NY, USA}
  170. }
  171. @article{grune2008introduction,
  172. title={Introduction to Parsing},
  173. author={Grune, Dick and Jacobs, Ceriel JH and Grune, Dick and Jacobs, Ceriel JH},
  174. journal={Parsing Techniques: A Practical Guide},
  175. pages={61--102},
  176. year={2008},
  177. publisher={Springer}
  178. }
  179. @inproceedings{king1960functions,
  180. title={Functions required of a translation system},
  181. author={King, Gilbert},
  182. booktitle={Proceedings of the National Symposium on Machine Translation},
  183. year={1960}
  184. }
  185. @article{chomsky1959certain,
  186. title={On certain formal properties of grammars},
  187. author={Chomsky, Noam},
  188. journal={Information and control},
  189. volume={2},
  190. number={2},
  191. pages={137--167},
  192. year={1959},
  193. publisher={Elsevier}
  194. }
  195. @inproceedings{wadler1985replace,
  196. title={How to replace failure by a list of successes a method for exception handling, backtracking, and pattern matching in lazy functional languages},
  197. author={Wadler, Philip},
  198. booktitle={Conference on Functional Programming Languages and Computer Architecture},
  199. pages={113--128},
  200. year={1985},
  201. organization={Springer}
  202. }
  203. @inproceedings{hutton1990parsing,
  204. title={Parsing using combinators},
  205. author={Hutton, Graham},
  206. booktitle={Functional Programming: Proceedings of the 1989 Glasgow Workshop 21--23 August 1989, Fraserburgh, Scotland},
  207. pages={353--370},
  208. year={1990},
  209. organization={Springer}
  210. }
  211. @misc{hutton1996monadic,
  212. title={Monadic parser combinators},
  213. author={Hutton, Graham and Meijer, Erik},
  214. year={1996}
  215. }
  216. @inproceedings{izmaylova2016practical,
  217. title={Practical, general parser combinators},
  218. author={Izmaylova, Anastasia and Afroozeh, Ali and Storm, Tijs van der},
  219. booktitle={Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program manipulation},
  220. pages={1--12},
  221. year={2016}
  222. }
  223. @incollection{swierstra2008combinator,
  224. title={Combinator parsing: A short tutorial},
  225. author={Swierstra, S Doaitse},
  226. booktitle={International LerNet ALFA Summer School on Language Engineering and Rigorous Software Development},
  227. pages={252--300},
  228. year={2008},
  229. publisher={Springer}
  230. }
  231. @inproceedings{koopman1998efficient,
  232. title={Efficient combinator parsers},
  233. author={Koopman, Pieter and Plasmeijer, Rinus},
  234. booktitle={Symposium on Implementation and Application of Functional Languages},
  235. pages={120--136},
  236. year={1998},
  237. organization={Springer}
  238. }
  239. @article{leijen2001fast,
  240. title={Parsec, a fast combinator parser},
  241. author={Leijen, Daan},
  242. year={2001}
  243. }
  244. @inproceedings{kurvs2016optimizing,
  245. title={Optimizing parser combinators},
  246. author={Kur{\v{s}}, Jan and Vran{\`y}, Jan and Ghafari, Mohammad and Lungu, Mircea and Nierstrasz, Oscar},
  247. booktitle={Proceedings of the 11th edition of the International Workshop on Smalltalk Technologies},
  248. pages={1--13},
  249. year={2016}
  250. }
  251. @article{leijen2001direct,
  252. title={Parsec: Direct style monadic parser combinators for the real world},
  253. author={Leijen, Daan and Meijer, Erik},
  254. year={2001}
  255. }
  256. @inproceedings{koopman2019new,
  257. title={A new view on parser combinators},
  258. author={Koopman, Pieter and Plasmeijer, Rinus},
  259. booktitle={Proceedings of the 31st Symposium on Implementation and Application of Functional Languages},
  260. pages={1--11},
  261. year={2019}
  262. }
  263. @article{jonnalagedda2014staged,
  264. title={Staged parser combinators for efficient data processing},
  265. author={Jonnalagedda, Manohar and Coppey, Thierry and Stucki, Sandro and Rompf, Tiark and Odersky, Martin},
  266. journal={Acm Sigplan Notices},
  267. volume={49},
  268. number={10},
  269. pages={637--653},
  270. year={2014},
  271. publisher={ACM New York, NY, USA}
  272. }
  273. @article{kurvs2018efficient,
  274. title={Efficient parsing with parser combinators},
  275. author={Kur{\v{s}}, Jan and Vran{\`y}, Jan and Ghafari, Mohammad and Lungu, Mircea and Nierstrasz, Oscar},
  276. journal={Science of computer programming},
  277. volume={161},
  278. pages={57--88},
  279. year={2018},
  280. publisher={Elsevier}
  281. }
  282. @article{willis2020staged,
  283. title={Staged selective parser combinators},
  284. author={Willis, Jamie and Wu, Nicolas and Pickering, Matthew},
  285. journal={Proceedings of the ACM on Programming Languages},
  286. volume={4},
  287. number={ICFP},
  288. pages={1--30},
  289. year={2020},
  290. publisher={ACM New York, NY, USA}
  291. }
  292. @article{hill1996combinators,
  293. title={Combinators for parsing expressions},
  294. author={Hill, Steve},
  295. journal={Journal of Functional Programming},
  296. volume={6},
  297. number={3},
  298. pages={445--464},
  299. year={1996},
  300. publisher={Cambridge University Press}
  301. }
  302. @incollection{swierstra1996deterministic,
  303. title={Deterministic, error-correcting combinator parsers},
  304. author={Swierstra, S Doaitse and Duponcheel, Luc},
  305. booktitle={International School on Advanced Functional Programming},
  306. pages={184--207},
  307. year={1996},
  308. publisher={Springer}
  309. }
  310. @inproceedings{beguet2014accelerating,
  311. title={Accelerating parser combinators with macros},
  312. author={B{\'e}guet, Eric and Jonnalagedda, Manohar},
  313. booktitle={Proceedings of the Fifth Annual Scala Workshop},
  314. pages={7--17},
  315. year={2014}
  316. }
  317. @inproceedings{van2018gll,
  318. title={GLL parsing with flexible combinators},
  319. author={van Binsbergen, L Thomas and Scott, Elizabeth and Johnstone, Adrian},
  320. booktitle={Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering},
  321. pages={16--28},
  322. year={2018}
  323. }
  324. @article{leijen2001parsec,
  325. title={Parsec: A practical parser library},
  326. author={Leijen, Daan and Meijer, Erik},
  327. journal={Electronic Notes in Theoretical Computer Science},
  328. volume={41},
  329. number={1},
  330. pages={1--20},
  331. year={2001}
  332. }
  333. @inproceedings{mazanek2008graph,
  334. title={Graph parser combinators},
  335. author={Mazanek, Steffen and Minas, Mark},
  336. booktitle={Implementation and Application of Functional Languages: 19th International Workshop, IFL 2007, Freiburg, Germany, September 27-29, 2007. Revised Selected Papers 19},
  337. pages={1--18},
  338. year={2008},
  339. organization={Springer}
  340. }
  341. @article{swierstra2001combinator,
  342. title={Combinator parsers: From toys to tools},
  343. author={Swierstra, S Doaitse},
  344. journal={Electronic Notes in Theoretical Computer Science},
  345. volume={41},
  346. number={1},
  347. pages={38--59},
  348. year={2001},
  349. publisher={Elsevier}
  350. }
  351. @inproceedings{fokker1995functional,
  352. title={Functional parsers},
  353. author={Fokker, Jeroen},
  354. booktitle={Advanced Functional Programming: First International Spring School on Advanced Functional Programming Techniques B{\aa}stad, Sweden, May 24--30, 1995 Tutorial Text 1},
  355. pages={1--23},
  356. year={1995},
  357. organization={Springer}
  358. }
  359. @techreport{jones1993composing,
  360. title={Composing monads},
  361. author={Jones, Mark P and Duponcheel, Luc},
  362. year={1993},
  363. institution={Technical Report YALEU/DCS/RR-1004, Department of Computer Science. Yale~…}
  364. }
  365. @inproceedings{willis2018garnishing,
  366. title={Garnishing parsec with parsley},
  367. author={Willis, Jamie and Wu, Nicolas},
  368. booktitle={Proceedings of the 9th ACM SIGPLAN International Symposium on Scala},
  369. pages={24--34},
  370. year={2018}
  371. }
  372. @inproceedings{hudak1984combinator,
  373. title={A combinator-based compiler for a functional language},
  374. author={Hudak, Paul and Kranz, David},
  375. booktitle={Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages},
  376. pages={122--132},
  377. year={1984}
  378. }
  379. @inproceedings{wand1983loops,
  380. title={Loops in combinator-based compilers},
  381. author={Wand, Mitchell},
  382. booktitle={Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages},
  383. pages={190--196},
  384. year={1983}
  385. }
  386.  
  387. \end{filecontents}
  388.  
  389. *)
  390.  
Add Comment
Please, Sign In to add comment