Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- (*
- Parzek Parser Combinator Literate Program Source
- By Chubak Bidpaa
- *)
- (*p
- % Preamble
- \DocumentMetadata{}
- \documentclass[a4paper]{article}
- \usepackage{ocamlweb}
- \usepackage{amsmath,amsthm,amssymb}
- \usepackage{filecontents}
- \usepackage{imakeidx}
- \usepackage{epigraph}
- \usepackage[backend=biber]{biblatex}
- *)
- (*
- % Settings and preliminary commands
- \makeindex
- \addbibresource{references.bib}
- *)
- (*
- % Macros and commands
- *)
- (*
- % Begin document
- \begin{document}
- *)
- (*
- % Introduction
- The program before you is \parzek, a \index{Parser combinator@\textit{Parser combinator}} library for the \ocaml\ language. This is not a textbook, nor is it a tutorial or an academic paper, this is merely a \index{Literate program@\textit{Literate program}}.
- There exist many different implementations of the basic \parsec: some use basic functions, whereas some use \index{Monadic formulation}\parencite{swiestra2001combinator}. In this \index{Literate program}, we present \parzek, a \parsez\ written in \ocaml; which is based more on the former, rather than the latter, as discussed in \parencite{fokker1995functional} --- and several other literature, which shall be discussed later.
- *)
- (*
- % Parsec history
- \section{A brief, painless history of \index{Parser combinator@Parser combinators}}
- \label{sec:parsec_history}
- \epigraph{A parser may be viewed as a function from a string of symbols to a result value.}{Graham Hutton}
- There's a clear distinction between \index{parse@\textit{parsing}} and \index{lexical scan@\textit{lexically scanning}}, first pointed out by \parencite{chomsky1959certain}. A \index{Formal language@\textit{Formal language}} is defined by its \index{grammar@\textit{grammar}}, which generates that language. These grammars are grouped in four tiers of contraint:
- \noindent
- \formallanglevel{Zero}{Context-free};\\
- \formallanglevel{One}{Context-sensitive};\\
- \formallanglevel{Two}{Context-free};\\
- \formallanglevel{Three}{Regular};\\
- \index{lexical analysis} operates on a \index{Regular grammar}-level, whereas \index{Syntactic analysis} operates on a \index{Context-free grammar}.
- Construction of \index{parser@\textit{parsers}} which attempt \index{Syntactic analysis} date back far as implementation of \index{FORTRAN}\parencite{backus1957fortran}. With the successor to FORTRAN, the \index{ALGOL 60} report introduced \index{BNF@\texttt{BNF}} notation, or the \textit{Backus-Naur form} --- which \index{parser generator@parser generators} such as \index{Yacc}\parencite{johnson1975yacc} leveraged to denote their grammars. \index{Regular Expression@\textit{Regular Expressions}} were used by tools such as \index{Lex}\parencite{lesk1975lex} to denote the \index{lexeme@lexemes}.
- With the introduction of \index{Functional language@Functional languages} such as \index{ML}\parencite{milner1982ml} and \index{Miranda}\parencite{turner1986overview} --- which in turn were influenced by the seminal \index{ISWIM}\parencite{landin1966next}; methods of parsing using these \index{High-level language@\textit{High-level languages}} emerged which leveraged the fluency of \index{Functional language@Functional languages and their adjacency to \index{Combinatory logic} to construct \index{language recognizer@language recognizers} which made obsolete the need for pre-generation.
- These methods existed in the \ae{}ther of \index{functional programming} for nearly a decade, being partially discussed in \parencite{wadler1985replace} --- however, it was \parencite{hutton1990parsing} which brought \index{Parser combinator@Parser combinators} into the forefront.
- These \index{combinator@combinator-based} parsers had historically faced two main disadvantages\parencite{swierstra2001combinator}:
- \begin{enumerate}
- \item As the \index{grammar@grammars} got large, both the \index{Run-time complexity} and \index{Space-time complexity} of the program increased substantially;
- \item These parsers had poor facilities for dealing with \index{user error@user errors};
- \end{enumerate}
- Both these shortcomings were dealt with over-time. \parencite{leijen2001fast} introduced a much more performant \index{Parser combinator} and \index{swierstra1996deterministic} introduced methods of dealing with \index{user error@user errors}.
- The aim of \parzek\ is to \textit{Stand on the shoulder} of the proverbial giants. \ref{appendix:some_parsecs} introduces several \index{Parser combinator@Parser combinators} written over the years by people like me, and some extra historical notes are provided.
- *)
- (*
- % Parsec explaination
- \section{Let's explain just \textbf{what} a \index{Parser combinator} is\ldots?}
- \label{sec:explaining_parsecs}
- \epigraph{Combinatory logic is a kind of 'variable-free' logic which aims to capture the essence of mathematical functions without the need for bound variables.}{Haskell Curry}
- *)
- (*
- % References
- \begin{filecontents}{references.bib}
- @book{bimbo2011combinatory,
- title={Combinatory logic: Pure, applied and typed},
- author={Bimb{\'o}, Katalin},
- year={2011},
- publisher={CRC Press}
- }
- @article{landin1966next,
- title={The next 700 programming languages},
- author={Landin, Peter J},
- journal={Communications of the ACM},
- volume={9},
- number={3},
- pages={157--166},
- year={1966},
- publisher={ACM New York, NY, USA}
- }
- @article{turner1986overview,
- title={An overview of Miranda},
- author={Turner, David},
- journal={ACM Sigplan Notices},
- volume={21},
- number={12},
- pages={158--166},
- year={1986},
- publisher={ACM New York, NY, USA}
- }
- @article{milner1982ml,
- title={How ML evlolved},
- author={Milner, Robin},
- journal={ML/Hope/LCF Newsletter},
- volume={1},
- number={1},
- pages={25--34},
- year={1982}
- }
- @book{lesk1975lex,
- title={Lex: A lexical analyzer generator},
- author={Lesk, Michael E and Schmidt, Eric},
- volume={39},
- year={1975},
- publisher={Bell Laboratories Murray Hill, NJ}
- }
- @book{johnson1975yacc,
- title={Yacc: Yet another compiler-compiler},
- author={Johnson, Stephen C and others},
- volume={32},
- year={1975},
- publisher={Bell Laboratories Murray Hill, NJ}
- }
- @book{mitchell2003concepts,
- title={Concepts in programming languages},
- author={Mitchell, John C},
- year={2003},
- publisher={Cambridge University Press}
- }
- @inproceedings{backus1957fortran,
- title={The FORTRAN automatic coding system},
- author={Backus, John W and Beeber, Robert J and Best, Sheldon and Goldberg, Richard and Haibt, Lois M and Herrick, Harlan L and Nelson, Robert A and Sayre, David and Sheridan, Peter B and Stern, Harold and others},
- booktitle={Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability},
- pages={188--198},
- year={1957}
- }
- @article{backus1960report,
- title={Report on the algorithmic language ALGOL 60},
- author={Backus, John W and Bauer, Friedrich L and Green, Julien and Katz, Charles and McCarthy, John and Perlis, Alan J and Rutishauser, Heinz and Samelson, Klaus and Vauquois, Bernard and Wegstein, Joseph Henry and others},
- journal={Communications of the ACM},
- volume={3},
- number={5},
- pages={299--311},
- year={1960},
- publisher={ACM New York, NY, USA}
- }
- @article{grune2008introduction,
- title={Introduction to Parsing},
- author={Grune, Dick and Jacobs, Ceriel JH and Grune, Dick and Jacobs, Ceriel JH},
- journal={Parsing Techniques: A Practical Guide},
- pages={61--102},
- year={2008},
- publisher={Springer}
- }
- @inproceedings{king1960functions,
- title={Functions required of a translation system},
- author={King, Gilbert},
- booktitle={Proceedings of the National Symposium on Machine Translation},
- year={1960}
- }
- @article{chomsky1959certain,
- title={On certain formal properties of grammars},
- author={Chomsky, Noam},
- journal={Information and control},
- volume={2},
- number={2},
- pages={137--167},
- year={1959},
- publisher={Elsevier}
- }
- @inproceedings{wadler1985replace,
- title={How to replace failure by a list of successes a method for exception handling, backtracking, and pattern matching in lazy functional languages},
- author={Wadler, Philip},
- booktitle={Conference on Functional Programming Languages and Computer Architecture},
- pages={113--128},
- year={1985},
- organization={Springer}
- }
- @inproceedings{hutton1990parsing,
- title={Parsing using combinators},
- author={Hutton, Graham},
- booktitle={Functional Programming: Proceedings of the 1989 Glasgow Workshop 21--23 August 1989, Fraserburgh, Scotland},
- pages={353--370},
- year={1990},
- organization={Springer}
- }
- @misc{hutton1996monadic,
- title={Monadic parser combinators},
- author={Hutton, Graham and Meijer, Erik},
- year={1996}
- }
- @inproceedings{izmaylova2016practical,
- title={Practical, general parser combinators},
- author={Izmaylova, Anastasia and Afroozeh, Ali and Storm, Tijs van der},
- booktitle={Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program manipulation},
- pages={1--12},
- year={2016}
- }
- @incollection{swierstra2008combinator,
- title={Combinator parsing: A short tutorial},
- author={Swierstra, S Doaitse},
- booktitle={International LerNet ALFA Summer School on Language Engineering and Rigorous Software Development},
- pages={252--300},
- year={2008},
- publisher={Springer}
- }
- @inproceedings{koopman1998efficient,
- title={Efficient combinator parsers},
- author={Koopman, Pieter and Plasmeijer, Rinus},
- booktitle={Symposium on Implementation and Application of Functional Languages},
- pages={120--136},
- year={1998},
- organization={Springer}
- }
- @article{leijen2001fast,
- title={Parsec, a fast combinator parser},
- author={Leijen, Daan},
- year={2001}
- }
- @inproceedings{kurvs2016optimizing,
- title={Optimizing parser combinators},
- author={Kur{\v{s}}, Jan and Vran{\`y}, Jan and Ghafari, Mohammad and Lungu, Mircea and Nierstrasz, Oscar},
- booktitle={Proceedings of the 11th edition of the International Workshop on Smalltalk Technologies},
- pages={1--13},
- year={2016}
- }
- @article{leijen2001direct,
- title={Parsec: Direct style monadic parser combinators for the real world},
- author={Leijen, Daan and Meijer, Erik},
- year={2001}
- }
- @inproceedings{koopman2019new,
- title={A new view on parser combinators},
- author={Koopman, Pieter and Plasmeijer, Rinus},
- booktitle={Proceedings of the 31st Symposium on Implementation and Application of Functional Languages},
- pages={1--11},
- year={2019}
- }
- @article{jonnalagedda2014staged,
- title={Staged parser combinators for efficient data processing},
- author={Jonnalagedda, Manohar and Coppey, Thierry and Stucki, Sandro and Rompf, Tiark and Odersky, Martin},
- journal={Acm Sigplan Notices},
- volume={49},
- number={10},
- pages={637--653},
- year={2014},
- publisher={ACM New York, NY, USA}
- }
- @article{kurvs2018efficient,
- title={Efficient parsing with parser combinators},
- author={Kur{\v{s}}, Jan and Vran{\`y}, Jan and Ghafari, Mohammad and Lungu, Mircea and Nierstrasz, Oscar},
- journal={Science of computer programming},
- volume={161},
- pages={57--88},
- year={2018},
- publisher={Elsevier}
- }
- @article{willis2020staged,
- title={Staged selective parser combinators},
- author={Willis, Jamie and Wu, Nicolas and Pickering, Matthew},
- journal={Proceedings of the ACM on Programming Languages},
- volume={4},
- number={ICFP},
- pages={1--30},
- year={2020},
- publisher={ACM New York, NY, USA}
- }
- @article{hill1996combinators,
- title={Combinators for parsing expressions},
- author={Hill, Steve},
- journal={Journal of Functional Programming},
- volume={6},
- number={3},
- pages={445--464},
- year={1996},
- publisher={Cambridge University Press}
- }
- @incollection{swierstra1996deterministic,
- title={Deterministic, error-correcting combinator parsers},
- author={Swierstra, S Doaitse and Duponcheel, Luc},
- booktitle={International School on Advanced Functional Programming},
- pages={184--207},
- year={1996},
- publisher={Springer}
- }
- @inproceedings{beguet2014accelerating,
- title={Accelerating parser combinators with macros},
- author={B{\'e}guet, Eric and Jonnalagedda, Manohar},
- booktitle={Proceedings of the Fifth Annual Scala Workshop},
- pages={7--17},
- year={2014}
- }
- @inproceedings{van2018gll,
- title={GLL parsing with flexible combinators},
- author={van Binsbergen, L Thomas and Scott, Elizabeth and Johnstone, Adrian},
- booktitle={Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering},
- pages={16--28},
- year={2018}
- }
- @article{leijen2001parsec,
- title={Parsec: A practical parser library},
- author={Leijen, Daan and Meijer, Erik},
- journal={Electronic Notes in Theoretical Computer Science},
- volume={41},
- number={1},
- pages={1--20},
- year={2001}
- }
- @inproceedings{mazanek2008graph,
- title={Graph parser combinators},
- author={Mazanek, Steffen and Minas, Mark},
- booktitle={Implementation and Application of Functional Languages: 19th International Workshop, IFL 2007, Freiburg, Germany, September 27-29, 2007. Revised Selected Papers 19},
- pages={1--18},
- year={2008},
- organization={Springer}
- }
- @article{swierstra2001combinator,
- title={Combinator parsers: From toys to tools},
- author={Swierstra, S Doaitse},
- journal={Electronic Notes in Theoretical Computer Science},
- volume={41},
- number={1},
- pages={38--59},
- year={2001},
- publisher={Elsevier}
- }
- @inproceedings{fokker1995functional,
- title={Functional parsers},
- author={Fokker, Jeroen},
- booktitle={Advanced Functional Programming: First International Spring School on Advanced Functional Programming Techniques B{\aa}stad, Sweden, May 24--30, 1995 Tutorial Text 1},
- pages={1--23},
- year={1995},
- organization={Springer}
- }
- @techreport{jones1993composing,
- title={Composing monads},
- author={Jones, Mark P and Duponcheel, Luc},
- year={1993},
- institution={Technical Report YALEU/DCS/RR-1004, Department of Computer Science. Yale~…}
- }
- @inproceedings{willis2018garnishing,
- title={Garnishing parsec with parsley},
- author={Willis, Jamie and Wu, Nicolas},
- booktitle={Proceedings of the 9th ACM SIGPLAN International Symposium on Scala},
- pages={24--34},
- year={2018}
- }
- @inproceedings{hudak1984combinator,
- title={A combinator-based compiler for a functional language},
- author={Hudak, Paul and Kranz, David},
- booktitle={Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages},
- pages={122--132},
- year={1984}
- }
- @inproceedings{wand1983loops,
- title={Loops in combinator-based compilers},
- author={Wand, Mitchell},
- booktitle={Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages},
- pages={190--196},
- year={1983}
- }
- \end{filecontents}
- *)
Add Comment
Please, Sign In to add comment