Untitled

(*
**** Hacheme ****
*)

(*p
\documentclass[a4paper]{report}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{ocamlweb}
\usepackage{makedix,glossaries,epigraph}
\usepackage{filecontents}
\usepackage{simpletrs,formal-grammar}
\usepackage{expl3}
\usepackage[backend=biber,style=authoryear]{biblatex}

\addbibresource{references.bib}
\includeonly{hacheme-command-definitions}
\includeonly{hacheme-glossary-definitions}
\includeonly{hacheme-index-definitions}
\makeindex
\makeglossaries

\begin{document}

The booklet before you presents the source code for \hacheme, an implementation of the \hachemestd, in a literate manner. This is not a textbook, nor is it a tutorial or a work of academia. This is merely a \litprogram. This also allows us to use \formeth{}s to, albeit partially, \verif\ this program --- and furthermore, delve into the history of \scheme\ and \lisp.


\section*{Map to the program}\label{sec:prog_map}


\section{A brief history of \lisp}\label{sec:lisp_hist}

\epigraph{"Ut therefore seems to be desirable to attempt to construct an artificial language which a computer can be programmed to use on problems and self~-reference. It should correspond to English in the sense that short English statements about the given subject matter should have short correspondents in the language and so should short arguments or consjectural arguments."}{J. McCarthy}

In the mid-1950s, \jmccarthy, an assisstant professor at \dartmouth, was presented the language \ipl; to which he showed no excitement due to its low level and rough shape\parencite{stoyan1984early}. In a surviving archival note, he prescribes \textit{the problem of programming a computer} into two bullet-points; the first being that "there does not exist an adequate language for human beings to describes procedures to each other", in a language which he felt like must have been \textit{explicit, univeral and concise} -- and the second issue he found was that computers at the time needed extra "engineering consideration" to be able t ounderstand such language\parencite{stoyan1984early}.

In his visit to \ibm, he was impressed by the success at engineering of the \fortran\ language by a team led by \jbackus. During September and October of that year, \jmccarthy\ worte some memos about the new language he was brewing in his mind\parencite{stoyan1979lisp}. McCarthy claimed he "did not understand Church's paper" so \lambdacalc\ did not play much of a role in early construction of \lisp\footnote{However, as we see later, \lambdacal\ plays a huge role in evolution of \scheme}.

% TODO: Complete history of lisp

\section{History of \scheme}


% TODO: Complete history of scheme


*)

(*

\section{\moduledef{Syntax}}\label{sec:mod_syntax}

\subsection{About the \absyn\ of \scheme}

The \modsty{Syntax} module defines the \syntax\ for \hacheme. As mentioned in \ref{sec:about_scheme}, the language uses fully-parenthesized \index{prefix} notation, or in other words, \sexpr. The \modsty{Syntax} module reflects this, but also, encodes several \textit{peculiarities} of the \scheme-flavored \sexp.

Where we to just use a \textit{basic} syntax for an \sexp\ language, we would have sufficed to \sexpatom{}s and \sexplist{}s. But our aim is to encode the syntax of \scheme, not just a run-off-the-mill \sexp\ syntax!

That is why \modsty{Syntax} module represents the \absyn\ for \scheme, and adds extra \hacheme-specific elements to it. In reality, the best way to convey the \absyn\ is using \sexp (\parencite{turbak2008design}\refextsection{2.3}). So languages like \scheme\ which use fully-parenthesized syntax are not only easier to parse, but each consturct conveys more info \textit{symbolically}.

The \scheme\ syntax is made up of \datum{}s which, in turn, form an \expr.

% TODO: Complete this section

*)
(*
\moduledefbegin{Syntax}
*)
module Syntax = struct
(*
\marktype{Syntax}{ProductType}{t}{The base type for \modsty{Syntax}}
*)
   type t =
       { contents : expr list
       ; location : Location.t
       }

(*
\marktype{Syntax}{SumType}{datum}{The \datum{}s, please see \ref{subsec:syntax_datum}}
*)
   and datum =
      | DATUM_Boolean of bool
      | DATUM_String of string
      | DATUM_Char of char
      | DATUM_Label of label * datum
      | DATUM_List of datum list
      | DATUM_Symbol of label
      | DATUM_Vector of datum array
      | DATUM_Pair of datum * datum

(*
\marktype{Syntax}{SumType}{quote}{The \quote{}s, please see \ref{subsec:syntax_quote}}
*)
   and quote =
      | QUOTE_Quote of expr
      | QUOTE_Quasiquote of expr
      | QUOTE_Unquote of expr
      | QUOTE_Splice of expr

(*
\marktype{Syntax}{SumType}{formals}{The \formal{}s, please see \ref{subsec:syntax_formals}}
*)
   and formals =
      | FORMALS_IdentList of ident list
      | FORMALS_SoloIdent of ident
      | FORMALS_IdentPair of ident list * ident


(*
\marktype{Syntax}{SumType}{expr}{The \expr{}s, please see \ref{subsec:syntax_expr}}
*)

   and expr =
      | EXPR_Ident of ident
      | EXPR_Thunk of datum
      | EXPR_Call of expr * expr list
      | EXPR_Lambda of ident option * formals * body
      | EXPR_Cond of Conditional.t
      | EXPR_Assign of ident * expr
      | EXPR_Macro of Macro.t
      | EXPR_Include of Include.t
      | EXPR_Quote of quote
      | EXPR_Derived of DerivedExpr.t

(*
\markauxtype{Syntax}{ident}
*)
   and ident = string

(*
\markauxtype{Syntax}{label}
*)

   and label = string

(*
\markauxtype{Syntax}{body}
*)

   and body = expr list
end

(*
\moduledefend{Syntax}
*)

(*

\section{\moduledef{Parser}}\label{sec:mod_parser}

\subsection{A theory for \sexpr}\label{subsec:theory_sexpr}

A theory for \sexpr\ was proposed in \parencite{sato1983theory} and subsequently in \parencite{sato1985theory}.

% TODO: Discuss Sato's Sexp theory

\subsection{The challenges in parsing \scheme}\label{subsec:challenges_parse_scheme}

Parsing \sexpr{}s are pretty simple, however; \scheme\ brandishes them with its own flavor of \textit{extensibility}, meaning that, an \scheme\ program is a living thing. It's not what is written down in a file, it gets expanded and expanded during runtime. This makes a native-code \index{compiler} for \scheme\ to be quite difficult to make. \parencite{dybvig1987models} discusses three models for implementing \scheme, and seminal as the dissertation is (I have made good use of it as well, this literate program is peppered with citations of this dissertation), all three models are what we today consider \textit{interpretive}. Since \hacheme\ is a compiler and not an interpreter, it must make use of \partialeval\ to first get all the expressions into a \normalform. The \modsty{Parser} must have facilities that will be used repeatedly.

*)

(*
\moduledefbegin{Parser}
*)

module Parser = struct
ZZ
(*
\marktype{Parser}{ProductType}{t}{The base type for \modsty{Parser} module}
*)
    type t =
       { input  : string
       ; tokens : token list
       }

(*
\marktype{Parser}{SumType}{token}{The type which represents \scheme's various \token{}s}
*)

   and token =
       | TOKEN_Numeric of lexeme
       | TOKEN_Ident of lexeme
       | TOKEN_String of lexeme
       | TOKEN_Char of lexeme
       | TOKEN_Bool of lexeme
       | TOKEN_Delim of lexeme
       | TOKEN_Punct of lexeme

(*
\markmodfn{Parser}{consume_until}{char list -> (char -> bool) -> (lexeme * char list)}{The \index{utility function} to \consume\ characters \textit{until} a condition becomes true, inverse of \funsty{consume_while}}
*)

     let consume_until chlist cond =
        let buffer = Buffer.create 32 in

	let rec aux ch chlist' =
	  if (cond ch) then
	    (Buffer.content buffer, chlist')
	  else
	    Buffer.add_char ch;
	    aux (List.hd chlist') (List.tl chlist')
	in
        aux (List.hd chlist) (List.tl chlist)

(*
\markmodfn{Parser}{consume_while}{char list -> (char -> bool) -> (lexeme * char list)}{The \index{utility function} to \consume\ characters \textit{whilst} a condition is true, inverse of \funsty{consume_until}}
*)

     let consume_while chlist cond =
       let buffer = Buffer.create 32 in
       (* \markauxfn *)
       let rec aux ch chlist' =
         if (cond ch) then
	   Buffer.add_char ch;
	   aux (List.hd chlist') (List.tl chlist')
	 else
	   (Buffer.content buffer, chlist')
	in
	aux (List.hd chlist) (List.tl chlist)

     let scan_hacheme input =
	 (* TODO *)

(*
\markauxty{Parser}{lexeme}
\aside{A lexeme is just a raw token\ldots!}
*)
    type lexeme = string
end

(*
\moduledefend{Parser}
*)

(*

\newpage
\printindex

\newpage
\printglossaries

\newpage
\printbibliography

\begin{filecontents}{references.bib}
@phdthesis{dybvig1987models,
  author="R. Kent Dybvig",
  title="Three Implementation Models for Scheme",
  year=1987,
  school="University of North Carolina at Chapel Hill",
  address="Chapel Hill, NC, USA",
  url="https://dl.acm.org/doi/10.5555/91071",
  note="Ph.D. thesis"
}

@techreport{scheme2013r7rs,
  title={Revised^7 Report on the Algorithmic Language Scheme},
  editor={R. Kent Dybvig and Michael Sperber and William D. Clinger and others},
  year={2013},
  journal={Scheme Language Steering Committee, Rep. R7RS},
  url={https://small.r7rs.org/attachment/r7rs.pdf},
}
@article{sato1983theory,
  title={Theory of symbolic expressions, I},
  author={Sato, Masahiko},
  journal={Theoretical Computer Science},
  volume={22},
  number={1-2},
  pages={19--55},
  year={1983},
  publisher={Elsevier}
}
@article{sato1985theory,
  title={Theory of symbolic expressions, ii},
  author={Sato, Masahiko},
  journal={Publications of the Research Institute for Mathematical Sciences},
  volume={21},
  number={3},
  pages={455--540},
  year={1985}
}
@inproceedings{stoyan1984early,
  title={Early LISP history (1956-1959)},
  author={Stoyan, Herbert},
  booktitle={Proceedings of the 1984 ACM Symposium on LISP and functional programming},
  pages={299--310},
  year={1984}
}
@article{stoyan1979lisp,
  title={LISP history},
  author={Stoyan, Herbert},
  journal={ACM Lisp Bulletin},
  number={3},
  pages={42--53},
  year={1979},
  publisher={ACM New York, NY, USA}
}
@inproceedings{padget1986desiderata,
  title={Desiderata for the standardization of LISP},
  author={Padget, Julian and Chailloux, J{\'e}r{\^o}me and Christaller, Thomas and DeMantaras, Ramon and Dalton, Jeff and Devin, Matthieu and Fitch, John and Krumnack, Timm and Neidl, Eugen and Papon, Eric and others},
  booktitle={Proceedings of the 1986 ACM conference on LISP and functional programming},
  pages={54--66},
  year={1986}
}
@incollection{steele1996evolution,
  title={The evolution of Lisp},
  author={Steele, Guy L and Gabriel, Richard P},
  booktitle={History of programming languages---II},
  pages={233--330},
  year={1996}
}
@book{turbak2008design,
  title={Design concepts in programming languages},
  author={Turbak, Franklyn and Gifford, David},
  year={2008},
  publisher={MIT press}
}
\end{filecontents}

\begin{filecontents}{hacheme-command-definitions.tex}

\newcommand\hacheme{}

\newcounter{modcnt}
\setcounter{modcnt}{0}
\newcommand\moduledef[1]{%
\stepcounter{modcnt}\textsc{Module} \#\arabic{modcnt} -> #1
}


\end{filecontents}

\begin{filecontents}{hacheme-glossary-definitions.tex}
\newglossaryentry{Term-rewriting}{
   name=Term-rewriting,
   description={}
}
\end{filecontents}

\begin{filecontents}{hacheme-index-definitions.tex}

\newcommand\scheme{\index{Scheme}}

\newcommand\hachemestd{\index{R7RS@$R^{7}RS$}}

\newcommand\litprogram{\index{literate program@\textit{literate program}}}

\newcommand\lisp{\index{Lisp}}

\newcommand\formeth{\index{formal method}}

\newcommand\verif{\index{verify@\textit{formally verify}}}

\end{filecontents}

\begin{filecontents}{simpletrs.sty}
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{simpletrs}[2024/9/15 A simple package for typesetting Term-rewriting systems]

\RequirePackage{amsmath}
\RequirePackage{xparse}


% Simple command for basic term-rewriting rules
\NewDocumentCommand{\trule}{m m}{
  \[ #1 \to #2 \]
}

% For conditional term-rewriting rules
\NewDocumentCommand{\truleif}{m m m}{
  \[ #1 \to #2 \quad \text{if} \quad #3 \]
}

% For contextual term-rewriting rules
\NewDocumentCommand{\ctrule}{m m m}{
  \[ #1[#2] \to #1[#3] \]
}

% For named term-rewriting rules
\NewDocumentCommand{\ntrule}{m m m}{
  \[ #1 \colon \quad #2 \to #3 \]
}

\newcommand{\relarrow}{\to}
\end{filecontents}

\end{document}

*)