View difference between Paste ID: 4x4TAtE3 and zjhA5Mr1
SHOW: | | - or go back to the newest paste.
1
/* The top level of a --C program a series of 1 or more function definitions.
2
 * Not having a function named "main" should be considered an error, though
3
 * "main" may appear anywhere.
4
 * Comments can also appear on the top level, but they should be tokenised and
5
 * then dropped (https://www.youtube.com/watch?v=YLdCKDyxQeI) during the
6
 * lexical analysis.
7
 */
8
9
// I'm going to use regular yacc syntax in this document because.
10
11
/*     function_definition : identifier identifier LBRACKET param_list RBRACKET block
12
 *                         ;
13
14
type_identifier identifier
15
16
 * (names for stuff to be decided)
17
 *
18-
 * say that a function doesn't return anything. We may want to just use an
18+
19
 * call.)
20
 * There are only three types (int, function, void) and one of them is only to
21
 * say that a function doesn't return anything
22
23
I'm thinking no void and default to returning 0 in int functions and raising an error in function returning functions.
24
25
. We may want to just use an
26
 * IDENTIFIER token for both of them, but an identifier rule would be useful so
27
 * we can have it do t[0] = t[1].value for all the other times we need to use
28
 * such identifiers.
29
 *     def p_identifier(t):
30
 *         """identifier : IDENTIFIER"""
31
 *         t[0] = t[1].value
32
 * (side note: yacc case sensitivity?)
33
 * (of course we may want it to output something like ("ID", t[1].value) so the
34
 * execution part can easily tell what kind of eldrich horror it has on its
35
 * virtual hands)
36
 *
37
 * We might also want to separate type identifiers and variable identifiers so
38
 * that we can get yacc to deal with the fact that there are only three possible
39
 * types at parse time.
40
 *     type : TYPE_INT
41
 *          | TYPE_FUNCTION
42
 *          ;
43
 * If we were feeling particularly enterprising/insane we could even make a
44
 * separate function_type rule for dealing with functions.
45
 *     type_void : TYPE_VOID
46
 *               ;
47
 *     function_type : type
48
 *                   | type_void
49
 *                   ;
50
 * (using "type_void" instead of "TYPE_VOID" works better)
51
 */
52
function negative_double_multiplier(int m) {
53
	int m2 = -2 * m;
54
	int f(int n) {
55
	    return n * m2;
56
	}
57
	return f;
58
}
59
60
/*     block : LCURLY declare_list statement_list RCURLY
61
 *           ;
62
63
We'll want block : { statement_list } | { declare_list statement_list }
64
So we don't have to include an empty statement_list
65
66
 * A block is the body of a function, an if statement, a while loop etc.
67
 * I'm not sure about this, but would a function definition such as maybe_this
68
 * below be valid?
69
 */
70
int maybe_this(int a, int b)
71
	return a + b;
72
/* If so, then the "block" in function_definition above could be replaced with
73
 * "statement".
74
 * (side note: statements are executed, expressions are evaluated(?))
75
 *     statement : block
76
 *               | expression SEMICOLON
77
 *               | assign_statement SEMICOLON
78
 *               | function_definition
79
 *               | while_statement
80
 *               | if_statement
81
 *               | return_statement SEMICOLON
82
 *               | iter_control SEMICOLON
83
 *               ;
84
85
Your side note doesn't make sense as evaluation should be the same as execution. Possibly you can split them based on execution affecting environment.
86
87
Either way, I'd think:
88
	statement : block | expression | control_statement
89
90
control_statement them subsumes while, if, return, iter (we don't need iteration as far as I know, just while)
91
expression subsumes assignment as (a = b) should evaluate to b in general C and it's easy enough to do
92
function_definition should only be permitted in declare_list
93
94
 * The only difference between a function's block and an if statement's one is
95
 * that the function also has the calling arguments in the local environment.
96
97
Not a difference at all really, since the calling arguments should behave like a parent environment or be initially set for the current environment
98
99
 * (also it can't contain iter_control statements (break, continue) as top-level
100
 * statements, but when not inside a while loop neither can if statements, so
101
 * we will also have to some logic for that somewhere (while possible,
102
 * making an alternate statement definition, if_statement definition etc might
103
 * not be the best (but it would allow for yacc to detect wrongly placed
104
 * iter_control statements))).
105
106
I don't think you can elegantly do this in the grammar, however you can maybe set a loop_counter that gets incremented when inside a loop, decerementing when leaving, raising an error when trying to leave l_c=0.
107
108
In the interpreter, detecting this is trivial since control statements should just bubble up the call chain until reaching a loop object or function object, raising an error if it reaches a function object (break, continue) or the "parent of root" (outside of function return)
109
110
 *
111
 * Declaration statements (while they are statements) aren't here because they
112
 * aren't allowed in statement_list, only in declare_list.
113
 *
114
 * One other point: do embedded function definitions have to be at the top of
115
 * the block (that is to say in the declare_list)? If so then obviously
116
 * function_definition shouldn't be here.
117
 *
118
 *     declare_start : type identifier
119
 *                   ;
120
 * I separate this from declare because it can also be used in param_list.
121
 * (unrelated: (lambda x: x(x))(lambda y: y(y)))
122
 *     declare : declare_start SEMICOLON
123
 *             | declare_start ASSIGN expression SEMICOLON
124
 *             ;
125
 * ASSIGN is "=", I used the name because it is less ambiguous. Equality is
126
 * "EQ".
127
 *
128
 *     expression : function_call
129
 *                | unary_expression
130
 *                | infix_expression
131
 *                | bracket_expression
132
 *                | number
133
 *                | identifier
134
 *                ;
135
136
I massively suspect reading about expression grammars is important here to make sure stuff like precedence is inherent in the grammar where feasible.
137
138
 * I may have forgotten one or two things here.
139
 */
140
141
int main() {
142
	int x;
143
	function multi = negative_double_multiplier(7);
144
	x = multi(multi(5));
145
	return maybe_this(2, x);
146
}