The F04 language will be similar to standard algorithmic languages, but will not follow any one in particular. You will not have an extensive library for input, output, system calls, or memory allocation.
The lexigraphical structure of the language is defined as follows:
- Ignore white space (blanks, tabs, new lines, and form feeds).
- An identifier is a string made up of letters, digits, and underscores that does not begin with a digit. The maximum number of characters you should pay attention to are the first 32. Ignore everything after that. (Token name is IDENTIFIER.)
- A character constant is enclosed in single quotes. (Token name is CCONSTANT.)
- A string constant is enclosed in double quotes. (Token name is SCONSTANT.)
- Special characters should be treated as C does using a backslash.
- An integer is a sequence of digits 0, ..., 9. (Token name is ICONSTANT.)
- A floating point number has 64 bits (i.e., a C or C++ double) and the
- Mantissa only: 123., 123.3, 0.3, .3
- Mantissa and exponent: 0.123e42, 1.23e-3, 0.001e+10
- Key words should processed specially. Unless noted, capitalize the
key word as its token (e.g., token name CHAR for char).
- begin (Token name BBEGIN.)
- Operators (and punctuation) that need to be recognized are the following:
Token name = ASSIGN : COLON , COMMA // COMMENT && DAND / DIVIDE || DOR == DEQ >= GEQ > GT [ LBRACKET <= LEQ ( LPAREN < LT - MINUS % MOD * MULTIPLY . PERIOD ~= NE ~ NOT . PERIOD + PLUS ] RBRACKET ) RPAREN ; SEMI
Tokens do not cross new lines. If lex cannot produce a string or number, you do not either. An error message is appropriate if you truncate something, however. An underflow (a floating point number that rounds off to 0 instead of whatever it really is in infinite precision arithmetic), however, is not an error, and should be treated as 0.
Defining It by Example
The easiest way to see what the language really is, is to study an example. Our example solves the heat equation on a patch. With a little work, it could even be a set of patches that were chosen using an adaptive mesh refinement method and could be turned into a parallel programming code. Oh, no... Who let a partial differential equation solver into a compiler course!!!
// This is the Fall 2004 target code for CS 441G. // Your compiler should be able to lex, parse, and generate code for the // entire file by the end of the course. // Watch out for inconsistencies in this file and report them. begin program AMRsolve; begin class Grid; // Variable declarations for class int imin, imax; // first and last indices in dimension i int jmin, jmax; // first and last indices in dimension j double data[imin:imax, jmin:jmax]; // actual data begin procedure Make( int i0, i1, j0, j1; ); // Local variable int j; // Check and set bounds if i0 <= 0 || i1 < i0 || j0 <=0 || j1< j0 then printf("Error in array bounds in call to Grid.Make()\n"); return; else imin = i0; imax = i1; jmin = j0; jmax = j1; end if // Zero data begin do int i = i0 : i <= i1 : i = i+1; begin do j = j0 : j <= j1; data[i,j] = 0.; j = j + 1; end do; end do; end procedure; // Make begin procedure SetInterior( double val; ) int i, j; // loop variables i = imin+1; begin do i = imin + 1 : i < imax : i = i + 1; begin do j = jmin + 1 : j < jmax : j = j + 1; data[i,j] = val; end do; end do; end procedure; // SetInterior begin function int Iterate( int iters; double rnorm; ); double nrm; // Norm of data int i, j; // Utility variables nrm = rnorm + 1.0e0; // Loop at most iters times while the error norm is greater than rnorm begin do int itr = 1 : i <= iters : i = i + 1 while nrm > rnrm; begin do i = imin : i <= imax : i = i + 1; begin do j = jmin : j <= jmax : j = j + 1; data[i,j] = 2.5e-1 * ( data[i-1,j] + data[i+1,j] + data[i,j-1] + data[i,j+1] ); nrm = nrm + data[i,j] * data[i,j]; end do; end do; end do; return itr; end function; // Iterate begin procedure Destroy(); imin = 0; imax = 0; jmin = 0; jmax = 0; // Never hurts end procedure; // Destroy end class; // Grid // Main program follows: Grid u; // Declare int i; // Number of iterations run u.Make( 0, 127, 32, 255); u.SetInterior( 2.0 ); i = u.Iterate( 100, 1.0e-3 ); printf( "%d iterations\n", i ); // Pass this straight through to C end program; // AMRSolve
- Declarations can occur right after a begin statement. The declarations are only valid within the block. After the block end, all of the variables declared in that block become undefined and memory should be freed automatically.
- Not all of the key words have been used (e.g., scanf and some of the logical operators).
- scanf and printf are special cases that will be discussed in class.
- A comment is the text folloing the // to the end of the line, as in C++. The text should be thrown away by the lexer unless you want to save it for some reason (e.g., complete program reconstruction from your parse tree).
The class put together a number of simple 5-10 line examples:
arnaoudova.f04 berry.f04 brumm.f04 chao.f04 delgoda.f04 drerup.f04 funken.f04 gilbert.f04 greathouse.f04 hays.f04 kaneko.f04 meredith.f04 pang.f04 porter.f04 quinn.f04 yang.f04
The Symbol Table
The symbol table will be a pain in the neck all semester. Just when you think that you have it just right, you will have an inspiration and need a modification or addition. So be really flexible and defensive in designing your symbol table.
Start by entering data such as an identifier name, constant value, the scope, and the memory location (if appropriate). Keep all attributes of a name in a separate record. Also, think of a fast way to access the symbol table. My favorite is a combination of hashing the symbols and then using a linked list to get to the right symbol table entry.
Your lexer can help. For many tokens, there is the token value (e.g., ICONSTANT) and a token subvalue (e.g., the binary value of the string in yytext) that can be stored in the global variable yylval. You should take advantage of this as often as possible in the early stages of compiling.
Finally, our language allows for variable to be redefined inside of blocks. Watch out for the nested loop variables and other equally unsavory variables. Your symbol table must resolve entries correctly so that the right instance of a variable is always used in the right place.