Homework and the Project

This page contains all of the homework and the compiler project. During the semester this page will grow quite long. You need to check it at once a week, preferably twice a week.

Homework

Reading Assignments

There is one traditional textbook, which you are expected to read cover to cover and other suggested reading that is optional.

Abbreviation What Details
L Textbook Kenneth C. Louden
Compiler Construction: Principles and Practice
PWS Publishing Co., 1997.
ISBN 0534939724
LMB Suggested reading John R. Levine, Tony Mason, and Doug Brown
Lex & Yacc
O'Reilly & Associates, 1992.
ISBN 1565920007
D Online notes C. C. Douglas, class notes

A good idea is to first sit down for a set period of time (e.g., 60 minutes) and read the books like a comic book. Do not really read anything in detail, just look at every single page (in order): skim a little, look a little. Your mind will subconsciously remember things later when you have to read the books in detail.

The table below lists what you are required to read by what date:

Read by Book What Comments
8/29/2006 L All Spend no more than 60 minutes on this. Just skim it and see what is in the book (and where). Do problems 1.2, 1.3, and 1.10.
8/31/2006 L Chapter 1 Skim sections 1.6 - 1.8. I will assign some exercises from the end of the chapter.
9/5/2006 L Chapter 2 Lexing (or scanning)
9/7/2006 LMB Chapters 2 and 6 lex (this is not required, but might be helpful)
9/14/2006 L Chapter 3 Context free grammars
9/21/2006 LMB Chapters 3 and 7 yacc (this is not required, but might be helpful)
9/28/2006 L Chapter 6.3 Symbol table philosophy
10/5/2006 L Chapter 4 Top down parsing
10/12/2006 L Chapter 5 Bottom up parsing
10/26/2006 L Chapter 6 Semantic analysis
11/9/2006 L Chapter 7 Memory organization at runtime
11/9/2006   Code Generator Stack based, inefficient code generator
11/16/2006 L Chapter 8 Code generation

Unless specified, problems from the textbook should be done individually, not as part of a group.

Resources of Interest

Lex, flex, yacc, byacc, and bison can easily be found on Linux, FreeBSD, and Mac OS X systems. Some of these tools can work with C++, not just C. Tools like jlex and jyacc work with Java. Searching on the web will find them for various operating systems (e.g., Cygwin/X for Windows).

Class Project

During this semester, students will be paired off into groups. Each group will write a complete compiler fairly quickly. Then we will embark on a code optimization trip. Your compiler will be stored in a cvs repository on one of my computers. You will be expected to master cvs (see the cvs Primer). No part of the project will be accepted by email.

Notes:

  1. This is going to be a heavy programming course. Swapping codes with other groups is expressly forbidden unless authorized by me in advance. If in doubt, contact me first. Otherwise you might find yourself in deep, deep trouble legally.
  2. The ultimate goal is to compile, run the class example, and get the correct output. Doing so means your group gets an A immediately.
  3. During the semester, we will work with many tiny_examples as well. I do not expect very many of you to get the correct output of the class example, but I expect all of you to be able to do the tiny_examples.
  4. If you are rusty on how to program tree data structures, please review this technique immediately. You will use trees extensively in the class project.

Due dates will be on this page in this section as the semester progresses. You are under no obligation to turn your project in at the last minute. Working ahead of the schedule is a good idea. Looking ahead to see where the project is going is an even better idea.

Due Date What is Due
9/19/2006 Lexer
10/10/2006 Simple Parser
10/24/2006 Symbol Table
11/9/2006 Trivial Code Generator
11/28/2006 Complete Parser
12/16/2006 Simple Code Generator
12/16/2006 Optimized Code Generator

Warnings:

  1. The due dates are initially approximate will probably may slip a few days. Watch this page for the exact due dates.
  2. More details will be added to later parts of the project during the semester. Re-read the specific part web page often when working on a part of the project. In particular, tiny examples will be added to later parts of the project.
  3. Do not get behind on the weekly sections of the project. I stage the assignments to nudge you through the project milestones. Getting behind early in the course is a disaster that you might not recover from. Remember, your whole group gets into trouble, not just you.
  4. The code generator takes a good bit of time to do. It is much trickier than you think it is. You had better get right on it once your parser is completed or you will never finish it by the due date. If you wait until December 1 to start, you are really sunk.
  5. I do not plan on giving any Incompletes. On the other hand, if you want to work on your compiler after the course is over, I will be happy to offer advice.
  6. The goal in this project is get as low a point total as you can. The more points you receive, the lower your grade. You will lose points by missing CSLab days. You will lose points by not following explicit directions (e.g., missing directories or files or mislabeled files in the cvs repository that you will use). You will lose points if your compiler fails to work correctly. You will lose points by not following the directions on the intermediate steps or if those steps fail in my testing. See Points for details.
  7. A programming project of the size of your compiler requires discipline to make it work correctly. Not following directions will cause problems that can become much worse over time.

Groups

This is a group oriented course. You will work with a partner throughout the compiler project. The programming/project groups are as follows:

Group Name Language Members
g1   Michael Leonard, Steven Walter
g2   Jeremy Patrick, Nate Rhodes
g3   Victor Ivanov, Bill Mullins
g4   Chris Keen, James Philips
g5   Joshua Stogsdill, Rick Wicker
See or contact me immediately if there is a problem with your group.

What to Put Into each Directory in your cvs Repository

You should not email me anything. Put everything into your group's cvs repository. When making a new directory in your repository, always include the following:

  1. A file called Organize (written in plain text, Word, or rtf format) that documents who is doing what part of the compiler. Be specific. Explains what is in the directory, what works, what does not, and any notes you want to keep about your ongoing term project. Update this file weekly and date the changes in the file (similar to a Changelog file commonly used in software projects).
  2. Makefile, which I can just type make and have your compiler made and run on a sample input that you create.
  3. Your compiler files.
  4. Your sample input files that are different from the class example file.
  5. A target in your Makefile so that I can type make clean and get a perfectly cleaned up directory with only the bare minimum of files left in it.

A common mistake that groups make is to put everything into cvs, add a new directory, but forget to add and commit files (and the new directory). Please checkout a copy of your project into a new disk area, change to the new directory and try make on your new copy. If it fails oddly, fix the problems and retry checking out the project again and testing from scratch. Keep iterating on adding and committing until you get a complete copy out of cvs.

Lexer

You will work with your group on this assignment in the CS Lab. Begin by checking out your cs441 group project from your cvs repository. Instructions will be handed out on 8/31/2006 in the CS Lab. Next create a new directory called lexer from your cvs repository. Working from the F06 language web page, you will write an entire lexer in some flavor of lex. All you have to do at this point is recognize constants, identifiers, key words, and operators. You do not have to save anything since the symbold table comes over the following two weeks.

Use lex, flex, or something similar. The final product of this week should be checked on one of the CS Lab Apple Mac OS X systems before turning the assignment in. I will be grading your lexers on one of my Macs. You may use Linux, Windows, Cygwin, etc. to do your actual initial work, but do the final checks on the CS Lab machines. Do not write a lexer by hand using the table driven technique discussed in class or something similar. Use an automatic generator.

You can easily find a solution to this part of the class project on the web. Do not do this since you will learn exactly zero by doing so. It is straightforward and will help in the parser part of the class project, which is nontrivial and not findable on the web.

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your lexer directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Your lexer file, which should be just a single input file to one of the lex-like programs.
  2. Start your Organize file.

Simple Parser

You will work with your group on this assignment in the CS Lab. Create a new directory called parser-simple from your cvs repository. Copy your files from your lexer directory. Consider the files in the lexer directory frozen. Do not modify them in that directory again. You will modify and add to the files in your parser-simple directory. You will create a parser for a small subset of the F06 language.

You will write your simple parser using yacc (or one of its cousins). The output will be a parse tree. Each node of your parse tree should have a unique identification number (I am fond of the natural numbers myself for the numbers). Do not just use pointers to nodes in your internal data structure for the parse tree. You will need to print each node number as part of this part of the class project.

You will use yacc and lex to parse the following two programs:

program tiny_example_1
{
	function integer main()
	{
		integer i;
		i := 441;
		print_integer( i );
	}
}
and
program tiny_example_2
{
	function integer main()
	{
		print_string( "Hello\n" );	
	}
}

Your group should work out as few yacc rules as you can to handle both examples and nothing more. Note that in the Symbol Table part of the class project you will have an example with a procedure and the main function as part of the program. Think ahead.

You can be quite creative in what your output looks like. However, being concise will make it far easier for you to debug your compiler as the term progresses and it gets more and more complicated. Print the node id and what is in the node. Suppose you have yacc grammar rules like

expression: expression PLUS expression { action goes here }
| expression MINUS expression { action goes here }
If I were compiling the snippet a+1-b, I would generate the following output:

** Node 101: Reduced: expression: expression PLUS expression
**** expression -> IDENTIFIER a
**** terminal symbol PLUS
**** expression -> ICONSTANT 1

... whatever else you are printing out

** Node 106: Reduced: expression: expression MINUS expression
**** expression -> parse tree node
**** terminal symbol MINUS
**** expression -> IDENTIFIER b

... whatever...

Once you have completed reading and processing all of the input, you will walk the parse tree and clearly print out what is in each node. Clearly mark that you are starting to print out your parse tree so that I can find it. I would print a banner like

++++++++++++++++++++++++++++++++++++++++++++++++
+ Walking through the Parse Tree Begins Here
++++++++++++++++++++++++++++++++++++++++++++++++
to get a readers' attention.

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your parser-simpledirectory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Your parser and lexer files.
  2. The output from using tiny_examples 1 and 2.
  3. Update your Organize file.

Symbol Table

You will work with your group on this assignment in the CS Lab. Begin by creating a new directory called symtab. Copy your files from your parser-simple directory into the new one. Save the new directory with all of its files in your cvs repository and verify that you can retrieve them. Then fix any remaining problems you had with your simple parser in the new directory.

Determine the scope of all variables. Write code that makes a complete pass through your parse parse tree. Variables can only be defined once per procedure or function. In theory, variables can be defined anywhere in the routines. You may enforce a no use before being declared policy if you want to. You should capture the following information about each instance of a variable:

A general class discussion about any of these issues is acceptable. You may use the class mailing list or just talk in groups. It will be considered polite if you provide a summary of any discussions on the class mailing list, however.

You will produce a tree oriented symbol table data structure. Each program, procedure, or function should have its own local symbol table. Each local symbol table should be a leaf in the overall symbol table tree. Hence you will have the scope of each member of the symbol table guaranteed for later use in the class project.

When your lexer identifies an IDENTIFIER, ICONSTANT, DCONSTANT, or SCONSTANT, store the value in your symbol table. The constants should only be in the symbol table once. Different representations of the same number should be stored only once. So 4, 04, and 004 should only be in the symbol table as the ICONSTANT 4. However, 4.0 would be a separate entry since it is a DCONSTANT. Take care designing how you deal with numbers versus IDENTIFIERS in your symbol table.

Above all, be flexible in your designs. This probably will not be the last version of your symbol table.

Write a routine that dumps all of the symbols from the table after yylex returns to your main program. Make the output look something like

Token type Symbol anything else you are storing now

Try to be neat in your output so that others can easily read it and determine what you are storing. You should also print the symbol table (before or after the parse tree is OK), but clearly mark it with something like
++++++++++++++++++++++++++++++++++++++++++++++++
+ Symbol Table Begins Here
++++++++++++++++++++++++++++++++++++++++++++++++
You can print a banner indicating the end of either of the parsing and symbol table printing areas if you want to explicitly indicate what is in each output section.

You might want to use symbols.c to generate a random collection of random length symbols. If you do use it, remember to follow the advice in the file and duplicate parts of the output so as to test whether or not you are getting unique symbols in your symbol table.

You will use yacc and lex to parse the following two programs:

program tiny_example_3
{
	procedure pr_int( integer i )
	{
		string title;
		title := "printing integer ";
		print_string(title);
		print_integer(i);
		print_string("\n");
	}

	function integer main()
	{
		integer i;
		i := 441;
		pr_int( i );
	}
}

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your symtab directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. The symbol table access, lexer, and simple parser files.
  2. The output from using tiny_example_3.
  3. Your random symbol file generated by symbols.c and the output from using that as input.
  4. Update your Organize file.

Trivial Code Generator

You will work with your group on this assignment in the CS Lab. Begin by creating a new directory called codegen-trivial. Copy your files from your symtab directory into the new one. Save the new directory with all of its files in your cvs repository and verify that you can retrieve them. Then fix any remaining problems you had with your simple parser in the new directory.

Each member of your group will produce working code with the F06 virtual machine for tiny_example_1 and tiny_example_2 in the Symbol Table part of the class project. Clearly document who did which example. You should work cooperatively, but each of you needs to know how to generate very simple code. Remember that the output of your compiler is a file called your-main.h. You will be able to compile and execute the resulting code with a command like gcc f06.c ; a.out on the Macs in CS Lab.

Since tiny_example_1 and tiny_example_2 work, it should be a small step to make tiny_example_3 work as well. You just have to agree on how to pass an integer to a procedure.

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your codegen-trivial directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. The symbol table access, lexer, simple parser, and code generator files.
  2. Your Makefile can make, run, and clean up afterwards tiny_example 1, 2, and 3.
  3. Output from using tiny_examples 1, 2, and 3.
  4. Update your Organize file.

Complete Parser

You will work with your group on this assignment in the CS Lab. Begin by creating a new directory called parser. Copy your files from your codegen-trivial directory into the new one. Save the new directory with all of its files in your cvs repository and verify that you can retrieve them. Then fix all of the problems you had with your compiler in the new directory.

You will complete your parser using yacc (or one of its cousins). The output will continue to be a parse tree. Each node of your parse tree should still have a unique identification number (I am fond of the natural numbers myself for the numbers). Do not just use pointers to nodes in your internal data structure for the parse tree. You will need to print each node number as part of this part of the class project.

Extend your trivial parser this week as fast as possible. Part will be done as a group, part individually. The first set of things you need to do are the following:

Be able to parse programs in the class language. After parsing, walk through your parse tree and print it out. Include the node identification number with each node. Finish the Organize file.

You may want to keep your parser split into separate files and have a make rule that creates a complete input file for yacc, e.g., in your Makefile you might two lines like

parser.y: parser-init.y parser-name1.y paerser-name2.y parser-final.y
cat parser-init.y parser-name1.y paerser-name2.y parser-final.y > parser.y

and work on the parser separately as well as cooperatively. You can base your splitting on what you decided on 2/21 or some adjustment to it. Always be flexible when writing a compiler.

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your hw-02-21 directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. The compiler to date.
  2. name1.f06 and name2.f06 that are short, simple, legal input for the f06 language that you make up by yourself. I will post all of these samples through the language web page.
  3. The output from using the class example program.
  4. Update your Organize file. This should be a blueprint of how you have organized everything to date and where you are heading in code generation.

Make certain that parser is completely in your cvs repository and can be retrieved by your partner.

Simple Code Generator

You will work with your group on this assignment in the CS Lab. Begin by creating a new directory called codegen. Copy your files from your parser directory into the new one. Save the new directory with all of its files in your cvs repository and verify that you can retrieve them. Then fix all of the problems you had with your compiler in the new directory.

You are going to write a code generator over the next several weeks. Code optimization will be done later. Concentrate on getting something robust. Speed will come later. Remember: It is far better to get the right answer slowly than the wrong answer quickly.

You should decide which parts of the F06 language you can and cannot generate code for. Do a subset well instead of trying to do everything and not having a compiler that compiles or works. Split the work in two. Document who is doing which parts of the code generator. An even split is for one person to do all of the associated arithmetic and data movement and the other do the rest.

In your codegen directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Make certain I can just type make and your compiler will compile (from scratch), link, and run an example. Check that I can type make clean and have a completely clean directory without anything made by lex, yacc, or your compiler.
  2. Update your Organize file. Keep track of what works, what almost works, what does not work, and what features of the language will be skipped. Track who is doing what part of the code generator. There should be an even split in work.

Check out a fresh copy of your project and make certain that everything is there that you expect (in all directories).

Optimized Code Generator

This is strictly extra credit. You can do this part while having implemented only a fraction of the F06 language. Implement optimizations as you will make up lost points from earlier in the course.

In your codegen directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Write a seprate Optimize file. Include exactly what features of optimization you have put into your compiler. Be very specific and include examples that showcase your optimizer. Modify your Makefile so that I can type make opt and see your results.
  2. Your compiler should have a pass through the parse tree that does all of the things that are listed in 1.

Points

During the term you might accumulate some points. Points are bad... really bad. You want to avoid points whenever possible for they will reduce your grade in the project. The course points lead to a course curve for grades. So, if all of you amass 100,000 points, it is still an A.

This is how you accumulate points based on each portion of the project.

Part Points Explanation
Lexer 20 Nothing turned in
average=4.8 10 Does not compile and link
median=2 2 Does not identify what type of operator or constant
2 No Makefile
1 Makefile not so I can just type make and have your own example run
1 No Organize file
0 Molehills that could grow into mountains
     
Simple Parser 20 Nothing turned in
average=4.6 10 Does not compile and link
median=2 5 Cannot parse sample codes
2 Directory parser-simple not present
2 No Organize file
2 Make does not build an executable and run an example
2 make clean does not work
2 >0 rules never reduced
0 Molehills that could grow into mountains
     
Symbol Table 20 Nothing turned in
average=8.0 10 Does not compile and link
median=10 10 Cannot determine if symbol table created
  5 Cannot process sample codes
  3 No parse tree walk
3 No Organize file
  2 Organize file does not contain required information (e.g., division of work)
2 Directory symtab not present
2 Make does not build an executable and run an example
2 make clean does not work
0 Molehills that could grow into mountains
     
Trivial Code Generator 20 Nothing turned in
average=5.0 10 Does not compile and link
median=0 5 Cannot process sample example
  4 No Organize file
  2 Directory codegen-trivial not present
  2 Make does not build an executable and run an example
  2 make clean does not work
  0 Molehills that could grow into mountains
     
Complete Parser 20 Nothing turned in
average=5.6 10 Does not compile and link
median=8 10 Does not provide enough information to determine if it is working or not
  5 Cannot parse simple example(s)
  3 Cannot parse mg.f06
  0 Molehills that could grow into mountains
     
Simple Code Generator 50 Nothing turned in
average=14.4 30 Does not compile and link
median=10 25 Does not produce any code in yourmain.h other than a call to F06_Exit
  10 Cannot compile sample example(s)
  10 Cannot compile mg.f06
  5 Simple examples compile, but do not run correctly
  5 mg.f06 compiles, but does not run correctly (or close)
  2 No separate codegen directory
  0 Molehills that could grow into mountains
     
Optimized Code Generator