Homework

Reading Assignments

There is one traditional textbook, which you are expected to read cover to cover and other suggested reading that is optional.

Abbreviation What Details
ASU Textbook Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman
Compilers: Theory and Practice
Addison-Wesley, 1990.
ISBN: 0201100886
LMB Suggested reading John R. Levine, Tony Mason, and Doug Brown
Lex & Yacc
O'Reilly & Associates, 1992.
ISBN 1565920007
D Online notes C.raig C. Douglas, class notes

A good idea is to first sit down for a set period of time (e.g., 60 minutes) and read the textbook like a comic book. Do not really read anything, but look at every single page (in order): skim a little, look a little. Your mind will subconsciously remember things later when you have to read the books in detail.

The table below lists what you are required to read by what date:

Read by Book What Comments
1/19/2006 LMB All Read like a comic book
1/23/2006 LMB Chapters 2 and 6 Read in detail
2/9/2006 ASU Sections 7.6 and 8.2 Read in detail about symbol tables
2/21/2006 LMB Chapters 3, 7, and 8 Read in detail
2/23/2006 ASU Chapter 4, sections 9 and 1-3 Read in detail about parsing
2/28/2006 ASU Chapter 4, sections 4-8 Read in detail about parsing
3/7/2006 ASU Chapter 5 Read in detail about parsing
3/21/2006 ASU Chapter 6 Read in detail about type checking
4/1/2006 ASU Chapter 9 Read in detail about code generation
4/18/2006 ASU Chapter 10 Read in detail about optimization

Homework Assignments (10%)

Homework will typically be done individually when textbook assignments are given and by your group when I give a programming exercise. I might post one of your assignments as the solution. I am not writing a compiler myself this term.

Due Date How Link/Description Solution
1/19/2006 Individual hw1, part 1 Groups
1/24/2006 Group hw1, part 2
5/2/2006, after 7pm Individual Email me what parts of the compiler you wrote.

Resources of Interest

Lex, flex, yacc, byacc, and bison can easily be found on Linux, FreeBSD, and Mac OS X systems. Some of these tools can work with C++, not just C. Tools like jlex and jyacc work with Java. Searching on the web will find them for various operating systems (e.g., Cygwin for Windows).

Groups

This is a group oriented course. You will work with a partner throughout the compiler project. The programming/project groups are as follows:

Group Name Language Members
g1 C++ Cindy Burklow, Derrick Spell
g2 C Jianzhong Wang, Li Wang
g3 Python, Java Pete Carey, Mark Maynard
g4 C++ Eli Arnaudova, Aarthi Jayaram
g5 C Nathan Liang, Wei Li
g6 C Soham Chakraborty, Divya Bansal
See or contact me immediately if there is a problem with your group.

Class Project (90%)

During this semester, students will be paired off into groups. Each group will write a complete compiler fairly quickly. Then we will embark on a code optimization trip.

This is going to be a heavy programming course. Swapping codes with other groups is expressly forbidden unless authorized by me in advance. If in doubt, contact me first. Otherwise you might find yourself in deep, deep trouble legally.

Due dates will be on this page in this section as the semester progresses. You are under no obligation to turn your project in at the last minute. Warning: The due dates are currently approximate. Watch this page for the exact dates.

Due Date What is Due
2/2/2006* Lexer
2/25/2006* Symbol Table
3/28/2006* Parser
4/13/2006** Simple Code Generator
5/2/2006, 6:15pm* Optimized Code Generator

**Firm date, **Tentative date

Warnings:

  1. Do not get behind on the weekly sections of the project. I stage the assignments to nudge you through the project milestones. Getting behind early in the course is a disaster that you might not recover from. Remember, your whole group gets into trouble, not just you.
  2. The code generator takes a good bit of time to do. It is much trickier than you think it is. You had better get right on it once your parser is done or you will never finish it by the due date. If you wait until December to start, you are really sunk.
  3. I do not plan on giving any Incompletes. On the other hand, if you want to work on your compiler after the course is over, I will be happy to offer advice.
  4. The goal in this project is get as low a point total as you can. The more points you receive, the lower your grade. You will lose points by missing CSLab days. You will lose points by not following explicit directions (e.g., missing directories or files or mislabeled files in the cvs repository that you will use). You will lose points if your compiler fails to work correctly. You will lose points by not following the directions on the intermediate steps or if those steps fail in my testing. See Points for details.
  5. Programming projects of the size of your compiler require discipline to make work. Not following directions will cause problems that can become much worse over time.

What to Put Into each Directory in your cvs Repository

You should not email me anything. Put everything into your group's cvs repository. When making a new directory in your repository, always include the following:

  1. Readme.txt, which explains what is in the directory, what works, what does not, and any notes you want to keep about your ongoing term project.
  2. Makefile, which I can just type make and have your compiler made and run on a sample input that you create.
  3. Your compiler files.
  4. Your sample input files that re different from the class example file.
  5. A target in your Makefile so that I can type make clean and get a perfectly cleaned up directory with only the bare minimum of files left in it.

A common mistake that groups make is to put everything into cvs, add a new directory, but forget to add and commit files (and the new directory). Please checkout a copy of your project into a new disk area, change to the new directory and try make on your new copy. If it fails oddly, fix the problems and retry checking out the project again and testing from scratch. Keep iterating on adding and committing until you get a complete copy out of cvs.

Lexer

You will work with your group on this assignment in the CS Lab. Begin by checking out your cs541 group project from your cvs repository. Instructions will be handed out on 1/26/2006 in the CS Lab. Next create a new directory called lexer from your cvs repository. Working from the S06 language web page, you will write an entire lexer in some flavor of lex. All you have to do at this point is recognize constants, identifiers, key words, and operators. You do not have to save anything since the symbold table comes over the following two weeks.

Use lex, flex, or something similar. The final product of this week should be checked on one of the CS Lab Apple Mac OS X systems before turning the assignment in. I will be grading your lexers on one of my Macs. You may use Linux, Windows, Cygwin, etc. to do your actual initial work, but do the final checks on the CS Lab machines. Do not write a lexer by hand using the table driven technique discussed in class or something similar. Use an automatic generator.

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your lexer directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Your lexer file, which should be just a single input file to one of the lex-like programs.

Symbol Table

You will work with your group on this assignment in the CS Lab. Begin by creating a new directory called hw-02-09. Copy your files from your lexer directory into the new one. Save the new directory with all of its files in your cvs repository and verify that you can retrieve them. Then fix all of the problems you had with your lexer in the new directory.

You are going to explore different ways of organizing your symbol table. You will design an extensible record for each symbol next week. This week you will design and implement the basic access methods. When your lexer identifies an IDENTIFIER, ICONSTANT, DCONSTANT, or SCONSTANT, store the value in your symbol table. Remember that you do not have a parser. You cannot identify any scope whatsoever at this point. That will come later. You just want to be able to uniquely store symbols in your symbol table(s).

The constants should only be in the symbol table once. Different representations of the same number should be stored only once. So 4, 04, and 004 should only be in the symbol table as the ICONSTANT 4. However, 4.0 would be a separate entry since it is a DCONSTANT. Take care designing how you deal with numbers versus IDENTIFIERS in your symbol table. Consider looking for advice in the textbook ASU (sections 7.6 and 8.2).

Since you cannot determine the scope of a variable at this point, each IDENTIFIER should be uniquely stored in the symbol table. Note that once you write your parser, you will be able to easily add scoping by global definitions versus inside a specific function or procedure. Plan on being able to store nonunique IDENTIFIERs in the symbol table if you are going to use a stack or tree based symbol table based on scope. (Remember, for now, no scoping.)

Your symbol table should be based on a (key,reference) system. The key represents the symbol's value. The reference is the equivalent of a pointer to one of more records for the symbol. In separate files, organize symbol table routines as

You want the calling sequences to be identical so that you can trivially substitute one access method for any other. This will allow you to analyze which is more efficient in different circumstances.

Above all, be flexible in your designs.

What to Have Done on 2/9 in CSLab

Fix your lexer from 2/2/2002. This might be easier for one of you to do instead of both of you at once. You should make certain that your include mechanism works. I have placed two sample inputs in the Notes web page that you can test your lexer on. The lexer-in1.s06 example is self contained. Any inlcude statements in it do not have corresponding files, so you need a graceful way of handling missing include files that does not involve quitting the compiling process. The lexer-in2.s06 has three more files associated with it that you should download, too (lexer-in2a.s06, lexer-in2a1.h, and lexer2a1.hh). You can actually use lexer-in2a.s06 independently of lexer2.s06.

Find one or more hashing codes and decide on using one.

Have at least one of the symbol table access methods working. Choose a simple one to implement first. You can split the other two up if necessary.

What to Have Done on 2/16 in CSLab

Write a routine that dumps all of the symbols from the table after yylex returns to your main program. Make the output look something like

Token type Symbol anything else you are storing now

You should write the routine assuming that it will be extended to cover individual records for actual declared variables once you have a parser to start finding declaration. Try to be neat in your output so that others can easily read it and determine what you are storing.

You might want to use symbols.c to generate a random collection of random length symbols. If you do use it, remember to follow the advice in the file and duplicate parts of the output so as to test whether or not you are getting unique symbols in your symbol table.

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your hw-02-09 directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. The three symbol table access files.
  2. An Analysisfile (text, Word, rtf, etc. format) that explains how the three access methods compare to each other, which one you intend to use, and why.

You should create another directory, symboltable, that contains your lexer and symbol table combination (with just one of the access methods). Always include a Makefile, Readme.txt, and the usual things that I require.

Make certain that both hw-02-09 and symboltable are completely in your cvs repository and can be retrieved by your partner.

Parser

You will work with your group on this assignment in the CS Lab. Begin by creating a new directory called hw-02-21. Copy your files from your symboltable directory into the new one. Save the new directory with all of its files in your cvs repository and verify that you can retrieve them. Then fix all of the problems you had with your compiler in the new directory.

You will write your parser using yacc (or one of its cousins). The output will be a directed acyclic graph (called a dag). If you do not know what a dag is, look it up in ASU. Each node of your dag should have a unique identification number (I am fond of the natural numbers myself for the numbers). Do not just use pointers to nodes in your internal data structure for the dag. You will need to print each node number as part of your assignment.

What to Have Done during 2/21-3/2 in CSLab

You will start writing a grammar for your language this week. This is something that all of you should have to do part of individually. You should program the assignment in yacc since it will help you debug the grammar and you have to have it done real soon now anyway.

As a group, decide in CSLab the following:

What to Have Done on 3/9 in CSLab

Be able to parse programs in the class language. After parsing, walk through your dag and print it out. Include the node identification number with each node. Finish the Organize file.

You may want to keep your parser split into separate files and have a make rule that creates a complete input file for yacc, e.g., in your Makefile you might two lines like

parser.y: parser-init.y parser-name1.y paerser-name2.y parser-final.y
cat parser-init.y parser-name1.y paerser-name2.y parser-final.y > parser.y

and work on the parser separately as well as cooperatively. You can base your splitting on what you decided on 2/21 or some adjustment to it. Always be flexible when writing a compiler.

Remember that I want you to have a dag in the end. What are you likely to have at the end of the part of the project? Probably a tree, which is also a dag, but an uninteresting one at first glance. You want to be careful about constructing a dag. Making a dag out of the entire sample input file is not useful in this course. However, making a dag inside of each procedure.

Note that I have not specified anything yet on extending your symbol table. Keep that in mind, too.

You can be quite creative in what your output looks like. However, being concise will make it far easier for you to debug your compiler as the term progresses and it gets more and more complicated. Print the node id and what is in the node. Suppose you have yacc grammar rules like

expression: expression PLUS expression { action goes here }
| expression MINUS expression { action goes here }
If I were compiling the snippet a+1-b, I would generate the following output:

** Node 101: Reduced: expression: expression PLUS expression
**** expression -> IDENTIFIER a
**** terminal symbol PLUS
**** expression -> ICONSTANT 1

... whatever else you are printing out

** Node 106: Reduced: expression: expression MINUS expression
**** expression -> dag node
**** terminal symbol MINUS
**** expression -> IDENTIFIER b

... whatever...

Once you have completed reading and processing all of the input, you will walk the parse tree and clearly print out what is in each node. Clearly mark that you are starting to print out your parse tree so that I can find it. I would print a banner like

++++++++++++++++++++++++++++++++++++++++++++++++
+ Walking through the Parse Tree Begins Here
++++++++++++++++++++++++++++++++++++++++++++++++
to get a readers' attention. You should also print the symbol table (before or after the parse tree is OK), but clearly mark it with something like
++++++++++++++++++++++++++++++++++++++++++++++++
+ Symbol Table Begins Here
++++++++++++++++++++++++++++++++++++++++++++++++
You can print a banner indicating the end of either of the above printing areas if you want to explicitly indicate what is in each output section.

What to Turn In

You should not email me anything. Put everything into your group's cvs repository. In your hw-02-21 directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. The compiler to date.
  2. name1.s06 and name2.s06 (replace name1/2 with your names) that are 5-10 line simple, legal input for the s06 language that you make up by yourself. I will post all of these samples through the language web page.
  3. An Organize file (text, Word, rtf, PowerPoint, PDF, etc. format) that explains how your compiler is constructed. This should be a blueprint of how you have organized everything to date and where you are heading in code generation.

You should create another directory, parser, that contains your lexer, symbol table, and parser combination. Always include a Makefile, Readme.txt, and the usual things that I require.

Make certain that both hw-02-21 and parser are completely in your cvs repository and can be retrieved by your partner.

Simple Code Generator

You are going to write a code generator over the next several weeks. Code optimization will be done later. Concentrate on getting something robust. Speed will come later. Remember: It is far better to get the right answer slowly than the wrong answer quickly.

What to Have Done on 3/23 in CSLab

Determine the scope of all variables. Write code that makes a complete pass through your parse dag (tree). Variables can only be defined once per procedure or function. In theory, variables can be defined anywhere in the routines. You may enforce a no use before being declared policy if you want to. You should capture the following information about each instance of a variable:

A general class discussion about any of these issues is acceptable. You may use the class mailing list or just talk in groups. It will be considered polite if you provide a summary of any discussions on the class mailing list, however.

You should not email me anything. Put everything into your group's cvs repository. In your hw-03-29 (in preparation fo the the next steps) directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Continue writing your Organize file. Include your strategy for identifying variables (scope, type, shape, etc.).

What to Have Done on 3/28 in CSLab

Start generating working, simple code for the following in whatever order makes sense to you:

You should not email me anything. Put everything into your group's cvs repository. In your hw-03-29 (in preparation fo the the next steps) directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Continue writing your Organize file. Include your strategy for generating code for simple objects. It is better to first write a strategy, then revise it to what actually works.
  2. Your code generator.

What to Have Done by 4/27 in CSLab

Complete your simple code generator. You should decide which parts of the S06 language you can and cannot generate code for. Do a subset well instead of trying to do everything and not having a compiler that compiles or works.

In your codegen directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Finish writing your Organize file. Include your strategy for identifying variables (scope, type, shape, etc.). Include who do what on the code generator. Be very specific.
  2. Make certain I can just type make and your compiler will compile (from scratch), link, and run an example. Check that I can type make clean and have a completely clean directory without anything made by lex, yacc, or your compiler.
  3. Check out a fresh copy of your project and make certain that everything is there that you expect (in all directories).

Optimized Code Generator

This is strictly extra credit. You can do this part while having implemented only a fraction of the S06 language.

Implement optimizations as you will make up lost points from earlier in the course. It is due by 6:15 (Eastern Time) on Tuesday, May 2. I will disable your cvs accounts around 6:16pm (by my watch and http://www.time.gov). so get your final results in by 6:00 pm to be safe. Note that I will be reloading the actual computer from scratch before I go home on May 2. If I do not get your updates, oh, well.

In your codegen directory, please include the following in addition to the regular files that should always be in one of your directories:

  1. Write an Optimize file. Include exactly what features of optimization you have put into your compiler. Be very specific and include examples that showcase your optimizer. Modify your Makefile so that I can type make opt and see your results.
  2. Your compiler should have a pass through the parse tree (dag) that does all of the things that are listed in 1.

Points

During the term you might accumulate some points. Points are bad... really bad. You want to avoid points whenever possible for they will reduce your grade in the project. The course points lead to a course curve for grades. So, if all of you amass 100,000 points, it is still an A.

This is how you accumulate points based on each portion of the project.

Part Points Explanation
Lexer 20 Nothing turned in on time
average=3.167 3 Include does not get input from file specified
median=3 2 Does not identify what type of operator or constant
2 Include does not redirect input back to the original file
1 Makefile not so I can just type make and have your own example run
1 No example or used mine (or part of mine)
1 No Readme.txt file
0 Molehills that could grow into mountains
Symbol Table 25 Nothing turned in on time
average=3.667 10 No Analysis file
median=0 5 symtable directory not present (all work in hw-02-09)
5 Analysis file inadequate
5 One method (vector, linked list, or hashing) not implemented
1 make clean does not work
1 make does not work
1 No Readme.txt file
0 Molehills that could grow into mountains
Parser 50 Nothing turned in on time
average=11.333 20 Cannot compile
median=18 10 Cannot parse simple codes
10 Directory parser not present (e.g., all work in hw-xx-xx)
10 cannot parse mg.s06 (including mg-s06.h) -- very simple codes ok
8 No Readme.txt file
8 Make does not build an executable and run an example
8 make clean does not work
2 >0 rules never reduced
0 Molehills that could grow into mountains
Code Generator 105 Nothing turned in on time
average=79.166 20 Symbol allocation/scoping does not work or is not used
median=75 20 Basic arithmetic expressions do not work
15 No timing
10 No strings
10 No classes
10 No functions or procedures
10 No Makefile/makefile
10 No control statements (if, do, ...)
10 make or make clean does not correctly
10 No 1D arrays
5 No 2D arrays
5 No unary -, ++, or --
5 Control statements (if, do, ...) generate high level C code
Code Optimizer 0 Nothing turned in
average=3.667 -5 Automatically typecast inside built in functions
median=0 -5 Arithmetic expressions
-5 Register allocation
-5 Simple code elimination
-1000 Global optimization for the full language across multiple files