CSE 504 Compiler Design

Fall 2016

Blackboard Introduction Course Description Grading Schedule Lecture Notes
Prerequisites Manuals Instructor Texts Special needs Survival Tips

Introduction

Theory and practice come together in Compilers in a way that is hard to replicate. Students learn about formal languages, algorithms for their efficient recognition, and practical tools that embody these algorithms and greatly simplify the development of compilers.

Compilers also offers a perspective on how research results find their way into practice. Much of the course material originates from research papers written in the 50s, 60s, 70s, and the 80s that have since been put into practice over the past 50 years.

Through programming projects, compilers courses also provide a "no-nonsense" approach for teaching software engineering practice. Modern compilers are among the most complex pieces of software around. Students learn how to break down this complexity --- how to modularize their design, how to make the modules flexible and extensible, how to decouple design and implementation, and finally, how to build efficient implementations.

It is important for a computer scientist to understand the process by which programs written in high-level languages are translated and executed. It will help them write code that provides good performance without compromising on readability and maintainability, portability and security.

An important side benefit of this course is to provide significant experience in OO-programming with C++. Modern C++ is an excellent language for writing high-performance code. While retaining some of the key benefits of C-language such as the ability to write low-level code, it enables programmers to avoid most of C's drawbacks such as lack of type-safety, lack of standardized libraries for data structures, and lack of modern features such as objects, templates, type inference, closures, exceptions, and so on.

What is new this year?

The programming assignments are being updated to use C++11 and C++14 features. In addition, the emphasis of the course has been shifting over the years to emphasize code generation, optimization and runtime systems, while reducing the emphasis on the compiler front-end.

The programming projects will be based on an event-processing language we call E--. This language was designed to provide a simple vehicle to teach all aspects of compiling (syntax and semantics analysis, code generation, optimization and runtime systems) without burdening students with an impossible amount of work.

In the first assignment, you will learn to use the Flex lexical analyzer generator to implement a lexical analyzer for E--. In the second and third assignments, you will learn to use the Bison parser generator to parse E-- programs and construct an abstract syntax tree (AST). The fourth and fifth assignments will involve type-checking and code generation for E-- and optimization. The first three programming assignments will be individual projects, whereas the fourth and fifth assignments will be group projects.

Course Description

In the class, we will discuss the theory and principles of developing compilers and interpreters. In the projects, you will then apply this theory (directly and indirectly) to develop a complete compiler for a high-level language. The problem of language translation is traditionally decomposed into many phases. Most common are:

All phases use a symbol table to keep track of the properties (types, number of parameters etc.) of each symbol (variables, procedures etc.). In the end, code is generated, which will be executed either directly by a machine (the so-called compiled-code approach) or by a software program that implements an "abstract machine" (the interpreter approach). Some languages (e.g., C) prefer the compiled-code approach due to its increased efficiency. Others (e.g., Java) prefer the interpreted approach due to its portability and flexibility.

The compiler will be written in C++. We don't expect that students already know a lot of C++ --- in fact, most students view this course as an opportunity to learn C++ at some level of depth. The structure and nature of a compiler makes it a natural candidate for learning OO-programming and appreciating its benefits.

We will use the LLVM clang and GNU g++ compiler on Linux. If you are unfamiliar with working in an UNIX environment, including the use of Makefiles and so on, then you should plan on spending some time in the first couple of weeks of the course to catch up. The files may work on cygwin as well, but it is not an officially supported environment.


Schedule:

Lecture time/location: Official meeting time for the course is Mon/Wed at 2:30pm to 3:50pm at Old CS 2311 (Wireless Seminar Room).

Programming Assignments: (Tentative)

Approximate Dates for Exams:

Mid-term I: Oct 3

Mid-term II: Nov 16

Last lecture: Dec 7, Wednesday

Final Exam: Dec 13, Tuesday, 5:30pm to 8:00pm


Lecture Notes:

These are notes from the previous offering of this course. They will be updated on an as-needed basis, as the material is covered in class.

Instructor:

R. Sekar
Office: 364 New CS Building
Office Hours: Mon/Wed 3:50pm to 4:50pm


Texts:

We will use the latest edition of the "Dragon book". Compilers: Principles, Techniques, and Tools, Second Edition by Aho, Sethi, Ullman, and Lam. Addison-Wesley


Grading

The weighting for programming assignments, midterm and final exams will be roughly as follows. Note that these weightings are preliminary, and are subject to some changes in the future. However, your final grade cannot be more than 1-point above the minimum of the grades you receive for homeworks or exams. For instance, if your homework grade is a C+, then your final grade can at most be a B+, even if you got 100% in all your exams.

You are advised to start working on the projects at the earliest possible time even if the deadlines are far away. The date the projects are due will be clearly specified. Projects are due by 11:59pm on the specified date. Late projects will be penalized at the rate of 5% for the first day, 15% the second day and 25% per day after that. So, there is no point submitting a project more than 4 days late! No further extensions will be allowed.

Each student gets one late pass that allows you to submit one assignment up to 4 days late. Use the late pass wisely. I will not accept more than one late submission for any reason, including emergencies.

Any form of copying in the course, and all forms of academic dishonesty, are considered serious offenses, and will be prosecuted to the maximum extent permitted by university policies.The minimum penalty for any form of copying, whether from your friends, the Internet or outside sources, or from previous offerings of the course, will be an F-grade. There will be no exceptions.

Note that we make use of automated software for catching copies, and this software is quite sophisticated and can see through typical evasion attempts. In addition, we manually check for unusual similarities among programs. These enhanced may not be performed on every submission, but if we do identify copying, then it is a situation of "one strike and you are out," i.e., you end up with an F-grade regardless of how well you have done in the course until then.


Prerequisites

Official prerequisites are CSE 304 and 307. I am usually willing to allow students that do not have one of these prerequisites, as long as I can be satisfied about their programming background. If you have only one of these prerequisites, please do come an talk to me so that I can sign any permission slips that you may need.

Regardless of whether you have taken the required prerequisites, I require that you be familiar with the material for CSE 307, specifically, the lecture notes posted here. We will provide a refresher for some of the most essential material covered in CSE 307, but it is up to the students to read up what they don't know.


Manuals for Compiler Tools

You are welcome to look up resources on the Internet, such as stack overflow. However, do not copy any material that solves a substantive part of your programming assignments. Moreover, don't copy any material without understanding it.


Special Needs

If you have a physical, psychological, medical or learning disability that may impact on your ability to carry out assigned course work, I would urge that you contact the staff in the Disabled Student Services office (DSS), in the ECC building (where the Computer Store used to be), 632-6748v/TDD. DSS will review your concerns and determine, with you, what accommodations are necessary and appropriate. All information and documentation of disability is confidential.


Survival Tips