• Course Introduction

        Because we have compiler programs, software developers often take the process of compilation for granted. However, as a software developer, you should cultivate a solid understanding of how compilers work in order to develop the strongest code possible and fully understand its underlying language. In addition, the compilation process comprises techniques that are applicable to the development of many software applications. As such, this course will introduce you to the compilation process, present foundational topics on formal languages and outline each of the essential compiler steps: scanning, parsing, translation and semantic analysis, code generation, and optimization. By the end of the class, you will have a strong understanding of what it means to compile a program, what happens in the process of translating a higher-level language into a lower-level language, and the applicability of the steps of the compilation process to other applications.

      • Unit 1: Introduction to Compilers

        The compilation process is one of the steps in executing a program. Understanding how compilers work and what goes on "behind the scenes" will help you get better at developing software. This unit will first provide you with an introduction to the compiler, its history, compiler structure and design, and the types of compilers. By the end of this unit, you will be able to describe the steps of the compilation process.

        Completing this unit should take you approximately 8 hours.

      • Unit 2: Formal Languages and Formal Grammar

        Formal languages and formal grammars are the theoretical foundation for computer languages and the compilation process. Formal languages are defined by their grammars, which specify the syntax of the languages.

        Completing this unit should take you approximately 6 hours.

      • Unit 3: Finite State Machines

        Finite state machines (FSM), also called finite state automata (FSA), are conceptual models for recognizing, parsing, and generating strings in a formal language. An FSM can be used to recognize (i.e., determine) whether a string adheres to the syntax of a language. Moreover, an FSM can be used to build a syntax tree, which shows the derivation (i.e., how the string was constructed) of the string. This unit introduces (or reviews) FSMs, which are covered in detail in other courses (for example, CS202: Discrete Structures).

        Completing this unit should take you approximately 6 hours.

      • Unit 4: Scanning and Lexical Analysis

        Lexical analysis is performed by a scanner, one of the front-end components of a compiler. The foundation for lexical analysis is provided by regular grammars and finite state automata. This unit studies scanners and lexical analysis in terms of development process products: requirements, functions, design, construction, and test. The verification of a scanner is done through testing. Validation is based on the programming language specifications, and operation of the scanner as a component of the compiler or application system that uses it.

        Completing this unit should take you approximately 12 hours.

      • Unit 5: Parsing and Syntax Analysis

        The next step of the compilation process is parsing. This step also has a foundation in formal languages and automata. Parsing takes input from the Lexical Analysis step and builds a parse tree, which will be used in future steps to develop the machine code. In this unit, we will define parsing and identify its uses. We will also discuss two parsing strategies, Top-Down Parsing and Bottom-Up Parsing, examining what it means to approach parsing from each standpoint and taking a look at an example of each. By the end of the unit, you will understand parsing techniques with regards to compilers, and be able to discuss each of the two main approaches.

        Completing this unit should take you approximately 28 hours.

      • Unit 6: Syntax Directed Translation and Semantic Analysis

        Semantic Analysis takes input from the parsing process and prepares the code for the code-generation step. In this unit, we will discuss this process in detail, learning about scope and type-checking, type expression, type equations, and type inference.

        Completing this unit should take you approximately 33 hours.

      • Unit 7: Runtime Environment

        Runtime environment considerations include organization of the compiled program, storage, building blocks of a compiled program, and different runtime configurations. This unit has some overlap with Unit 8. The emphasis in this unit is on representation of needed data structures and techniques for objects and inheritance. The emphasis of Unit 8 is on instruction generation.

        Completing this unit should take you approximately 14 hours.

      • Unit 8: Code Generation

        This unit is closely related to Unit 7, where the emphasis was on representation of data structures needed for run-time. While there will be some overlap, the emphasis in this unit is on instruction-level intermediate code generation and from intermediate code to target code.

        The last phase (or next to the last phase if there is a code optimization phase) of the compilation process is code generation, where the output from the previous steps is finally translated into machine code, ready to execute on the target platform. In this unit, we will start with a discussion of code generation in general before moving on to a more detailed description of the code generation process. This will include an in-depth discussion of three main areas: Instruction Selection, Instruction Scheduling, and Register Allocation. By the end of this unit, you will have a firm understanding of the code generation process.

        Completing this unit should take you approximately 17 hours.

      • Unit 9: Code Optimization

        Simply compiling and executing a program is not enough to get the most out of your code. It is the optimization process that allows your code to run as effectively and efficiently as possible. In this unit, we will first take a look at optimization, learning what it is and why we are interested in it. Next, we will review different optimization categories, including Peephole, Local, Loop, Language Dependent, and Machine Dependent. We will conclude with a discussion of different optimization techniques. By the end of this unit, you will have a basic understanding of a wide range of optimization techniques and how they improve the effectiveness of your program.

        Completing this unit should take you approximately 22 hours.

      • Unit 10: Compiler Verification and Validation

        The verification and validation (V&V) of a compiler consists of the V&V of its parts: scanner or lexical analyzer, syntax analyzer, semantic analyzer, Intermediate Code Generator, Intermediate Code Optimizer, Code Generator, and Code Optimizer. See the V&V notes for each of these. V&V is based on a careful plan, a well-defined compiler process (including version control or more extensive configuration control, and efficient and effective documentation), peer reviews, measurements, (including defect analysis), component tests, formal proofs, and integration tests. It can utilize known correct tools, such as parser generators, code generation templates, and other compilers and compiler parts. In addition, a formal certification of the compiler can be done by an independent organization.

        Remember that verification pertains to correctness, which means satisfaction of specifications; and validation pertains to user needs, which means satisfaction of user operational requirements. Verification addresses, for example, correctness of results of scanning, semantic analysis, code generation, and code optimization. Validation includes meeting operational, functional, performance, and physical requirements for processing time and memory space, and also, the "ilities" -- reliability, availability, maintainability, and usability.

        V&V depends on the complications of the source and target programming languages, the intended use of the compiler, number of front ends to back ends, available tools, run-time environments and machine architecture, and the system that it will interface with.

        Finally, certification of a compiler -- i.e., compliance to a standard -- may be necessary for safety or security of critical programs.

        Completing this unit should take you approximately 1 hour.

      • Unit 11: Compiler Summary and Next Steps

        This unit concludes our course study of compilers. We summarize the topics we have studied in the course and look ahead to further study to build upon what we have learned in this course. The topics covered may at first seem limited in their application to the development of compilers. However, their application is much broader, and helps us write better programs. The techniques, abstractions, and data structures are applicable to many other applications, including safety, security, high-performance applications, data and control flow analysis, internal or intermediate representation and encoding of structures, and optimization for external dependencies.

        Completing this unit should take you approximately 3 hours.