dotLang: Writing a Compiler using LLVM - Part 1
So, I have been working on “dotLang” for more than 2 years now. It has been a great journey up to now. I have been working mostly on the language spec (memory model, type system, function call dispatch, operators, …). But it’s time to move on to the next step which is writing the compiler. I will try to update this series regularly with the latest status on writing the compiler.
I chose to write the compiler in C because of its simplicity. That is my main goal in creating dotLang so it made sense for me. Also I will be using LLVM to generate native code because I am not going to invent the wheel again!
The first step is writing the grammar of the language. Now this grammar will definitely get updated during the compiler implementation project but right now it can be viewed here.
I have used a modified EBNF notation to describe the grammer. First I was thinking that a grammar is not really an important part but soon I found out that it will act like a map in an uncharted territory. It helps you maintain the big picture while working on the smallest details. Another important advantage is that it makes you think really carefully about the notations and semantics in your language and this helps you find out if there are any inconsistencies or ambiguities in the rules of your language.
So the basic building block of the language is a Module
. A module can contain three different elements:
- Named type definitions: Assigning a name for a specific type (For example “
MyInt := int
” ). - Imports: Including definitions from another module (“
_ := @{"module"}
“, here underscore means we want to filter the output of the command and want to import everything). - Bindings: Defining a function or a value to be used/invoked later (Bindings are immutable data definitions, e.g. “
PI := 3.1415
” or “inc := (x:int) -> x+1
“).
A binding in dotLang can be any immutable value. Note that even functions are considered values in dotLang.
I will start with manually writing a lexer and parser. The parser will be a Recursive Descent parser. I tried to use Flex/Bison before but did not really like the huge volume of code they generated. Also, it was difficult to track errors in the code being compiled.
About LLVM one thing that really shocked me was how scarce the documentation/reference material is for C bindings. So all I got was something like this:
Well, does not say a lot of things about how the function works, what are its inputs and outputs. There is also a ton of blog posts for writing a simple tiny compiler using LLVM C bindings but they don’t compiler with the current version of LLVM. I guess they are so busy adding new functionalities or removing/deprecating stuff in LLVM that they do not have enough time to document their code.
However, I managed to find some sort of documentation/manual which proves to be very useful when implementing the compiler.
Here is the list:
- LLVM Programmer’s Manual
- LLVM Language Reference Manual
- Summus: Simple compiler frontend using LLVM as backend
In the next post, I will start to explain the process of writing the compiler.