This is the first of a series of posts that are intended to help C# developers to get ANTLR up and running and help me to remember what steps were required to get things set up.
What is ANTLR?
Take a look at the ANTLR website. ANTLR is a tool for developers who want to create languages that comply to a context-free grammar. For such a grammar, ANTLR can create a lexer, a parser, an abstract syntax tree and even a corresponding tree walker.
You need some or all of those components to validate, process or transform texts written in your custom language.
ANTLR is written in Java. Nevertheless, it is capable of generating code for all kinds of target languages, which means the generated lexers, parsers and so forth are expressed e.g. in C# code, which is the generation target of interest in this post
While generated code is C#, the tool processing the original grammar definitions and producing the desired C# code is a Java tool. This means you have to have a Java runtime installed. If you don’t get the JRE from Sun’s Java site.
Once you have a JRE installed we can finally get our hands at ANTLR.
Authoring environment ANTLRWorks
Defining a grammar in a text editor and then generating code for it with ANTLR will work just fine. But there is an amazing authoring environment available, called ANTLRWorks. It features syntax highlighting for grammars, syntax tree visualization for rules, on the fly evaluation of expressions, debugging and much more. It is the easiest way to get started with ANTLR.
Download the latest ANTLRWorks JAR file from here. Depending on your system settings you can either just double-click the JAR file or enter
java -jar antlrworks-1.2.2.jar
at a console prompt. (Note that by the time you read this, version numbers may have changed.)
The application should come up and you would be able to start entering a grammar. Just copy the following dummy grammar into the editor:
grammar MyGrammar;
options
{
language = CSharp2;
}
start: LET+;
LET: 'a'..'z';
Save the file as MyGrammar.g and select Generate from the Generate menu. It should just work and create two C# files in an output folder next to the grammar file.
So we know it works.
ANTLR runtime
There are various ways to get what you need, but what I found to be the easiest was to download what the ANTLR download site currently calls the ANTLR 3.1.1 source distribution, which comes as a file called antlr-3.1.1.tar.gz.
Once you unzip and untar the archive, you find a collection of files and folders the most interesting for our purpose are:
- lib: Contains JAR files required to run ANTLR from the command line.
- runtime\CSharp\dist: Contains a ZIP file with DLL files required to use the generated lexers and parsers in a C# application.
- src: Contains the Java source code for ANTLR.
Let’s start with the JARs in lib: If you only plan to generate C# source code from within ANTLRWorks, you will not need these at all. But if you consider doing the generation steps also from the command line or include it in build scripts, you need to add them to the Java classpath. On my system this looks like the following in at a Windows command prompt:
set CLASSPATH=%CLASSPATH%;D:\Antlr\antlr-3.1.1\lib\antlr-3.1.1.jar
After setting the class path, execute the following to compile the grammar file save above:
java org.antlr.Tool MyGrammar.g
This again should work fine.
Creating a C# project
Finally, we want to compile the generated sources to see the created code is valid. We won’t do anything useful with it, just finish up our path through the tool chain.
Create a C# command line application project in Visual Studio. I am using VS2008, but earlier versions should do fine as well. Now unzip the archive found in runtime\CSharp\dist mentioned in the previous section. You’ll get a bunch of assemblies. Add references to the ones mentioned here.
In your code, instantiate the two generated classes MyGrammarLexer and MyGrammarParser. Compile the application. If it links, your toolchain to use ANTLR in a C# environment has been successfully put in place.