Lexers! I’ve sort of written lexers before, although not of this generality, so this material is mostly familiar to me.
The chapter provides a simple lexer (below), in Java. Old Java (pre-generics, so, what, 1.4 or someting?). I’m doing the exercises here in C#. I’ve provided the lexer from the source; my github will provide a C# implementation of the original lexer and the answers to the exercises, which are all extensions of the lexer.
package lexer; public class Token { public final int tag; public Token(int t) { tag = t; } } ===== package lexer; public class Num extends Token { public final int value; public Num(int v) { super(Tag.NUM); value = v; } } ===== package lexer; public class Word extends Token { public final String lexeme; public Word(int t, String s) { super(t); lexeme = new String(s); } } ===== package lexer; public class Tag { public final static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259; } ===== package lexer; import java.io.*; import java.util.*; public class Lexer { public int line = 1; private char peek = ' '; private Hashtable words = new Hashtable(); void reserve(Word t) { words.put(t.lexeme, t); } public Lexer() { reserve( new Word(Tag.TRUE, "true") ); reserve( new Word(Tag.FALSE, "false") ); } public Token scan() throws IOException { for ( ; ; peek = (char) System.in.read() ) { if ( peek == ' ' || peek == '\t' ) continue; else if ( peek == '\n' ) line = line + 1; else break; } if ( Character.isDigit(peek) ) { int v = 0; do { v = 10 * v + Character.digit(peek, 10); peek = (char) System.in.read(); } while ( Character.isDigit(peek) ); return new Num(v); } if ( Character.isLetter(peek) ) { StringBuffer b = new StringBuffer(); do { b.append(peek); peek = (char) System.in.read(); } while ( Character.isLetterOrDigit(peek) ); String s = b.toString(); Word w = (Word) words.get(s); if ( w!= null ) return w; w = new Word(Tag.ID, s); words.put(s, w); return w; } Token t = new Token(peek); peek = ' '; return t; } }
Note: I’m not a big fan of this code. The big if tree with non-obvious early returns is annoying and should probably be turned into a bunch of subroutines and a switch table or something like it. There’s also no error handling or anything. Too many side effects – especially IO side effects – come on! There’s an almost-exact copy online now, and I’ll either just refactor that or add a new class that handles some of that soon.