Lexical Analysis (2.6)

Lexers! I’ve sort of written lexers before, although not of this generality, so this material is mostly familiar to me.
The chapter provides a simple lexer (below), in Java. Old Java (pre-generics, so, what, 1.4 or someting?). I’m doing the exercises here in C#. I’ve provided the lexer from the source; my github will provide a C# implementation of the original lexer and the answers to the exercises, which are all extensions of the lexer.

package lexer;
public class Token {
    public final int tag;
    public Token(int t) { tag = t; }
}
=====
package lexer;
public class Num extends Token {
    public final int value;
    public Num(int v) { super(Tag.NUM); value = v; }
}
=====
package lexer;
public class Word extends Token {
    public final String lexeme;
    public Word(int t, String s) {
        super(t); lexeme = new String(s);
    }
}
=====
package lexer;
public class Tag {
    public final static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259;
}
=====
package lexer;
import java.io.*; import java.util.*;
public class Lexer {
  public int line = 1;
  private char peek = ' ';
  private Hashtable words = new Hashtable();
  void reserve(Word t) { words.put(t.lexeme, t); }
  public Lexer() {
    reserve( new Word(Tag.TRUE, "true") );
    reserve( new Word(Tag.FALSE, "false") );
  }

  public Token scan() throws IOException {
    for ( ; ; peek = (char) System.in.read() ) { 
      if ( peek == ' ' || peek == '\t' ) continue;
      else if ( peek == '\n' ) line = line + 1;
      else break;
    }
    if ( Character.isDigit(peek) ) {
      int v = 0;
      do {
        v = 10 * v + Character.digit(peek, 10);
        peek = (char) System.in.read();
      } while ( Character.isDigit(peek) );
      return new Num(v);
    }
    if ( Character.isLetter(peek) ) {
      StringBuffer b = new StringBuffer();
      do {
        b.append(peek);
        peek = (char) System.in.read();
      } while ( Character.isLetterOrDigit(peek) );
      String s = b.toString();
      Word w = (Word) words.get(s);
      if ( w!= null ) return w;
      w = new Word(Tag.ID, s);
      words.put(s, w);
      return w;
    }
    Token t = new Token(peek);
    peek = ' ';
    return t;
  }
}

Note: I’m not a big fan of this code. The big if tree with non-obvious early returns is annoying and should probably be turned into a bunch of subroutines and a switch table or something like it. There’s also no error handling or anything. Too many side effects – especially IO side effects – come on! There’s an almost-exact copy online now, and I’ll either just refactor that or add a new class that handles some of that soon.

Advertisement
This entry was posted in computer science, Dragon book. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s