Command Line Parsing using JFlex
What started out as a small set of commands for a tool I’m writing is
slowing growing unwieldy to have to warrant enough repetitious code to
parse the command line manually, and to wade through lines of if/else
or switch
statements (Don’t you preach to me about the virtues of
using the Command
design pattern, for it is still unwieldy because it
does not handle the parsing of arguments even the hash saves you from
having long branching segments of code, which I don’t mind. In my
opinion, it’s visually easier for me using
folds,
rather than to have file fragmentation of one command per file.)
Rather than having to deal with the unwieldy mess of buggy, manual
coding using an ad-hoc mixture of Regular Expressions and
StringTokenizers
, I decided to start using a lexical analyzer instead.
The one that I’m using is called JFlex, which is
probably the most popular (or only?) one around.
Barring the initial learning curve, certainly having the lexical analyzer certainly makes life much easier, by automatically breaking down the command string into tokens each, without having to intervene to deal with handling white spaces and separators and such. A simple example for a lexical analyser that breaks up commands and arguments looks something like this:
/** The lexer for scanning command tokens. */
%%
%class CommandLexer
Parameter = [:jletterdigit:]+
WhiteSpace = [ \n\t\f]
%%
[:digit:]+ { return new Yytoken(Integer.parseInt(yytext())); }
{Parameter} { return new Yytoken(yytext()); }
{WhiteSpace} { /* Ignore Whitespace */ }
"-" { return new Yytoken('-'); }
"," { return new Yytoken(','); }
The example should hopefully be simple enough not to cause a ‘cringe factor’ or the need to refer to the Dragon Book.
There are 3 different sections in JFlex’s definition file, separated by
'%%'
symbols. The first section is straightforward, it just allows you
to include whatever that you wanted to include in the generated file.
The second section, is a list of definitions and directives that tells
JFlex what to do. In this case, I’ve told JFlex to generate the the
output to a file called 'CommandLexer[.java]'
. Subsequently, the next
two lines allows me to put in what I defined as 'WhiteSpace'
or
'Parameter'
.
The last section is where you define the grammar that helps the
generated scanner code to discern what is a token, and in my case, what
type of a token it is. In my example, rule 1 '[:digit:]+'
, matches 1
or more number and transforms that into a token, rule 2, matches what I
call a parameter (which has either one or more digits or letters, and
contains at least 1 letter in it). Rule 3, just tells the scanner to
ignore all WhiteSpace
characters, while Rule 4, 5 indicates what I
define as separators, in my case the characters '-'
and ','
.
It must be noted that ordering is important. If I actually swapped the
order of rule 2 with 1, because numbers will match the {Parameter}
rule first, the [:digit:]+
rule will never match. JFlex will tell you
that if that’s the case (highlighted in red below):
Reading "commandlexer.jflex"
Constructing NFA : 16 states in NFA
Converting NFA to DFA :
.....
Warning in file "commandlexer.jflex" (line 13):
Rule can never be matched:
[:digit:]+ { return new Yytoken(Integer.parseInt(yytext())); }
7 states before minimization, 5 states in minimized DFA
Old file "CommandLexer.java" saved as "CommandLexer.java~"
Writing code to "CommandLexer.java"
The next thing to do is to actually create a actual token class, which
is called Yytoken by default. An example of a typical Yytoken.java
file looks somewhat like this:
/** A single scanner token. */
public class Yytoken {
public boolean is_separator = false;
public boolean is_int = false;
public boolean is_token = false;
public char separator;
public String token = null;
public int value = 0;
/** Default for range separator. */
public Yytoken(char c) {
is_separator = true;
separator = c;
}
public Yytoken(int value) {
is_int = true;
this.value = value;
}
public Yytoken(String token) {
is_token = true;
this.token = token;
}
public String toString() {
if (is_separator) return "Range Token("+separator+")";
else if (is_int) return "Int Token("+value+")";
else return "Token ("+token+")";
}
}
To test it, you can write a simple harness to read from stdin
:
/** Test class to try out the command lexer. */
public class UseCommandLexer {
public static void main(String args[]) throws Exception {
CommandLexer command_lexer = new CommandLexer(System.in);
Yytoken token = null;
do {
token = command_lexer.yylex();
System.out.println("token = " + token);
}
while (token!=null);
}
}
That’s probably a really basic tutorial in using JFlex, and to learn all of it probably requires having more of RTFM, but in the meantime, have fun in processing your command line!