As I mentioned back in this post, the initial phase of work needed to allow Sotue to recognize data in input streams is to build a state machine that input characters can move through as they are read. If the state machine ends up in what is called an accepting state, then the input characters match a pattern. To review, Sotue’s process for building these state machines are as follows: Construct a non-deterministic finite automaton (NFA) from a regular expression. Convert the NFA into a deterministic ......
In my post on adding closure operator support for regular expressions input to Sotue, I showed the unoptimized NFA generated by Sotue for a regular expression built to match a number. Since the regular expression used only the OR operator, the generated NFA contained some 40 states. While the state machine was perfectly valid, asking a developer to write a regular expression such as (0|1|2|3|4|5|6|7|8|9)+ seems like a bit too much to write. The de facto standard for specifying ranges of characters ......