Source Code #
The source code is publicly available on my own git server
Why? #
I was inspired to try and make my own automaton engine after learning about them in a course offered by the ANU, COMP1600.
Prior to COMP1600, I had already learned a bit about regular grammars and regular expressions in a compiler construction course, but that course didn’t go in much depth about how they might be practically implemented.
COMP1600 showed me that it is actually incredibly simple to make a regular expression engine that will match strings in linear time complexity with respect to the string length thanks to a concept known as deterministic finite automata (DFAs).
What is this? #
If you don’t know what regular expressions are, leave now and return when you do.
This project is currently a collection of implementations of finite automata, along with a regular expression compiler. I will primarily be highlighting its ability to compile regular expressions into easily computable structures in memory known as automaton.
Essentially, you give a regular expression, and it spits out an automaton that can recognise the regular language described by that expression. We can think of the regex string as a kind of “source code” for a machine that accepts or rejects other strings; this “machine” is called an automaton. Specifically for regular expressions and languages, these automaton can be described with a finite memory known ahead of time, and are hence called finite state automaton (FSA).
This project utilises 3 different forms of FSA in the compilation process from regular expression to the final deterministic finite automaton (DFA). See the compilation pipeline
Automaton Included #
Currently, only finite state automaton are included, no turing automaton. (yet!)
Automaton type  Primary data structure  struct 
Typical use 

\(\epsilon \operatorname{NFA}^*\)  Graph (using the petgraph library) 
GraphENFA 
Programmatically Building Automaton 
\(\epsilon \operatorname{NFA}\)  HashMap 
ENFA 
Easy to build automaton 
\(\operatorname{NFA}\)  HashMap 
NFA 
Less problematic to convert to DFA 
\(\operatorname{DFA}\)  HashMap 
DFA 
Complete control over a deterministic automaton 
Regex Compilation Pipeline #
The typical compilation pipeline for a regular expression \(R_L\) defining the
regular langauge \(L\) in the automaton
library is the following:
$$
R_L
\mapsto \epsilon \operatorname{NFA}^*_L
\mapsto \epsilon \operatorname{NFA}_L
\mapsto \operatorname{NFA}_L
\mapsto \operatorname{DFA}_L
$$
See the automaton included for info on these data structures.
This is exactly what the regex
binary does.
The mapping \(R_L \mapsto \epsilon\text{NFA}^*_L\) is not necessarily trivial, the language of regular expressions is unfortunately not regular (it would have been really nice to describe the language of regular expressions exactly with a regular expression). I implemented an LL(1) predictive recursive descent parser^{1} to parse the regular expression into an AST, which is then converted into an \(\epsilon\text{NFA}^*_L\) through a set of algorithms described in the COMP1600 lectures.
Conversion #
Conversion between various finite state automaton is achieved through the algorithms described in the lectures, with not much care for the memory complexity of the resulting conversion. Conversions are also not perfectly reversible, converting an \(\text{NFA}\) to a \(\text{DFA}\) and then back will just yield a \(\text{DFA}\) in the struct of an \(\text{NFA}\).
Demo #
See the demo I have hosted here:

In all honesty I’m not sure if it is really LL(1), I think it may technically be nondeterministic given it has some notion of context. As of writing, it has been months since I actually implemented it and did any formal theory on grammars so I’m a bit vague. ↩︎