CST generation speed needs to be improved a bit (it is the slowest part right now) e.g. for main_example_1.st:
Tokens done in 0.80ms
Tokens_print done in 1.60ms
CST done in 7.25ms
CST_print done in 4.40ms
AST done in 0.84ms
AST_print done in 4.35ms
I'm not concerned about the printing (*_print) performance as that may will not be a common code path, except for debugging.
Both tokenization and AST performance is acceptable (0.80ms on a 70-line file).
But CST is ~9x slower! CstGen.matches() uses ~40% of the time so one way to improve this could be to make CstGen.matches() faster (or just don't use it).