I would say that parsing integer and floating point values is the most complex part of the tokenizer.


The simplest way to parse would be similar to what I have done before, basically a mini kind of state machine with possibilities. Basically as characters are parsed determine what it cannot be. The only unambigious literal would be binary digits which start with 0b. Anything else could be floating point values. But if a decimal point exists, it will be known that it is a float value, so there will just be removing what is needed.


But definitely literal parsing will be quite complex, especially with underscores and such.