SquirrelJME: 2016/03/23

00:03

For a single pass system with no allocations for temporary byte code that is read, one problem I will encounter during translation is exceptions. One thing I do know about exceptions though, is that the stack is wiped away yet the locals are kept the same. The exception handler can never be the first operation. On entry of the method, there are just the locals and no stack. Exception handlers get a single stack element containing the exception which was thrown. Also, each instruction can only have a single stack state also. Thus if some byte code starts modifying an object on the stack, then it is likely an exception handler. The only thing to determine though are future goto operations going to these locations with matching stacks.

00:09

So with the first instruction never being an exception is a rule. I would just flow normally after that and make sure elements are valid and such. There can also be a heuristic to guess code which is likely an exception handler. If it is alone for example. If code is reached which is never reachable by normal flow of execution then it is likely an exception handler. The only case of it not being so would be if a goto were just tossed to a future address, which then jumps back to the given point. So for unknown states I will have to allocate some buffers to store byte code temporarily and defer it to until the exception handler table is read. If a method is very simple then this will never occur. Unknown states would only be for code outside the starting program flow.

10:54

Ok, so the goal with byte code parsing is that it needs to be light and simple. There are many operations and a large number of them are calculated similarly such as when stack operations are performed. So what I am thinking of is having the operations stored in a simple text file with their stack opertions and such. However, alternatively I could just make a gigantic switch statement. The switch would be more inline so to speak. I could use reflection and cache the classes, then each byte code operation would be in its own class and such. I can do this because I can call Class.forName() and call newInstance(). However, they would need to be interfaces and I would also need a bridging object. The opcode handlers would be single instance, however they would also be a lookup table that can be static so to speak. Doing it that way I do not have to clutter the class with details, or have gigantic enumerations which are pre-exist in memory. The only time I need an operation is when I am handling it.

12:52

I should set references to null when I am done with them to indicate that I no longer care for them and they can be freed as such. Although the compiler can determine this and this is likely a micro premature optmization. On second thought, in my code generation if a local or stack element ever stops being a references, I can just null it in the code.

13:11

However, thinking about it, having 255 classes would be a bit messy, I could just instead have 255 methods instead. However, since I cannot have function pointers essentially with method handles, I need a giant switch statement. The tableswitch instruction can be turned into a binary tree since it is sorted. That would reduce the total number of comparisons needed. That however is if code cannot internally use a table. If the target architecture can for example use pointers, then it just essentially would become a local table lookup followed by a jump based on an offset. Thinking about it however, the NARF code would complicate handling of tableswitch due to how the instructions are laid out. So perhaps instead I should have a linear generation pass directly to native code. Essentially it would be close to the Java byte code except be in native form. It would not have the best optimization however due to translation between a stack machine and a register one. However for that I would need to load in the byte code, read the exception table, and then verify stack map locations. But using the known rules I know about earlier I could ignore some things. I could do a basic linear parsing, with notes so to speak.

13:24

So what the engine needs is a way to be told if it is interpreting or if it is performing native compilation. Basically a code operation handler of sorts.

13:26

Well, I would want an interpreter which is optimized as interpreting Java byte code can be really slow due to potentially wasted stack operations and needing the constant pool to work correctly. So the interpreter engine should be the core of the code generation engine since it already has all of the code and logic to do such things.

14:33

Considering that the code parser if it gets all the methods, the file will essentially be probably a 5000 line behemoth. So I suppose the seprated class would be a good thing because all that code would be elsewhere instead.

15:40

__ChunkedInputStream__ must change to provide byte offsets and then it can be used for the code parser to determine the address of the current operation being handled.

18:17

I need a class which can handle the state that things are on the locals and stack. The basic requirement is that the types exist (can use bit fields for that). The more advanced requirement for any optimization is I need to know how new or stale a value is, or if it gets mutated. Basically after a value is set, how many times has it been updated before it was erased.

18:34

If I use the register types, I only need a single bit that I can use for something else. That is if I keep the alignment of types at 4 bits. From the looks of it the NARF package is going to go away and will likely be merged into the interpreter code.

18:37

One thing I can actually do for low memory systems is have hollow arrays. So for example if you only have 32KiB of RAM and you allocate an integer array with 2 billion elements, the allocation will succeed. Once you start writing enough into it (where you cannot fit any more) there will be an out of memory error thrown during the write. This does have a speed cost though, so it should only be used when it is needed. There can also be a cost of extra memory also. But the complexity cost of it might not be worth it. I am definitely going to use handles so that I can perform compaction (move everything down) without worrying too much about pointer reassignment. With any classes and their data structures in memory, I will need to be efficient with the stuff which I use in the heap. One idea I can do is actual merging of values. For example if a class defines 8 non-final boolean fields, they can be combined into a single byte. However if one is volatile then they all become volatile. This would work, however there would need to be some kind of lock because if multiple threads are changing the values then other booleans could lose their state. So if an architecture has an atomic clear and set of a bit it could be used. Otherwise I would fall back to wasting 7-bits. However, I can perform this optimization on final fields no problem.

23:22

Something I can have in the stack data, would be to have a unique value of sorts. That unique value can be synced between the local variables and the stack and could be used for optimization and such. So if I load a local variable onto the stack, they both have the same unique ID. That ID though will just have to be managed by the program so things are not out of sync when lots of code changes them.

23:27

I can also use a kind of directly loaded setup and register allocation kind of thing too. For example instead of generating code for a push to the stack from a local, I can have an alias which is indicated to by the unique ID and another marker that says that it is currently allocated to a given register.