SquirrelJME: 2017/04/02

00:01

There is the stack map table too. Right now when I decode a class I handle that and just do the code variables in the description stream, but those could actually be removed. So the question is then, is it worth it to fold the class decoder into the JIT? It would probably be faster, easier to implement, and more fine tuned. But it would be rigid because it only works with the JIT. However, in pretty much every case it will only really be used by the JIT because the JIT is the only thing that uses the class decoder.

00:03

At least folding in, I can get rid of the code variable as all of that information and the used slots would directly go into the faces. So basically as the operation decoder runs, it will end up performing the output register allocations similar to before, but the overhead of bridging between the code will be gone. Although in that case I get the same thing as before. So what if I load the entire byte code into a set of cards and faces. All of that info is known, but what I could do for each instruction is basically just have a replace this card with a bunch of other cards. It would run through the entire method doing this. There would only be a single card. So basically the cards for the byte code will have the ActiveCacheState allocations and such, be given a bunch of native registers or stack positions to use, but refer to it in a pseudo-native/bytecode fashion. Once I know which variables are used I can actually determine that in the parsing stage because register allocation and such is rather fixed. It would not really be true stack caching, but that could be figured out.

00:08

The next thing are exception handlers, which are basically magic. During the byte code translation, after some exception throwing operations there could just be checks for an exception and if there is just jump to some other card. Basically, get rid of the exception handlers and just treat it as normal byte code with a special return value after some operations. Then the native code generator (which turns this pseudo byte code into machine code) will never have to worry about exceptions because it is just a secondary return value and comparison via stored values. Because to be honest, exceptions are a pain and I really do not like the way they exist because they are considered after the fact. It would be much easier to consider them during the fact.

00:13

The only thing I worried about the most in the exceptions is transition of state whenever stuff happened. But with exceptions handled this way, this is basically handled naturally by the JIT.

00:15

So the major thing would be doing this with the exceptions along with the stack map types and allocations for the target every step of the way. State transition from jump targets could be generated by the code also. Then the class decoder can handle the stack caching automatically. So basically I will merge the JIT and class decoder into one, simplify a few things, and make some other things more complicated for better usage.

00:39

Then also for any instructions which do not possibly cause exceptions (like adding two integers) no exception check has to be performed at all. So basically before there would have been exception checks regardless, but they are not needed at all. So effectively this makes generation a bit easier because not every instruction does cause an exception. So say if a bunch of instructions are wrapped in a handler and all they do is perform non-exceptional operations, there would never have to be an exception handler placed for that. So this is definitely far superior than to all of my previous solutions. I never thought of it this way. Then with the argument register and how it is also used for return values, I can just have a special treatment for it and treat it as the exception register. I would have to potentially handle caling convention cases though, but I can locally violate ABIs as long as I handle it when I bridge to system code.

08:19

I could however instead of loading in all the byte codes just directly generate the machine code instructions from the parsed byte code. I do not truly need to handle jump targets because those can be checked later. I can have insertion cards which start at every instruction and act as a temporary barrier. This would actually reduce the amount of code required in the JIT. I can also just stack cache everything, even locals generated from stack entries. While before I just did local variables. Just that when there is a jump to an exception handler, some states will have to be stored for example if a local aliases a stack entry, the local will have to be made real because exception handlers only take care of stack entries. But that would be complex to handle. It would be easier to implement if locals were never aliased. That way I can just drop all of the stack entries on the jump and only have to worry about local variables being used. Then for every register that is used, it can get saved at the start of the method in the prolog. So basically there would be a prolog start, where the stack is initialized, and where the register values would be saved. Then there would be an exit area in the method where any saved values are restored and the method returns. There only has to be one return area though. And to simplify things, anything that does return will restore the saved registers and then just return with whatever value was set in the return address.

08:30

I could on entry move the argument registers to the saved registers (so they do not have to be saved all the time), but that would make methods larger and introduce inefficiencies. At least if this never changes, it would never require moving at all.

08:32

It would probably be the cleanest route if I were to create a new package with all of this code so it does not collide with the old jit.

08:33

I can also have a unified set of exceptions that indiciate JIT failure even if the JIT fails to parse the byte code properly or there is some other class issue.

08:36

Also I only have to handle local variable changes. I can essentially have a system where the state of local sources and the target are used for compatibility. So this means that if half the method has the same state transition for local variables, they can use the same transition when jumping to the exception handler (since some locals could be in the wrong registers). Naturally if no state has to be changed at all, it could jump directly to the exception handler. And with the way I now want to do exceptions and the parsing, I do not have to do this moving around at all. Why? Because a number of the exception targets will be jumped following what is being parsed. So this means that I will not be generating many transition states at all because what is currently being parsed will define most of the state of the exception target. So what I would need then is to slightly uncouple the active cache state to where it can detect whether the state changed for exceptions (for the local variables). If the state did change then it will create a new copy of the cache state. With individually cached treads I can reduce the amount of objects that are created so the local variable allocations remain the same. Then whenever a local variable is allocated, it will be assigned to saved registers so that temporary ones are used for the stack. This way they only need to be saved by the callee method. So then for allocating stack variables using temporaries, the first priority would be to use non-argument temporary registers then follow that with argument temporary registers. But for clarity, never use the first two or three argument registers. That is so the return value and exceptional return value do not get bumped around. However since I am going to have an exceptional return value, if an argument is taken and fills the exception value register then I can just move it out of the way and clear it at the start. This way it is always zero and I can more easily detect if it gets changed and such. This would make it easier to handle. Then also it can be treated as an immutable virtual slot where when an exception does occur, that the stack entry points to this virtual immutable slot. If that slot is ever written to (the first stack entry) even if it is of the same type (an object) it will be replaced with a new allocation so the register is always clear for usage when exceptions are handled.

08:50

For long, float, and double comparison that will take up a bunch of instructions to generate every single case. But really for long, that result is actually just: v = a - b; v = (v & 1) | (v >>= 63), and that is the result of the comparison. I do not have to branch at all or do those conditionals. Actually that is not correct. I basically have to compact all of those bits down to a lower value. So it might be simplest to just branch, until I figure out a better algorithm.

09:01

I can also definitely make some things easier to. Take the former JITConfig which requires to be extended. But instead of having an engine provider which takes a config, the JITConfig class will have a method which initializes the JIT with the required parameters. That way there only has to be a single class used.

09:03

And the long comparison is just:

v = a - b;
if (v == 0)
	v = 0;
else
	v = (v >> 63) | 1;

That handles the zero case, and then just drags down the sign bit and forces a 1 to be ORed in, so positive values get 1 and negative values get -1. Of course that is simplified to this:

v = a - b;
if (v != 0)
	v = (v >> 63) | 1;

Because I do not have to set zero to it if it is already zero.

11:08

So now I am going to do the idea of the interpreter having different execution engines so I can test each architecture in an interpreter.

11:19

I believe for simplicity I am going to remove the determinstic recording interpreter because it complicates things a bit. If someone wants a determinstic run of SquirrelJME there are probably emulators and other such things for that. Also Java has undefined cycle counts so keeping compatibility across SquirrelJME versions would be painful.

14:12

This means that I could potentially allow for alternative register sets in the dictionary based on the key values. I would need a method that returns every available register. Then maybe for some targets I would need to use an alternative stack register or otherwise.

14:39

So the class file when it is parsed, the imports and exports will directly be placed in the output executable rather than just being kept around. It will just build imports and exports for the most part. Then on adding I can detect whether there are duplicate exports defined in a class.

15:08

The flags for a class will just be part of the exported class definition.