DISCLAIMER: These notes are from the defunct k8 project which precedes SquirrelJME. The notes for SquirrelJME start on 2016/02/26! The k8 project was effectively a Java SE 8 operating system and as such all of the notes are in the context of that scope. That project is no longer my goal as SquirrelJME is the spiritual successor to it.
Fixed a bunch of stack trouble with the SMT. Now I have to implement the LCMP instruction. I will do this by just adding another math operation which does the -1, 0, and 1 result.
Only 8 operands remain now.
[FINER] No handler method runDUP_X2. [FINER] No handler method runDUP2. [FINER] No handler method runDUP2_X1. [FINER] No handler method runDUP2_X2. [FINER] No handler method runSWAP. [FINER] No handler method runMULTIANEWARRAY. [FINER] No handler method runJSR_W. [FINER] No handler method runWIDE_RET.
My current hit is a stack mismatch, one has a LONG TOP while the target lacks
the TOP. So the SMT is still a bit messed up. The current SMT trace is
SF~8rrjjjiii~0S1rArjjSS1j. So the same 1-frame item stuff will have to have
the top bit set if so, rather simple to do.
And now with that, I have hit DUP2 which is where stack operations get a bit crazy.
And now with DUP2 out of the way, now I have to do MULTIANEWARRAY which is one complex dimensional instruction. Means 7 left now. And MULTIANEWARRAY is a rather complex operation.
JSR, JSRW, RET, and WIDERET were removed in Java class version 51, which equates to version 7. So only 5 operands remain now. I am almost done, except for the few remaining stack operations and MULTIANEWARRAY. I am thinking of just not creating a new instruction for that and using something similar to a loop or a virtualized method call. I could really just call a magical method say that is called that, but then I would need to allocate an array and store the values there. It should not be too hard to translate it into unrolled code as it should be a simple loop and lots of moving around.
Running what I can do now on my PowerPC desktop using JamVM's stack caching interpreter.
openjdk version "1.8.0_40-internal" OpenJDK Runtime Environment (build 1.8.0_40-internal-b04) JamVM (build 2.0.0, inline-threaded interpreter with stack-caching) two of these: cpu : 7447A, altivec supported clock : 1599.999996MHz real 3m17.346s user 3m7.490s sys 0m5.023s
Vs my Core i3 laptop:
openjdk version "1.8.0_40-internal" OpenJDK Runtime Environment (build 1.8.0_40-internal-b04) OpenJDK 64-Bit Server VM (build 25.40-b08, mixed mode) two of these: model name : Intel(R) Core(TM) i3-2310M CPU @ 2.10GHz real 0m18.004s user 0m48.596s sys 0m0.868s
Although not really a comparison, Debian's JVM on my i3 setup is 11 times faster than a stack cached interpreter. If I can force JamVM to be used on the i3 then I could gauge it better.
openjdk version "1.8.0_40-internal" OpenJDK Runtime Environment (build 1.8.0_40-internal-b04) JamVM (build 2.0.0, inline-threaded interpreter) real 0m59.459s user 0m58.804s sys 0m0.656s
Then against the Zero VM:
openjdk version "1.8.0_40-internal" OpenJDK Runtime Environment (build 1.8.0_40-internal-b04) OpenJDK 64-Bit Zero VM (build 25.40-b08, interpreted mode) real 2m49.970s user 2m49.524s sys 0m0.640s
And against Oracle's with -Xint if that even does anything or setting it even worked.
real 1m41.200s user 1m41.088s sys 0m0.660s
So putting all the values together (NOTE THIS IS NOT A TRUE BENCHMARK):
From / Against -> | PPC JamVM | EM64T Oracle JIT | EM64T Oracle
INT | EM64T JamVM | EM64T Zero
PPC JamVM (197s) | --- | 10.94 | 1.95 | 3.34 | 1.17
EM64T Oracle JIT (18s) | 0.09 | --- | 0.18 | 0.31 | 0.11
EM64T Oracle INT (101s) | 0.51 | 5.6 | --- | 1.71 | 0.60
EM64T JamVM (59s) | 0.30 | 3.28 | 0.58 | --- | 0.35
EM64T Zero (169s) | 0.86 | 9.39 | 1.67 | 2.86 | ---
So assuming that I do a bad job writing a dynamic recompiler and having badly optimized code (which I hopefully will not do), I would gather that running a native system at this current rate (revision 3ee130db9664ec573f61ce04dee71e10626173f4 or so) would be 50-70 seconds with no cache ahead of time work. While once the code is cached it could probably do the step at 30 seconds. When I actually get a running workable and self hosted system I can run this revision where I type stuff so see how well I guessed the result would run, at least on this Dual PowerPC 1.6GHz desktop. Which hopefully when I get to that point is still around (it should be since it is a rather reliable machine, except for the random memory corruption). So those times are from pessimistic to optimistic.
When it comes to MULTIANEWARRAY I use a helper method to simplify things since doing by hand writing of the code could be rather error prone.
Now the Java runtime starts to get decoded.
Would seem that the initial pushes when pushing long/double get treated incorrectly.
And now the entire library can be decoded, so the next step is compiling to native code and producing binaries that can be used to boot a system.
I am going to need a dedicated register for the exception handler object, then detect if an entry point operation is an exception handler because if it is then that register will need to be copied to the Java stack. There are also only 10 potential security warning messages that print which is not too bad.
Since the Java stack could jump to an exception handler outside of the cause of an exception I will need secondary exception jump handling. This means the exception handler will point to elsewhere in code that copies the exception register to the first item in the Java stack, then it will jump without condition to the handler. This would be the best way without complicating normal and exceptional jumps.
I am going to change the PowerPC 32 to just PowerPC and have it handle both 32-bit and 64-bit PowerPC variants to simplify things.
I need a better way to specify platforms along with all of their needed information. Perhaps a reverse dot notation of JSON files located somewhere, perhaps in the dynarec that identifies a machine to use.
In fact I could probably just nuke the boot packages all together and have all the bootloaders munged together. Although I would still have dynarecs which are separate so those stay clean. Anything that is not included could just be stripped, and if a new boot loader is to be supported by a user then it can be patched in. So both can exist in the kernel package. One in a bootloader subpackage while the big kernel is in its own package.