08:57
So for the single pass compilation, I will also need an existing run-time state so that final variables may potentially be optimized globally and so that I actually get global optimizations. I can have the interpreter be similar. When compiling the memory space will be similar to an actual running system which essentially sees everything as a bunch of integers. The SSA operations would essentially be the required operations so that it is easily translated to the target system.
09:16
I suppose for native systems, the SSA will need personalities. For example if I target a 16-bit CPU it will not have 32-bit add operations so they will need to be split up if specific values need to be calculated. That is, I will need to keep a potential value store. If an if is done and a value is say below 100 but greater than zero and a value of 7 is added, I can just keep it as a 8-bit add instead of requiring a full 32-bits. This would be better for 8-bit and 16-bit systems which I do plan to target. Otherwise even simple math would take lots of memory and far too many cycles to complete. The personality will provide details such as the native integer size, if it can do Java compatible floating point, the memory model (flat, segmented, split data/code, and banked).
09:40
With all of the stack operations and such, and copies I can have a basic high
speed to SSA first which is rather ugly. Then once all of it is loaded in then
optimization can be performed to generate a new program which is faster and
potentially smaller. For the program state, instead of having what I have
before with a set of variables with values I can have instead a global register
file which operations perform work on. So instead of operations having state
such as "variable 2 is an int with value 42." the operations would at on
unique variables given by their IDs (so for example add $17 = $14 + $12
).
Then in the register file there would be slots for 17, 14, and 12. Another
possible kind of SSA thing would be to only care about side effects. The
register file will contain operations and their derived values from other
registers. Then if a method returns variable 12 then it goes backwards and
performs the required operations to compute the given value. However this would
be easily done if there were no method calls for example. A bunch of methods
could be called which change the state of a global field which to the method
itself it never set in that method, but the one that is called. One major
hurdle is getting the SSA representation to be clean and the ability for
optimization to update variable data and such. At least with a register file
all of the used variables would be in a single location. Each variable could
have direct dependencies. If an operation adds 14 and 12 and puts it into 17
then the variable 17 will have a dependency on that operation.
09:50
However, even with copy operations in the SSA code, the native code generation pass which takes input SSA could determine if any actual work was performed.
10:14
However, a very limited system such as ancient 16-bit systems with about only
64K of RAM will be unlikely to load the entire program into SSA form, optimize
it, then generate native code very fast. So, I propose instead a straight
generation pass from Java byte code to machine code with stack caching in a
linear sense. However, in specific cases I cannot cache some items because they
could for example be the start of a for loop. However, if I do not scan the
method before hand for jumps going back I can do a rewind. Since everything is
done in a single pass so to speak, if there is a goto into the back and it goes
for example to the start of a loop then I can just drop everything I performed
and flag a specific state. This would only be handled for a single operation
however. Once its jump back point has been calculated it is dropped and not
done again. However any jump backs will require recalculation. The next thing
to consider is jump forwards (such as tableswitch
and such). For future
operations I can just set a marker to say that there is a given state. I will
need to store the variable types so that verification is possible and the
stack map can be used. I will do the same thing as before by using a linear
passing of the stack map. However, jump backs which revert state will also
force the stack map to go back also so that it has to be handled again. So if
a jump back occurs, then all known state following the target instruction is
dropped as if it never occured except for that flagging bit. Then for basic
optimization of values instead of having value ranges I can have a sign
extended mask. That is the highest bit of the mask represents the sign bit so
a value of 127
will be 0xFF
along with -127
using the same even though
127
is 0x7F
, the extra bit set will represent the sign bit to extend. This
would only work for integers however. Then to remove multiple null checks I
can have a null flagging bit which says that something is known to be null or
not null. For arrays I can have a specifier which gives the array length. For
example, an unknown array would have length n
. If a request is made to access
index 8 in the array, then a check is made against the size. Following that
access or a similar check against the length of an array will result in the
array having the indices up 8 values be unchecked. The important optimization
would be removing array bounds checks and null pointer checks since those can
be costly. So an operation which performs lots of work on a single array would
eventually get all of its checks removed depending on how the method works.
10:30
The next thing would be exceptions. I will do as I thought of before, have the exceptions be ran first so that the tables are generated at the start. I can also drop some temporaries because when it comes to exceptions anything that is on the stack is destroyed. The only thing on the stack is the exception which is to be handled. So as before, a dedicated register and an index which specifies the exception table entry which should be handled. With that, there would only be a need to have a single lookup for exception handling. One thing however I need to determine when exceptions and gotos exist is block barriers. Say an exception handler is between a jump forward and another kind of block and there is no potential jump in. A jump back should not remove the exception handler code at all. However, it is possible for an exception handler to jump back into the normal program control flow (an exception handler in a for loop for example). So exceptions complicate this. So if an exception performs a goto or similar then it will perform a jump back and the exception potentially rehandled since it could modify the state of the program. Then as before, jump forwards would set a future flag that an instruction later is a jump target. So the most important thing is reading the exception table first to determine the potential future jumps since past jumps would require recalculation.
11:34
Took a walk, have some ideas. Instead of CFMethod
returning an InputStream
of a byte array containing the code attribute it instead has a
CFCodeAttribute
which can provide an InputStream
for CPProgram
usage.
It would also handle exceptions. This way when I write the processor I do not
have to have another copy of the all of the data such as exceptions, the byte
code, and the attributes. The class will just provide a view of the class.
Then something similar could be done with the deflate processor. I can have the
sliding window provide the InputStream
data and it would support limited
mark and such. However, it could only be done once. So instead of an output
queue, it would just use the window directly. I also thought about having a
system property which can set the maximum size of the sliding window. So for
example on very limited systems with barely any memory, a sliding window of
32KiB can be very large and not all of it might be needed at all.
15:56
So I will need an output program which contains the parsing state which is set by an input program. The input program parser will read and handle the byte codes. Actually one property of the byte code is that all operations must have the same variable type state. So jump backs, if their value states differ then they have to be calculated. However types which are removed do nothing at all.
16:05
For simplicity the stack caching idea would have a fast code generator although
it would not produce the fastest code. So basically only local variables would
count for the most part. The stack operations would really do nothing. The
locals stay the same during exception handlers, so they are assigned actual
registers. The stack entries would just be virtual temporaries which are
assigned no registers. So essentially local variables will be permanent. When
a local variable is written to in the byte code the stack is consumed and the
code to be generated is executed. Other points such as returning values and
loading and storing from fields and arrays, and would trigger events to occur.
So essentially there would be synchronization events. Stack entries would be
pure virtual. If there is an event where a method is called, then its return
value would be placed in a real register (since it has to go somewhere). That
variable will be like a local except that it is dropped during exception
handlers. So I would flow through the program quite normally, and in the case
of these synchronizations, depending on the operations I would not need to
redo operations if I jump back unless a synchronization occurs. I do however
have to consider that there could be an entry on the stack which has a
different value when it is jumped to. So jump targets which are not exception
handlers would have to have all their stack entries on a barrier of sorts. So
that the stack entries become temporary real registers. One of the good things
is that I never have to worry about jsr
and ret
since no code will ever
have those instructions.
16:16
So I may have to go through all the instructions to determine jump targets. If an instruction has stack entries then I will do an implicit generate of the entries and real register allocation on them. There would just be real registers contianing the actual values and the calculations dropped so that they just become values. Then normal virtual stack registers would be used while the temporary real ones will just contain some values. Then when that jump back point is reached, all the input values will be calculated between the slices so that it can jump back. Doing it this way, although not as optimized will be faster and simpler to implement. So I only perform stack caching when an instruction is not a target of a jump. If I jump to an instruction in the future, it would be the same. There would be temporary register sets. Thus, the basic blocks of a method will start with the instruction which is the start of an exception handler, the method, or the target of a jump. The end of basic blocks would be return values, method calls, load/store fields/arrays, jumps to a target, or instructions which have jumps to it. Each basic block has its own set of stack temporaries if required. At the end of a basic block, all values are synchronized into these temporaries. During the transition of one block to another, temporaries may be moved and shuffled as needed so the values fit.
I should note that as before, locals are always up to date regardless of whether they are just set, and never used again.
16:25
For basic blocks though, should I end a block at remote call or should it end
on the instruction before it? Before a method is called however, registers need
to be saved and the stack and registers setup so that arguments can be passed
to the given method. If I end before it, I would have all of the stack items
saved accordingly since they are all popped from the stack. That would be good
because then a method call or similar would start a basic block and it would
have none of the stack temporaries used as method arguments in it. Then the
same thing can be done for field get and set, throwing, and returning of
values (and some others). So operations such as getfield
will be the start
of a basic block. The same would be done for all the synchronizing operations.
16:31
This would not produce the best code, however it would be simple to perform and stuff such as method invocation would be simplified because methods being at the start would mean that all stack temporaries are synced. So after setting up the exception jump table, I determine the bounds of basic blocks. If there are two method calls in a row or similar then they would be in their own basic block since there can only be a method call at the start of the block.
17:46
To start the virtual machine faster, there can be an initial program state which has already been calculated in memory. For example, all of the static variables and instances would for the most part already exist. This could be a bonus on systems with a small amount of memory where initialization and loading of classes could be quite filling.
17:58
If I can get the run-time to run in under 8KiB then it could run on an NES. This would be an achievement. Including the 8KiB save chip, the NES comes with 2KiB of space. So there is about 10KiB of RAM available. With the right ROM size and enough bank switching, it would be possible to have a system on a large ROM. However I would want it to be as close as possible to 2KiB so that it does not require "pirate" ROM carts.
18:18
One thing though, are conditionals at the bounds of a basic block?
18:39
For JIT simplicity, the entire program on start will either be compiled
completely at the start and potentially cached or just compiled all at once.
This would permit for some whole program optimization regarding fields. If
say a final field with an int contains the value 42
, a getstatic
or
getfield
of that field would just result in 42
with no actual memory
operation required.
18:45
The compiler's class library lookup will need a separated detail. I will
require this so that I do need to actually have to depend on
javame-class-file
since for example the JIT when cross compiling will use
class files, however on the target system when it is running all of the classes
in the main library are built into the executable. Thus, they cannot be in
class file (they can but that would waste much space).
19:01
Having a bridge over the class file code will permit as stated before. Although there would be some duplication it would be better than hacking the class file code to support built in classes, including the class files in the ROM, or including the CLDC JARs somewhere.