SquirrelJME: 2016/07/02

11:53

This weeken is holiday weekend, with Monday being the 4th of July.

12:03

I should rework the documentation a bit and have a completely standalone port section of sorts. Then I can have user and developer bits in their specific sections.

14:28

One thing I need when it comes to the class file decoder is remembering and storing the class flags for potential usage later. I will need to document the blob format in a way where it allows blobs to be output without needing future details. So as such this means that the table of contents in a blob will be last.

14:36

I should have a class which can be given a byte buffer or some other class which is used to read from say an int[] or byte[] array to access the details within a blob.

15:36

As alternative to a table of contents kind of thing, I can have a linked list of sorts through the executable. However, backlinks would not operate at all. I would say that for simplicity, the blob can be directly memory mapped and have its structure accessed directly. Also it may be reasonable to have a case where there are two binaries, one which contains the raw data and another which contains the table of contents. If the table of contents remains apart from the executable code they can be linked into the binary using different means. Alternatively, instead of an OutputStream passed into the SSJIT, there is instead a result of a compilation. So this "smarter" class would have stuff such as "create new field" and other such things. Then an implementation of the given writer can be used. Then this way the SSJIT is not locked to a single output format but one which could be wrapped using multiple means.

15:46

Then if the output format is not that nice, it can completely be replaced with a better one without changing any code. So I could literally have an output which writes ELF binaries.

15:51

Essentially what could happen for example when it comes to Linux, is that classes could be compiled to ELFs, then when they need to be opened they can be linked in as such. Although, I am not sure if Linux supports dynamic linking of libraries that exist only in memory. Looking into it, it does not. So when it comes to generation, I will need to use blobs and such.

15:58

When it comes to the output, I should support multiple classes being output into a single SSJITOutput. This way when working with the initial SquirrelJME binary which the user would use, all classes exist and are pre-merged into it. The native code would have to be generated in a way where all code acts together as a single unit. So in this case, entire JARs will be merged into a single fragment. However, I will need to devise a means where there can be multiple namespaces within the output (for multi-JAR support).

16:01

For example, for the ELF format all symbols and such will be generated and placed in an output ELF file. Although doing this within an ELF may be complex because there may be some things which are unknown (such as how large a given section is).

16:06

So I suppose for simplicity, keep with the blobs and instead of a container. Ther container would be the ELF which loads and initializes the classes stored within the blob. But still allow for multiple JARs and resources to be namespaced in a single blob.

17:03

Thinking about it, when it comes to generating the bootstrap code, it will be very difficult to have a split apart JIT when it comes to building the initial binary which would generate the machine code. Another consideration is the number of objects that will need to be initialized for every class. So I suppose I need a separated native code generator, which there is a single instance for (with functions as I currently have it) where it is then attached and associated with a state tracker. This way there is a single instance, if multiple functions need to interact with the same state, they can use the passed state tracker instead.

17:07

So SSJIT is given a NCGManager. That NCGManager will then create any associated output with a given state and singular set of instances for functions.

17:11

Also going to place anything related to the JIT in net.multiphasicapps.squirreljme.jit, then branch from that for example. The code generators will be a bit higher level. I would suppose below the code generator there would be the assembler.

17:14

Then this way, when it comes to generating native code I can have similar means of generating. I will have variants, but instead of operating system modifications of functions, I instead will have modifications of the code generator. The assembler, code generator, and JIT should at best be of a single state which will enable only a single assembler or other instance to exist at a time. This would reduce the allocation count. I will suppose that for the assembler, it will be given an output stream. The only consideration is that there may be the potential for sub-variants of variants (such as big endian and such).

17:19

So what I will need is a means where I can easily specify all of the flags which may change how an assembler operates, such as which features are available and such. I suppose what I can do is have a standard setup of sorts. There would be the variant, but that would define a standard set of flags which are supported by a system. So powerpc32+g4 will turn into powerpc32+g4,altivec,big. A variant will by default give a set of flags and otherwise which are set to on or off. So the big flag for big endian would be set by default, so having a suppose ~ before it so it is ~big would disable it, even if it were on by default with a given variant. The generic variant would just choose a set of flags to use. So I suppose for simplicity, a given CPU could have their variants mapped to enumeration entries potentially as a kind of additive set of extra variants and such.

17:28

I would suppose instead of this, that I should have an assembler configuration which is used to initialize the assembler.

17:48

Actually, having it where the assembler has add operations with types and such will be a bit complex. What I could have instead is a kind of pipeline of sorts similar to Java 8's streams. So basically there will be register selection, essentially something such as source(RegisterType.INT, "r1") which will act as the source register. Then there will be the same for the destination. Then following that, there will be an execution which performs the operation (such as an add). The native part of the assembler can check if the given operation is valid between two given registers.

17:55

I will also need some kind of native assembly provider functions, ones that could be given a single instance, where they are given the Assembler where options can be read from (such as the selected source/dest registers), the configuration, and the output stream. However the constant handling of this would be a bit ugly in a way. So I suppose the assembler becomes abstract. However a problem with this is for each architecture I could only ever have a single assembler. The assembler would never be patchable and third parties would never be able to add support for an unsupported CPU.

17:59

So I suppose something similar as I have previously talked about. Basically I will have an ASMGenlet which would be an interface. Multiple genlets could be created and have a given priority. The first argument to the genlet would be the assembler itself. However, that would create much complexity. There would be multiple instances of genlets created for every method. This definitely would not be fast. Then genlets in every case would have to check which CPU was selected for code generation. So when it comes to code generation, it will definitely not be small, nor will it be fast, nor will it be simple. I suppose I am thinking of far too complex of a solution. So instead, just have an Assembler with the source and destination as I have thought up of. If done correctly, I can even have it where a single assembler could be created and shared across all JIT instances (there would need to be a state reset). Since some operations might not be supported by a given CPU, I would need to have failure cases that throw an unsupported assembly operation.

18:08

I can then layer the code generator on top which performs virtualized operations in the event the assembler does not support it (so if hardware float is not supported, then it will be handled using software instead by replacing the code with method calls potentially). However, the layering of the JIT goes deep through all levels. Having three levels will keep things complicated. So instead of three things, the JIT should be merged into one. When it comes to support for native architectures I can have a generator class and a set of registers which are supported by a given CPU. The operating system variant could change how registers are used and such. Although it is still complex.

18:14

One thing I do not want to have is a few thousand different implementations of the native JIT for a given CPU. I can have an architecture specific one for a given operating system however. But generally I only want a single for each architecture, so the PowerPC one would target only PowerPC and there would be no other implementation of a PowerPC assembler.

18:16

Ultimately I could support only a single CPU variant and potentially if it has floating point or not. I could ignore vectorization (that is a whole complex another subject anyway). So generated code for PowerPC would quite literally target the G1 (603) processor for the most part (the original). However some CPU architecture have better instructions in later generations or have removed older ones.

18:19

So right now I am going in circles. I want a small and fast JIT, but one that is simple to write and does not consist of overly complex code. So basically something similar to SSJIT except there is one instance of it and it is given an input class to transform (instead of one instance per class). It would not be multi-threaded capable. I would also say that support for a given CPU and its variant can be used. Support for a given architecture would extend the actual JIT class to provide the native functionality. To keep internal details hidden, package private will be used in many places. This way I can have the class decoder be external and call into the JIT.

18:39

Actually a single instance JIT would be very ugly.

18:55

I can have a slight JIT modifier be used which if found will be passed a JIT. It can then see which instance the JIT is and possibly perform register banning or other minor tweaks.