2016/03/08

00:14

Perhaps later today I can figure out ZIP files. I suppose the next thing I will do is just do a massive dump of the central directory tables to attempt to see if there is any sane way to determine how large a local file is along with its header.

11:09

Perhaps what I can do to limit the massive number of constants that I have in the ZIP code for all of the offsets, I can do a typesafe kind of enumeration structure reference based on an input pointer. Doing it this way I can then have all the fields defined in an enumeration which can handle the type. Then I could perform a read on those fields and use long/int/short/byte as required. That would be easier to debug because I could just read and print them without giant print statements with constants. It would also probably be less prone to bugs because then the field being requested would be known.

11:20

Actually realization, I suppose a stressful day yesterday was messing with my thinking. Anyway, I can determine the size of the zip by using the local file offsets. Note that these can appear in any order, so I get them all first and have it so I know which values belong to what. I then sort all of them to determine their logical placement in the file. Then using that I can determine how large the actual ZIP is. One thing I would have to handle though is polyglots and situations where local file indexes are in the contents of stored entries. So also with this I am going to go into the structured enum form so I do not have to rely on adding with constants. Then I am going to fix my code around that, then perform the required calculation to determine the actual ZIP size. The good thing with the ZIP format is that there is the offset to the start of the central directory start which will be useful. Also currently I use the central directory size to determine where it starts from the end. So doing both of these will end up with clean code and should result in working ZIP code. I have been working on this ZIP code for 6 days, and hopefully after this I just need to get the decompression algorithms implemented and then I can continue with loading classes.

19:17

Was a busy outside real life day today. However I do know what I want to do. First I will write the structure code so I can have a safer way to access the raw ZIP data without potentially falling into bugs and such.

19:35

I can also throw the magic number information in the structure data also. I suppose due to the way it works, using an enumeration would be a bit silly so I really could just use a base class to do these things.

20:44

Structured reading will be a bit slower, however it will be more safer and such. Plus, it may be possible for it to be optimized later on when things are finished.

20:49

I can also have bounds checking on the read also to make sure things are working properly also.

20:52

Well moving part of my code to the structured system was quite easy and was no trouble at all. If in the future I may need it again I could always put it in its own package and use it as such.

22:30

All of these structure stuff can use up a bunch of memory, however I will likely have an aggressive garbage collector which can collect entire classes if they all objects are not externally references. So even if a public static final of an object of a class exists, if nothing points to any instance of the given object for a class and say the objects all reference themselves. Then that means the class and all of its finals can be removed. So say there is not much memory and a ZIP file is being read. Say for example the system runs out of memory reading the central directory. If say for example the end directory structure class is no longer referenced except by itself, then it can be collected and the static finals collected. Then if the ZIP loading code hits the class again, it will load it again. Although this would reduce speed, memory is very important. However if a class is statically compiled into the ROM then it may be possible for ROM based instances to exist. For example if all code paths for a static final are well known, then the object would always exist and be within the ROM data. I can do this and get away with it because there is no reflection and no way to access fields using reflection. So if and when Java ME adds reflection, that will be the day it will no longer have the potential for optimization on size and memory.

22:39

So this means that my static compiler and the class compiler can use the interpreter and such to determine if a prebuilt static final is constant viable. Of course I would need to remove this optimization in the run-time system when loading JARs because otherwise I would need to keep the JAR and byte code around. Writing a disassembler to evaluate would be a large mess. However, I could still include it potentially if desired. Everything would be ahead of time and cached before being used. So for example if I were to target the Nintendo 64 with a flash cart, I could keep the byte code in the ROM to allow for static optimization.

22:43

The only problem with this is that it would violate how Java would see things so I would need to have a defer for it to work properly. Normally, when initializing a static final, all other fields following it will not be initialized at all. So when the fields are being initialized I will have to have a pointer of sorts. Essentially, the constructor will just set the pointer of the field to the precomposed object that already exists.

23:10

Actually, determining the size of the ZIP is quite simple. All I need to do is determine the size of the end central directory, use the provided offset to the start of the central directory and its size. Then that is the size of the ZIP.

23:19

And that was simple, I just need to make sure it works with a bunch of arbitrary data placed before the ZIP. It says it starts at 2097152 and I did dd if=/dev/urandom bs=1048576 count=2. So yes that works.

23:56

Ok so the next thing to do tomorrow when I awaken is to start reading actual entries and implementing inflation and the CRC32 calculator to determine if data is valid or not.