SquirrelJME: 2016/06/18

08:37

Actually my previous times are invalid because I had more packages available while right now due to a reboot my built package cache has been cleared.

4.0s
2.4s
2.3s
2.4s
2.4s

So due to there being less packages, it is definitely much faster. However two seconds is still a bit slow.

08:42

One thing that could speed up searching ZIP entries would be if the hashcode of the entries were stored. However that would be a bit ugly. What I actually could do is binary sort all of the filenames using the String comparator. This way when searching for an entry it happens at a faster rate. This would reduce the number of searches and would find them faster.

08:49

That however requires being able to read the filename of the directory entries. So perhaps instead of initializing a new file for every pass of the entry, instead what is done is that there is a getEntryName which returns a string of the given index. That way the entry names can be cached and then used to initialize the entries also. Then with this, searching through entries does not have to initialize all of them, it just has to read their name. So this would reduce the number of allocations and parsing performed since creation of ZipEntry would be avoided.

09:45

Seems I never cached read entries, so they were never placed in the array to be used again, so on every get of an entry they were reinitialized.

10:03

Ok, so after these changes (and the caching of entries which was an error) I get the following results:

1.8s
1.0s
1.0s
2.0s
1.0s

So this is a definite 1 second faster than before.

10:06

When it comes to sorted entries, I suppose it should depend on how many entries are within the ZIP and if a request threshold was reached I suppose. However that would complicate things a bit, it would likely be best to just sort them all at the start. However, if ZIPs are just thrown away then the sorting will not matter at all. So I suppose for the package list that ZIPs should remain open?

10:23

If the ZIP remains open then a binary search would be better. However in the long run, ZIPs will remain open due to being used as resources. When a ZIP is read into the class path, it will pretty much never close as long as a resource or class will need to be looked up (in general cases, this would be common). So a sorted list by name would definitely be faster when it comes to searching by name.

21:25

And now my merge sort works. I just need to now test odd sized elements.

10:35

Big-O wise, binary search beats linear search every time, so I will go for a binary search when a ZIP is initialized.

10:55

I need a way to sort the arrays while also remember their original indices and such. The Comparator and sort operation has to go through all entries, however it passes only a string so the entry order is not known. What I could do instead of strings is sort by hashcode. I would have to handle cases where there are duplicate entries since it is unspecified which index is returned if there are duplicates.

10:59

Alternatively, I could sort a fake list sort of. I can have an array of Integer and then use a Comparator on it. The comparator will read the value and then compare the original strings instead. I could alternatively make a merge sort which takes the index based approach and uses that. Basically instead an array of indexes in their sorted order is used. Then once the values are properly sorted, they are moved back into place from the original depending on the actual index position. So the initial array is just an index from 0 through n. This would require all the strings be created but would not need a number of temporary integers to be created. However, I could not use a built in sorting algorithm. So for simplicity, it will be merge sort.

11:10

Alternatively instead of sorting names, I can just have a sort index of references instead. The name search can just go by these indexes instead of using a linear search.

11:24

I am likely to implement the sorting algorithm many times over especially in the CLDC libraries. Therefor I should have generic code be declared and then use the internal sorting code. Internally the sorting of any type of data can just be done with a single integer array of indexes and a special method. Although there would be a slight speed penalty due to method calls, all of the sorting algorithm code would result in no duplicated sorting algorithms. I really do not have to have 9 merge sort algorithm copies for (all primitives, Object, and Lists). The internal sorting algorithm will create a copy integer array with the input elements, then sort them using a special Comparator.

12:44

I am actually going to have to test the sorting algorithm, so make sure that it works properly.

12:41

Actually, the class should have a shorter name.

15:41

I believe I have a merge sort now.

21:29

And it fails for a given case.

21:34

Seems for this one merge case, one value is never merged in. It is perhaps the result of a merge of sorts.