So the exception that is hit is this,

AE0e\ 0\ 1\ false\ true\ true\ true\ 3\ 3\ 3\ 4

Tail (3) minus the true head (3) is zero.


Well it does not occur when the block size is 64, but when it is 4. So it likely has to do with the block bounds and such. Perhaps related to the true offset of the data, how it is calculated that is.


Seems when the data is even, the block skipping is not correct.


This seems off (total - trueaddr). Since total is the number of bytes, it has to be corrected by the head value. So as such it is really (total + head) - (__a + head) or __a + (total - __a).


I should probably use the ticket system more, but I am the only developer so to speak.


So I just threw out my get first reading now and rewrote it, and it works. I suppose trying to fix broken code was problematic.


It seems this system I am on now had the basic assets as regular files and not symlinks, since fossil was not picking them up I just forked. Since get via last and get via first are pretty much the same, I can just have a booolean for it.


Now I just need a slightly modified set for going through the list from the tail end.


Appears I have last working now, except in the case of B. A works however.


It is likely the reason B fails is if the tail is at the edge of a block. So then now B works and A fails now. Using -1 works for A, but fails for B.

for (int i = (tail == 0 ? 0 : -1); i < blskip; i++)

However, it appears they both have a tail of zero and they do.


I would say that then it is this (head + (total - __a)) that is problematic.


Ok this case fails becuase it wants to read from the first address which in this case where the total is aligned. The block size is 4, there are 32 bytes total. So I just did a simple hasPrevious() check for now. A, B, and C pass although D fails with what appears to be an off by one (likely the sliding window code).


Well, the inflate algorithm works properly, although the data which is referred to be a sliding window is not correct. if switching the block size back to 64 works without issue and gives the same result then it is the sliding window code. However when the block size is 64 an exception is thrown. And as a result of the first based code, reading is correct. The last going code ends up causing issues.


Oh wow, the decompression code is much faster although still a bit slow. But it is much improved now as it does not take a forever to do.


I actually gave up on the problem and then suddenly the simplest solution came to me which works. I hope, because the solution was ifed out and at this point I am feeling a bit insane. And it works. So that was simple. So as a result of the ByteDeque refactor the decompression of busybox is much faster now. I would say the decompression algorithm runs at about 8KiB/s, which is far better than 536B/s.


Now what I have is two classes: BufferAreaInputStream and SizeLimitedInputStream.


Let me see... The former is never used and the latter is only used once, so the former goes away.


Ok so, SizeLimitedInputStream bulk operation now makes reading the busybox binary faster. I would say perhaps it is going at 12KiB/s. Method overhead definitely cuts down on speed, bulk is always better.


DataPipeInputStream only does single reads at a time also, so I should increase the speed of that by supporting such things. Since it is synchronized the single byte operation can turn into the bulk one.


Well I found a __l < __o, which is not good.


Adding bulk read for DataPipeInputStream has definitely increased the speed of it.


So the code is a bit faster, but to decompress 1,899,912 bytes it takes about 2 minutes, so that would be around 15KiB/s. The inflation tests take 3 seconds to complete. In comparison, the code from July 1st takes 3 seconds also. But running it again, the new code's lowest time is 1.9 seconds or 2 seconds. While the old code is 1.3 seconds.


Technically I do not need a drain buffer, I can just use the array that was passed in with its limit and such, although that would be a bit risky and vulnerable to timing attacks.


So the next points for refactoring would be the ZIP file reading code. I could also refactor and speed up the inflater also. If I remove the need of using BitCompactor and BitInputStream when handling input data. Or at least increasing the speed of BitInputStream then things may be a bit better.


Also readBitsInt just calls readBitsLong and masks, this would add a slight casting value penalty and method call. The long version can just call the int version and OR the values as required.


Making BitCompactor not thread safe might improve some speed. It could also be associated with an output stream or a DataPipe instead of having a callback.


Optimizing BitCompactor gets me now 1.008s which is better than before. Not much of an improvement but it is getting there.


Running the code on HotSpot's VM, it decides to optimize the inflation code just before it finishes. After it does that, inflation goes by quickly. However the code is still far too slow to be practical. I should make more classes not thread safe, in general the classes are used in a single thread at a time generally.


Thinking about it, the data pipes and related classes should not be thread safe. They are generally intended to be very fast and versatile. External locking can always be performed when they are used.


InflateDataPipe can use some precalculated shift improvements also.