So I need an efficient means of reading streamed ZIP files. It is possible though that actual entries are spaced out and potentially not next to each other as in a normal ZIP file.
I would suppose for stream based reading of ZIPs I will need a way to locate local file headers. So I need an adjustable buffered input stream where I can directly access the bytes in the buffer and know their actual positions. Once a local file header is reached I just then need to switch to a different size and detect the data descriptor if a uncompressed size was not specified. So generally something that would be a slight issue would potentially be ZIPs within ZIPs if they are placed correctly. However, I can calculate the CRC, size, and uncompressed size. Basically since the descriptor header is optional I essentially have to check every byte ahead of the current read position to determine if the compressed, uncompressed size, and CRC match the given file.
Appears the standard ZIP utility included in my system does not support reading ZIPs from a pipe and working with their data.
Basically for every byte that is read, checks will have to be made to determine if the end of the single entry has been reached. This would be not be very efficient since there would be a large number of tests. If the data descriptor was not option this task would be a bit easier. Probably something that would be a bit more efficient would be a double queue on the input bytes. So essentially the dynamic history stream would read from the last set of history and then there can be a peek method which can read ahead from the source stream. There can then be a get which returns the current history or at least a part of that history. This would be the most efficient means of writing the data. I can use the dynamic buffer code I previously wrote to manage the buffer and such.
nextEntry is called, it searches for the next entry based on header and
other file information. Then an entry is setup. Calling close on the entry for
it to work will basically read every byte until EOF is reached for that
specific entry before being marked as closed/finished.
I suppose after a
peek there is instead just
readAhead although that could
be confused with
read. I suppose instead that peek will just return the
actual requested in the future, then there can be another
grab which just
loads the given number of bytes.
DynamicByteBuffer could probably use a refactor to be much more efficient
and much cleaner.
Now that my code detects the local file header, I must now read it.
When it comes to uncompressed data, there is potential that the decompressor could read a bit ahead. Also one thing I considered is that it is possible for the decompressor to use the historical stream to read in byte sequences so to speak for what it needs. However, for the reader, I need a lower level reader which is associated with the compressed size and that one performs the detection of the end of entry if the size is undefined.
One issue is that I need to know the CRC of the uncompressed output and potentially the uncompressed size of the stream before that information is known to detect the end of the data. This would be a somewhat complex endeaver especially with the fact that the descriptor is optional. Of course this would be that much of an issue if the CRC were associated with the compressed data instead of the uncompressed one. I can always test this however.
Saw a meteor in the sky outside my window, I wonder how common that is.