Ok so, what I need to do is create actual threads and work on the structure manager.
An idea that I have for the structure manager is that partitioning free space that is available would cost a bunch of memory. So instead of having an active partition table, I will have two regions. The region at the start of the lowest available memory would be the allocation table. This region specifies basic object information and the regions of memory which are actually used. So basically there would be a handle table at the start for every allocation. At the start, the memory would be initialized to contain an intial table of a given size and a large contigious free chunk. Each allocation chunk would then have information such as its type, where its pointer is allocated, some GC information, and basically that would be it. It would have to be small so that lower memory systems are not insanely pressured into it. There would also need to be size information also. So right now if a 32-bit field is used for type and GC flags, then that means two pointer sized fields are used for the data pointer and the size.
- 16-bit: 4 + 2 + 2 = 8
- 32-bit: 4 + 4 + 4 = 12
- 64-bit: 4 + 8 + 8 = 20
I could actually split the start 32-bit field into other fields which may have atomic operations performed on it. When it comes to the class type that an object is, or the length of the array that can be placed in the data pointer information.
For arrays the type and length could be determined by the table index information. Then with this, I can have byte arrays which are mapped to specific regions of memory that are outside of the allocated memory zone.
For simplicity, the memory table contents will only pertain to their own pool region. This way I can have multiple pools where their table data is not placed in another pool.
I suppose the given allocation strategy shall be used:
- Go through all memory pools
- Try to allocate a given amount of bytes.
- Check to see if compacting all of the data within a single pool would be
enough to allocate the given number of bytes (without performing the
- If there is enough room, then everything in the pool is compacted until a large enough area of free space is available.
- Try to allocate another memory pool, if it works run the steps for 1 for that given pool.
- Garbage collect all memory pools and clear weak/unreferenced objects.
- Run step 1 again.
- To the running program, an
The compaction would have to consider also objects it cannot move (because they may be locked by a thread).
I suppose for simplicity and some debugging, the identity hash code of an object will be the table reference index. However a random value would likely be a better choice. Many of the references would have very close values with the same higher bits which would make it harder on hash maps. Random would be the best choice. However, the identity hash code could be a 16-bit value which is duplicated in the high and low places. This may be enough for Java ME since there is hopefully not going to be more than 65,000 objects at any one time.
If multiple pools are available then pools which cannot be locked are ignored for allocation attempts. So if multiple threads are allocating at the same time in a multi-pool scenario they can freely manage their own pools so to speak. One thing to consider however is that there needs to be a latch for the garbage collector (which needs to lock all pools). There would be a lock collision if the GC attempts to lock all pools. So I suppose that the first memory pool should have a special flag be used that is compared and swap to indicate that a garbage collector is running. An allocation would first compare and swap the GC lock to prevent a GC from being run, then it would lock the current pool, if that fails it would unlock the GC lock and try another pool. If the pool lock succeeds then the pool remains locked and the GC lock is cleared. In the event that two threads are allocating in a pool and they both determine that the GC has to run, they will then both CAS the GC lock. Whichever one gets the true condition performs the garbage collection. The one which fails getting the garbage collector will yield-spin until the GC lock becomes available again. Since another thread performed garbage collection it would start the allocation cycle all over again.
So the allocator would keep a reference to the first pool for the GC lock and then use the other pools accordingly. Actually if two threads were to hit the GC at the same time.
Actually I can compare and swap the same value. Use for example
compareAndSwapInt(a, 0, 0). Although thinking about it, since the flags are
quite simple it might be best if it were a pointer instead. However that would
complicate the math because it would need to be stored the positions of the
fields rather than being pure constants.
Actually if two threads are running and both hit the garbage collector lock then flagging the other to determine if a GC ran would be simply just comparing the value against the GC is running value. If that value was detected then the allocation algorithm restarts at the beginning.
One thing to consider however is cache coherency when it comes to data allocation.
One major problem though is false sharing. Since all of the table data is at the start and very close to each other, when one thread allocates data the other threads's CPUs will have to throw out their object information caches.
The current model I plan would work a bit better on older CPUs and such however. Also thinking about it, the memory pool information would be falsely shared also due to the locks so to speak.
One way would be to add a get and set operation in the memory pool manager. There would be a configuration pool so to speak which would have a pointer to the last allocated object. When a new object is allocated it will atomically be compared and set to the new allocated object. Then the current object's back chain will point to the object that was allocated before it.
. 1 ( --- 2 ( --- 3
Then the garbage collector would perform a global lock and then go through all threads and would mark any objects which can be reached. Then once that stage is done the object chain is walked down which destroys any objects which were not marked as visited and their chains reset accordingly.
However that would remove the capability of having compaction and would make it more difficult on lower memory systems. Also, when ordering fields, I need to have finals first with their alignment which is then followed by any volatiles or writable fields. This way there may potentially be a reduction in the amount of false sharing when reading field values that will never change in value.
When it comes to the memory allocator with this scheme, it just would not scale well.
Hypothetically, the table indexes would have to be at the minimum the size of a cache line. So depending on the architecture that is running, there would potentially be a bunch of wasted space in the starting table data.
So for the JIT to possibly generate better code for a target system, will need to have CPU metrics and such that can vary at run-time to adjust for how a system runs. So ideally, precompiled code will be generated exactly the same for the most part, however the layout of actual objects would vary depending on the host system. So despite having the same code, the cache line size could be 32 on one system and 64 on another, so table entries would be rounded up to this given size. This would also need to be the barrier between final and fields which are mutable and then ones which are volatile.
Currently reading an academic paper on Hoard. Technically I could use it since it is GPLv2+, however it is either C or C++ which when translated to Java does not really work well at all. So thinking about it, the current memory pool as is is a bad idea and will not work out at all. It should instead be a class which acts as a memory allocator. Then the structure manager could be managed on top of base memory manager implementations.
Ok so, currently my memory manager is a bit too high level. The kernel and
drivers may need to actually access the address space of the host on a fashion
that the infamous
Unsafe would do. So to the kernel, this would provide
basically access to the entire address space using a common interface and
such. I can also provide a similar interface for the MMUs that CPUs use also.
So if a MMU is available, when code is generated I can instead catch the
exception potentially rather than checking against a null pointer so to speak.
Although that would be a bit ugly personally.
Something that is interesting is that there is a reflection of a reflection of my eye and my iris/pupils in my glasses. That is light comes in, reflects off my eye, and then reflects off my glasses. The whites of the eye is a very dark purple while where the pupil and iris are completely black.
Due to the varying memory models on CPUs, there are not completely linear memory models. Some portions of memory could also be banked switched, there could be paging also. Then for certain architectures, code and data is kept separate (such as in the JVM) where code might not even be readable.
Since doubles and floats can be accessed via int/long I will not have them.
Actually having the memory access having a block would be very ugly. It would be best if the memory space were kept as flat as possible.
When it comes to memory management, some of it could be managed by the OS (if
there is one). Hypothetically,
valgrind could be used to determine if there
are any memory leaks.
I can keep the structure manager which can use an allocator for a native system and such.
Actually thinking about it, instead of the interpreter creating the memory pool and the required information from it, it can be assigned a memory pool by the kernel instead. This would make it possible to run the interpreter to run some code on the real kernel where it could flow into natively compiled code for example. Although it would be slow, there could be a practical application for this.
To handle the rerecording interpreter however I would need a way to determine the type and size of the allocated memory space. Memory allocation would also need to be deterministic, but in these cases it would be deterministic anyway.
One thing I must rethink is how to integrate the interpreter memory with the
kernel memory. At the start, the interpreter would have used the kernel
memory but that was switched to the kernel uses the interpreter memory. The
current method makes it easier for the re-recording interpreter to handle
re-recording. Since both states can potentially be very complex, I suppose
what I can do instead of a
Interpreter is have an object which has
interpreter configuration details. The configuration details would have the
ability to create a structure manager and memory accessor.
JVMKernel can also use the configuration system. Actually, instead the
stuff can be kept in the interpreter. However the kernel itself it initialized
with settings that the interpreter was initialized with. So basically the
interpreter would take the argument list and it could potentially modify them
before they are passed to the kernel. Then this way the kernel does not have to
use the argument rewriting (although it could still exist, just the
can do nothing with it now).
A difficulty with that is the constructor, since I can only use a default no-arg constructor. This means a factory must be used instead.
The interpreter would handle the arguments and such, there would be an early argument handling method which could potentially adjust the remaining set of arguments (or pass them through untouched). Then the arguments that the interpreter gives (which may be modified if rerecording is enabled) will be passed to the kernel. Alone, the interpreter would do nothing and as such anything that wants to use it must then take its initialization arguments and do what it needs to with it.
When it comes to the memory regions, perhaps instead everything should be
hidden behind the structure manager. Instead of the kernel directly accessing
MemoryAccessor interface, it can take it from the
Also the pointer type can be decoupled from the structure manager and placed in memory instead.
Then for memory regions (code, data, etc.) it can be utilized via the access manager by having it indicating the type of information it provides. So for example if the memory accessor has combined data and code it would indicate as such. If it had non-executable data and non-readable code then it could require two memory accessors to be used for each.