Artifact [1700c5249a]

Artifact 1700c5249ae2ed0ba3d3f5c44166ec17078361f1:


# Design Document

This document outlines several decisions and design considerations for the
SquirrelJME Virtual Machine. This document should reflect the most recent
path that SquirrelJME will be taking in its design. Note that the design may
change and this is not forever concrete (that would be foolish).

For day to day notes on my current thought process you may look within the
[developer notes](assets/developer-notes/index.mkd).

For the current route of development see the [release route](route.mkd)
document.

Because Java ME is different from Java SE, there are some considerations,
advantages, and disadvantages to consider when reading this document. Note that
this document is mostly within the scope of Java ME and not Java SE.

 * Reduced library subset.
   * The main library is much smaller which means it will load faster and
     use less memory. Less memory means it can run on smaller/weaker systems.
 * No reflection.
   * There is `Class.forName()` and `Class.newInstance()`, however they are
     trivial to support.
   * More secure because access checks do not have to be performed at run-time
     to determine if it is permissible to access an object.
   * Random object fields (and finals) can be cached because their reference
     or primitive values will never change.
   * More secure because for example changing `Boolean.TRUE` to be `false` may
     cause security exploits with code that relies on it being `true`.
   * The resulting virtual machine is smaller because information such as
     the types of fields and methods that exist and their type information does
     not have to be included within the virtual machine. This reduces the bloat
     within the virtual machine.
   * Produces faster code because `final` variable and especially
     `static final` values can directly be accessed rather than requiring a
     read of a field pointer value.
 * No finalizers (the `finalize()` method).
   * Finalizers are very integrated with how the garbage collector works.
   * It is never known when they will actually be called (if ever).
   * Timing attacks could be performed when finalizers are called between
     garbage collection runs.
 * No serialization.
   * No objects are `Serializable`.
   * Serialization uses virtual machine magic to access internal details, since
     there is no reflection the information that would be used for
     serialization does not exist.
   * The `transient` keyword becomes obsolete.
   * Simplifies implementation.
 * No `invokedynamic` instruction.
   * Simplifies virtual machine operation, at the cost of lambdas (which could
     be smartly wrapped in anonymous classes by a compiler).

There are also disadvantages however:

 * Without reflection, one cannot include plugins dynamically from the program.
   * However, Java ME 3 (or so) added LIBlets which may be optional and provide
     a slight alternative to plugins. These however are fixed to the JAR/JAD
     which means that the difficulty is increased.
   * Alternatively, Java ME 8 has `ServiceLoader` which enables JARs to
     potentially be merged to provide services. Also using `Class.forName()`
     and `Class.newInstance()`, plugins using a common interface can be
     initialized when they are not directly known.

For specific APIs, one should read the [Project Scope](scope.mkd) document
which outlines the APIs which exist for Java ME and whether they would be
implemented in SquirrelJME or third party vendors.

# Java SE vs SquirrelJME

This section is more a bit in depth of the differences between SquirrelJME and
Java SE.

## Memory

Java SE has a standard garbage collector which in general will free objects
when they are no longer referenced using a mark and sweep algorithm. In most
cases it will try to avoid freeing memory in the event that it could be used
again, also doing major garbage collection sweeps can be computationally
expensive. Despite being very virtual machine specific, this gives Java the
appearance of using quite the amount of memory (it has historically and
currently is seen as a memory and resource hog) especially if code running on
it has not be designed in an optimal way (which is usually the case). The
garbage collector in large instances may take awhile to execute and can take
considerable resources. The garbage collector generally is designed to run in
another thread and hopefully can concurrently garbage collect objects making
use of extra CPUs to decrease cleanup time. In general this garbage collector
is made for speed where it only affects program execution speed when needed.
Recent JVM advancements do have stack allocations which do not operate with the
heap at all, which makes cleanup much easier if these optimizations can be
used.

SquirrelJME on the other hand will try to use the lowest amount of memory it
can by freeing objects from memory as soon as possible (in well designed code,
one that does not use circular references). Once it is known that a given
object can be freed from memory it will be freed to make up space for other
programs and objects (within SquirrelJME). So virtually in most cases
SquirrelJME will use the minimum memory footprint needed to run it. Although
the lowest amount of memory will be used, a reference counted garbage collector
has some slight overhead in that it needs to count objects and be able to
free the objects when they are no longer counted. Parts of code which heavily
use objects will run a bit slower due to the counting required. This is
definitely a trade-off as it gives memory a more important consideration than
speed. Since SquirrelJME targets low end systems, this extra memory can be
very important. However, unlike Java SE all allocations are on the heap since
the size of the stack may be very small.

# Programming Language

SquirrelJME is written entirely in Java. This means that it only requires a
Java virtual machine and a Java compiler to be built. There are also no other
dependencies apart from what is within SquirrelJME itself, it is entirely
standalone and self contained.

One may ask why Java and not another language such as C? Well, Java is a much
simpler language compared to C when it comes to syntax (C has the preprocessor,
structures, pointers, typedefs, function pointers, etc.). One main advantage
of Java is the consistency of the code.

One misconception about using Java is that it is impossible to use native code
or one will require and assembler to assemble assembly code for things which
Java cannot do. This is not the case for SquirrelJME. The major and most
important part of SquirrelJME is the compiler which can turn Java byte code
into native machine code. Since the compiler is very much integrated into
SquirrelJME this means that certain aspects of interacting with the host
environment can be accessed by changing compilation for certain aspects in a
way where it remains compatible with Java but also provides native access when
needed. Native access is provided by replacing method calls to special static
methods within a special class by the appropriate machine code rather than
invoking a method call. These special rewrites affect everything within a
special package. This is used to provide
support for multiple operating systems and environments without causing name
collisions (since you cannot have classes with the same name in multiple
projects when they are merged together).

## Portability

I intend for SquirrelJME to be very portable so that it can be built for and
built on a large number of systems.

## Self Hosting

I intend SquirrelJME to be self hosting in that it can build itself.

In the future a Java compiler will be written which can run on SquirrelJME
itself and allow building and compiling itself from source. This would also
allow other programs to be built from source and can be used as a self
contained Java development environment.

# Environment

This details the environment in which SquirrelJME operates within the host
operating system.

## APIs

This describes the design of standard implementations.

### `javax.microedition.lcdui.*`

The LCDUI API is used by a large number of older J2ME applications to display
widgets and graphics on the screen. On SquirrelJME everything is done by
SquirrelJME itself on its own framebuffer which it draws into, this makes
portability easier.

#### `javax.microedition.lcdui.Image`

Since many systems may have varying image formats and supported native pixel
formats, instances of this class may be specific to the given hardware and may
have the screen display limitations. Images generally may use 32-bits to
store their pixel data, but some implementations may use images with a lower
bit density (such as 256 color images).

## Pathname Handling

Instances of the `Path` class will be strictly limited to the limitations of
the host system and will not provide support for allowing limitations to be
skirted as that complicates compatibility.

As an example for DOS, there are severe filename limitations such as a
maximum of 8 characters for a file name and 3 characters for an extension along
with other naming restrictions. As such getting a path which does not produce
a valid DOS pathname will result in an exception being thrown.

# Compilation Time

This section contains information related to the operation of the Ahead-Of-Time
Compiler and the Just-In-Time Compiler.

# Virtual Machine

This section contains information related to the target independent virtual
machine at run-time.

## Garbage Collection

For simplicity the garbage collector is a reference counter with sweeping when
no more memory is available (or GC is called manually). As such, cyclic
object references will not be freed unless one or both directions are
weakly referenced (using `WeakReference`). This means that the following
situations would permit both objects to be potentially collected:

 * A has a strong reference to B, B does not reference A.
 * A has a weak reference to B, B does not reference A.
 * A has a strong reference to B, B has a weak reference to A.
 * A has a weak reference to B, B has a strong reference to A.
 * A and B both have weak references to each other.

Although reference counting may increase lock contention on the CPU and memory
buses it simplifies the design greatly by not requiring complex garbage
collection algorithms. In most cases with reference counting, SquirrelJME is
capable of using always a minimum footprint of memory depending on whether that
memory should be freed to the operating system or within SquirrelJME's own
memory for other programs running in it.

Objects will have two counts: The number of strong references pointed to this
object and the number of weak references pointed to this object. These two
counts determine if an object can be garbage collected. If any object has a
strong reference count that is non-zero then it will be garbage collected as
long as it has zero weak count also. Thus, an object which is never referenced
at all will be garbage collected. In a standard Java VM, a `WeakReference` in
most cases will only give an object if it has at least one strong reference.

### Strongly reached, weakly reached (`WeakReference`)

In Java ME with Java being garbage collected, there are two types of
references to objects which affects how garbage collection is performed.
Similar to other languages this can be seen as a smart pointer or a special
kind of reference counting pointer.

A strong reference is a normal reference to an object, like a value which is
placed within a field (a static field or an instance field) or a local
variable in a method. Any objects that are strongly referenced cannot be
garbage collected because they are used.

A weak reference is another kind of reference to an object which is provided
by the `WeakReference` class. Essentially it does not have a strong bond to
the object it points to and that object may be garbage collected and
return `null` as long as no strong references point to it. Weak references
cannot really be used as a cache due to the way the garbage collector works,
for a cache a `SoftReference` should be used by Java ME does not have such a
class. At least with weak references, the virtual machine will in the most
average case always use the least amount of memory. In short, weak references
are garbage collected as soon as possible. If used for memoization it will
not have the best intended effect of reducing calculations but it would reduce
the memory footprint.

### Strong Count of Zero, Non-Zero Weak Count

When an object has no strong references and only weak references, that means
that it can soon be garbage collected. For simplicity when `WeakReference`
detects that a target object has no strong references to it, it will detach
itself from that object, reduce the weak reference count, and return `null`.
If a `WeakReference` is no longer strongly referenced it will also cause a
detach to occur.

If the system is out of memory then all objects will be iterated and any
objects which only have weak references to them will be removed.

### Strong Count of Zero, Zero Weak Count

This object can be garbage collected, it will be removed and that memory will
be made available for other allocations.

## Tasks

MIDP 3 allows multiple programs to be ran at the same time (provided they
are actual different MIDlets). One thing to simplify the design of SquirrelJME
without needing much work.

## Synchronization and Locks

Java naturally provides synchronization which is used for writing code which
is thread safe. Since synchronization and monitors are very intertwined, the
design will reference future information. Since there are a number of different
ways different CPUs and targets could have thread safety, those details have
been removed and replaced with easy to determine common means.

### Loop Threading for Multi-Threading

Since Java is multi-threaded and SquirrelJME may run on top of a number of
system which may have different threading models, the following differences
determine what happens when a loop needs to be repeated due to a failed
operation.

#### Preemptive

The loop should perform a given number of checks, then once a certain
threshold is reached a longer duration sleep should be entered so that CPU
cycles are not spent deadlocking. Essentially it waits upon a signal where
possible.

#### Cooperative

The loop should yield and not attempt another try (because only a single thread
can run at one time) that way another thread which is able to be ran can
executed, potentially one which controls the monitor for the given object.

It is possible that an internal threading manager can determine the best
thread to choose for consecutive execution.