wyager 1 day ago

It's a bit of a bizarre design; the JVM's execution model is not sufficiently different from any standard procedural execution semantics to justify special hardware support. It makes more sense to use a normal JIT/AOT compiler that's modular over the target ISA.

There's that one ARM instruction specifically for JavaScript float-to-i32 casts, but at least that's relatively narrowly scoped.

2
kmeisthax 1 day ago

>JavaScript float-to-i32 casts

To be clear, the behavior in question is Intel behavior that snuck its way into the JavaScript spec; ARM just doesn't wanna use the word "Intel" in public documentation. For similar reasons there's a few other places where there's an "alternate" execution mode that exists solely to emulate x86.

MrBuddyCasino 1 day ago

> "alternate" execution mode that exists solely to emulate x86

Is this something different from TSO?

elchananHaas 1 day ago

There was a narrow window where CPUs were big enough to have circuitry for JVM fast bytecode execution, but too small for strong JIT compilation. ARM deprecated Jazelle as soon as that window passed.

Teongot 17 hours ago

It so much that JITs became feasible, it's that bigger CPUs were less suitable for Jazelle's approach because of the behaviour of the in-order CPU pipeline.

Because Jazelle converted Java bytecodes into ARM instructions in sequence, there is no opportunity for any instruction scheduling. So a bytecode sequence like:

  // public static int get_x(int x, T a, T b) { return a.x+b.x; }
  aload_1
  getfield #N
  aload_2
  getfield #N
  iadd
would go down the pipeline as something like:

    LDR r1, [r0, #4]   // a_load1
  * LDR r1, [r1]       // getfield
    LDR r2, [r0, #8]   // aload_2
  * LDR r2, [r2]       // getfield
  * ADD r1, r1, r2     // iadd
There would be a pipeline stall before each instruction marked with a *.

On the first ARM 9 CPUs with Jazelle, the pipeline is fairly similar to the standard 5 stage RISC pipeline (Fetch-Decode-Execute-MemoryAccess-Writeback) so this stall would be 1 cycle. That wasn't too bad - you could just accept that loads took usally 2 cycles, and it would still be pretty fast.

However, on later CPUs with a longer pipeline the load-use delay was increased. By ARM11, it was 2 cycles - so now the CPU is spending more time waiting for pipeline stalls that it spends actually executing instructions.

In contrast, even a basic JIT can implement instruction scheduling and find some independent instructions to do between a load and the use of the result, which makes the JIT much more performant than Jazelle could be.

phire 1 day ago

It was more about Memory usage than anything else.

It was designed for a market with feature phones running J2ME apps in about 4-8MB of RAM. The CPU was probably fast enough to do decent JIT compilation, but a JIT compiler would take a few hundred KB of code, and then suck up a large chunk of RAM for the code cache.

As far as I'm aware, it's is not as fast as a JIT would be on the same hardware (due to the stack machine), but Jazelle has the advantage of being significantly faster than an interpreter, while using about the same RAM (if not less).

pm215 1 day ago

Back in the day I worked for a company whose J2ME engine was much faster than the competition, which was pretty much because it had a JIT and the competition was interpreted. (We also had our own homebrewed implementation of most of the Java libraries, which tended to be more efficient because they weren't written in Java.)

AshamedCaptain 1 day ago

Yeah, people fail to realize it was not designed to speed up a JIT, but speed up an interpreter, as most of the Java ME/SE ARM VMs at the time were interpreters. I have my doubts it would succeed at that, but at least it was clearly designed to do that.

By the time you could start doing full desktop OSes and full Java VMs on small ARM chips, Jazelle (DBX) had been stubbed out completely for years, handling only the most simple of opcodes or even none at all.