Item 42252045

dmitrygr • 1 day ago

Back in the mid 2000s I tried to reverse engineer some of it. Got to executing a few instructions. Found that ARMv5TEJ cores did not natively execute very many Java instructions. It was sad. I had say hoped that later cores got better.

I never understood why ARM hid jazzelle docs away. It made no sense. It is probably why Jazzelle died.

cyberax • 1 day ago

ARM wanted to deprecate Jazzelle as fast as possible. Even during its development, it became clear that Java bytecode is never going to be quick, and that it's better to just JIT/AOT it.

TNorthover • 1 day ago

I was working there at the time and even internally Jazelle documentation was hidden. Very odd for what it is.

1 reply

miohtama • 1 day ago

Likely licensing issue with Sun/Oracle?

1 reply

lxgr • 23 hours ago

One rationale I've seen somewhere is that they were intentionally trying to not commit to a stable instruction set they'd need to support going forward, i.e. intentionally make support very closely coupled to a specific JVM and OS implementation.

Of course it might just also have been a cooperation with a specific JVM vendor (that could then promote their JVM as the fastest on a given set of CPUs).

wyager • 1 day ago

It's a bit of a bizarre design; the JVM's execution model is not sufficiently different from any standard procedural execution semantics to justify special hardware support. It makes more sense to use a normal JIT/AOT compiler that's modular over the target ISA.

There's that one ARM instruction specifically for JavaScript float-to-i32 casts, but at least that's relatively narrowly scoped.

2 replies

kmeisthax • 1 day ago

>JavaScript float-to-i32 casts

To be clear, the behavior in question is Intel behavior that snuck its way into the JavaScript spec; ARM just doesn't wanna use the word "Intel" in public documentation. For similar reasons there's a few other places where there's an "alternate" execution mode that exists solely to emulate x86.

1 reply

MrBuddyCasino • 1 day ago

> "alternate" execution mode that exists solely to emulate x86

Is this something different from TSO?

1 reply

fanf2 • 1 day ago

See “flag-manipulation” at https://dougallj.wordpress.com/2022/11/09/why-is-rosetta-2-f...

elchananHaas • 1 day ago

There was a narrow window where CPUs were big enough to have circuitry for JVM fast bytecode execution, but too small for strong JIT compilation. ARM deprecated Jazelle as soon as that window passed.

2 replies

Teongot • 16 hours ago

It so much that JITs became feasible, it's that bigger CPUs were less suitable for Jazelle's approach because of the behaviour of the in-order CPU pipeline.

Because Jazelle converted Java bytecodes into ARM instructions in sequence, there is no opportunity for any instruction scheduling. So a bytecode sequence like:

  // public static int get_x(int x, T a, T b) { return a.x+b.x; }
  aload_1
  getfield #N
  aload_2
  getfield #N
  iadd

would go down the pipeline as something like:

    LDR r1, [r0, #4]   // a_load1
  * LDR r1, [r1]       // getfield
    LDR r2, [r0, #8]   // aload_2
  * LDR r2, [r2]       // getfield
  * ADD r1, r1, r2     // iadd

There would be a pipeline stall before each instruction marked with a *.

On the first ARM 9 CPUs with Jazelle, the pipeline is fairly similar to the standard 5 stage RISC pipeline (Fetch-Decode-Execute-MemoryAccess-Writeback) so this stall would be 1 cycle. That wasn't too bad - you could just accept that loads took usally 2 cycles, and it would still be pretty fast.

However, on later CPUs with a longer pipeline the load-use delay was increased. By ARM11, it was 2 cycles - so now the CPU is spending more time waiting for pipeline stalls that it spends actually executing instructions.

In contrast, even a basic JIT can implement instruction scheduling and find some independent instructions to do between a load and the use of the result, which makes the JIT much more performant than Jazelle could be.

phire • 1 day ago

It was more about Memory usage than anything else.

It was designed for a market with feature phones running J2ME apps in about 4-8MB of RAM. The CPU was probably fast enough to do decent JIT compilation, but a JIT compiler would take a few hundred KB of code, and then suck up a large chunk of RAM for the code cache.

As far as I'm aware, it's is not as fast as a JIT would be on the same hardware (due to the stack machine), but Jazelle has the advantage of being significantly faster than an interpreter, while using about the same RAM (if not less).

2 replies

pm215 • 1 day ago

Back in the day I worked for a company whose J2ME engine was much faster than the competition, which was pretty much because it had a JIT and the competition was interpreted. (We also had our own homebrewed implementation of most of the Java libraries, which tended to be more efficient because they weren't written in Java.)

AshamedCaptain • 1 day ago

Yeah, people fail to realize it was not designed to speed up a JIT, but speed up an interpreter, as most of the Java ME/SE ARM VMs at the time were interpreters. I have my doubts it would succeed at that, but at least it was clearly designed to do that.

By the time you could start doing full desktop OSes and full Java VMs on small ARM chips, Jazelle (DBX) had been stubbed out completely for years, handling only the most simple of opcodes or even none at all.