Apparently Starlet was a security and I/O module on the Wii's graphics chip: https://en.wikipedia.org/wiki/Hollywood_(graphics_chip)#Star...
Awesome. I did something similar on the old Raspberry Pi Zeros which had the right chip for it. It was a Linux kernel module with with a bit implemented in ARM assembly. All it executed was an add, I always meant to go back and reverse more functionality of the instruction set.
.byte 0x05 @ iconst_2
.byte 0x06 @ iconst_3
.byte 0x60 @ iadd
.byte 0xAC @ ireturn
Back in the mid 2000s I tried to reverse engineer some of it. Got to executing a few instructions. Found that ARMv5TEJ cores did not natively execute very many Java instructions. It was sad. I had say hoped that later cores got better.
I never understood why ARM hid jazzelle docs away. It made no sense. It is probably why Jazzelle died.
ARM wanted to deprecate Jazzelle as fast as possible. Even during its development, it became clear that Java bytecode is never going to be quick, and that it's better to just JIT/AOT it.
I was working there at the time and even internally Jazelle documentation was hidden. Very odd for what it is.
Likely licensing issue with Sun/Oracle?
One rationale I've seen somewhere is that they were intentionally trying to not commit to a stable instruction set they'd need to support going forward, i.e. intentionally make support very closely coupled to a specific JVM and OS implementation.
Of course it might just also have been a cooperation with a specific JVM vendor (that could then promote their JVM as the fastest on a given set of CPUs).
It's a bit of a bizarre design; the JVM's execution model is not sufficiently different from any standard procedural execution semantics to justify special hardware support. It makes more sense to use a normal JIT/AOT compiler that's modular over the target ISA.
There's that one ARM instruction specifically for JavaScript float-to-i32 casts, but at least that's relatively narrowly scoped.
>JavaScript float-to-i32 casts
To be clear, the behavior in question is Intel behavior that snuck its way into the JavaScript spec; ARM just doesn't wanna use the word "Intel" in public documentation. For similar reasons there's a few other places where there's an "alternate" execution mode that exists solely to emulate x86.
> "alternate" execution mode that exists solely to emulate x86
Is this something different from TSO?
There was a narrow window where CPUs were big enough to have circuitry for JVM fast bytecode execution, but too small for strong JIT compilation. ARM deprecated Jazelle as soon as that window passed.
It so much that JITs became feasible, it's that bigger CPUs were less suitable for Jazelle's approach because of the behaviour of the in-order CPU pipeline.
Because Jazelle converted Java bytecodes into ARM instructions in sequence, there is no opportunity for any instruction scheduling. So a bytecode sequence like:
// public static int get_x(int x, T a, T b) { return a.x+b.x; }
aload_1
getfield #N
aload_2
getfield #N
iadd
would go down the pipeline as something like: LDR r1, [r0, #4] // a_load1
* LDR r1, [r1] // getfield
LDR r2, [r0, #8] // aload_2
* LDR r2, [r2] // getfield
* ADD r1, r1, r2 // iadd
There would be a pipeline stall before each instruction marked with a *.On the first ARM 9 CPUs with Jazelle, the pipeline is fairly similar to the standard 5 stage RISC pipeline (Fetch-Decode-Execute-MemoryAccess-Writeback) so this stall would be 1 cycle. That wasn't too bad - you could just accept that loads took usally 2 cycles, and it would still be pretty fast.
However, on later CPUs with a longer pipeline the load-use delay was increased. By ARM11, it was 2 cycles - so now the CPU is spending more time waiting for pipeline stalls that it spends actually executing instructions.
In contrast, even a basic JIT can implement instruction scheduling and find some independent instructions to do between a load and the use of the result, which makes the JIT much more performant than Jazelle could be.
It was more about Memory usage than anything else.
It was designed for a market with feature phones running J2ME apps in about 4-8MB of RAM. The CPU was probably fast enough to do decent JIT compilation, but a JIT compiler would take a few hundred KB of code, and then suck up a large chunk of RAM for the code cache.
As far as I'm aware, it's is not as fast as a JIT would be on the same hardware (due to the stack machine), but Jazelle has the advantage of being significantly faster than an interpreter, while using about the same RAM (if not less).
Back in the day I worked for a company whose J2ME engine was much faster than the competition, which was pretty much because it had a JIT and the competition was interpreted. (We also had our own homebrewed implementation of most of the Java libraries, which tended to be more efficient because they weren't written in Java.)
Yeah, people fail to realize it was not designed to speed up a JIT, but speed up an interpreter, as most of the Java ME/SE ARM VMs at the time were interpreters. I have my doubts it would succeed at that, but at least it was clearly designed to do that.
By the time you could start doing full desktop OSes and full Java VMs on small ARM chips, Jazelle (DBX) had been stubbed out completely for years, handling only the most simple of opcodes or even none at all.
Funny, I pulled up HN while I go through modding a Wii I got from hard rubbish.
This seems very useless. I'm not sure Jazelle was ever used for anything, and I'm not sure why anyone would want to either - least of all on the Wii's IO processor.
Still, this repo links to some other stuff I found interesting. The Starlet exploit which is linked is funny for how basic it is, and it also seems to be part of a much bigger and more ambitious (but mothballed?) project.
Jazelle was before my time but what I like is that maybe v7 maybe even v8 have enough Jazelle logic to say "we don't do that here" and that's it. It was in one of their docs that the only functionality is detection of them as illegal instructions. So all those little cost optimized chips had to spend some silicon because of that decision years ago, which feels very x86.
Is this any different from detecting illegal instructions in general?
I would have thought any encoding in the unused part of the instruction space would generate a SIGILL, needing the "we don't do that here" logic.
A bit fuzzy. I mostly work on M-class so idk what the sig would be, but a cursory glance says there's a dedicated "go to Jazelle" instruction that maybe it deals with differently? That way if any bytecodes overlap with arm/thumb it'll still know. Thinking about it, I'm more certain that's what it is. Maybe there's a bytecode that's a valid non-Java encoding that's a valid instruction, so they focus on the Jazelle mode entry
You're looking at it exactly the wrong way around: Jazelle support on the Wii's IO processor was useless – but thanks to this, it's not anymore :)
>least of all on the Wii's IO processor.
The Wii sold well, thus, if anything, it is a particularly good target for playing with Jazelle.
I was approaching it from the perspective of "What would be useful to do on a Wii?" When you approach it from the perspective of "I want to play with Jazelle - what can I use?" it makes much more sense. Thanks.
EDIT: Following one of the reference links, apparently you can enter Jazelle mode on the Nintendo 3DS's application cores. That's another suitable target. https://github.com/SonoSooS/libjz
Philosophical question: What does "useful" mean in the context of a game console?
In my view, it's people having fun with it and maybe learning a new thing or two, so I'd say this is as useful as it gets.