There was a narrow window where CPUs were big enough to have circuitry for JVM fast bytecode execution, but too small for strong JIT compilation. ARM deprecated Jazelle as soon as that window passed.
It so much that JITs became feasible, it's that bigger CPUs were less suitable for Jazelle's approach because of the behaviour of the in-order CPU pipeline.
Because Jazelle converted Java bytecodes into ARM instructions in sequence, there is no opportunity for any instruction scheduling. So a bytecode sequence like:
// public static int get_x(int x, T a, T b) { return a.x+b.x; }
aload_1
getfield #N
aload_2
getfield #N
iadd
would go down the pipeline as something like: LDR r1, [r0, #4] // a_load1
* LDR r1, [r1] // getfield
LDR r2, [r0, #8] // aload_2
* LDR r2, [r2] // getfield
* ADD r1, r1, r2 // iadd
There would be a pipeline stall before each instruction marked with a *.On the first ARM 9 CPUs with Jazelle, the pipeline is fairly similar to the standard 5 stage RISC pipeline (Fetch-Decode-Execute-MemoryAccess-Writeback) so this stall would be 1 cycle. That wasn't too bad - you could just accept that loads took usally 2 cycles, and it would still be pretty fast.
However, on later CPUs with a longer pipeline the load-use delay was increased. By ARM11, it was 2 cycles - so now the CPU is spending more time waiting for pipeline stalls that it spends actually executing instructions.
In contrast, even a basic JIT can implement instruction scheduling and find some independent instructions to do between a load and the use of the result, which makes the JIT much more performant than Jazelle could be.
It was more about Memory usage than anything else.
It was designed for a market with feature phones running J2ME apps in about 4-8MB of RAM. The CPU was probably fast enough to do decent JIT compilation, but a JIT compiler would take a few hundred KB of code, and then suck up a large chunk of RAM for the code cache.
As far as I'm aware, it's is not as fast as a JIT would be on the same hardware (due to the stack machine), but Jazelle has the advantage of being significantly faster than an interpreter, while using about the same RAM (if not less).
Back in the day I worked for a company whose J2ME engine was much faster than the competition, which was pretty much because it had a JIT and the competition was interpreted. (We also had our own homebrewed implementation of most of the Java libraries, which tended to be more efficient because they weren't written in Java.)
Yeah, people fail to realize it was not designed to speed up a JIT, but speed up an interpreter, as most of the Java ME/SE ARM VMs at the time were interpreters. I have my doubts it would succeed at that, but at least it was clearly designed to do that.
By the time you could start doing full desktop OSes and full Java VMs on small ARM chips, Jazelle (DBX) had been stubbed out completely for years, handling only the most simple of opcodes or even none at all.