tekknolagi 4 days ago

It is very exciting to get a multi-tier VM from just bytecode encoded version of VM spec.

2
versteegen 4 days ago

Yes! I've been waiting for a practical tool like this, and would love to write a JIT for Squirrel/Quirrel using it.

But I'm looking through the luajit-remake codebase, and there is still a lot of code. Assuming that the drt and deegen directories are Deegen (however, at lease drt/tvalue.h is clearly part of the VM, not of Deegen):

  > fd . -e h -e cpp | egrep -v "test|thirdparty|deegen|drt" | xargs wc --total=only --lines
  34734
  > fd . -e h -e cpp | egrep -v "test|thirdparty" | xargs wc --total=only --lines
  97629
In comparison, Lua 5.2.4 is 20.3k lines of C and LuaJIT 1.1.5, which is a (comparable?) method JIT compiler, is 22.8k lines of C and 4.8k lines of Lua (for dynasm and JIT support). LuaJIT 2.1 is 74.9k lines of C, 13.7k Lua.

vanderZwan 3 days ago

I think a large part of that might be the language they choose. Every C++ code example in the paper feels extremely verbose to me, and I wonder to which degree that is inherently required for encoding language semantics, and to which degree it's C++ syntax being noisy.

This is not a critique of the authors, btw. Considering the breadth and depthtof various types of domain-specific knowledge that have to be "synthesized" on a project like this, developing a mastery of C++ is almost a given. So implementing things in C++ was likely the most natural approach for them. It technically also might be the most portable choice, since anyone who has LLVM installed will also have a C++ compiler.

I do wonder what it would be like if this were built upon a language with more appropriate "ergonomics" though. Maybe they can invent and implement DSL for Deegen in Deegen, haha.

versteegen 1 day ago

Well... if you look at an opcode example, eg [1], it's actually almost entirely ordinary C++ which would only look slightly different if this were a non-JIT-compiling interpreter without Deegen. I say that being moderately familiar with the PUC Lua, LuaJIT, Squirrel and other VM codebases. If this is enough to produce a good JIT, that's incredible. It's quite verbose C++ but that's code style, not due to Deegen. The Deegen bit is a few lines at the bottom, which is DSL-like. I think C++ is a good choice, because it's a good choice for writing a VM.

[1] https://github.com/luajit-remake/luajit-remake/blob/master/a...

Rochus 4 days ago

See also https://stefan-marr.de/papers/oopsla-larose-et-al-ast-vs-byt... which demonstrates that we can do that with GraalVM/Truffle, and the generated VM from the AST based interpreter is even faster than the bytecode interpreter.

tekknolagi 4 days ago

There is significant warmup required, which is not good for most programs. Deegen's approach is very promising for interactive use or other situations that require low latency.

mike_hearn 3 days ago

There's warmup to get to the best possible performance, which given that Deegen is a copy/patch baseline compiler, will be far above what Deegen can do. If you only care about Deegen level performance then GraalVM will warm up to that point quite quickly. And Deegen's approach cannot easily go beyond that level because it's not a full compiler.

I think the GraalVM/Truffle guys are also working on a copy/patch mode and warmup optimizations too. So the real question is who gets to full generation of both baseline and full top-tier JIT from one codebase quicker.