Item 42261548

I loved this clever, weird, awesome paper, so a short summary.

In many dynamic languages some values are stored on the heap ("boxed") and represented as a pointer, while others are represented as an immediate value ("unboxed" or "immediate"). Pointer tagging is a common way to do that: the low bit of the value tells you the value's type, and some types are immediate while others are boxed.

Naturally, the tag bits have a fixed value, so can't be used to store data. So for example your language might offer 61-bit integer immediates instead of 64-bit integers; the other three bits are used for tags. Possibly, larger integers are stored on the heap and treated as a different type (for example Python 2.X had separate int and long types for these cases).

However, it's hard to use this strategy for floats, because floats need all 64 bits (or 32 bits for single-precision, same difference). There's a trick called "NaN boxing" which makes use of the large number of NaNs in the float representation, but read the paper if you want more on that.

The authors' insight is that, suppose you have a three-bit tag and 011 is the tag for floats. By totally random chance, _some_ floats will end in 011; you can represent those as immediates with those tag bits. Obviously, that's unlikely, though you can raise the chances by using, like, 010, 011, 100, and 101 all as float tags. Still, the low bits are a bad choice. But what about high bits? Most floats have one of a couple high bit patterns, because most floats are either 0 or between, say, 1e-100 and 1e100. Floats outside that range can be boxed but since they're really rare it's not a big cost to box them.

So basically, we use high bits as our tag bits and map all the common float prefixes to float tags. This allows unboxing the vast majority of floats, which leads to big speedups on float-heavy benchmarks.

A personal note: I've been working in numerics and floating-point for a decade now and have had to deal with float boxing both from a research point of view (lots of runtime analysis systems for floats), from a user point of view (using unboxed float vectors for significant speedup in my own software), and from a teaching point of view (discussing boxing in my compilers class, using NaN-boxing as an example of cleverness).

This idea is so simple, so crazy, so stupid, and works so well, but I never thought of it. Bravo to the authors.

daanx • 3 hours ago

> This idea is so simple, so crazy, so stupid, and works so well, but I never thought of it. Bravo to the authors.

Thanks for the nice summary -- looking forward to read the paper!

The same idea of self-tagging is actually also used in Koka language [1] runtime system where by default the Koka compiler only heap allocates float64's when their absolute value is outside the range [2e-511,2e512) and not 0, infinity, or NaN (see [2]). This saves indeed many (many!) heap allocations for float intensive programs.

Since Koka only uses 1 bit to distinguish pointers from values, another slightly faster option is to only box negative float64's but of course, negative numbers are still quite common so it saves less allocations in general.

[1] https://koka-lang.github.io/koka/doc/book.html#sec-value-typ...

[2] https://github.com/koka-lang/koka/blob/dev/kklib/src/box.c#L...

ps. If you enjoy reading about tagging, I recently wrote a note on efficiently supporting seamless large integer arithmetic (as used in Koka as well) and discuss how certain hardware instructions could really help to speed this up [3]:

[3] https://www.microsoft.com/en-us/research/uploads/prod/2022/0... (ML workshop 2022)

rtpg • 4 hours ago

Do all float operations need to reconfirm those bits afterwards though? I suppose if you have some sort of JIT you can end up with a bunch of unboxed floats and would only pay the cost on boundaries though

2 replies

pansa2 • 3 hours ago

> reconfirm those bits afterwards

Thanks - I hadn't thought about that but it seems to be the main downside of this approach. The benefit of NaN-boxing is that it reassigns values that are otherwise unused - floating-point calculations will never generate NaNs with those bit patterns.

lifthrasiir • 3 hours ago

Only when they have to be boxed, but yes if you are talking about that.

pansa2 • 4 hours ago

> This allows unboxing the vast majority of floats, which leads to big speedups on float-heavy benchmarks.

NaN-boxing allows all floats to be unboxed though. The main benefit of the self-tagging approach seems to be that by boxing some floats, we can make space for 64-bit pointers which are too large for NaN-boxing.

The surprising part of the paper is that "some floats" is only a small minority of values - not, say, 50% of them.

1 reply

adgjlsfhk1 • 1 hour ago

50% means you only get 1 tag bit.

also you totally can fit 64 bit pointers inside a NaN. 46 bit pointers are only 48 bits and you have 53 bits of NaN payload. (you also could get an extra 3 bits if you only allow storing 8 byte aligned pointers unboxed)

1 reply

pansa2 • 1 hour ago

> 50% means you only get 1 tag bit.

That's enough to distinguish between "unboxed float" and "something else", where the latter can have additional tag bits.

> [64-bit] pointers are only 48 bits and you have 53 bits of NaN payload.

The paper specifically talks about support for "high memory addresses that do not fit in 48 bits". If you don't have to handle those high addresses, I don't think this approach has any benefits compared to NaN-boxing.

jonnycomputer • 4 hours ago

Thank you for the clear explanation!