steventhedev 6 days ago

This feels like a net negative result. It removes some of the complexity of using generics, but it couples between the data type and the collections it can be indexed by.

What are the benefits of this approach? Is it limited to data alignment, or is it out of a greater desire to remove generics?

6
whizzter 6 days ago

Intrusive linked lists as this is called removes unnecessary allocations and traversal (main reason why linked lists have such a horrible reputation), say a hypothetical example where you have a temporary listener object installed that listens to broadcasts on channel X and times out after 5 minutes.

Upon connecting the object is connected to the channel X linked list as well as some other list of objects that are killed at the timestamp 5 minutes in the future.

With an intrusive linked list the link-node resides within the object, the only thing needed when installing is linking it into the 2 lists (this is a few move-operations), an external linked list would _require 2 extra allocations_ for the linked-list nodes.

Broadcasting from X is almost the same since it's a simple traversal, albeit with double the cache pressure since the object and the linked-list node probably lives separately.

The real HORROR comes when disconnecting, if channel X has millions of listeners it could become a real bottleneck to traverse the linked list to find the link-node that connects the actual object since there could be tons of jumping around memory. An intrusive linked list would just disconnect itself if it's doubly-linked.

This is why hashsets/tables,vectors/arraylists,etc are often used in practice (because many "modern" OO languages never added good support for the machinery needed here) making linked lists look quite bad (there is other badness but using non-intrusive linked lists is almost always worse than using something else than a linked list altogether).

steventhedev 6 days ago

The generic version in TFA puts the data type allocated alongside the next pointer - no additional allocation needed. The only functional difference is if the zig compiler is not sufficiently advanced to understand it can reorder the fields (hence the alignment question).

The removal scenario is merely specifying that you are passing in ConnectionListNode instead of a Connection. Although maybe it's a good idea to think about how they compose comparatively.

flohofwoe 6 days ago

It's not at all unusual for intrusive linked lists though?

On AmigaOS (which is entirely built out of intrusive doubly linked list) the list node is placed at the start of a struct, so the pointer to the node is also the pointer to the linked item. There's also no 'coupling' because the list manipulation functions take pointers to the list node structs, but they don't need to know the 'outer' item struct.

Zig's @fieldParentPtr is more flexible since the node struct can be anywhere in the parent struct and you still can get from the node pointer to the item base pointer (and more importantly, it makes it trivial to link the same item into multiple linked lists).

Joker_vD 6 days ago

It's not unusual at all, it's just... should it be exposed? I personally prefer having "Node struct with next pointer + inlined generic data" design when it can be encapsulated away since, well, it can be encapsulated away, and the result data layout is the same. And when you expose this design, well, you end with all sorts of disasters, including OOP inheritance (no, really: [0]).

[0] https://catern.com/inheritance.html

flohofwoe 6 days ago

If the node embeds the data item you can't link the item into multiple linked lists, but if the data item embeds the node(s) suddenly that's trivial.

I don't know if this was the motiviation for the design change though, but IMHO intrusive linked list are the 'proper' linked list design, and C++-style "nodes with payload" are a weird outlier.

Another reason might be that generic code leads to a lot of code duplication in the binary (which may or may not be optimized away by the compiler though - but at the cost of increased build times though).

lelanthran 6 days ago

> If the node embeds the data item you can't link the item into multiple linked lists, but if the data item embeds the node(s) suddenly that's trivial.

Perhaps I'm missing something, but isn't it the other way around? In C, this is what I'd expect for the intrusive and extrusive:

    // Intrusive
    struct user_data_t {
       struct user_data_t *next;
       struct user_data_t *prev;
       // The object's fields
    };

    // Extrusive
    struct node_t {
       struct node_t *next;
       struct node_t *prev;
       struct user_data_t *payload;
    };
The intrusive one obviously can't be shared between lists.

flohofwoe 6 days ago

No, intrusive would be (note: no pointers in user_data_t):

    typedef struct { node_t* next } node_t;

    typedef struct { ... } payload_t;

    typedef struct {
        node_t node;
        payload_t payload;
    } user_data_t;
...and if you want user_data_t to be included into multiple lists:

    typedef struct {
        node_t list1_node;
        node_t list2_node;
        node_t list3_node;
        payload_t payload;
    } user_data_t;
...of course in C now it gets tricky to get to the start of user_data_t given a pointer to `list3_node`, but that's where @fieldParentPtr comes in.

The advantage versus your extrusive example is that the payload doesn't need to be referenced through a pointer, which drastically simplifies lifetime tracking / memory management.

lelanthran 6 days ago

> No, intrusive would be (note: no pointers in user_data_t):

    typedef struct { node_t* next } node_t;

    typedef struct { ... } payload_t;

    typedef struct {
        node_t node;
        payload_t payload;
    } user_data_t;

That's exactly the same as my intrusive structure, no?

> typedef struct { node_t list1_node; node_t list2_node; node_t list3_node; payload_t payload; } user_data_t;

In C, at any rate, that doesn't give you "inclusion into multiple lists". It gives you "inclusion into at most 3 lists. The extrusive example I posted gives "inclusion into multiple lists".

So, yeah, I'm still not seeing your point.

Zambyte 6 days ago

> That's exactly the same as my intrusive structure, no?

It's not, for the reason they put in parentheses:

> note: no pointers in user_data_t

An allocation of user_data_t also allocates space for payload_t, rather than just allocating space for a pointer to payload_t. Your structure requires an additional allocation to point the payload_t* at something. The fact that they hid the next node_t* in a struct doesn't matter though.

lelanthran 6 days ago

> It's not, for the reason they put in parentheses:

>> note: no pointers in user_data_t

I feel like I am taking crazy pills: where, in my intrusive example, are pointers in the `user_data_t`?

My intrusive code sample had no pointers for the user_data_t. It's exactly the same as GP's intrusive example.

Zambyte 5 days ago

It is not you that is crazy, I have been exhausted. My bad.

sph 6 days ago

Your link about inheritance is worth its own post.

EDIT: Previously: https://news.ycombinator.com/item?id=26988839

codr7 4 days ago

This approach is also common in C:

https://github.com/codr7/hacktical-c/tree/main/list

One pretty big advantage is you get some help from the compiler making sure you mean what you say. As opposed to just blindly casting items and assuming no one forgot to put the list in their struct. Also the possibility to be part of multiple lists.

GolDDranks 6 days ago

I think it matches really well for the Zig ethos of "artisanal, minimal code". More than often, a type is strongly tied to how it is being used in the code base. In that sense, having it be tightly coupled to a data structure isn't much of a problem. The opposite isn't true, and the data structure is independent of the embedding parent data. The implementation of that data structure itself can still be presented as a library, and the "generic" parts carefully tested.

reissbaker 6 days ago

Zig has no desire to remove comptime AFAIK (comptime being the larger Zig feature by which people implement generics in the language) — comptime is pretty much the reason to use Zig.

The benefits of intrusive linked lists are higher performance; you can use comptime to still have decoupled code.

surajrmal 6 days ago

The generic approach should be the same performance. This approach just lets you place your data in multiple lists without needing multiple allocations.

reissbaker 6 days ago

No, the generic approach requires your data to be spaced out further in memory (or to be heap allocated), which causes CPU cache misses and is slower. The entire reason for intrusive linked lists is performance. Standard linked lists are notoriously slow compared to almost any other similar data structure, which is why they are hardly ever used in real code (ArrayLists and vectors are much more common).

foldr 4 days ago

Maybe it requires this in Zig (I don’t know), but in general there’s no reason why you couldn’t allocate the nodes of an extrusive linked list from a pool residing on the heap or on the stack. For example, you could do this with the STL (for all that STL allocators are a pain to use in practice). Or you could have a slightly different API where you add nodes to the list by passing an entire Node<T> to the relevant function rather than just T, at which point you can trivially allocate the nodes as you please.

kevin_thibedeau 6 days ago

C++20 has consteval to do the same thing.

Someone 6 days ago

The only logical explanation I can see is that these are two decisions:

- linked lists aren’t useful on modern systems because traversing them causes to many cache misses. Therefore, we shouldn’t provide such a simple implementation.

- but we need one in low level OS code, and there, intrusive lists are preferred. Their limitation that you cannot store an object in multiple lists isn’t a problem, and the extra speed and the fact that you can move items between lists without allocations is desired.

I don’t see why the intrusive implementation has to be so basic, though. Can’t you, in Zig, express that a generic type T used in a generic list class has to have a nullable next field that points to a T?

throwawaymaths 5 days ago

> linked lists aren’t useful on modern systems because traversing them causes to many cache misses

Only if you "default to using" (and only use) malloc. Zig encourages you to use different allocators within the same program for different use cases, including, potentially allocators which will thrash the cache far less (for example thread local arenas).

yccs27 6 days ago

You could create a comptime function that adds the Node field to any type. I guess then you've arrived back at the previous generic version.