Item 43680655 - HN

mgaunard • 6 days ago

There are several advantages to intrusive node-based data structures that I haven't seen stated in the article or the comments:

- the same object can move between containers with no allocation and no need for a dedicated complex API

- the same object can be part of multiple containers at once; particularly useful for intrusive binary trees, for indexing data with different criteria.

- the container can be fully polymorphic, no need for all elements to be the same dynamic type.

- no need for complex allocators, you can just store the objects as you see fit.

7

suspended_state • 6 days ago

I think the main advantage of the intrusive definition is that you can use it to implement the non intrusive one easily, whereas the reverse isn't possible.

NobodyNada • 6 days ago

With a non-intrusive linked list type, can't you define the intrusive version as `Node<void>`?

boomlinde • 6 days ago

> the same object can move between containers with no allocation and no need for a dedicated complex API

This is true also of the previous design.

> no need for complex allocators, you can just store the objects as you see fit.

Same here; the previous design leaves it up to the user.

MrCroxx • 6 days ago

Agreed. Intrusive data structures are good for implementing multi-container data structures with lower allocation overhead. And they are widely used than expected. E.g. LRU is a classic multi-container problem.

paraboul • 6 days ago

> the same object can be part of multiple containers at once

I'm not sure I understand this one. Since the object contains the reference to where it belongs inside a container (e.g. object.node.next) how can it be re-used in multiple containers. Conversely, in a non-intrusive data structure, multiple containers can hold a ref to the same object through an intermediate node object

gnubison • 6 days ago

You add multiple next variables. buffer.next, buffer.next_displayed, etc

paraboul • 6 days ago

That's not an advantage of intrusive data structures then. That's precisely an advantage of non-intrusive data structure : object can be inserted in an arbitrary number of containers

mikepurvis • 6 days ago

But for the intrusive one you have the visibility into its membership in the other structures, since you're holding a reference to the object and its intrusive fields.

In the non-intrusive case, all you know is how you got there, you don't have return info to the other structures— particularly annoying if you're trying to destroy an object and have to speculatively look for references to it in the other lists in order to get rid of them.

paraboul • 5 days ago

Of course. But I wouldn’t qualify "multi-container use" to be an advantage of intrusive data structures per se.

dlahoda • 4 days ago

when zig people say multicontainer use, they mean C use, not O(log N) nor O(1)( which may be O(N) for small data). but very strict C.

Zambyte • 6 days ago

The object can contain multiple intrusive node fields.

grandempire • 6 days ago

Is there any generic implementation which is not intrusive? I expect C++ forward_list to look like

struct Node<T> { Node<T> *next; T x; }

_huayra_ • 6 days ago

At least in C++ land, that is not quite what is referred to as intrusive lists. It's basically the opposite / inside-out of what you wrote:

```C++ struct MyType : Node<MyType> { Node<MyType> next, prev; // rest of data }; ```

I usually reach for Boost.Intrusive when I need this [0].

[0] https://www.boost.org/doc/libs/1_88_0/doc/html/intrusive/usa...

grandempire • 6 days ago

I see. I am noticing the main difference is that forward_list manages the lifetime and allocation of nodes and that having a pointer to an object is equivalent to a pointer to the node (could be implemented as a helper).

The layouts look the same?

_huayra_ • 6 days ago

At the byte level, it's quite possible the layouts are the same. However, an "intrusive data structure" means that the nodes themselves are the data.

In other words, intrusive is like this:

struct T : NodeType<T> { // T data NodeType<T> next, prev; };

whereas non-intrusive data structures the T's are not the nodes themselves

struct NodeType<T> { NodeType<T> next, prev; T t; };

Doing non-intrusive means you need to own the lifecycle of the nodes (and make them copyable, move-constructible, or some combo thereof). Intrusive means that the caller can manage the nodes' lifecycles and just say "here's a node, I promise it'll live while the list is still alive" and have that node be on the stack, heap, static, etc.

rowanG077 • 6 days ago

All of these sound like grievous sources of bugs to me. Sure it's nice that it's possible but please don't do those things unless there isn't a safer way to do it.

Diggsey • 6 days ago

They are definitely more error-prone usage patterns, and coming from Rust the amount of hidden unsafety is quite unsettling (in addition to the obvious potential for user errors, depending on the aliasing model of your language/compiler, which I suspect Zig hasn't figured out yet, this parent pointer trick could even lead to miscompilations...)

Having said that, intrusive linked lists are quite an important data structure in code which is required to be allocation-free. Kernel code, signal handlers and code which requires consistent performance (eg. audio processing) need to be able to move objects between containers or have objects in multiple containers without allocating.

rowanG077 • 6 days ago

Use a separate data structure and then index into it with a linked list like API. No need to do something so dangerous if you simply need an allocation less linked list interface.

But yes there is a place for these data structures. It just shouldn't be common.

Diggsey • 6 days ago

If you just mean using array indexes instead of pointers, that can work in some cases, but can run into concurrency issues when the container needs to be resized. With pointers to nodes, new nodes can be allocated and old nodes freed on a separate thread without interfering with the performance sensitive thread, whereas resizing an array requires all access to stop while the backing memory is re-allocated.

I don't think you mean this, but elaborating on why a non-intrusive linked list is not very useful, that's because it takes linear time to find an object in a non-intrusive linked list, but constant time in an intrusive linked list.

For example:

    struct Node {
        Node* prev_by_last_update;
        Node* next_by_last_update;
        Node* prev_with_owner;
        Node* next_with_owner;
        Payload data;
    }

This intrusive structure allows reassigning and updating nodes, whilst being able to efficiently find eg. the next most recently updated node, or to iterate over nodes with the same owner.

bananabiscuit • 6 days ago

Do you mean you will likely make an error when implementing a standard textbook linked list? Or that a standard textbook linked list has non-obvious errors in its design? If so, would be curious to learn more.

simonask • 6 days ago

Most programmers can implement a linked list.

Extremely few can use one safely.

They're an elegant weapon for a more civilized age - namely one without multithreading, aliasing-based optimization, and a host of other nice things.

tauoverpi • 6 days ago

Multithreading is exactly where linked lists are useful as in the implementation of MPSC, MPMC, SPMC, and so on it allows one to safely construct objects out of vire of other threads then use a CAS to append the new item to the list making it visible for other threads to consume even without needing to allocate additional memory.

They're also pretty useful for free-list structures which overwrite entries to be removed (they're scheduled for deletion thus assumptiong here is that the object is now dead) with linked list nodes that would fit in that space thus never needing to allocate any additional as you can reuse existing space.

simonask • 5 days ago

Yeah, wait-free data structures often contain linked lists. That doesn’t cancel out all of the other problems.

Anecdotally, I’ve seen wait-free structures used incorrectly many more times than I’ve seen them used correctly. Some people think they are magically faster because “acquiring a mutex lock is expensive”, but that’s far from always the case.

mgaunard • 2 days ago

a CAS-loop, as suggested in the message you're replying to, is certainly not wait-free, only lock-free.

codr7 • 4 days ago

I don't see a problem with their safety, even in C.

More the opposite, their simplicity makes them safer to deal with than the alternatives. And sometimes their unique features (mostly the intrusive variant) are exactly what you need.

Multi-threaded accesses to a vector would be just as bad?

im3w1l • 6 days ago

Implementing anything is more error prone than using something battle-tested.

throwawaymaths • 6 days ago

Maybe advocate for wrappers in the stdlib to implement typesafe. LL, typesafe extrusive LL, etc. Using this as a primitive?

tauoverpi • 6 days ago

You don't need everything to be generic to have safety as the API provided by the object using the linked list can maintain safety by making it difficult to do the wrong thing and being correct by construction. Of course that doesn't stop a user of a library ignoring the safe way to handle types which use intrusive linked list nodes but that same user is likely going to shoot themselves in the foot regardless even when given a "safer" interface.

Trying to write a safer version of the above without losing the performance / advantages takes you into the realm of ATS2 https://www.cs.bu.edu/~hwxi/atslangweb/ which quickly becomes more difficult to work with than practical (and no, rust can't solve this with it's implementation of resource types as far as I'm aware).

So yes, doing the above makes sense when the problem demands it as allocating memory you don't need when there's a clear way to represent a graph without it just wastes cycles on memory management you didn't even need to do.

immibis • 6 days ago

bugs* ironic

It depends on whether you approach programming from a perspective of writing piles of obviously correct abstraction layers (the total effect of which is to leak so much the solution ends up being wrong anyway), or from the direct unabstracted perspective of actually making the computer do what you want it to do (which does not preclude the solution from using abstractions).

I call them the Uncle Bob vs Casey Muratori school, after Casey's lectures such as "Clean Code, Horrible Performance": https://www.youtube.com/watch?v=tD5NrevFtbU

rowanG077 • 6 days ago

Well uncle bob gives good advice. If you do exactly the opposite of what he tells you to do. so I don't think this comparison is apt at all.

immibis • 6 days ago

Can you elaborate on why he's a hack and perhaps how this relates to the idea of forcing yourself to use highly constrained inefficient "safe" data structures?

rowanG077 • 6 days ago

Because the excessive OOP and TDD he recommends is just bad. He also hates static typing and seems to think testing can somehow replace it.

I'm not for forcing inefficient safe data structures. You are pretending that this is a dichotomy when the vast majority of the time it isn't. You are completely misinterpreting what I said. I'm for safe data structures. And the data structures described are huge footguns. I made no point about efficiency.

Ygg2 • 6 days ago

In programming and life, both extremes can be equally wrong. I'm not saying Uncle Bob is the shining beacon to steer your boat towards at all cost, but neither is Casey Muratori.

> He also hates static typing

I mean, in OOP there are many reasons to dislike it. It's harder to extend and test. Calls to such functions within other methods are a nuisance to remove. E.g. `Instant.now()`.

rowanG077 • 6 days ago

I'm sorry I'm not familiar with Casey Muratori. So I have no clue what he says.

Ygg2 • 6 days ago

He's very data and performance oriented. He gives advice on how to write games, high-performance systems. His advice is solid and great even, but not when your source of latency is a cloud DB.

immibis • 6 days ago

I beg to differ even on that. When your source of latency is a cloud DB, learning about CPU caches won't help you, but learning to think about end-to-end performance absolutely will. "What sequence of queries achieves my goal with the shortest critical path?" is very similar to "What sequence of vector instructions achieves my goal with the shortest loop-carried dependency?"

Ygg2 • 5 days ago

> but learning to think about end-to-end performance absolutely will.

In a cloud DB webservice what will save you is having a good tracing/debugging story. From it, it will be easy to see E2E performance. Sure, some things will translate, but going SIMD optimized layout on a class, that will have to talk to DB across the ocean is a fool's errands.

To optimize you must have a good target and make sure you are measuring correctly.

immibis • 4 days ago

Why do you think that making a minimum number of queries to a cloud database is anything other than conceptually similar to making a SIMD optimized layout on a class?

Ygg2 • 4 days ago

That's not what I was talking about. Optimizing queries to DBs is nothing like optimizing SIMD layout.

Ygg2 • 6 days ago

> do what you want it to do

Until a colleague or future you, or someone else calling the code, forgets.

Which leads to "I WANT TO KILL WHOEVER WROTE THIS" debugging sessions.

Encapsulation exists for a reason.

cheezebubba • 6 days ago

How can you have elements of different types? I don't understand how you would know which type a given node was in?

throwawaymaths • 6 days ago

it was maybe easier to understand when @fieldParentPtr took the type as an argument, but if you look at this:

   var x: *T = @fieldParentPtr("node", node_ptr);

@fieldParentPtr knows that its result must have type pointer T; and then the builtin function "looks" inside of T and notices there is a "node" field. Then it can backtrack the pointer of T given the pointer to the node field, since the layout of T is well-defined at compile time.

So you could have a mixed-type linked list, and at the callsite zig can at compile-time back out the parent type. There obviously needs to be outside logic (and an outside mechanism to track) this with branches for each type that you could be putting into the linked list. And yes, this is more than a little bit type-unsafe because of the potential for type erasure.

You could probably pretty trivially write a typesafe wrapper for this, as suggested in one of the comments here.

cheezebubba • 6 days ago

Thanks. I still don't understand where the bits live that tell you which type T is for a given node.

Is this something like "list starts with type A, and that will tell you if the next node is type A or B"?

Is there a way to stuff those bits in with the node somehow so you always know the type you expect back?

throwawaymaths • 6 days ago

> Thanks. I still don't understand where the bits live that tell you which type T is for a given node.

Yup. You'd have to set that up yourself. It could be anywhere. It could be in a separate array, it could be in the previous node-data, etc.

you wouldn't put it in the Node (capital N) datastructure though, that is privileged.

Personally, I would say doing something like this is "bad idea bears" unless you are absolutely certain you need it and you should explore using, say, data payload type with a union field instead, if possible (though know that that comes with a potential memory penalty)

cheezebubba • 6 days ago

Aha Thanks!