There are several advantages to intrusive node-based data structures that I haven't seen stated in the article or the comments:
- the same object can move between containers with no allocation and no need for a dedicated complex API
- the same object can be part of multiple containers at once; particularly useful for intrusive binary trees, for indexing data with different criteria.
- the container can be fully polymorphic, no need for all elements to be the same dynamic type.
- no need for complex allocators, you can just store the objects as you see fit.
I think the main advantage of the intrusive definition is that you can use it to implement the non intrusive one easily, whereas the reverse isn't possible.
With a non-intrusive linked list type, can't you define the intrusive version as `Node<void>`?
> the same object can move between containers with no allocation and no need for a dedicated complex API
This is true also of the previous design.
> no need for complex allocators, you can just store the objects as you see fit.
Same here; the previous design leaves it up to the user.
Agreed. Intrusive data structures are good for implementing multi-container data structures with lower allocation overhead. And they are widely used than expected. E.g. LRU is a classic multi-container problem.
> the same object can be part of multiple containers at once
I'm not sure I understand this one. Since the object contains the reference to where it belongs inside a container (e.g. object.node.next) how can it be re-used in multiple containers. Conversely, in a non-intrusive data structure, multiple containers can hold a ref to the same object through an intermediate node object
You add multiple next variables. buffer.next, buffer.next_displayed, etc
That's not an advantage of intrusive data structures then. That's precisely an advantage of non-intrusive data structure : object can be inserted in an arbitrary number of containers
But for the intrusive one you have the visibility into its membership in the other structures, since you're holding a reference to the object and its intrusive fields.
In the non-intrusive case, all you know is how you got there, you don't have return info to the other structures— particularly annoying if you're trying to destroy an object and have to speculatively look for references to it in the other lists in order to get rid of them.
Is there any generic implementation which is not intrusive? I expect C++ forward_list to look like
struct Node<T> { Node<T> *next; T x; }
At least in C++ land, that is not quite what is referred to as intrusive lists. It's basically the opposite / inside-out of what you wrote:
```C++ struct MyType : Node<MyType> { Node<MyType> next, prev; // rest of data }; ```
I usually reach for Boost.Intrusive when I need this [0].
[0] https://www.boost.org/doc/libs/1_88_0/doc/html/intrusive/usa...
I see. I am noticing the main difference is that forward_list manages the lifetime and allocation of nodes and that having a pointer to an object is equivalent to a pointer to the node (could be implemented as a helper).
The layouts look the same?
At the byte level, it's quite possible the layouts are the same. However, an "intrusive data structure" means that the nodes themselves are the data.
In other words, intrusive is like this:
struct T : NodeType<T> { // T data NodeType<T> next, prev; };
whereas non-intrusive data structures the T's are not the nodes themselves
struct NodeType<T> { NodeType<T> next, prev; T t; };
Doing non-intrusive means you need to own the lifecycle of the nodes (and make them copyable, move-constructible, or some combo thereof). Intrusive means that the caller can manage the nodes' lifecycles and just say "here's a node, I promise it'll live while the list is still alive" and have that node be on the stack, heap, static, etc.
All of these sound like grievous sources of bugs to me. Sure it's nice that it's possible but please don't do those things unless there isn't a safer way to do it.
They are definitely more error-prone usage patterns, and coming from Rust the amount of hidden unsafety is quite unsettling (in addition to the obvious potential for user errors, depending on the aliasing model of your language/compiler, which I suspect Zig hasn't figured out yet, this parent pointer trick could even lead to miscompilations...)
Having said that, intrusive linked lists are quite an important data structure in code which is required to be allocation-free. Kernel code, signal handlers and code which requires consistent performance (eg. audio processing) need to be able to move objects between containers or have objects in multiple containers without allocating.
Use a separate data structure and then index into it with a linked list like API. No need to do something so dangerous if you simply need an allocation less linked list interface.
But yes there is a place for these data structures. It just shouldn't be common.
If you just mean using array indexes instead of pointers, that can work in some cases, but can run into concurrency issues when the container needs to be resized. With pointers to nodes, new nodes can be allocated and old nodes freed on a separate thread without interfering with the performance sensitive thread, whereas resizing an array requires all access to stop while the backing memory is re-allocated.
I don't think you mean this, but elaborating on why a non-intrusive linked list is not very useful, that's because it takes linear time to find an object in a non-intrusive linked list, but constant time in an intrusive linked list.
For example:
struct Node {
Node* prev_by_last_update;
Node* next_by_last_update;
Node* prev_with_owner;
Node* next_with_owner;
Payload data;
}
This intrusive structure allows reassigning and updating nodes, whilst being able to efficiently find eg. the next most recently updated node, or to iterate over nodes with the same owner. Do you mean you will likely make an error when implementing a standard textbook linked list? Or that a standard textbook linked list has non-obvious errors in its design? If so, would be curious to learn more.
Most programmers can implement a linked list.
Extremely few can use one safely.
They're an elegant weapon for a more civilized age - namely one without multithreading, aliasing-based optimization, and a host of other nice things.
Multithreading is exactly where linked lists are useful as in the implementation of MPSC, MPMC, SPMC, and so on it allows one to safely construct objects out of vire of other threads then use a CAS to append the new item to the list making it visible for other threads to consume even without needing to allocate additional memory.
They're also pretty useful for free-list structures which overwrite entries to be removed (they're scheduled for deletion thus assumptiong here is that the object is now dead) with linked list nodes that would fit in that space thus never needing to allocate any additional as you can reuse existing space.
Yeah, wait-free data structures often contain linked lists. That doesn’t cancel out all of the other problems.
Anecdotally, I’ve seen wait-free structures used incorrectly many more times than I’ve seen them used correctly. Some people think they are magically faster because “acquiring a mutex lock is expensive”, but that’s far from always the case.
a CAS-loop, as suggested in the message you're replying to, is certainly not wait-free, only lock-free.
I don't see a problem with their safety, even in C.
More the opposite, their simplicity makes them safer to deal with than the alternatives. And sometimes their unique features (mostly the intrusive variant) are exactly what you need.
Multi-threaded accesses to a vector would be just as bad?
Maybe advocate for wrappers in the stdlib to implement typesafe. LL, typesafe extrusive LL, etc. Using this as a primitive?
You don't need everything to be generic to have safety as the API provided by the object using the linked list can maintain safety by making it difficult to do the wrong thing and being correct by construction. Of course that doesn't stop a user of a library ignoring the safe way to handle types which use intrusive linked list nodes but that same user is likely going to shoot themselves in the foot regardless even when given a "safer" interface.
Trying to write a safer version of the above without losing the performance / advantages takes you into the realm of ATS2 https://www.cs.bu.edu/~hwxi/atslangweb/ which quickly becomes more difficult to work with than practical (and no, rust can't solve this with it's implementation of resource types as far as I'm aware).
So yes, doing the above makes sense when the problem demands it as allocating memory you don't need when there's a clear way to represent a graph without it just wastes cycles on memory management you didn't even need to do.
bugs* ironic
It depends on whether you approach programming from a perspective of writing piles of obviously correct abstraction layers (the total effect of which is to leak so much the solution ends up being wrong anyway), or from the direct unabstracted perspective of actually making the computer do what you want it to do (which does not preclude the solution from using abstractions).
I call them the Uncle Bob vs Casey Muratori school, after Casey's lectures such as "Clean Code, Horrible Performance": https://www.youtube.com/watch?v=tD5NrevFtbU
Well uncle bob gives good advice. If you do exactly the opposite of what he tells you to do. so I don't think this comparison is apt at all.
Can you elaborate on why he's a hack and perhaps how this relates to the idea of forcing yourself to use highly constrained inefficient "safe" data structures?
Because the excessive OOP and TDD he recommends is just bad. He also hates static typing and seems to think testing can somehow replace it.
I'm not for forcing inefficient safe data structures. You are pretending that this is a dichotomy when the vast majority of the time it isn't. You are completely misinterpreting what I said. I'm for safe data structures. And the data structures described are huge footguns. I made no point about efficiency.
In programming and life, both extremes can be equally wrong. I'm not saying Uncle Bob is the shining beacon to steer your boat towards at all cost, but neither is Casey Muratori.
> He also hates static typing
I mean, in OOP there are many reasons to dislike it. It's harder to extend and test. Calls to such functions within other methods are a nuisance to remove. E.g. `Instant.now()`.
I'm sorry I'm not familiar with Casey Muratori. So I have no clue what he says.
He's very data and performance oriented. He gives advice on how to write games, high-performance systems. His advice is solid and great even, but not when your source of latency is a cloud DB.
I beg to differ even on that. When your source of latency is a cloud DB, learning about CPU caches won't help you, but learning to think about end-to-end performance absolutely will. "What sequence of queries achieves my goal with the shortest critical path?" is very similar to "What sequence of vector instructions achieves my goal with the shortest loop-carried dependency?"
> but learning to think about end-to-end performance absolutely will.
In a cloud DB webservice what will save you is having a good tracing/debugging story. From it, it will be easy to see E2E performance. Sure, some things will translate, but going SIMD optimized layout on a class, that will have to talk to DB across the ocean is a fool's errands.
To optimize you must have a good target and make sure you are measuring correctly.
Why do you think that making a minimum number of queries to a cloud database is anything other than conceptually similar to making a SIMD optimized layout on a class?
That's not what I was talking about. Optimizing queries to DBs is nothing like optimizing SIMD layout.
> do what you want it to do
Until a colleague or future you, or someone else calling the code, forgets.
Which leads to "I WANT TO KILL WHOEVER WROTE THIS" debugging sessions.
Encapsulation exists for a reason.
How can you have elements of different types? I don't understand how you would know which type a given node was in?
it was maybe easier to understand when @fieldParentPtr took the type as an argument, but if you look at this:
var x: *T = @fieldParentPtr("node", node_ptr);
@fieldParentPtr knows that its result must have type pointer T; and then the builtin function "looks" inside of T and notices there is a "node" field. Then it can backtrack the pointer of T given the pointer to the node field, since the layout of T is well-defined at compile time.So you could have a mixed-type linked list, and at the callsite zig can at compile-time back out the parent type. There obviously needs to be outside logic (and an outside mechanism to track) this with branches for each type that you could be putting into the linked list. And yes, this is more than a little bit type-unsafe because of the potential for type erasure.
You could probably pretty trivially write a typesafe wrapper for this, as suggested in one of the comments here.
Thanks. I still don't understand where the bits live that tell you which type T is for a given node.
Is this something like "list starts with type A, and that will tell you if the next node is type A or B"?
Is there a way to stuff those bits in with the node somehow so you always know the type you expect back?
> Thanks. I still don't understand where the bits live that tell you which type T is for a given node.
Yup. You'd have to set that up yourself. It could be anywhere. It could be in a separate array, it could be in the previous node-data, etc.
you wouldn't put it in the Node (capital N) datastructure though, that is privileged.
Personally, I would say doing something like this is "bad idea bears" unless you are absolutely certain you need it and you should explore using, say, data payload type with a union field instead, if possible (though know that that comes with a potential memory penalty)