The generic approach should be the same performance. This approach just lets you place your data in multiple lists without needing multiple allocations.
No, the generic approach requires your data to be spaced out further in memory (or to be heap allocated), which causes CPU cache misses and is slower. The entire reason for intrusive linked lists is performance. Standard linked lists are notoriously slow compared to almost any other similar data structure, which is why they are hardly ever used in real code (ArrayLists and vectors are much more common).
Maybe it requires this in Zig (I don’t know), but in general there’s no reason why you couldn’t allocate the nodes of an extrusive linked list from a pool residing on the heap or on the stack. For example, you could do this with the STL (for all that STL allocators are a pain to use in practice). Or you could have a slightly different API where you add nodes to the list by passing an entire Node<T> to the relevant function rather than just T, at which point you can trivially allocate the nodes as you please.