Item 42213454

mattmcknight • 6 days ago

It's weird because it is not a black box at the lowest level, we can see exactly what all of the weights are doing. It's just too complex for us to understand it.

What is difficult is finding some intermediate pattern in between there which we can label with an abstraction that is compatible with human understanding. It may not exist. For example, it may be more like how our brain works to produce language than it is like a logical rule based system. We occasionally say the wrong word, skip a word, spell things wrong...violate the rules of grammar.

The inputs and outputs of the model are human language, so at least there we know the system as a black box can be characterized, if not understood.

_heimdall • 6 days ago

> The inputs and outputs of the model are human language, so at least there we know the system as a black box can be characterized, if not understood.

This is actually where the AI safety debates tend to lose. From where I sit we can't characterize the black box itself, we can only characterize the outputs themselves.

More specifically, we can decide what we think the quality of the output for the given input and we can attempt to infer what might have happened in between. We really have no idea what happened in between, and though many of the "doomers" raise concerns that seem far fetched, we have absolutely no way of understanding whether they are completely off base or raising concerns of a system that just hasn't shown problems in the input/output pairs yet.