Item 43463195

kibwen • 6 days ago

> Parsing isn't too bad compared to, say, Perl.

This is damning with faint praise. Perl is undecidable to parse! Even if C isn't as bad as Perl, it's still bad enough that there's an entire Wikipedia article devoted to how bad it is: https://en.wikipedia.org/wiki/Lexer_hack

account42 • 5 days ago

> The Clang parser handles the situation in a completely different way, namely by using a non-reference lexical grammar. Clang's lexer does not attempt to differentiate between type names and variable names: it simply reports the current token as an identifier. The parser then uses Clang's semantic analysis library to determine the nature of the identifier. This allows a simpler and more maintainable architecture than The Lexer Hack. This is also the approach used in most other modern languages, which do not distinguish different classes of identifiers in the lexical grammar, but instead defer them to the parsing or semantic analysis phase, when sufficient information is available.

Doesn't sound as much of a problem with the language as it is with the design of earlier compilers.

1 reply

kibwen • 5 days ago

Unifying identifiers in the lexer doesn't solve the problem. The problem is getting the parser to produce a sane AST without needing information from deeper in the pipeline. If all have is `foo * bar;`, what AST node do you produce for the operator? Something generic like "Asterisk", and then its child nodes get some generic "Identifier" node (when at this stage, unlike in the lexer, you should be distinguishing between types and variables), and you fix it up in some later pass. It's a flaw in the grammar, period. And it's excusable, because C is older than Methuselah and was hacked together in a weekend like Javascript and was never intended to be the basis for the entire modern computing industry. But it's a flaw that modern languages should learn from and avoid.

C ain't simple, it's an organically complex language that just happens to be small enough that you can fit a compiler into the RAM of a PDP-11.