Item 43462159

mort96 • 6 days ago

What about C is simple? Its syntax is certainly not simple, it's hard to grok and hard to implement parsers for, and parsing depends on semantic analysis. Its macro system is certainly not simple; implementing a C preprocessor is a huge job in itself, it's much more complex than what appears to be necessary for a macro system or even general text processor. Its semantics are not simple, with complex aliasing rules which just exist as a hacky trade-off between programming flexibility and optimizer implementer freedom.

C forces programs to be simple, because C doesn't offer ways to build powerful abstractions. And as an occasional C programmer, I enjoy that about it. But I don't think it's simple, certainly not from an implementer's perspective.

uecker • 5 days ago

First (as in my other comment), the idea that C parsing depends on semantic analysis is wrong (and yes, I wrote C parsers). There are issues which may make implementing C parsers hard if you are not aware of them, but those issues hardly compare to the complexities of other languages, and can easily be dealt with if you know about then. Many people implemented C parsers.

The idea that C does not offer ways to build powerful abstractions is also wrong in my opinion. It basically allows the same abstractions as other languages, but it does not provide as much syntactic sugar. Whether this syntactic sugar really helps or whether it obscures semantics is up to debate. In my opinion (having programmed a lot more C++ in the past), it does not and C is better for building complex applications than C++. I build very complex applications in C myself and some of the most successful software projects were build using C. I find it easier to understand complex applications written in C than in other languages, and I also find it easier to refactor C code which is messed up compared to untangling the mess you can create with other languages. I admit that some people might find it helpful to have the syntactic sugar as help for building abstractions. In C you need to know how to build abstractions yourself based on training or experience.

I see a lot of negativity towards C in recent years, which go against clear evidence, e.g. "you can not build abstractions" or "all C programs segfault all the time" when in reality most of the programs I rely on on a daily basis and which in my experience never crash are written in C.

1 reply

mort96 • 4 days ago

Huh? How are you supposed to parse a statement like 'x * y;' without some form of semantic analysis? You need to be able to look up whether 'x' has been declared as a variable or a type, and parse it as either a multiplication expression or a variable declaration. Am I wrong on this?

1 reply

uecker • 4 days ago

True. But this does not require full semantic analysis, it only requires distinguishing between typedef names and other identifiers. You can argue that this part of semantic analysis, but this would be rather pedantic. Tracking this could equally seen as part of parsing.

pjc50 • 6 days ago

Parsing isn't too bad compared to, say, Perl.

The preprocessor is a classic example of simplicity in the wrong direction: it's simple to implement, and pretty simple to describe, but when actually using it you have to deal with complexity like argument multiple evaluations.

The semantics are a disaster ("undefined behavior").

2 replies

kibwen • 6 days ago

> Parsing isn't too bad compared to, say, Perl.

This is damning with faint praise. Perl is undecidable to parse! Even if C isn't as bad as Perl, it's still bad enough that there's an entire Wikipedia article devoted to how bad it is: https://en.wikipedia.org/wiki/Lexer_hack

1 reply

account42 • 5 days ago

> The Clang parser handles the situation in a completely different way, namely by using a non-reference lexical grammar. Clang's lexer does not attempt to differentiate between type names and variable names: it simply reports the current token as an identifier. The parser then uses Clang's semantic analysis library to determine the nature of the identifier. This allows a simpler and more maintainable architecture than The Lexer Hack. This is also the approach used in most other modern languages, which do not distinguish different classes of identifiers in the lexical grammar, but instead defer them to the parsing or semantic analysis phase, when sufficient information is available.

Doesn't sound as much of a problem with the language as it is with the design of earlier compilers.

1 reply

kibwen • 5 days ago

Unifying identifiers in the lexer doesn't solve the problem. The problem is getting the parser to produce a sane AST without needing information from deeper in the pipeline. If all have is `foo * bar;`, what AST node do you produce for the operator? Something generic like "Asterisk", and then its child nodes get some generic "Identifier" node (when at this stage, unlike in the lexer, you should be distinguishing between types and variables), and you fix it up in some later pass. It's a flaw in the grammar, period. And it's excusable, because C is older than Methuselah and was hacked together in a weekend like Javascript and was never intended to be the basis for the entire modern computing industry. But it's a flaw that modern languages should learn from and avoid.

C ain't simple, it's an organically complex language that just happens to be small enough that you can fit a compiler into the RAM of a PDP-11.

mort96 • 6 days ago

I would probably describe Perl as really complex to parse as well if I knew enough about it. Both are difficult to parse compared to languages with more "modern sensibilities" like Go and Rust, with their nice mostly context free grammars which can be parsed without terrible lexer hacks and separately from semantic analysis.

Walter Bright (who, among other things, has been employed to work on a C preprocessor) seems to disagree that the C preprocessor is simple to implement: https://news.ycombinator.com/item?id=20890749

> The preprocessor is fiendishly tricky to write. [...] I had to scrap mine and reimplement it 3 times.

I have seen other people in the general "C implementer/standards community" complain about it as well.

1 reply

pjc50 • 6 days ago

I wonder if we can dig out the original K&R preprocessor implementation?

1 reply

dbrower • 6 days ago

it was a lot simpler in capabilities. much of the complexity is because of feature creep.

1 reply

uecker • 5 days ago

What feature creep did the preprocessor have?

grandempire • 6 days ago

Each of these elements is even worse in every other language I can think of. What language do you think is simple in comparison?

2 replies

csb6 • 5 days ago

Pascal (and most other Wirth languages) is better in most of these respects than C. Of course there are other flaws with Pascal (cf “Why Pascal is not my favorite programming language”), but it proves that C has a lot of accidental complexity in its design.

1 reply

grandempire • 5 days ago

I agree pascal is simpler - but pascal also simplifies the memory model.

mort96 • 6 days ago

Go, Rust, Zig?

I'm curious, what language do you know of with a more complex macro system than the whole C preprocessor?

EDIT: To be clear to prospective downvoters, I'm not just throwing these languages out because they're hype or whatever. They all have a grammar that's much simpler to parse. Notably, you can construct a parse tree without a semantic analyser which is capable of running in lockstep with the parser to provide semantic information to the parser. You can just write a parser which makes a parse tree.

1 reply

unclad5968 • 6 days ago

I've never written a parser for any of those languages but my intuition is that Go is easier to parse than C. The others are debatable. Rust macros are definitely not simpler than C macros. I'm not sure what could be simpler than text substition. Zig doesn't have macros and comptime is implemented as a language VM that runs as a compilation step(last I knew), so that's definitely not simpler. I don't use go often, but I don't think it has macros at all so that's definitely simpler.

When people say that C is a simple language, my interpretation is that they mean it is easy to interpret what a C program does at a low level, not that it is simple to write.

1 reply

mort96 • 6 days ago

The other languages can be written by a parser. A parser for C needs a semantic analyzer working in tandem.

The C preprocessor is not text substitution.

It is not easy to describe what C does at a low level. There are simple, easy to describe and wrong models of what C does "at a low level". C's semantics are defined by a very difficult to understand standards document, and if you use one of those simple and enticing mental models, you will end up with incorrect C code which works until you try a different compiler or enable optimisations.

2 replies

uecker • 5 days ago

A parser for C does not need a semantic analyzer. What C does it allows semantic analysis to be integrated into the parser.

The preprocessor has some weird behavior, it it is also not very complicated.

And I would argue that the abstract machine model of C is still relatively simple. There are are certainly simpler languages in this regard, but they give up one of the key powers of C, i.e. that you can manipulate the representation of objects on a byte level.

unclad5968 • 6 days ago

By that argument the other languages mentioned are impossible to understand since they don't have a spec, except for Go again.

1 reply

mort96 • 6 days ago

No. The other languages have documented semantics too. Just happens that C's are in the shape of a standards document.