The thing I always loved about C was its simplicity, but in practice it's actually very complex with tons of nuance. Are there any low level languages like C that actually are simple, through and through? I looked into Zig and it seems to approach that simplicity, but I have reservations that I can't quite put my finger on...
The reality is, the only languages that are truly simple are Turing tarpits, like Brainfuck.
Reality is not simple. Every language that’s used for real work has to deal with reality. It’s about how the language helps you manage complexity, not how complex the language is.
Maybe Forth gets a pass but there’s good reason why it’s effectively used in very limited circumstances.
The perceived complexity from a semantic standpoint comes from the weakly-typed nature of the language. When the operands of an expression have different types, implicit promotions and conversions take place. This can be avoided by using the appropriate types in the first place. Modern compilers have warning flags that can spot such dodgy conversions.
The rest of the complexity stems from the language being a thin layer over a von Neumann abstract machine. You can mess up your memory freely, and the language doesn’t guarantee anything.
C is simple.
Representing computation as words of a fixed bit length, in random access memory, is not (See The Art of Computer Programming). And the extent to which other languages simplify is creating simpler memory models.
What about C is simple? Its syntax is certainly not simple, it's hard to grok and hard to implement parsers for, and parsing depends on semantic analysis. Its macro system is certainly not simple; implementing a C preprocessor is a huge job in itself, it's much more complex than what appears to be necessary for a macro system or even general text processor. Its semantics are not simple, with complex aliasing rules which just exist as a hacky trade-off between programming flexibility and optimizer implementer freedom.
C forces programs to be simple, because C doesn't offer ways to build powerful abstractions. And as an occasional C programmer, I enjoy that about it. But I don't think it's simple, certainly not from an implementer's perspective.
First (as in my other comment), the idea that C parsing depends on semantic analysis is wrong (and yes, I wrote C parsers). There are issues which may make implementing C parsers hard if you are not aware of them, but those issues hardly compare to the complexities of other languages, and can easily be dealt with if you know about then. Many people implemented C parsers.
The idea that C does not offer ways to build powerful abstractions is also wrong in my opinion. It basically allows the same abstractions as other languages, but it does not provide as much syntactic sugar. Whether this syntactic sugar really helps or whether it obscures semantics is up to debate. In my opinion (having programmed a lot more C++ in the past), it does not and C is better for building complex applications than C++. I build very complex applications in C myself and some of the most successful software projects were build using C. I find it easier to understand complex applications written in C than in other languages, and I also find it easier to refactor C code which is messed up compared to untangling the mess you can create with other languages. I admit that some people might find it helpful to have the syntactic sugar as help for building abstractions. In C you need to know how to build abstractions yourself based on training or experience.
I see a lot of negativity towards C in recent years, which go against clear evidence, e.g. "you can not build abstractions" or "all C programs segfault all the time" when in reality most of the programs I rely on on a daily basis and which in my experience never crash are written in C.
Huh? How are you supposed to parse a statement like 'x * y;' without some form of semantic analysis? You need to be able to look up whether 'x' has been declared as a variable or a type, and parse it as either a multiplication expression or a variable declaration. Am I wrong on this?
True. But this does not require full semantic analysis, it only requires distinguishing between typedef names and other identifiers. You can argue that this part of semantic analysis, but this would be rather pedantic. Tracking this could equally seen as part of parsing.
Parsing isn't too bad compared to, say, Perl.
The preprocessor is a classic example of simplicity in the wrong direction: it's simple to implement, and pretty simple to describe, but when actually using it you have to deal with complexity like argument multiple evaluations.
The semantics are a disaster ("undefined behavior").
> Parsing isn't too bad compared to, say, Perl.
This is damning with faint praise. Perl is undecidable to parse! Even if C isn't as bad as Perl, it's still bad enough that there's an entire Wikipedia article devoted to how bad it is: https://en.wikipedia.org/wiki/Lexer_hack
> The Clang parser handles the situation in a completely different way, namely by using a non-reference lexical grammar. Clang's lexer does not attempt to differentiate between type names and variable names: it simply reports the current token as an identifier. The parser then uses Clang's semantic analysis library to determine the nature of the identifier. This allows a simpler and more maintainable architecture than The Lexer Hack. This is also the approach used in most other modern languages, which do not distinguish different classes of identifiers in the lexical grammar, but instead defer them to the parsing or semantic analysis phase, when sufficient information is available.
Doesn't sound as much of a problem with the language as it is with the design of earlier compilers.
Unifying identifiers in the lexer doesn't solve the problem. The problem is getting the parser to produce a sane AST without needing information from deeper in the pipeline. If all have is `foo * bar;`, what AST node do you produce for the operator? Something generic like "Asterisk", and then its child nodes get some generic "Identifier" node (when at this stage, unlike in the lexer, you should be distinguishing between types and variables), and you fix it up in some later pass. It's a flaw in the grammar, period. And it's excusable, because C is older than Methuselah and was hacked together in a weekend like Javascript and was never intended to be the basis for the entire modern computing industry. But it's a flaw that modern languages should learn from and avoid.
C ain't simple, it's an organically complex language that just happens to be small enough that you can fit a compiler into the RAM of a PDP-11.
I would probably describe Perl as really complex to parse as well if I knew enough about it. Both are difficult to parse compared to languages with more "modern sensibilities" like Go and Rust, with their nice mostly context free grammars which can be parsed without terrible lexer hacks and separately from semantic analysis.
Walter Bright (who, among other things, has been employed to work on a C preprocessor) seems to disagree that the C preprocessor is simple to implement: https://news.ycombinator.com/item?id=20890749
> The preprocessor is fiendishly tricky to write. [...] I had to scrap mine and reimplement it 3 times.
I have seen other people in the general "C implementer/standards community" complain about it as well.
Each of these elements is even worse in every other language I can think of. What language do you think is simple in comparison?
Pascal (and most other Wirth languages) is better in most of these respects than C. Of course there are other flaws with Pascal (cf “Why Pascal is not my favorite programming language”), but it proves that C has a lot of accidental complexity in its design.
Go, Rust, Zig?
I'm curious, what language do you know of with a more complex macro system than the whole C preprocessor?
EDIT: To be clear to prospective downvoters, I'm not just throwing these languages out because they're hype or whatever. They all have a grammar that's much simpler to parse. Notably, you can construct a parse tree without a semantic analyser which is capable of running in lockstep with the parser to provide semantic information to the parser. You can just write a parser which makes a parse tree.
I've never written a parser for any of those languages but my intuition is that Go is easier to parse than C. The others are debatable. Rust macros are definitely not simpler than C macros. I'm not sure what could be simpler than text substition. Zig doesn't have macros and comptime is implemented as a language VM that runs as a compilation step(last I knew), so that's definitely not simpler. I don't use go often, but I don't think it has macros at all so that's definitely simpler.
When people say that C is a simple language, my interpretation is that they mean it is easy to interpret what a C program does at a low level, not that it is simple to write.
The other languages can be written by a parser. A parser for C needs a semantic analyzer working in tandem.
The C preprocessor is not text substitution.
It is not easy to describe what C does at a low level. There are simple, easy to describe and wrong models of what C does "at a low level". C's semantics are defined by a very difficult to understand standards document, and if you use one of those simple and enticing mental models, you will end up with incorrect C code which works until you try a different compiler or enable optimisations.
A parser for C does not need a semantic analyzer. What C does it allows semantic analysis to be integrated into the parser.
The preprocessor has some weird behavior, it it is also not very complicated.
And I would argue that the abstract machine model of C is still relatively simple. There are are certainly simpler languages in this regard, but they give up one of the key powers of C, i.e. that you can manipulate the representation of objects on a byte level.
By that argument the other languages mentioned are impossible to understand since they don't have a spec, except for Go again.
No. The other languages have documented semantics too. Just happens that C's are in the shape of a standards document.
It’s not really clear to me how you could have a simple low level language without tons of nuance. Something like Go is certainly simple without tons of nuance, but it’s not low level, and I think extending it to be low level might add a lot of nuance.
forth would come to mind, some people have build surprising stuff with it though I find it too low-level.
Lisp is build from a few simple axioms. Would that make it simple?
Lisp could be simple... but there's a lot of reasons it isn't.
It uses a different memory model than current hardware, which is optimized for C. While I don't know what goes on under SBCL's hood, the simpler Lisps I'm familiar with usually have a chunk of space for cons cells and a chunk of "vector" space kinda like a heap.
Lisp follows s-expression rules... except when it doesn't. Special forms, macros, and fexprs can basically do anything, and it's up to the programmer to know when sexpr syntax applies and when it doesn't.
Lisp offers simple primitives, but often also very complex functionality as part of the language. Just look at all the crazy stuff that's available in the COMMON-LISP package, for instance. This isn't really all that different than most high level languages, but no one would consider those "simple" either.
Lisp has a habit of using "unusual" practices. Consider Sceme's continuations and use of recursion, for example. Some of those - like first-class functions - have worked their way into modern languages, but image how they would have seemed to a Pascal programmer in 1990.
Finally, Lisp's compiler is way out there. Being able to recompile individual functions during execution is just plain nuts (in a good way). But it's also the reason you have EVAL-WHEN.
All that said, I haven't invested microcontroller Lisps. There may be one or more of those that would qualify as "simple."
Mostly we have eval-when because of outdated defaults that are worth re-examining.
A Lisp compiler today should by default evaluate every top level form that are compiles, unless the program opts out of it.
I made the decision in TXR Lisp and it's so much nicer that way.
There are fewer surprises and less need for boilerplate for compile time evaluation control. The most you usually have to do is tell the compiler not to run that form which starts your program: for instance (compile-only (main)). In a big program with many files that could well be the one and only piece of evaluation control for the file compiler.
The downside of evaluating everything is that these definitions sit in the compiler's environment. This pollution would have been a big deal when the entire machine is running a single Lisp image. Today I can spin up a process for the compiling. All those definitions that are not relevant to the compile job go away when that exits. My compiler uses a fraction of the memory of something like GCC, so I don't have to worry that these definitions are taking up space during compilation; i.e. that things which could be written to the object file and then discarded from memory are not being discarded.
Note how when eval-when is used, it's the club sandwich 99% of the time: all three toppings, :compile-toplevel, :load-toplevel, :execute are present. The ergonomics are not very good. There are situations in which it would make sense to only use some of these but they rarely come up.
So are entire branches of mathematics, and I feel safe in saying they are not "simple"
I would say rust. When you learn the basics, rust is very simple and will point to you any errors you have, so you get basically no runtime errors. Also the type system is extremely clean, making the code very readable.
But also C itself is very simple language. I do not mean C++, but pure C. I would probably start with this. Yes, you will crash at runtime errors, but besides that its very very simple language, which will give you good understanding of memory allocation, pointers etc.
Got through C and K&R with no runtime errors, on four platforms, but the first platform... Someone asked the teacher why a struct would not work on Lattice C. The instructor looked at the code, sat down at the students computer, typed in a small program compiled it, and camly put the disks in the box with the manual and threw it in the garbage. "We will have a new compiler next week." We switched to Manx C, which is what we had on the Amiga. Structs worked on MS C, which I thought was the lettuce compiler. ( Apparently a different fork of the portable C compiler, but later they admitted that it was still bigendian years later )
Best programming joke. Teacher said when your code becomes "recalcitrent", we had no idea what he meant. This was in the bottom floor of the library, so on break, we went upstairs and used the dictionary. Recalcitrant means not obeying authority. We laughed out loud, and then went silent. Opps.
The instructor was a commentator on the cryptic-C challenges, and would often say... "That will not do what you think it will do" and then go on and explain why. Wow. We learned a lot about the pre-processor, and more about how to write clean and useful code.
Lattice C (on the Amiga) was my first C compiler! Do you remember what the struct issue you ran into? This was a pretty late version... like 5.x.
Modula-2 is a language operating on the same level (direct memory addressing, no GC etc) but with saner syntax and semantics.
It's still a tad more complicated than it needs to be - e.g. you could drop non-0-based arrays, and perhaps sets and even enums.
It depends what you mean by simple. C still is simple, but it doesn't include a lot of features that other languages do, and to implement them in C is not simple.
C is simple for some use cases, and not for others.
> C still is simple
Syntactically, yes. Semantically, no.
There are languages with tons of "features" with far, far less semantic overhead than C.
https://blog.regehr.org/archives/767
FWIW, writing programs in C has been my day job for a long time.
Exactly. There is a lot happening implicitly in a C program that the programmer has to be aware of and keep in mind. And it’s made worse by valid compile implementation choices. I remember chasing a bug for a day that was based on me forgetting that the particular implementation I was working with had signed characters and was sign extending something at an inopportune time.
As someone who has had to parse C syntax for a living, I'd argue that it's not syntactically simple either. (Declarators are particularly nasty in C and even more so in C++).
Entirely my point. Simpler in some ways, more difficult in others. Totally depends on the use case
The appeal of C is that you're just operating on raw memory, with some slight conveniences like structs and arrays. That's the beauty of its simplicity. That's why casting a struct to its first argument works, why everything has an address, or why pointer arithmetic is so natural. Higher level langs like C++ and Go try to retain the usefulness of these features while abstracting away the actuality of them, which is simultaneously sad and helpful.
> The appeal of C is that you're just operating on raw memory ... why everything has an address, or why pointer arithmetic is so natural
That is just an illusion to trip unsuspecting programmers who have false mental models. Pointers are not addresses, and pointer arithmetic is rife with pitfalls. There is the whole pointer provenance thing, but that's more like the tip of the iceberg.
That is really the problem with C; it feels like you can do all sorts of stuff, but in reality you are just invoking nasal demons. The real rules on what you can and can not do are far more intricate and arcane, and nothing about them is very obvious on the surface level.
A typical C program of useful length typically includes a spattering of implicit type conversions that the programmer never intended or considered. It's the consequence of a feature that abstracts away how the type system and memory really[1] acts.
[1]for certain definitions of 'really'
> That's why casting a struct to its first argument works
Until WG14 makes everything you love about C "undefined behavior" in the name of performance.
> Until WG14 makes everything you love about C "undefined behavior" in the name of performance.
What do you mean?
I just looked up WG14 and I cannot see what you mean
A link perhaps? Am I going to have to "pin" my C compiler version?
Some people have this idea that when they write utter nonsense it should do what they meant because - ie they're missing out the whole discipline of programming and going straight from "I want it to work" to "It should work" and don't understand what they're doing wrong.
For some of these people WG14 (the C language sub-committee of SC22, the programming language sub-committee of JTC1, the Joint Technical Commitee between ISO and the IEC) is the problem because somehow they've taken this wonderful language where you just write stuff and it definitely works and does what you meant and turned into something awful.
This doesn't make a whole lot of sense, but hey, they wrote nonsense and they're angry that it didn't work, do we expect high quality arguments from people who mumble nonsense and make wild gestures on the street because they've imagined they are wizards? We do not.
There are others who blame the compiler vendors, this at least makes a little more sense, the people who write Clang are literally responsible for how your nonsense C is translated into machine code which does... something. They probably couldn't have read your mind and ensured the machine code did what you wanted, especially because your nonsense doesn't mean that, but you can make an argument that they might do a better job of communicating the problem (C is pretty hostile to this, and C programmers no less so)
For a long time I thought the best idea was to give these people what they ostensibly "want" a language where it does something very specific, as a result it's slow and clunky and maybe after you've spent so much effort to produce a bigger, slower version of the software a friend wrote in Python so easily these C programmers will snap out of it.
But then I read some essays by C programmers who had genuinely set out on this path and realised to their horror that their fellow C programmers don't actually agree what their C programs mean, the ambiguity isn't some conspiracy by WG14 or the compiler vendors, it's their reality, they are bad at writing software. The whole point of software is that we need to explain exactly what the machine is supposed to do, when we write ambiguous programs we are doing a bad job of that.
The premise "lol who needs memory safety at runtime, you get sigsegv if there's a problem no biggie, lets make it FAST and dont bother with checks" was the original horror. There are enough cowboys around that loved the approach. It's actually not so surprising such mindset became cancerous over time. The need to extract maximum speed devoured the language semantics too. And it is spreading, webassembly mostly inherited it.
I've said before that C is small, but not simple.
Turing Tarpits like Brainfuck or the Binary Lambda Calculus are a more extreme demonstration of the distinction, they can be very tiny languages but are extremely difficult to actually use for anything non-trivial.
I think difficulty follows a "bathtub" curve when plotted against language size. The smallest languages are really hard to use, as more features get added to a language it gets easier to use, up to a point where it becomes difficult to keep track of all the things the language does and it starts getting more difficult again.
Modula-2, Object Pascal, Oberon specially Oberon-07.
I would say Zig is the spiritual follower from the first two, while Go follows up the Oberon and Limbo heritage.
> but in practice it's actually very complex with tons of nuance
That's because computers are very complex with tons of nuance.