Item 43681256

9d • 5 days ago

> C doesn't try to save you from making mistakes. It has very few opinions about your code and happily assumes that you know exactly what you're doing. Freedom with responsibility.

I love C because it doesn't make my life very inconvenient to protect me from stubbing my toe in it. I hate C when I stub my toe in it.

oconnor663 • 5 days ago

> It has very few opinions about your code

I understand where this is coming from, but I think this is less true than it used to be, and (for that reason) it often devolves into arguments about whether the C standard is the actual source of truth for what you're "really" allowed to do in C. For example, the standard says I must never:

- cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)

- allow a signed integer to overflow

- pass a NULL pointer to memcpy, even if the length is zero

- read an unitialized object, even if I "don't care" what value I get

- read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that

All of these are ways that (modern, standard) C doesn't really "do what the programmer said". A lot of big real-world projects build with flags like -fno-strict-aliasing, so that they can get away with doing these things even though the standard says they shouldn't. But then, are they really writing C or "C with custom extensions"? When we compare C to other languages, whose extensions are we talking about?

1 reply

ryao • 4 days ago

  cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)

Use the union type. Abusing it for aliasing violates the standard too, but GCC and Clang implement an extension that permits this. Alternatively, just allocate a char array and cast it as you please. Strict aliasing does not apply to char arrays if I recall.

  allow a signed integer to overflow

Is this still true? I thought that the reason for this is because C left the implementation to define how signed arithmetic worked, meaning you could not assume two’s complement, but the most recent C standard was supposed to mandate two’s complement.

  pass a NULL pointer to memcpy, even if the length is zero

There is a reason for this. memcpy is allowed to start reading early as a performance optimization, before it does a branch that checks if reading is only. I do wonder what happens if you only want to copy 1 byte and that byte has invalid memory right next to it. Presumably, this optimization would read more than a byte.

  read an unitialized object, even if I "don't care" what value I get

You are probably doing something wrong if you do this. It is not even good as an entropy source.

  read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that

Earlier C standards likely did not say anything about this because they did not support multithreading, but outside of possibly reading/writing to hardware registers, you do not want to do this because of races. Even if you think you know better, you almost certainly do not.

3 replies

lifthrasiir • 4 days ago

> the most recent C standard was supposed to mandate two’s complement.

While that's true, overflows are not automatically wrapping because they instead may trap for several reasons. (C++ does require wrapping now in comparison. [1])

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf

> memcpy is allowed to start reading early as a performance optimization, [...]

Most modern memcpy implementations would branch on the length anyway, because word-based copying is generally faster than byte-based copying whenever possible. Also many would try SIMD when the copy size exceeds some threshold for the same reason.

>> read an unitialized object, even if I "don't care" what value I get

> You are probably doing something wrong if you do this.

The GP meant the case like this. Consider `struct foo { bool avail; int value; } foos[100];` where `value` would be only set when `avail` is true. If we are summing all available `value`s, we may want to avoid a branch misprediction by something like `accum += foos[i].avail * foos[i].value;` for each `foos[i]`, since the actual `value` shouldn't matter when `avail` is false. But the current specification prohibits this construction because it considers that each read from `foos[i].value` may be different from each other (!). In reality, this kind of issues is so widespread that LLVM has a special "poison" value which gets resolved to some fixed value after the first use.

1 reply

ryao • 4 days ago

Thanks for the explanations.

As for the last one, I would probably bzero() that structure, as it is faster than setting just 1 field to zero in a loop, which presumably is what you would do until you have some need to “allocate” a value. That would avoid the problem entirely.

I know bzero() was removed from POSIX, but “bzero()” is nicer to write than “memset() it to zero”.

quietbritishjim • 4 days ago

> > cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)

> Use the union type. Abusing it for aliasing violates the standard too, but GCC and Clang implement an extension that permits this. Alternatively, just allocate a char array and cast it as you please. Strict aliasing does not apply to char arrays if I recall.

I could be misreading, but you seem to be implying that you can trick the aliasing rules by casting Foo* to char* and then cast the char* to Bar*, but that still violates the rule. Even a union isn't allowed as a way of aliasing, but as you say it's often allowed in practice and is heavily used in the Linux kernel (and Linus has made his opinion on this part of the language standard very clear).

In theory, the right way to access the bits of a Foo as a Bar is to memcpy to a fresh Bar object, and then memcpy back if you want to update the original variable. The compiler is then allowed to optimise this into a direct access of the bits.

1 reply

ryao • 3 days ago

You are misreading. I said to take a char * and then cast it to whatever you want. You can cast it to struct A *. Then you can cast the original char * to struct B *. The compiler will be fine with this since the strict aliasing rule excludes char *.

If you insist on doing what you described, just skip char * and mark the pointer with __attribute__((may_alias)) and then it will be okay. That is a compiler extension that lets you turn off strict aliasing rules.

1 reply

quietbritishjim • 3 days ago

Ah, I see. Like this:

    char x[sizeof(struct Foo)];
    struct Foo* f = (struct Foo*)&x;
    struct Bar* b = (struct Bar*)&x;

1 reply

quietbritishjim • 2 days ago

(I can't edit so replying instead.) But this isn't allowed either. You can access a struct Foo variable through a char* pointer but you can't use struct Foo* to access an object whose actual type ("effective type" in the words of the standard) is char array. The standard says:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

https://www.iso-9899.info/n1570.html#6.5p7

I realise that many implementations will allow it anyway but if you're relying on that then you may as well fall back to a straight cast from Foo* to Bar*, which is also not allowed in theory.

RustyRussell • 3 days ago

> ryao 7 hours ago | parent | context | flag | on: Hacktical C: practical hacker's guide to the C pro...

  cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)

  allow a signed integer to overflow

>> pass a NULL pointer to memcpy, even if the length is zero

> There is a reason for this. memcpy is allowed to start reading early as a performance optimization, before it does a branch that checks if reading is only.

Where did you get this idea from? It's not possible, since you can hand an address at the end of an array, and length 0. The array ends at the end of a page.

You can't read extra bytes in this case!

1 reply

ryao • 3 days ago

Handing memcpy() the address at the end of an array and length 0 is undefined behavior. It is often said that the reason for this is to allow memcpy() to read before it branches to make it fast.

This lead me to think of the case where you hand it the address right before the end of a byte array where the byte after the last byte is an unmapped page and tell it to copy 1 byte. I suspect systems that have such an optimization would read beyond 1 byte into invalid memory. This is my criticism of the idea of having memcpy(NULL, NULL, 0) be undefined to make that speed trick legal. I am suggesting that an undefined number of low values to copy must also be undefined, yet they are not under the standard.

0xEF • 5 days ago

I've heard it put another way that I enjoyed: "C assumes you know what you're doing, which is only a problem if you don't know what you're doing."

1 reply

tialaramex • 5 days ago

Having spent many, many years paid to write C, and with no wish to write any more now than I learned Rust, I would suggest a rewording:

"C assumes you know what you're doing, which is only a problem because you don't know what you're doing."

Periodically, especially in r/cpp I run into people who are apparently faultless and so don't make the mistakes that make these languages dangerous, weirdly none of these people seem to have written any software I can inspect to see for myself what that looks like, and furthermore the universe I live in doesn't seem to have any of the resulting software. I choose to interpret this mystery as: People are idiots and liars, but of course there could be other interpretations.

2 replies

ryao • 4 days ago

I wonder if in a few years you will never want to write another line of Rust again like another developer I know who used to be enamored with Rust.

That said, I have not written perfect C code myself, but I have fixed a number of mistakes others made in their C code. Many of my commits to OpenZFS were done to fix others’ mistakes. A few of my commits even contained my own mistakes that I or others later caught. Feel free to inspect the codebase yourself. You should find it is a very well written codebase

psunavy03 • 5 days ago

> Periodically, especially in r/cpp I run into people who are apparently faultless and so don't make the mistakes that make these languages dangerous, weirdly none of these people seem to have written any software I can inspect to see for myself what that looks like, and furthermore the universe I live in doesn't seem to have any of the resulting software.

So basically Jeff Sutherland ever since he started talking about AI. "My AI agents have formed a Scrum team that's 30 times faster than any human developer!" Great, Jeff. Working in which company's production codebase?

1 reply

codr7 • 5 days ago

Yeah, well, as stated: software written by humans will have bugs.

The real danger with Rust is the cult like delusion that's not the case for them.

1 reply

tialaramex • 5 days ago

To be sure, my Rust has bugs in it, but none of them come close to the spooky nonsense that could happen in my C and yet the performance is excellent. Probably more than once a day Rust's compiler rejects code that an analogous C compiler would wave through - and maybe it'd survive testing too, at least for a while.

OCASMv2 • 5 days ago

No, it just makes it inconvenient to try to protect yourself from stubbing your toe in it.

1 reply

codr7 • 5 days ago

C doesn't make anything inconvenient, that's its major appeal. Some things are convenient by design, yes, but it's not trying to prevent you from doing anything. That's a feature.

1 reply

OCASMv2 • 5 days ago

> C doesn't make anything inconvenient

Other than writing memory safe code, as history has shown.

1 reply

codr7 • 5 days ago

Difficult, not inconvenient.

Because it allows things that are difficult, like writing your own memory allocators.

If you don't like working at that difficulty level, then C programming isn't for you. And that's fine.

1 reply

OCASMv2 • 5 days ago

It doesn't allow me to write my own memory allocator, it forces me to.

This line of argumentation reminds me of this:

Advertise and promote a shortcoming or a fault as a virtue.

For example, ultra-cheap single-use film cameras are advertised as "No Focusing Required." The truth is, no focusing is possible, because those cameras have cheap plastic fixed-focus lenses that won't move and can't be focused. What is a serious shortcoming for a camera — the inability to properly focus on the subject — is sold as a convenience: "You don't have to bother with focusing."

https://orangepapers.eth.limo/orange-propaganda.html#make_vi...

1 reply

codr7 • 4 days ago

No it doesn't, malloc() and free() is the default.

codr7 • 5 days ago

Oh, very much likewise, but there's always two sides to a coin.

neonsunset • 5 days ago

Usually stubbing your toe does not take your whole leg.