Item 43469732

criddell • 5 days ago

ASCII white space (in any encoding) by default with an optional user defined set of trim characters (like Python) would probably solve the needs of 90% of people rolling their own.

usrnm • 5 days ago

Not all unicode whitespace characters take up exactly one byte when encoded in utf8. Not even talking about other possible encodings, just good old utf8. Let that sink in a bit, and you'll realize what a can of worms it is in a language where strings are just byte sequences.

1 reply

criddell • 5 days ago

Because it's tricky is exactly why it should be in the standard library.

The C++ standard library should just incorporate ICU by reference IMHO.

1 reply

account42 • 5 days ago

ICU is an unreasonably large dependency for something that many users won't need. Its behavior also changes with new Unicode versions which makes it incompatible with something that cares as much as backward compatibility as the C++ standard library.

1 reply

criddell • 5 days ago

That’s the nature of Unicode: it’s complicated and a moving target.

As far as it being a large dependency, the beauty of C++ is that if you don’t use it, it won’t affect your build.

If ICU is too large, complex, and unstable for the C++ committee, then regular users don’t stand a chance.

1 reply

account42 • 5 days ago

> As far as it being a large dependency, the beauty of C++ is that if you don’t use it, it won’t affect your build.

That's the theory. In practice, you have things like iostreams pulling in tons of locale machinery (which is really significant for static builds) even if you never use a locale other than "C". That locale machinery will include gigantic functions for formatting monetary amounts even if you never do any formatting.

> If ICU is too large, complex, and unstable for the C++ committee, then regular users don’t stand a chance.

Regular users have more specific requirements and can handle binary compatibility breaks better if those aren't coupled with other unrelated functionality.