dvh 5 days ago

I unexpectedly did some cpp few days ago and I was surprised that cpp standard library doesn't have string trim function! Everybody is rolling their own. What is the reason behind that?

5
account42 5 days ago

What do you want to trim off? ASCII 0x20? Any ASCII white-space? Any Unicode white-space? Well the latter requires defined string encodings and depends on the Unicode version and you can't just use the latest without introducing subtle compatibility issues.

criddell 5 days ago

ASCII white space (in any encoding) by default with an optional user defined set of trim characters (like Python) would probably solve the needs of 90% of people rolling their own.

usrnm 5 days ago

Not all unicode whitespace characters take up exactly one byte when encoded in utf8. Not even talking about other possible encodings, just good old utf8. Let that sink in a bit, and you'll realize what a can of worms it is in a language where strings are just byte sequences.

criddell 5 days ago

Because it's tricky is exactly why it should be in the standard library.

The C++ standard library should just incorporate ICU by reference IMHO.

account42 5 days ago

ICU is an unreasonably large dependency for something that many users won't need. Its behavior also changes with new Unicode versions which makes it incompatible with something that cares as much as backward compatibility as the C++ standard library.

criddell 5 days ago

That’s the nature of Unicode: it’s complicated and a moving target.

As far as it being a large dependency, the beauty of C++ is that if you don’t use it, it won’t affect your build.

If ICU is too large, complex, and unstable for the C++ committee, then regular users don’t stand a chance.

account42 5 days ago

> As far as it being a large dependency, the beauty of C++ is that if you don’t use it, it won’t affect your build.

That's the theory. In practice, you have things like iostreams pulling in tons of locale machinery (which is really significant for static builds) even if you never use a locale other than "C". That locale machinery will include gigantic functions for formatting monetary amounts even if you never do any formatting.

> If ICU is too large, complex, and unstable for the C++ committee, then regular users don’t stand a chance.

Regular users have more specific requirements and can handle binary compatibility breaks better if those aren't coupled with other unrelated functionality.

tialaramex 5 days ago

I mean, you're a big grown-up language with generic programming, why can't you:

https://doc.rust-lang.org/std/primitive.str.html#method.trim...

C++ can't manage to do this because it doesn't give its primitive types methods, it doesn't have a sensible way to talk about methods on types, and it always coerces to function pointers...

    assert_eq!("123foo1bar123".trim_end_matches(char::is_numeric), "123foo1bar");
But it's pretty easy to at least do this:

    assert_eq!("11foo1bar11".trim_end_matches('1'), "11foo1bar");
(Yes Rust does provide one that always trims off trailing whitespace, but that requires that you know, as Rust does, what the encoding is)

eru 5 days ago

The problem here isn't so much that it's not in the standard library (not everything needs to be in the standard library), but that everyone is rolling their own instead of using third party libraries.

tialaramex 5 days ago

"Everybody" is the problem, there are two proper use cases for a standard library†

1. Vocabulary. Things Everybody will want to talk about. It's easier to communicate if we all agree this is a List<Goose> than if we first have to negotiate do we mean MyLinkedList or ArrayList or HybridStorage::List, and if we can't agree do we need an adaptor layer. Vocab is a reason the stdlib should provide a string type (if the language itself does not), a growable array, a hash table, etc. With generic programming you likely want some algorithms in here too, all & any, sum, that sort of thing.

2. Shared features Everybody will find they need and might otherwise screw up. Trimming trailing whitespace, turning numbers into strings and vice versa, sorting, basic arithmetic, familiar constants.

This should be in category (2). "Everybody" will need this once in a while.

† In C++ instead the standard library functions as a way to not bother with package management, this does have amusing effects like how FreeBSD will end up with a linear algebra library required to build the OS.

pjmlp 4 days ago

I do mostly like C++, yet the linear algebra library belongs on the same garbage box the anti-GUI folks have placed the graphics proposal on, even more so as it depends on something else, most likely not written in C++.

To add pain to the injure, imagine FreeBSD also needing a Fortran compiler to build the OS.

This is something for vcpkg and conan, not the standard library.

eru 5 days ago

Yes. How much you'll want to stick in the standard library and how much into other 3rd party libraries also depends on how easy it is to work with these 3rd party library.

Two examples:

Rust 3rd party library support is so strong, that its standard library itself is actually partially built on top of 3rd party libraries.

Python, especially in the bad old days eg 20 years, didn't have much support for (sane) package management and dependency management, so it was really convenient to have a standard library with 'batteries included'.

pjmlp 4 days ago

Thing is, while C and C++ might be a late comers to the whole cross platform package management, vcpkg and conan are already quite established, what we are having is folks stuck in the old ways.

jeroenhd 5 days ago

In a similar vein, I found out that Go doesn't have a string reverse function either. Everyone online pretends reversing strings is easy (just iterate through the array backwards! The world is US ASCII only, right?).

Trimming strings isn't hard in most real world applications, on the other hand, and not putting it in the standard library means people won't confuse the way the trim method works (i.e. the user must make a choice between copying memory or reusing memory and risking memory lifetime/consistency issues). And that doesn't even include problems like "what if the string isn't utf8".

I'm more disappointed in Go, which takes a ton of questionable assumptions in the standard library to pretend difficult problems are easy. C++ wants to be correct and knowing what is or isn't whitespace is hard when you don't know the length of a single grapheme.

zabzonk 5 days ago

Slightly OT:

Interview question(s): "Write a function to reverse a string/linked list"

Me, as interviewee: "You spend a lot of time reversing things, do you?"

I don't understand why people are so obsessed with this kind of thing. In my entire career, I don't think I ever felt the need to reverse anything - iterate backwards, perhaps.

bluGill 5 days ago

That is the point - nobody does this in the real world so you don't have the solution memorized. However doing it is "easy" enough that you can actually do it in an interview. More than once I've worked with someone who had a great resume with a lot of experience, but we quickly figured out once they were on the job that they couldn't write code (I was sometimes involved in the hiring decision, but I never did the hiring alone).

What the world is looking for in question like that is enough to figure out if you can program. Most people looking for a job have a lot of experience but they can't show you any code.

Any sane company in the US will only confirm the dates someone worked there and they "left on good terms" - they will not tell you if the person was any good. If they must fire someone HR will often offer to let the person write a resignation letter on the spot thus meaning the the person leaves on good terms - it is to your advantage overall to accept this offer - you can't sue for wrongful termination which protects them, but in turn they will say you left on good terms instead of giving a bad reference.

As such there is often no indication someone is bad and so they can jump from job to job despite being incompetent. Questions like this exist because you can solve it (at least a simplified version of ASCII only, if you need to work with unknown character set it gets hard)

zabzonk 5 days ago

It's easy to come up with questions they can't prepare for - example, for a C++/SQL database job:

1) Present them with your database schema, give them time to read and (at least partially) understand it. Allow questions. Give them a workstation.

2) Get them to write a SELECT statement to pull stuff out of two or three tables.

3) Get them to integrate the query into a small C++ program. Have the program write data out to a text file.

You can do this fairly realistic stuff for any technologies. Or, for C++, you could use my favourite interview question: "Tell me about the copy constructor".

Maxatar 5 days ago

I'd really rather not tell you our database schema.

Instead of expecting businesses to tell you domain specific things and then answer questions about them, please just understand some basic principles behind a large class of algorithms.

Almost all algorithm questions boil down to a simple principle, can you take a problem and break it down into its simplest form; the simplest linked list to reverse is the empty linked list or a linked list with 1 node.

Can you then build upon the simplest case to solve the next simplest case; reverse a multi-node linked list by reversing the tail and then appending the head to the result.

It really is unfortunate how many people, instead of trying to understand concepts, want to just memorize a bunch of hardcoded facts or trivia about programming languages or libraries. If you understand the basic principles, you can easily pick up minutia about C++ copy constructors or move constructors... but someone who has memorized a great deal of minutiae about C++ may never be able to understand some of the basic principles that broadly cover a multitude of data structures and algorithms.

zabzonk 5 days ago

> you can easily pick up minutia about C++ copy constructors

Hollow laughter. And if it were true (which it isn't) how well can you explain those "minutia"?

bluGill 5 days ago

That means you tell someone who you might not hire what your database schema is. Probably not something you want them to know. You also assume they know SQL - many C++ jobs only need minimal SQL knowledge and so you are fine with hiring someone who can write a select only with the help of google - but someone at that level wouldn't be able to solve your problem. I've spent a lot of time working in a language that was custom to the one company I worked for at the time - I can learn your language quickly (even C++ is not that hard - the dark corners means it takes years to become great but to be productive doesn't take very long), as such I don't want to force any particular language on the interview, I want something that proves they can write code.

zabzonk 5 days ago

> Probably not something you want them to know

Why not? But if your schema is so secret, come up with a simple one for use in interviews.

> You also assume they know SQL

I specifically said this was for a c++/sql job.

> so you are fine with hiring someone who can write a select only with the help of google

No, I'm not fine with that, even if it were do-able.

> I can learn your language quickly (even C++ is not that hard - the dark corners means it takes years to become great but to be productive doesn't take very long)

Wrongo. And not just for C++.

> I don't want to force any particular language on the interview, I want something that proves they can write code.

Obviously, we want very different things.

tmoertel 5 days ago

The goal of an interview isn't to get the candidate to write code that will be used in production. The goal is to observe the candidate doing something that predicts whether they're a viable hire. If a candidate cannot write a function to reverse a given sequence, especially in a situation where candidates have been led to expect that they'll be asked to do something just like that, then it becomes harder to believe that the candidate is a viable hire.

pjmlp 5 days ago

It is the closest to do a programmer casting.

I would rather have that question, instead of how many golf balls fit into a plane.

At least the former has something to do with programming.

billforsternz 5 days ago

Using my fingers I'm guessing a golf ball is about 3cm in diameter. A 737 or an A320 cabin is, again my guess/estimate, 30m long, 4m wide 2m high. So approx 30cm^3 into 200m^3. One million cm^3 in a m^3. I'm going with 5 million golf balls and hoping I'm right within an order of magnitude or so. I miss those kind of questions which have died out sadly.

pjmlp 5 days ago

And that helps proving someone is up to the task of writing a Website using Spring in a Kubernetes cluster in what manner?

billforsternz 5 days ago

Obviously it doesn't because it's more a test of reasoning ability and intelligence that specific domain skills. The theory is that smart programmers will be able to quickly pick up whatever specialised skills are needed for specific projects. Some people are good generalists. Others prefer to specialize. Employers are free to optimize for their circumstances and preferences I guess.

pjmlp 4 days ago

It works beautiful in practice, as proven by Google products source code.

zabzonk 5 days ago

Obviously, it doesn't. But having estimation skills (which is what the golf ball stuff is about) is always useful, specifically when doing testing. I wouldn't want to hire someone that couldn't estimate what the price of their supermarket trolley shop was.

pjmlp 4 days ago

How does that map to fibonacci sequences?

billforsternz 5 days ago

Just for fun I asked the question as a simple google search which these days of course triggers an AI answer. It made for rather sobering reading;

Query: "how many golf balls fit in a boeing 737"

Result:

Estimating, you could fit roughly 1.5 to 2 million golf balls inside a Boeing 737, depending on the specific model and how tightly they're packed.

Here's a breakdown of the estimation: Boeing 737 Dimensions: A Boeing 737 has a cabin volume of approximately 3,000 cubic meters. Golf Ball Volume: A golf ball has a volume of about 0.000004 cubic meters. Calculation: Dividing the cabin volume by the golf ball volume (3,000 / 0.000004) yields an estimated 750,000 golf balls. However, this calculation assumes the balls are packed perfectly, which is unlikely. Practical Considerations: In reality, you'd need to account for the space taken up by the plane's structure, seats, aisles, and other equipment, which reduces the usable space for golf balls. Final Estimate: Therefore, a more realistic estimate would be around 1.5 to 2 million golf balls, which accounts for the inefficiencies of packing and the space taken up by the plane's interior.

Commentary: There are so many problems with this.

A) The actual diameter of a golf ball is 4.3 cm, so its volume is 4/3 * pi * (4.3/2)^3 = 42 cm^3. There's definitely one million cubic centimetres in a cubic metre because it's (100cm)x(100cm)x(100cm). Dividing 42 by one million gives, 0.000042 cubic metres unless I'm going crazy. So approximately 0.00004 cubic metres not 0.000004 cubic metres, out by one order of magnitude.

B) Cabin volume is 3000 cubic metres. Really? Since it's about 4m wide by 2m high, it would have to be nearly 400 metres long for that to be true! Actual length 40 metres approx, actual volume 320 cubic metres approx. Out by one order of magnitude (again).

C) 3000 / 0.000004 = 750,000,000 not 750,000! This time the AI is just doing basic arithmetic and is out by three orders of magnitude. The actual calculation should be 300 / 0.00004 = 7,500,000. The various order of magnitude errors at each stage partially cancel each other, leaving us just one order of magnitude out.

D) Finally the various practical considerations it quite correctly raises means we should reduce the number of golf balls we estimate, but the AI goes the wrong way, increasing it by a factor of 3 or so from less that a million to 1.5 to 2 million.

Conclusion: It's a hallucination raised to the fourth power.

pjmlp 4 days ago

Indeed, at least it is on the right path to be hired by a wannabe FAANG startup.

jjmarr 5 days ago

If you want to reuse memory in C++, you'll either have to modify the string or return a string_view because strings must be null terminated (string_views are not). If you just chop off the last n-bytes of a string, it won't be a string anymore.

I personally use std::string_view as much as possible especially for compile-time constants. Then you can slice as much as you want without reallocating.

madduci 5 days ago

Exactly, same as for base64 encoding, sha256/512 hashes and many more.

pjmlp 5 days ago

In C++ frameworks it exists for ages.

Why not in ISO C++?

Welcome to the ways of ISO and committee driven development, apparently no one cared enough to submit a paper, and do the work to win the paper voting into the standard.