I compile a lot of C++ code from a lot of places, and the only time I run into code that somehow simply doesn't work on newer versions of C++ and where the developers aren't even sure if they will accept any patches to fix the issue as they claim it "isn't supported" to use a newer version of C++--even for the public headers of a library--is, you guessed it: code from Google.
Meanwhile, most of the C++ code from Google seems to be written in some mishmash of different ideas, always at some halfway point along a migration between something ancient and something passable... but never anything I would ever dare to call "modern", and thereby tends to be riddled with state machines and manual weak pointers that lead to memory corruption.
So... I really am not sure I buy the entire premise of this article? Honestly, I am extremely glad that Google is finally leaving the ecosystem, as I generally do not enjoy it when Google engineers try to force their ridiculous use cases down peoples' throats, as they seem to believe they simply know better than everyone else how to develop software.
Like... I honestly feel bad for the Rust people, as I do not think the increasing attention they are going to get from Google is going to be at all positive for that ecosystem, any more than I think the massive pressure Google has exerted on the web has been positive or any more than the pressure Google even exerted on Python was positive (not that Python caved to much of it, but the pressure was on and the fact that Python refused to play ball with Google was in no small part what caused Go to exist at all).
(FWIW, I do miss Microsoft's being in the space, but they honestly left years ago -- Herb's existence until recent being kind of a token consideration -- as they have been trying to figure out a tactical exit to C++ ever since Visual J++ and, arguably, Visual Basic, having largely managed to pivot to C# and TypeScript for SDKs long ago. That said... Sun kicking Microsoft out of Java might have been really smart, despite the ramifications?)
> code from Google.
I spilled my coffee, I was just talking the other day to some coworkers how I don't trust google open source. Sure they open their code but they don't give a damn about contributions or making it easy for you to use the projects. I feel a lot of this sentiment extends to GCP as well.
So many google projects are better than your average community one, but they never gain traction outside of google because it is just too damn hard to use them outside of google infra.
The only Google project that seems to evade this rule that I know of is Go.
> but they don't give a damn about contributions
Here is a concrete reason why Google open source sucks when it comes to contributions and I don't think it can be improved unless Google changes things drastically: (1) an external contributor makes a nice change and a PR on GitHub; (2) the change breaks internal use cases and their tests; (3) the team is unwilling to fix the PR or port the internal test (which may be a test several layers down the dependency tree) to open source.
> making it easy for you to use the projects
Google internally use Blaze, a version of Bazel. It's so ridiculously easy for one team to use another team's project that even just thinking about what the rest of us needs to do to use another project is unloved dreadful work. So people don't make that effort.
I do not see either of these two points changing. Sure there are individuals at Google that really care about open source community, but most don't, and so their project is forever a cathedral not a bazaar.
It is not only that, but often when google uses an open source project not owned by them they either try to take ownership of the project or fork it instead of trying to contribute to the original.
That's pretty common though? I mean isn't that part of the idea of open source? Forking is a pretty central part.
I don't see a problem here. Why should google have to deal with the opinions of a maintainer if they can just maintain their own version. Yeah obviously it would be nice if they'd contribute their changes back to the upstream repo but from a business perspective it's often not worth it.
At my company the inverse of this problem happened way more often: We find a problem but the maintainer just doesn't care. For example the backward-cpp library is a good example where the maintainer just isn't that active in the issues. Why wait for him to respond if you can just fork it and keep on moving.
Which cases did you have in mind? Seems like it should be easy to find half a dozen examples since you claim it happens often.
KHTML, officially discontinued in 2023. -- "Embrace, extend, and extinguish" (EEE) also known as "embrace, extend, and exterminate", is a phrase that the U.S. Department of Justice. It's also possible that President-elect Donald Trump may interfere with the DOJ's proposed remedies; he said on the campaign trail that a Google break-up may not be desirable since it could "destroy" a company that the US highly values.
The GP's complaint was that Google "took over projects" or "forked them without trying to contribute to the original".
In the case of KHTML, they never used it in the first place, so it seems like a particularly inappropriate example. I assume you actually meant Webkit? In that case, they spent half a decade and thousands of engineer-years contributing to Webkit, so it doesn't fit the original complaint about not "trying to contribute" either.
November 4, 1998; 26 years ago (KHTML released)
June 7, 2005; 19 years ago (WebKit sourced)
https://chromium.googlesource.com/chromium/src/+/HEAD/third_... * (C) 1999-2003 Lars Knoll ([email protected])
* (C) 2002-2003 Dirk Mueller ([email protected])
* Copyright (C) 2002, 2006, 2008, 2012 Apple Inc. All rights reserved.
* Copyright (C) 2006 Samuel Weinig ([email protected])
"...they never used it in the first place" I think the point is that KHTML was already forked into webkit by apple long before google came along (though, they have in fact also now forked webkit into blink).
Thank you, I rest my case. I didn't even need to bring up the DragonEgg cartel (Chandler?) going down the gcc-llvm-clang pathway used essentially for getting rid of the pesky GPL quoted above. With BSD-style, source code is no longer any of your business (not to mention chrome-chromium differences along the textbook AndroidTV tivoization).
> I didn't even need to bring up the DragonEgg cartel (Chandler?) going down the gcc-llvm-clang pathway used essentially for getting rid of the pesky GPL quoted above.
That's... not even close to what happened?
Historically, LLVM was at one point proposed by Chris Lattner, while he was at Apple, to be upstreamed into GCC (and relicensed to GPL, natch) for use as at the LTO optimization phase, which was declined. For most of its early existence, it used llvm-gcc as the frontend to generate LLVM IR. In the late '00s, serious effort was put into making a new frontend for LLVM IR which we know as clang, primarily by Apple at that point, which become self-hosting in 2009 or 2010. Basically the moment clang becomes self-hosting, everyone jumps ship from using llvm-gcc to using clang to make LLVM IR.
Google shows up around this time, I think primarily motivated by the possibility that Clang offered for mass rewriting capabilities, since it has extraordinarily good location tracking (compared to the other compilers available), which is necessary for good rewriting tools. The other major area of Google's focus at this time is actually MSVC compatibility, and I distinctly remember Chandler talking in one of his presentations that you need to be able to compile code to trust it well enough to rewrite your code, so I think the compatibility story here was mostly (again) for rewriting.
Also around this time, gcc gains proper plugin support, and llvm-gcc is reworked into dragonegg to take advantage of the proper plugin support. But because clang now exists, dragonegg is no longer very interesting, with almost all the residual attempts to use dragonegg essentially being limited to people trying to use it to get LLVM IR out of gfortran, as LLVM had no fully-working Fortran compiler at that point.
Again, that seems to be in no way demonstrating the pattern that was claimed to be happening often.
AFAIK Google did not take ownership of gcc, nor did they try to fork it without contributing to the original. They used GCC for a good couple of decades while contributing to it, but eventually switched to a different compiler. The same for clang, they neither "took it over" nor "forked it without trying to contribute".
https://web.archive.org/web/20241123183550/https://en.wikipe... https://web.archive.org/web/20241125065641/https://arstechni...
Ars is controversial on YC news. Who knew?
One could ask whether Google works ‘open source’ or more ‘source available’; the source is there but you cannot contribute, if you can build it at all
No, "open source" doesn't imply open contribution. The standard terminology is cathedral vs bazaar.
Just to add a different perspective: sometimes people mean Open Source[1] when they say "open source," and sometimes they don't.
Personally, I take the cathedral/bazaar distinction to indicate different development cadences and philosophies, rather than whether contributions are allowed/encouraged.
Various cathedral-style projects (eg: FreeBSD, Emacs) still actively take contributions and encourage involvement.
There's something even further along the spectrum that's "we provide dumps of source code, but don't really want your patches." I'm not sure what the best term is for that, but "source [merely] available" sometimes has that connotation.
The quintessential example for providing source and discouraging contributions is SQLite. Nobody would argue that it's merely source available. It is full open source.
In fact "source available" usually means you can see the source code, but there are severe restrictions on the source, such as no permission to modify the source even for your own use, or no permission to create forks of the project containing the modifications, or severe restrictions on such modifications. An example is MongoDB's Server Side Public License, which is source-available but not open source.
I think it depends on the contribution. I sent a bug report with a minimal test case. It was welcomed and quickly fixed. It is not a source code contribution, but I think it is a contribution.
OP is specifically talking about code contributions. You can (I have) make that type of contribution to proprietary software.
> sometimes people mean Open Source[1] when they say "open source," and sometimes they don't.
And when they don't when talking about source code, they are wrong. If someone says that an RJ45 cable is "a piece of software" because it's "soft" (you can bend it), would you say it's just a different perspective?
Open source, in the context of software, has a particular meaning. And it is the case that many software developers don't know it, so it's worth teaching them.
While I, too, believe that words should mean things, I don't think it's quite so cut-and-dry in this particular case. Part of the reason the term could not be trademarked was because it is too descriptive; it's easy for people to put those words together to describe software.
I agree that the OSI meaning is worth teaching. But perhaps not by saying "you're wrong; there is only one right way." Perhaps more like "some people attach XYZ specific meaning to that phrase, please be aware of it. Also, here is some history of the term if you like."
----
Aside: On re-reading this, I wonder if it comes across as testy... I think I am just channeling my annoyance with the language police of the world, in general, who sour people's interest in topics with their gatekeeping behavior. I don't mean it too personally towards you (:
To take a step back, it came from this comment:
> One could ask whether Google works ‘open source’ or more ‘source available’; the source is there but you cannot contribute, if you can build it at all
The author of this comment says "if you can't contribute, shouldn't you consider it `source available` instead of `open source`?".
There is only one valid answer: "No, you should not. It is still open source even if you cannot contribute". The context is clear, we are talking about "open source" vs "source available", which are both very specific in this context.
> I think I am just channeling my annoyance with the language police of the world, in general, who sour people's interest in topics with their gatekeeping behavior. I don't mean it too personally towards you (:
No offense taken, and I don't mean it personally either =). My point is just that in this context, the author of the comment was pretty clearly talking (asking, even?) about the difference between "open source" and "source available".
I don't even think it's shutting down the author: there was no other point than this, so the "thread" started by this author was purely about the meaning of those words.
Maybe you already know this and have discarded it (if so, no worries), but for what it's worth, this is my perspective on these things: Some people, in some contexts, use words like a laser — very specific, very targeted, with precise meanings, etc. Other people, other times (perhaps most people, most of the time?) use words more like ... a bucket of paint. Words are sloshy and approximate and about as precise as trying to sign your name using that bucket. Each has their value.
Inevitably, a laser-minded person talks with a sloshy-bucket person and misunderstandings ensue.
In sloshy-bucket land, I think "open source" has various connotations — a sense of community, encouraged contribution, being able to build it yourself, improve it yourself, etc.
And I think the commenter, in broad strokes, was saying that Google is not upholding those various virtues that are often associated with "open source," so felt the term was not a good (sloshy) fit.
In particular, I do not think they were asking the question you say they were asking.
In this space, it seems like there are both too many terms (so people rather just pick a popular one and over-apply it) and too few (so you can never find one that quite says what you want). Such is life, I guess. Maybe "open sourcey" would be good, to indicate it's talking about a hand-wavy vague "ness" rather than a particular nailed-down definition. "Google isn't being very open sourcey"? ¯\_(ツ)_/¯
Anyway, all this to say: in the ethos of trying to take a charitable interpretation of people's words, I think it's good to consider the bucket-of-paint possibility, before jumping to corrections and yes/no determinations.
----
Edit: It occurs to me that originally I misinterpreted you as being persnickety, when perhaps you were just trying to answer the question you felt they had asked. Sorry!
Note that I did not write the original answer: I answered to you :-).
> And I think the commenter, in broad strokes, was saying that Google is not upholding those various virtues that are often associated with "open source," so felt the term was not a good (sloshy) fit.
Totally valid! And I like the idea of considering the "bucket-of-paint" possibility before saying "no you're wrong". But on the other hand, sometimes it's worth agreeing on the meaning of words while discussing something.
I feel like I actually happen to regularly be on the bucket-of-paint side. I will often simplify the part of the discussion that I feel is not relevant by saying e.g. "okay this solution is bad, so if we look into this other solution we have to think about ...". And sometimes people really care about starting a discussion saying "by saying it's bad, you make it sound like whoever would think about it is stupid, and that's extreme. This solution is not necessarily bad, because in some situations it may work even though it is suboptimal". To which I tend to say "sure, I said it was bad as a way of saying that we seemed to agree that we would focus on the other one".
Until this point it's perfectly fine for me. What frustrates me is when the discussion continues in what I feel sounds like, e.g. "no, I think that your saying it is bad reflects that you disrespect whoever would think about it, and you should never have used that word in the first place. I am not sure I can ever have a meaningful discussion with you now that you used this word in this sentence, even if you later admitted that it was an oversimplification".
Anyway, communication is hard :-)
Googletest is the most widely used test library for C++. Googlemock is the only mocking library available that's reasonably feature complete.
I you are using googletest, you owe it to yourself to check out catch2 which I find much better and uses modern C++. There are a few other test frameworks in C++ that look better than google test as well, but catch2 is the one I settled on (and seems to be the best supported): feel free to check them out.
I've given up on mock frameworks. They make it too easy to make an interface for everything and then test that you are calling functions with the expected parameters instead of the program works as you want. A slight change to how I call some function results in 1000 failed tests and yet I'm confident that I didn't break anything the user could notice (sometimes I'm wrong in this confidence - but none of the failing tests give me any clue that I'm wrong!)
catch2 has become fairly bloated. doctest takes all of the best parts of catch2 without all the bloat and the end result is a test framework that is literally over 10x faster than catch2. It's also like 90% compatible with catch2 so porting your tests to it is pretty easy.
Especially if you have a build process that always runs your unit tests, it's nice to have a very fast test/compile/debug loop.
>catch2 has become fairly bloated. doctest takes all of the best parts of catch2 without all the bloat and the end result is a test framework that is literally over 10x faster than catch2. It's also like 90% compatible with catch2 so porting your tests to it is pretty easy.
I feel like you could make a madlib where you could plug in any two project names and this sentence would make sense.
Madlibs have become fairly bloated. Copypasta memes take all the best parts of madlibs without all the bloat and the end result is a form of mockery is literally over 10x faster than a madlib. It's also like 90% compatible with madlibs so porting your gibes is pretty easy.
I was just about to suggest doctest, you beat me to it! I'm all about faster compile times, and it was mostly a drop-in replacement for catch2 in my case.
Also, IMO, both doctest and catch2 are far superior to Google Test.
I've found exactly three places where I really want to have a mock available:
1) Databases and other persistent storage. Though in this case, the best mock for a database is generally another (smaller, easily snapshottable) database, not something like googlemock.
2) Network and other places where the hardware really matters. Sometimes, I really want to drop a particular message, to exercise some property of the sender. This is often possible to code around in greenfield projects, but in existing code it can be much simpler to just mock the network out.
3) Cases where I am calling out to some external black-box. Sometimes it's impractical to replicate the entire black-box in my test. This could be e.g. because it is a piece of specialized hardware, or it's non-deterministic in a way that I'd prefer my test not to be. I don't want to actually call out to an external black-box (hygiene), so some kind of a mock is more or less necessary.
For 1 have you looked at test containers?
Briefly, but frankly: copying small SQLite files around works so well in almost all cases that I don't feel the need for a new abstraction.
Sounds like the mocks are overused or used inappropriately in your experience (whether by a colleague or yourself).
Mocks have their place. A prototypical example is at user-visible endpoints (eg: a mock client).
I have found in my world it is easy to setup a test database (we use sqlite!) and the file system is fast enough (I have code to force using a different directory for files). I have been playing with starting a dbus server on a different port in my tests and then starting the real server to test against (with mixed results - I need a better way to know when dbus is running). I have had great success by writing a fake for one service that is painful - the fake tracks the information I really care about and so lets me query on things that matter not what the function signature was.
I'm not arguing that mocks don't have their place. However I have found that by declaring I won't use them at all I overall come up with better solutions and thus better tests.
Exactly! This one gets it, real communism has never been tried! On another note I do not think that it is tiresome at all, that any critique of any pattern/teqnique in SWE, always is meet with the "you are holding it wrong" rebutle.
Do you not believe it's possible to hold something wrong? If someone is a skilled and experienced golfer, it's quite believable that they won't automatically be a skilled tennis player after three months of tennis playing. If someone is an experienced race car driver, they won't automatically be a skilled member of a basketball team. "You must be holding it wrong" can sometimes take years of practising holding it right, not just minutes or months.
If a team of people who have been SWEs for decades reports that something helped their team, and you try it and it doesn't work, and you have been SWEs for decades, that doesn't automatically mean they are charlatans selling nonsense. They might all be basketball players playing together for 5 years and you might be a team of a baseball player, a racecar driver, a track and field athlete, and a water polo player, trying to play basketball from only reading about it, with nobody who has done it or experienced it, and several people who quietly don't want to be playing it and are just nodding along while hoping it fails. The conclusion that they are liars and it can't possibly work is not a strong conclusion.
When I look close I discover that those people who tried agile and found it worked either were on a much smaller projects with much simpler problems than large projects have; or they are not telling the full truth about agile. (sometimes both). I'm glad agile works for small projects, but it doesn't scale very well seems clear from all the large projects that have tried it and have gone back in major ways (generally not all the way back). The people who have failed projects still often sing the praises of agile, but we have no idea if the project would have failed if something else had been used.
I used to really like Google Test, and then Google decided in it's infinite wisdom to make the OSS version depend on their C++ shared library replacement Abseil, and not just that but the live at head version.
That makes sense internally for Google because they have their massive monorepo, but it sure as hell makes it a pain in the ass to adopt for everyone else.
I don't think you're reading those docs correctly. Googletest recommends living at head, but there's no reason you can't pin a release, either a git commit hash or a release label, of which there have been several. Googletest does not depend on the HEAD of abseil-cpp, it actually declares a direct dependency on an older LTS release of absl, but since you are building it from source any later release or commit of absl would work.
Google open source libraries are often a mess when you try to include more than one of them in the same project, but googletest isn't an example of the mess. It's actually pretty straightforward.
> Google open source libraries are often a mess when you try to include more than one of them in the same project
Completely agree. In isolation all of their libs are great, but inevitably I end up having to build Abseil from source, to then build Protobuf off of that, to then build gRPC off of that. If I can include the sanitizers under Google then that also becomes painful because Abseil (at least) will have ABI issues if it isn't built appropriately. Thinking about it I'd really just like a flat_hash_map replacement so I can drop Abseil.
Protobuf depending on Abseil (which has ongoing macOS build issues) is clinically insane. I tend to use protozero now which trades half a day’s boilerplate for two days’ build heartache.
Wouldn't it be even more insane if protobuf had its own distinct string splitting/merging routines, its own flags and logging libraries, etc?
No. Not at all. String splitting is a couple of lines' code. I don't want have to think about a logging framework just to read a protobuf - it can send stuff to stderr like everything else. If Google wants protobuf to be a widely accepted standard then it shouldn't require you to opt into their ecosystem to use it.
> Thinking about it I'd really just like a flat_hash_map replacement so I can drop Abseil.
boost has a flat_hash_map implementation for quite a few versions now, which from what I could see generally beat or is competitive with the absl implementation: https://www.reddit.com/r/cpp/comments/yikfi4/boost_181_will_...
The reddit thread mentions that the author was probably going to write a blog post about it at some point; I went and found it so you don't have to.
I was curious what exactly differentiates boost::unordered_flat_map from absl::flat_hash_map, and was not disappointed. It seems that the lion's share of the performance improvement comes from using more of the metadata for the reduced hash value, although there are a few other contributing factors.
The blog post further describes where absl::flat_hash_map performs better: iteration (and consequently erasure), which is ironic given those are a couple of areas where I always felt that absl::flat_hash_map was especially weak. But, it makes sense to double down on Abseil's strengths as well as its shortcomings.
https://bannalia.blogspot.com/2022/11/inside-boostunorderedf...
Iteration has been improved since, and now we’re beating Abseil on iteration plus erasure:
https://github.com/boostorg/boost_unordered_benchmarks/tree/...
Very cool!
I especially like how you can see the load factor across the graphs, where there are sharp downward spikes each time the map resizes, and how they vary as you move through the memory hierarchy.
I am curious what Abseil could learn from other modern hash map implementations, since my understanding is that the fundamental structure of its swisstables implementation hasn't changed meaningfully since 2017.
FWIW the flat hash map in Boost is now faster. I am not sure if integrating Boost is any easier for you.
I occasionally reconsider it so I can try a bunch of the FB alternatives (Folly, Thrift, CacheLib, etc.), but... yeah. Still just kind of waiting for a panacea.
It's been a few years to be fair, I stopped working with C++ in early 2021 or so so maybe I've just misremembered. I do remember having to take Abseil on where we previously didn't.
Google test and mock are quite powerful but are a big hit at both compile time and runtime, which matters for quick edit-compile-fix loops.
I still go back and forth on whether google test and mock are worth it.
Google benchmark is also nice.
> big hit at both compile time and runtime, which matters for quick edit-compile-fix loops
honestly if you write C++ for work, there's no excuse for your company to not give you the beefiest dev machine that money can reasonably buy. given that rust exists, I think "get a faster computer" is a totally valid answer to build times, especially now that skylake malaise era is over and CPUs are getting faster
> given that rust exists, I think "get a faster computer" is a totally valid answer to build times
I find this amusing because one of the main reasons i avoid Rust (in the sense that i prefer to build things written in other languages if possible - i don't mind if someone else uses it and gives me a binary/library i can use - and it never went beyond "i might check this at some point, sometime, maybe" in my mind) is the build times compared to most other compilers :-P.
Also, at least personally, if i get a faster computer i want my workflow to be faster.
You may want to add a '/s' at the end of your post there, because sarcasm doesn't really translate on the internet. The only way I can tell it's sarcasm is because nobody would really go 'throw away the old stuff, buy new stuff, waste more, pollute the oceans, consume, CONSUME!!!'.
Does it not support only running some or no tests? I only run the full test suite rarely, close to releases.
I blame monorepo culture. If it doesn't grow up in a context where it's expected to stand on its own, it crashes and burns when you kick it out of the nest.
I heard that Meta also has a monorepo but most of their open source projects are very community driven. I think it is corporate mandate thing, no resources to be spent on open source and not tracking open source contributions as part of career development.
Meta does have a monorepo but their open source stuff lives outside it. Or at least it did when I worked on PyTorch (2019). I did all my work in the separate open-source PyTorch repo and then commits got mirrored back to the monorepo by some automated process.
You could also build and run it using completely standard tools; you didn’t need to download random internal source control software etc. like you do for e.g. Chromium.
Curious about the organizational dynamics around this kind of decisions. There is no reason why google couldn't do the same.
I assume there is little will internally because everyone there is so focused on their performance reviews and helping external people using google open source projects is not tracked by that.
I think it's more of a strategic difference. Google seems like their long term planning involves thinking about open source less than Meta's. They're more wait-and-see about it.
React must've been destined to be open source from the get go: gotta create a mountain of js to hide in so the users can't strip out the malicious parts. Kubernetes on the other hand could've been internal forever and still would've made sense. It just happened to later make sense to open source it (it feels lopsided to me, like they kept certain parts secret. It wouldn't feel that way if they had planned it as OSS from the get go).
Tensorflow is/was decent. It looked like they made a lot of effort for it to be accessible for outsiders.
Have you tried building the damn thing ?
Nix build is still stuck in the one from 3-4 y back because bazel doesn't play well. Debian too has some issues building the thing...
As an industry we need to stop treating breaking changes as an acceptable thing. The rate of bit rot has accelerated to an absurd pace. I can't remember the package but I had to spend considerable time fixing a build because a package.. changed names.. for NO REASON. They just liked the new name better. This should be career death. You're wasting your fellow humans' time and energy on your vanity when you make a breaking change that is at all avoidable. I should be able to run a build script made 20 years ago and it should just work. No renamed package hunting, no WARNING WARNING DEPRECATED REWRITE ALL YOUR CODE FOR LEFTPAD 10.3 IMMEDIATELY in the console, no code changes, no fuss, we should expect it to just work. This state of affairs is a stain on our industry.
One day we will have bled enough and we'll switch to using cryptographic hashes of package contents (or of some recipe for deterministically building the thing on different architectures) instead of anything so flimsy as a name and version number.
For the humans, we can render the hashes as something friendly, but there's no reason to confuse the machines with our human notions of friendliness.
You’re basically describing nix and Guix.
They use a hash of the derivation and its inputs as a memoization strategy: providing yesterday's answer to today's question since it was asked yesterday. But so far as I know nobody's actually using those hashes for the initial request.
It's not like python will let you:
import nix.numpy-hsbdjd...8r5z2 as np
Such that the import mechanism ensures that the correct build of numpy is used.For that to work you'd have to change nix such that the hash did not digest parameters like `amd64-linux` witch indicated the system architecture (you'd want those to be satisfied at import time).
In Guix at least (I assume also nix) you can build things from source with a verified hash. I.e. write a numpy package definition that says download the Numpy source from this URL, and expect its hash to be equal to this string. You could then depend on that package from another package ensuring it uses a numpy built from that bit-for-bit exact source tree. Does that not amount to the same thing as what you want?
this is why you build to a specific version of a library. drop your build script into a container with the versions of software it expects and it should do fine. containerization is the admittance that versioning environments is needed for most software. I expect the nix/guix crowds to win in the end.
Blindly wrapping a build script in a Dockerfile is not nothing, but it's no replacement for being careful while writing that script in the first place.
Otherwise I agree, because if you must be careful, you might as well use tooling that's built for such care. But if you're doing that, do you need the Dockerfile? And that's how you end up with nix/guix.
Having tried on other platforms, it's not Bazwl, it's not even just Google.
It's python packaging and the way the only really supported binary distribution method of Tensorflow for many many years was to use Pip and hope it doesn't crash. And it's reflected in how the TF build scripts only support building python lib as artefact, everything else at the very least involved dissecting bazel intermediate targets
The issue with Microsoft until recently, has been the power of WinDev, which are the ones responsible for anything C++ on Microsoft dungeons.
Hence the failure of Longhorn, or any attempt coming out from Microsoft Research.
Ironically, given your Sun remark, Microsoft is back into the Java game, having their own distribution of OpenJDK, and Java is usually the only ecosystem that has day one parity with anything Azure puts out as .NET SDK.
I use the Microsoft JDK daily - to develop in Maui for Android. Other than that, I'm not too sure what anyone would use it for over the actual OpenJDK versions. I'm pretty sure the MS OpenJDK is mostly there to support pushing people to Azure (hence your observation) and Android. I don't think it is there for much else outside of that, but I'm happy to stand corrected if anyone has another use cas for it.
It was thanks to Microsoft that you get to enjoy the JVM on ARM for example, or better escape analysis.
https://github.com/microsoft/openjdk-aarch64
https://www.infoq.com/news/2023/02/microsoft-openjdk-feature...
Sure, but the first link is surely only benefiting those using Windows on ARM? I do have Windows on ARM on a MacBook under VMWare, but my daily usage of Windows is under x64. Second link - not really knowing much about Java I don't know enough to comment. 99% of my Java use is indirect because it only gets touched by MSBuild when compiling my APK from C#.
What is "WinDev"? A quick search didn't turn up much except a French Wikipedia article.
Windows Development, per opposition to DevDiv, Developer Division.
Two quite common names in the Microsoft ecosystem.
As a former MS employee some time ago I don't think I ever heard "windev". It was always referred to as "Windows". Though there were a lot of different groups within that, so sometimes you'd hear an initialism for a specific team. For example during some of my time there was a big organizational split between "core" and more UI oriented teams.
Here is an example in the press, with an email from Somasegar, leader of developer division in the past.
https://www.zdnet.com/article/microsoft-splits-up-its-xaml-t...
I was an employee in Windows on the date of that email. I left a few months later. Note that the email itself doesn't say "windev". It says "Windows" a bunch of times.
If I'm stretching this "windev" thing, the domain for a lot of employee accounts (including mine) was NTDEV, that had a longer history afaik, nobody called an org that..
The journalist writes it though, as do many other folks.
I didn't come up with this definition myself.
If I am not mistaken, I can probably even dig some Sinosfky references using it.
I think it was sort of externally derived based on "DevDiv", but as another former MS employee - albeit from DevDiv - I can confirm that "WinDev" is not something that was routinely used inside the company the way "DevDiv" is. Usually it's just "Windows", or "Windows org" if the context is ambiguous.
For a moment there I thought you were referring to this trademark: https://pcsoft.fr/windev/index.html Which was known at a time for having young women in light clothing in their marketing material.
aha, that's the windev that comes to mind too. I didn't know they were actually a french company, wild that they're still around... their ads were plastered everywere in the 2000s.
Apparently they have a programming language for which you can "one-click-switch" between english and french for the keywords??? https://pcsoft.fr/windev/ebook/56/
The C++ from Google that people in the outside world are seeing is not the C++ the article is talking about. Chromium and open sourced libraries from Google are not the same as C++ in Google3. I worked on both back in the day and ... There's slightly different style guides (not hugely different), but most importantly the tooling is not the same.
The kind of mass refactorings / cleanups / static analysis talked about in this article are done on a much more serious and large scale on C++ inside the Google3 monorepo than they are in Chromium. Different build systems, different code review tools, different development culture.
Going from g3 to AOSP has been downright painful. It was like suddenly working in a different company the contrast was so stark.
Interesting. I never worked in Android, but did in Chromium & Chromecast code bases. Biggest difference with Google3 was honestly in the tooling. Style guide was fairly close, maybe a bit more conservative. Also the lack of the core libs that eventually became Abseil.
I work full-time in Rust these days and everytime I go back to working in C++ it's a bit of a cringe. If I look long enough, I almost always find a use-after-free, even from extremely competent developers. Footgun language.
Whatever gave you the idea Microsoft "left" C++ years ago? It has massive code bases in C++ and continues to invest in its compiler teams and actively tracks the C++ standard. It was the first compiler to implement C++20 mostly completely, including modules, which other compilers have yet to catch up to. Like other mature companies, Microsoft realized decades ago that they can be a one-tech-dependent company and hence has code in C++ and .NET, and is now exploring Rust.
Cppwinrt is in maintenance mode[1]. Cppwin32 is abandoned (with windows.h as the official alternative). It is now possible to deploy WinUI 3 apps as single files in C#[2] but not in C++. From experience, the entire C++ side of WinUI 3 documentation is underbaked to the extent that the easiest approach is to read the C# documentation and attempt to guess the cppwinrt equivalent (as docs for cppwinrt are not really... there).
I don’t know if they’ve really abandoned C++ entirely—the compiler team certainly hasn’t, that’s true. But the above doesn’t feel like first-class support.
[1] https://github.com/microsoft/cppwinrt/issues/1289#issuecomme...
[2] https://learn.microsoft.com/en-us/dotnet/core/deploying/sing...
WinUI3 itself feels kind of abandoned. Heck, everything except desktop OS (which changes we neither need nor want) and cloud (where everyone has gone) feels a bit neglected.
C#/dotnet continues nicely, but the team is surprisingly small if you look closely.
Microsoft doesn't commit to UI frameworks in any language. By contrast, DirectX 11 and 12 (and Direct2D) are C++-native and have become core modules within NT. I don't think MS has abandoned C++, but the use case for C++ has shrunk considerably since the 1990s
If you go into Visual C++ developer blog, you will notice it has been all about Unreal support during the last year, and not much else.
Besides the sibling comments, officially Windows is going to go under some rewrites under the Secure Future Initiative, and the MSVC team has been reduced in resources, to the point now they are asking what features of C++23 people want to have.
https://developercommunity.visualstudio.com/t/Implement-C23-...
I suppose usually one would like to have everything from a language standard.
The C++20 winning run seems to have been one of a kind, and whatever made it possible is now gone.
Speaking of gone, Herb Sutter has left Microsoft and most certainly had to do something with whatever is going on MSVC, C# improvements for low level coding, and Rust adoption.
Being smart, well-educated, and knowing how to program isn't good enough for creating great code. It takes experience. I've been programming for 50 years now, and keep finding ways to make code more readable and more maintainable.
How do you find gimmicks from Bob Martin like (d + e*g) which in theory are great but to use it in practice would take loads of coaching?
I'm not familiar with that gimmick.
One thing I learned, for example, is do not access global immutable state from within a function. All inputs come through the parameters, all outputs through the parameters or the return value.
Global immutable or global mutable. I vehemently agree with the latter, but while I could definitely make a case for the former [1], I think it is a bit too extreme especially without language support.
Would you access a global M_PI constant? Or another function name? Or would you require every dependency to passed through?
[1] i.e. a total capability based system.
Global mutable state is to be avoided at all costs of course, but IMO global immutable state is to be avoided... at some costs.
The main issue comes in when you change (in the code! not as mutation!) the global immutable state and now you have to track down a bunch of usages. If it wasn't global, you could change it only in some local areas and not others.
You aren't likely to change M_PI to a new value (int 3 for performance?) so for pure constants, fine, global immutable state works. However many usages of global state are things like singletons, loggers and string messages that often eventually benefit from being passed in (i18n, testability etc.)
As to ergonomics, you can stuff all that global state into a single instance and have one more parameter that is passed around. It will still allow calls to eg change logging on their downstream functions much more easily than having singleton configuration.
As someone without a lot of experience (in my first dev job now), would you care to expand on this? Does this mean that you wouldn’t have a function fn() that manipulates a global variable VAR, but rather you’d pass VAR like fn(VAR)?
To expand on the other reply, some related things:
1. don't do console I/O in leaf functions. Instead, pass a parameter that's a "sink" for output, and let the caller decide what do with it. This helps a lot when converting a command line program to a gui program. It also makes it practical to unit test the function
2. don't allocate storage in a leaf function if the result is to be returned. Try to have storage allocated and free'd in the same function. It's a lot easier to keep track of it that way. Another use of sinks, output ranges, etc.
3. separate functions that do a read-only gathering of data, from functions that mutate the data
Give these a try. I bet you'll like the results!
I heartily agree with #2 if the language isn't Zig. Which actually supports your point: allocating in leaf functions is idiomatic in Zig, and it works out fine, because there's no allocation without an Allocator, and even if that's passed in implicitly as part of a struct argument, error{OutOfMemory} will be part of the function signature. So there's no losing track of what allocates and what doesn't.
This actually supports your broader point about always passing state to functions, and never accessing it implicitly. Although I don't know that I agree with extending that to constants, but maybe with another several decades of experience under my belt I might come to.
Zig also makes it easy for 'constants' to change based on build-specific parameters, so a different value for testing, or providing an override value in the build script. I've found that to eliminate any problems I've had in the past with global constants. Sometimes, of course, it turns out you want those values to be runtime configurable, but as refactorings go that's a relatively straightforward one.
> So there's no losing track of what allocates and what doesn't.
Having an allocator implicitly passed in with a struct argument is not quite what I meant. D once had allocators as member functions, but that wound up being deprecated because the allocation strategy is only rarely tied to the struct.
There are some meaningful differences between Zig and D in this specific area, specifically, D uses exceptions and has garbage collection as the default memory strategy. That will surely result in different approaches to the leaf-allocation question being better for the one than for the other.
> Give these a try. I bet you'll like the results!
It sounds like too many words to refer ro plain old inversion of control and CQRS. They're both tried and true techniques.
You've got the gist of it. By decoupling your function from the state of your application, you can test that function in isolation.
For instance, you might be tempted to write a function that opens an HTTP connection, performs an API call, parses the result, and returns it. But you'll have a really hard time testing that function. If you decompose it into several tiny functions (one that opens a connection, one that accepts an open connection and performs the call, and one that parses the result), you'll have a much easier time testing it.
(This clicked for me when I wrote code as I've described, wrote tests for it, and later found several bugs. I realized my tests did nothing and failed to catch my bugs, because the code I'd written was impossible to test. In general, side effects and global state are the enemies of testability.)
You end up with functions that take a lot of arguments (10+), which can feel wrong at first, but it's worth it, and IDEs help enormously.
This pattern is called dependency injection.
https://en.wikipedia.org/wiki/Dependency_injection
See also, the "functional core, imperative shell" pattern.
Yes. Global variables or singletons are deeply miserable when it comes to testing, because you have to explicitly reset them between tests and they cause problems if you multithread your tests.
A global variable is a hidden extra parameter to every function that uses it. It's much easier if the set of things you have to care about is just those in the declared parameters, not the hidden globals.
Cool I am just confirming my own bias against much of „clean code” teachings. That it might be a bit easier to read order of the operations - but no one uses it so it doesn’t matter.
There are lots of things that look like great methods, but experience with them often leads to disillusionment. For another example, Hungarian notation is a really great idea, heavily adopted by Microsoft Windows, and just does not deliver on its promises.
For example, types can have long names, but that doesn't work with HN. Changing a declaration to have a different type then means you've got endless cascading identifiers that need to be redone. And so on.
> Changing a declaration to have a different type then means you've got endless cascading identifiers that need to be redone.
This is actually a good thing, every mention of that identifier is a place that you might need to adapt for the new type. Hungarian notation is an excellent coping mechanism when you have to use compilers that don't do their own type checking - which used to be a huge issue when Hungarian notation was current.
On balance, it isn't a good thing. Having high refactoring costs means:
1. you become reluctant to do it
2. lots of diffs cluttering up your git history. I like my git history to be fairly narrowly targeted.
I don't use languages that don't do type checking. Microsoft uses Hungarian notation on their C interface and example code.
Could someone explain what this is since that expression is unsearchable?
So (d + e*g) is an example where if you do mathematical operations you put spaces between ones that will be lower rank and higher rank no spaces. This way you could a bit faster grasp which operation will be first so (2 + 3*4) you know first to evaluate 3*4 will be 12 and then you add 2 giving 14 - but given variable names of course you are quicker to evaluate result.
But no one has time to craft such details in the code.
I only have 20 years of development experience, so I'll defer to Walter here, but if I were to write that equation it would look like `d + (e * g)`. I don't trust mine or anyone's understanding of operator precedence. Just look at how ridiculously hard to read their implementations in parsers are.
Specifically d+e*g I might make an exception for in a code review (and allow it), since it's such a widely known precedence in mathematics you can expect the reader and writer to know the way it goes, but any more complex and I'd reject it in the review for lack of parentheses.
Operator precedence is so deeply burned into my brain I would never think of adding parens for it or modify the spacing.
I will use parens, however, for << and a couple other cases. It would be a shame to use lack of spacing to imply precedence, and yet get it wrong. Oops!
I also like to line up things to make vertical formatting of similar expressions, something a formatting program doesn't do. Hence I don't use formatters.
Parens were not the main part - main part is having multiplication without spaces and addition with spaces.
I would say it is a neat detail but if no one cares or uses it - it is pretty much "feel good about yourself" use and not practical one.
.. that seems like a strange optimization when there's a tool to indicate to both reader and compiler which operations will be performed first: brackets!
I second the observation of the state of Google C++. Just look at Chromium. There are a lot of unfinished refactoring there, as if people lost interest the moment the clean refactoring hit a roadblock requiring efforts to communicate with other teams. Only by a sort of direct order from the management things can be completed.
> Honestly, I am extremely glad that Google is finally leaving the ecosystem, as I generally do not enjoy it when Google engineers try to force their ridiculous use cases down peoples' throats, as they seem to believe they simply know better than everyone else how to develop software.
Well, you may be celebrating a bit prematurely then. Google still has a ton of C++ and they haven't stopped writing it. It's going to take ~forever until Google has left the C++ ecosystem. What did happen was that Google majorly scaled down their efforts in the committee.
When it comes to the current schism on how to improve the safety of C++ there are largely two factions:
* The Bjarne/Herb [1] side that focuses on minimal changes to the code. The idea here is to add different profiles to the language and then [draw the rest of the fucking owl]. The big issue here is that it's entirely unclear on how they will achieve temporal and spatial memory safety.
* The other side is represented by Sean Baxter and his work on Safe C++. This is basically a whole-sale adoption of Rust's semantics. The big issue here is that it's effectively introducing a new language that isn't C++.
Google decided to pursue Carbon and isn't a major playing in either of the above efforts. Last time I checked, that language is not not meant to be memory safe.
[1] https://github.com/BjarneStroustrup/profiles [2] https://safecpp.org/draft.html
(Carbon lang dev here.)
Carbon is intended to be memory safe! (Not sure whether you intended to write a double negative there.) There are a few reasons that might not be clear:
* Carbon has relatively few people working on it. We currently are prioritizing work on the compiler at the moment, and don't yet have the bandwidth to also work on the safety design.
* As part of our migration-from-C++ story, where we expect code to transition C++ -> unsafe Carbon -> safe Carbon, we plan on supporting unsafe Carbon code with reasonable ergonomics.
* Carbon's original focus was on evolvability, and didn't focus on safety specifically. Since then it has become clear that memory safety is a requirement for Carbon's success, and will be our first test of those evolvability goals. Talks like https://www.youtube.com/watch?v=1ZTJ9omXOQ0 better reflect more recent plans around this topic.
Not super familiar with Carbon but .. what's the elevator pitch for porting my C++ to unsafe Carbon? Can it be done with an automated refactoring tool or something?
I feel like if I'm gonna go through the whole nightmare of a code port I should get something for it as opposed to just relying on interop
The idea is that it is an incremental process. By default you should be able to make minimal changes to your code and it should mostly just work. Over time you can use features that more tightly couple you to Carbon, such as memory safety. Google's motivation is supporting its massive C++ codebase while providing a path for memory safety and other features. If your use case does not closely mirror that of Google's, namely, that you have 10+ year old code you intend, and have, to maintain, Carbon probably doesn't make sense for you and that is generally made pretty clear for anyone interested in the language.
Thanks for the correction, I appreciate it!
The double negative was not intended :)
People like to always talk about Carbon like that, yet the team is the first to point out anyone that can use something else, should.
Carbon is an experiment, that they aren't sure how it is going to work out in first place.
> "If you can use Rust, ignore Carbon"
https://github.com/carbon-language/carbon-lang/blob/e09bf82d...
> "We want to better understand whether we can build a language that meets our successor language criteria, and whether the resulting language can gather a critical mass of interest within the larger C++ industry and communit"
https://github.com/carbon-language/carbon-lang/blob/e09bf82d...
Carbon isn't currently memory safe, but Chandler Carruth has made it clear that every security expert he talked to says the same thing: memory safety is a requirement for security.
He at least claims that Carbon will have memory safety features such as borrow checking down the line. I guess we'll see.
It’s worrying to me that Carbon separates data races and memory safety as two distinct things when data races can easily cause both spatial and temporal memory safety issues. Similarly, type safety, can also cause spatial issues (e.g. many kernel exploits in Darwin were a result of causing type confusion for the SLAB allocator resulting in an exploitable memory safety issue).
The entire philosophy errs too much in the direction of “being reasonable” and “pragmatic” while getting fundamental things wrong.
> Over time, safety should evolve using a hybrid compile-time and runtime safety approach to eventually provide a similar level of safety to a language that puts more emphasis on guaranteed safety, such as Rust. However, while Carbon may encourage developers to modify code in support of more efficient safety checks, it will remain important to improve the safety of code for developers who cannot invest into safety-specific code modifications.
That’s really just paying lip service to Rust without recognizing that the key insight is that optional memory safety isn’t memory safety.
It is kind of neat just how much Rust has managed to disrupt the C++ ecosystem and dislodge its position.
In principle data races can cause memory safety issues, but they are usually very hard to exploit.
Java guarantees VM integrity in the face of data races, while for example many races are UB in theory and in practice in Go. Both are considered safe languages.
Sometimes pragmatism is in fact a valid goal.
edit: from a practical point of view I don't know how realistic is to retrofit memory safety to a language that lacks it.
An interesting claim to make. Do you have evidence to support that position?
I could just as easily offer a valid counter analysis to explain the data. They’re “hard” just because there’s so many easier avenues so attackers just often don’t bother not because they’re intrinsically unlikely. Let’s say you’re successful in eliminating temporal & spatial classes of failures completely (even Rust proponents do not claim this). They’ll focus on data races and type confusion next.
> Both are considered safe languages
Go is considered a memory safe language today because C and C++ are the anchor that we compare against and we have overwhelming evidence against it (but it’s also where a huge amount of value is in terms of the systems they underpin). In 50 years time, it’s not inconceivable that a lot of the languages may lose their memory safety designation if their runtimes continue to be written to C/C++ (Java) and/or data races remain unaddressed at the language level (Go) and we have overwhelming evidence that exploits just moved on to architectural defects in those languages.
It may raise costs of exploits but cybercrime is estimated to be a 10T dollar market next year so there’s clearly a lot of money to put towards exploits.
> from a practical point of view I don't know how realistic is to retrofit memory safety to a language that lacks it
I think letting people evolve into a more memory safe situation is good. I think doing it partially instead of tackling memory safety in all its forms is just asking for trouble - your attackers will be able to develop exploits more quickly than you are able to update all existing code to more secure language features.
> Herb side that proposes minimal changes
Herb is developing a whole second syntax, I wouldn't call that minimal changes. And probably the only way to evolve the language at this point, because like you said Sean is introducing a different language entirely, so its not C++ at that point.
I really like some of Herb's ideas,but it seems less and less likely they'll ever be added to C++
Have you seen some of his recent talks? Lots of underpinnings of cppfront have been added or are in committy.
He compares it to the JS/TS relationship.
Nope, that is mostly sales pitch, the only thing added thus far has been the spaceship operator.
He also sells the language differently from any other language that also compiles to native via C++, like Eiffel and Nim among others, due to conflict of interest to have WG21 chair propose yet another take on C++.
It's not really a valid comparison though. cppfront is a different language that just happens to be compatible with C++. ts/js is were ts is just js with types. You can comment out the types and it just runs. cppfront's language you'll actually have to re-write the code to get it to compile in C++
typescript
function add(a: number, b: number): number { return a + b };
javascript function add(a/*: number*/, b/*: number*/)/*: number*/ { return a + b };
cppfront add: (a: float, b: float): float = { a + b; }
cpp float add(float a, float b) { return a + b; }
> ts/js is were ts is just js with types. You can comment out the types and it just runs.
Is this true in the general case? I thought there were typescript features that didn't have direct JavaScript alternatives, for example enums.
Enums and namespaces are the only runtime features of TypeScript.
So, yes, you can't just strip types, but it's close.
Is there a comprehensive list of such incompatibilities documented somewhere?
That's not the same.
That guarantees that the types do not determine the output (e.g. no const enums), not that you can "strip" types to get the same output.
Not that I'm aware of.
Decorators would be another example. (Though they have always been marked experimental.)
And of course JSX, but that's not a TypeScript invention.
Do you realize that the Typescript example contains strictly more information than the Javascript one (namely, declarations for the type of three things) and is therefore more complex to compile, while the two C++ examples are semantically identical (the last expression in the function is returned implicitly without having to write "return") and the new syntax is easier to parse?
There are several semantic differences between Cpp1 and Cpp2. Cpp2 moves from last use, which is the biggest one. In a contrived example, that could result in a "hello world" changing to "goodbye world" or any other arbitrary behavior change you want to demonstrate. Cpp2 also doesn't require you to order functions and types or declare prototypes, which means partial template specializations and function overloads can produce similar changes when migrating from Cpp1 to Cpp2.
I've written a little demo here: https://godbolt.org/z/xn1eqd5zb
You can see where CPPFront inserts a `cpp2::move` call automatically, and how that differs from a superficially equivalent Cpp1 function.
yes, of course. That's not my point. My point is TypeScript succeeds because it's just JavaScript with types. It's not a new language. cppfront is an entirely new language so it's arguably going to have a tougher time. Being an entirely new language, it is not analogous to typescript
> He compares it to the JS/TS relationship.
OP is right, TypeScript is a whole new syntax, and it's shtick is that it can be transpiled into JavaScript.
This phenomenon is mostly because, as the article notes, Google has one of the largest C++ deployments in the world. And since much of the C++ code needs to be extremely platform-agnostic (any given library might be running in a web service, a piece of Chromium or Android, and an embedded smart home device), they tend to be very conservative about new features because their code always has to compile to the lowest-common-denominator (and, more importantly, they're very, very sensitive to performance regressions; the devil you know is always preferred to risking that the devil you don't know is slower, even if it could be faster).
Google can embrace modern processes, but the language itself had better be compilable on whatever ancient version of gcc works on the one mission-critical architecture they can't upgrade yet...
> I compile a lot of C++ code from a lot of places, and the only time I run into code that somehow simply doesn't work on newer versions of C++
I'm impressed that you even get as far as finding out whether that much C++ from disparate sources works on a newer version of C++. The myriad, often highly customized and correspondingly poorly documented build systems invented for each project, the maze of dependencies, the weird and conflicting source tree layouts and preprocessor tricks that many projects use... it's usually a pain in the neck to get a new library to even attempt to build, let alone integrate it successfully.
Don't get me wrong, we use C++ and ship a product using it, and I occasionally have to integrate new libraries, but it's very much not something I look forward to.
> riddled with state machines
Why is this bad? Normally, state machines are easy to reason about. The set of developers who say "I want to implement this logic as a state machine" is MUCH larger than the set of developers who say "I should make sure I fully understand every possible state and edge case ahead of time before making a state machine!"
> "I should make sure I fully understand every possible state and edge case ahead of time before making a state machine!"
Attempting to understand every state and edge case before writing code is a fool's errand because it would amount to writing the entire program anyway.
State machines are a clear, concise, elegant pattern to encapsulate logic. They're dead simple to read and reason about. And, get this, writing one FORCES YOU to fully understand every possible state and edge case of the problem you're solving.
You either have an explicit state machine, or an implicit one. In my entire career I have never regretted writing one the instant I even smell ambiguity coming on. They're an indefatigable sword to cut through spaghetti that's had poorly interacting logic sprinkled into it by ten devs over ten years, bring it into the light, and make the question and answer of how to fix it instantly articulable and solvable.
I truly don't understand what grudge you could have against the state machine. Of all the patterns in software development I'd go as far as to hold it in the highest regard above all others. If our job is to make computers do what we want them to do in an unambiguous and maintainable manner then our job is to write state machines.
> And, get this, writing one FORCES YOU to fully understand every possible state and edge case of the problem you're solving.
lol? ;P Here is just one example of a bug I know of that should exist, today, in Chrome, because, in fact, state machines are extremely hard to reason about. (I have filed previous ones that were fixed, and tons of bugs in Chrome in general are in this category. This one is top of mind as they haven't even acknowledged it yet.)
https://issues.webrtc.org/issues/339131894
Now, you are going to tell me "they are doing state machines wrong as they don't have a way to discriminate on what the state even is in the first place"... and yet, that's the problem: the term "state machine" does not, in fact, mean a very narrow piece of inherent algorithmic limitations any more than "regular expressions" implies the language is regular, as this is an engineering term, not a computer science one.
In the field, state machines just kind of happen by accident when a ton of engineers all try to add their own little corner of logic, and the state is then implied by the cross-product of the state of every variable manipulated by the edges of the machine. This results in a complete mess where, in fact, it is essentially impossible to prove that you've provided edges for every possible state. Nothing in the type system saves you from this, as the state isn't reified.
This contrasts with approaches to these problems that involve more structured concurrency, wherein the compiler is able to not only deduce but even some concept of what kinds of edges are possible and where the state lies. (This, FWIW, is a reason why async/await is so much preferable to the kind of callback hell that people would find themselves in, maintaining a massive implicit state machine and hoping they covered all the cases.)
>State machines are a clear, concise, elegant pattern to encapsulate logic. They're dead simple to read and reason about. And, get this, writing one FORCES YOU to fully understand every possible state and edge case of the problem you're solving.
It doesn't force you to do that at all.
You can start piling in hacks to handle edge cases inside of certain states, for instance, instead of splitting them into their own states. Or the next dev does.
Now it's an implicit ball of mud that pretends to be something else and has a execution pattern that's different from the rest of your company's business logic but not actually strictly "correct" still or easier to reason about for the edge cases.
And that's what most people do. They don't use it as a tool to force them to make things unambiguous. They bail when it gets hard and leave crappy implementations behind.
Copy-pasting from another reply to a different comment: As a simple example of something that's often left out in a way that fucks up a lot of devs' attempts at state machines, and is super annoying to draw in a typical state diagram: the passing of time.
This just hasn't been my experience but I suppose it's possible if your team is determined enough to write bad code. I'd still wager a bungled state machine is probably fairly easier to fix than a bungled mess of branches, but I've never seen such a thing.
I actually use passage of time as a factor in state machines all the time on game dev projects. It's pretty simple, just store a start time and check against it. I don't see how "ten seconds have passed since entering state A" is a more difficult condition than any other to draw or model.
In my experience Ye Olde Web App backend services tend to be particularly bad with time because so much is generally done in the request/response model.
For business-logic reasons, where I've generally seen it fall apart is when things go from fairly simple things like "after six months of inactivity this account is considered idle" to more complex interactions of timers and activity types. "Move fast and break things", "just get to MVP" attitudes rarely have the discipline to formally draw all the distinct states out as the number of potential states starts to exceed a couple dozen.
The times I’ve bothered to write explicit state machines have created the most solid, confident and bug-free pieces of software I’ve ever built. I would send someone to the moon with them.
Couldn't this be said about any alternative solution? I fail to see how this is specific to state machines.
What do you suggest instead of a state machine?
The "riddled with state machines" from the post I was replying to, while sounding negative, is at least better than the "single state machine" which is probably combinatorially huge and would be impossible to maintain.
My rough rule of thumb based on experience is that if the state machine being a state machine is visible outside of it's internal implementation (compared to just an interface with operational methods that don't hint at how things are managed behind the scenes) it's probably too leaky and/or incomplete.
I would trust code with extensive state-transition testing (regardless of internal implementation) - I wouldn't trust code that claimed to implement a state machine and didn't have that testing, or extensive documentation of edge cases and what was left out of the state machine.
As a simple example of something that's often left out in a way that fucks up state machines: the passing of time.
Like properly model a domain in domain terms?
And that won't be a state machine with the states having more fancy names?
It will be, but the idea of having an overview over the states is gone then. There is just modules-> objects with the transitions being method calls. Nobody will have to know all the things about all the state transitions, resulting in another problem (dys)solved by architecture obscurity.
If needs be the state-machine can be reconstructed on a whiteboard by a team of five.
A state machine makes the actual program state first class and easy to reason about. One does not even need mutable state to model one. Whereas you appear to be advocating mutable objects. The state space then becomes a combinatorial explosion of all the hidden mutable state "encapsulated" inside the objects. Object oriented programming is not the only way and often leads to a poor domain model. Some OOP evangelists even model a bank account with a mutable balance field and methods for making deposits. This is a absolutely not a faithful model of the domain (ledgers have been used for hundreds/thousands of years). In summary, yes a state machine can absolutely be a good domain model.
I don't follow the connection you're making.
State machines often are implemented with mutable objects.
And one does not need mutable objects to make "modules-> objects with the transitions being method calls". Every method call could return a fresh, immutable object, nothing requires mutation there.
I'd see a method like:
`TransitionTo(newState)`
as a major smell compared to an explicit
`TransistionToNewState`
and I think OOO can be helpful (hardly required, of course) in that one neat way of streamlining usage of your code is that if you're implementing objects then the object for something in "State A" might not even have "TransitionToStateC" if that's not a valid operation.
(No, you don't HAVE to write state machine code that allows you to ask for invalid operations, but it's a common pattern I've seen in real code and in online discussion/blogs/stack overflow.)
Objects and "method calls" generally implies mutable state to me, but yes the parent was not explicit about this. I assumed mutable (implicit) state was being argued in favour of an explicit state representation. Perhaps I misunderstood.
For a state machine, I would expect a function such as:
transition : (old_state, event) -> new_state
Or if we use immutable objects, and one method per simple event, then something like:
transition_event1 : () -> new_state
Which I think is similar to what you hace. So I think we are in agreement here.
> I assumed mutable (implicit) state was being argued in favour of an explicit state representation.
I definitely was not: I would argue for structured logic rather than implicit state. The idea you are discussing seems to be more about imperative vs. functional design, and that would also be a lot better... but these are Google engineers managing a million interacting state machines via a giant pile of global (mutable) state, resulting in tons of bugs such as this one that isn't even fixed yet:
https://issues.webrtc.org/issues/339131894
A reply I just left elsewhere on this thread, noting that "state machine" doesn't imply the clean perfectly-factored concept you'd learn in a computer science class: these are ad hoc state machines that result from having a number of developers who don't really care and who think they can "dead reckon" the state of the system if they just add enough transition functions.
> these are Google engineers managing a million interacting state machines via a giant pile of global (mutable) state
Yeah that doesn't sound good. I understand the point you are making now and agree.
The problem is, that the state transition information is usually a complex set of events unleashed upon the little world build into objects. Its basically a small reality simulation, parametrized by all sides, similar to a physics simulation with the external world filtered being input.
Now, if we said to someone to model a weather prediction via state-machine- that would be obvious madness. But if we take a small object (like a cubic meter of air) and modelled that part to handle inputs and transfer forces, that generic statemachine will do the job- because the atmospheric ocean of statemachines knows more about the whole system, than a single statemachine.
My point is- there is state in a system, that is not explicitly modeled.
It's interesting to know about what state machines you talk. From my experience most of the time it's an entity with state property with finger countable cardinality. And state is assumed to be changed directly. And it's not easy to reason because author only heard about state machines and state transitions are spread over all code base.
I was talking in the most general sense. I am sure there are state machine implementations that are terrible to reason about, especially any that emerge from a codegen tool. But hopefully they are the exception and not the rule.
Woah, this caught my eye:
> especially any that emerge from a codegen tool
Can you give an example? Implement as a state machine? But. Your program exists as a set of transforms upon memory. Your program is a state machine! You just need to define the proper morpisms to map your problem domain to the computer domain.
Transformations are separable by principle, it's a fundamental property of them that state machines have as an afterthought that is even hard to represent.
It doesn't matter if they have equivalent power. One of those representations fundamentally allows your software to have an architecture, the other doesn't.
How much of software architecture is required because of the architecture? If your program has types that are the possible states, and functions to transform between those states, what architecture is needed beyond that? A grouping of related types, perhaps?
Yeah, just one layer of functions is enough for everybody.
Let's look next at that "compiler" thing and high-level languages. The hardware-native one suffices, no need for all that bloat.
I have a coding problem.
I'll use a state machine!
Now, I have two problems :-(
I've never understood this claim. I find state machines very hard to follow because there's no easy way to tell what paths lead to a given state; they're like using goto instead of functions (indeed they're often implemented that way).
Please describe "normally". State machines can turn into nightmares, just like any design pattern used poorly.
State machines don't have syntax for "transition here when event is encountered no matter what state you are in" so the whole diagram becomes a spaghetti mess if you have a lot of those escape hatches.
> State machines don't have syntax for "transition here when event is encountered no matter what state you are in" so the whole diagram becomes a spaghetti mess if you have a lot of those escape hatches.
I place a note at the top of my diagrams stating what the default state would be on receipt of an unexpected event. There is no such thing as "event silently gets swallowed because no transition exists", because, in implementation, the state machine `switch` statement always has a `default` clause which triggers all the alarm bells.
Works very well in practice; I used to write hard real-time munitions control software for blowing shit up. Never had a problem.
> hard real-time munitions control software for blowing shit up. Never had a problem.
Ha, Ha, Ha! The juxtaposition of these two phrases is really funny. I would like to apply for a position on the Testing team :-)
> Ha, Ha, Ha! The juxtaposition of these two phrases is really funny. I would like to apply for a position on the Testing team :-)
It had its moments: used to go to a range where we'd set off detonators. Once or twice in production on site where we'd set off actual explosives.
State machines don't have a native syntax in C++ at all, so you can structure them however you want. It's easy to structure a state machine, if needed, so that all (or some) states can handle the same event in the same way.
I always thought this framework was neat: https://doc.qt.io/qt-5/statemachine-api.html
Downside of course is now you have a dependency on qt.
The downside is that you're now heap allocating at least one object for every state, and I'm willing to bet that each QState has an associated std::vector-style list of actions, and that each action is also its own object on the heap.
If you can afford to do things like this you can most likely use something other than C++ and save yourself a lot of headaches.
> If you can afford to do things like this you can most likely use something other than C++ and save yourself a lot of headaches.
Surely you can understand that, despite the recent c++ hate, my job doesn't give a fuck and we aren't migrating our massive codebase from c++ to... anything.
Switch + goto is very close to being a native syntax for state machines. It's also very efficient.
I believe HSMs can model this, but don't quote me. :)
Yes, of course in theory nested state machines should be able to model this. I feel like adding more complexity and bending the rules is a bit of a concession.
Back in the days we implemented HSM helper classes in about 500 LoC and generated them from Enterprise Architect. No need to write a GUI yourself, but better to have a visual for documentation and review. Worked very well until we replaced EA with docs-as-code, now I miss that there is no nice simulator and Modeler for that workflow.
They can be. Or they can be... less easy.
Imagine you have an informally-specified, undocumented, at-least-somewhat-incomplete state machine. Imagine that it interacts with several other similar state machines. Still easy to reason about?
Now add multithreading. Still easy?
Now add locking. Still easy?
Cleanly-done state machines can be the cleanest way to describe a problem, and the simplest way to implement it. But badly-done state machines can be a total mess.
Alas, I think that the last time I waded in such waters, what I left behind was pretty much on the "mess" side of the scale. It worked, it worked mostly solidly, and it did so for more than a decade. But it was still rather messy.
> Imagine you have an informally-specified, undocumented, at-least-somewhat-incomplete state machine. Imagine that it interacts with several other similar state machines. Still easy to reason about?
You think that developers that wrote an informally-specified, undocumented, at-least-somewhat-incomplete state-machine would have written that logic as a non-state-machine in a formally-specified, documented and at-least-somewhat-complete codebase?
State-machines are exceptionally easy to reason about because you can at least reverse-engineer a state-diagram from the state-machine code.
Almost-a-state-machine-but-not-quite are exceptionally difficult to reason about because you can not easily reverse-engineer the state-diagram from the state-machine code.
In fact state machines are great for documentation even if the code is not explicitly written as a state machine!
Yes, and it's much better than having a dozen or more `bool` values that indicate some event occurred and put it into some "mode" (e.g. "unhealthy", "input exhausted", etc) and you have to infer what the "hidden state machine is" based on all of those bool values.
Want to add another "bool state"? Hello exponential growth...
But that is just true of any problem-solving/programming technique.
In general, state/event machine transition table and decision table techniques of structuring code are easier to comprehend than adhoc and even worse, poorly understood pattern-based techniques are.
> Now add multithreading. Still easy?
> Now add locking. Still easy?
Don't do that then.
Or rather, either manipulate the state machine from only a single thread at a time; or explicitly turn the multithreading into more states. If you need to wait for something instead of having "do X" you transition into state "doing X". C#-style async does this in a state machine behind the scenes.
> riddled with state machines
What's wrong with state machines? Beats the tangled mess of nested ifs and fors.
That depends on your problem. I've seen useful state machines. I've seen someone implement a simple decoder as a complex any-to-any state machine that couldn't be understood - a single switch statement would have been better. Nothing about state machines, but some people have a hammer and are determined to prove it can drive any screw - it works but isn't how you should do it.
I've adopted a rule of thumb to have a very low bar to skip straight to writing a state machine. I've never once regretted it, personally. I'm sure they can be misused but I haven't came across that.
> Like... I honestly feel bad for the Rust people, as I do not think the increasing attention they are going to get from Google is going to be at all positive for that ecosystem
We are just now feeling this. Some original contributors left the field, and lately the language has went in directions I don't agree with.
But Google is not even the first. Amazon has had their eyes in Rust for quite some time already.
As an outsider, I'm curious what directions those are. Are you referring to effects or keyword generics or something else?
Endless bikeshedding about `Pin` would be one example. I'm also not sure keyword generics are the correct way.
The discussions around 'Pin' are the opposite of bikeshedding. It's not about what color to pick for the shed, it's about reworking the feature to make it hopefully easier to reason about and use.
I think the article is pretty interesting. There are so many more interesting takes than just another boring Hacker News moan about Google.
The technical pressure exerted on Python (which was resisted) is one thing. The social pressure incubated the most radical culture warriors the Internet has ever seen and its proponents have ruined the Python organization, driven away many people and have established a totalitarian and oppressive regime.
Interestingly, Google has fired the Python team this year. The revolution eats its own?
Anyway, Rust should take note and be extremely careful.
Based on what an ex-Google developer said in conversation at a party at the weekend (the discussion was about the choice of First Language for a Computer Science degree course, yes, I do go to exciting parties, many of those attending have never even been a CS lecturer):
Some years ago Google decided that Go projects were similar engineering effort, better performance, lower maintenance, and so on that basis there was no reason to authorise new Python software and their existing projects would migrate as-and-when.