> code from Google.
I spilled my coffee, I was just talking the other day to some coworkers how I don't trust google open source. Sure they open their code but they don't give a damn about contributions or making it easy for you to use the projects. I feel a lot of this sentiment extends to GCP as well.
So many google projects are better than your average community one, but they never gain traction outside of google because it is just too damn hard to use them outside of google infra.
The only Google project that seems to evade this rule that I know of is Go.
> but they don't give a damn about contributions
Here is a concrete reason why Google open source sucks when it comes to contributions and I don't think it can be improved unless Google changes things drastically: (1) an external contributor makes a nice change and a PR on GitHub; (2) the change breaks internal use cases and their tests; (3) the team is unwilling to fix the PR or port the internal test (which may be a test several layers down the dependency tree) to open source.
> making it easy for you to use the projects
Google internally use Blaze, a version of Bazel. It's so ridiculously easy for one team to use another team's project that even just thinking about what the rest of us needs to do to use another project is unloved dreadful work. So people don't make that effort.
I do not see either of these two points changing. Sure there are individuals at Google that really care about open source community, but most don't, and so their project is forever a cathedral not a bazaar.
It is not only that, but often when google uses an open source project not owned by them they either try to take ownership of the project or fork it instead of trying to contribute to the original.
That's pretty common though? I mean isn't that part of the idea of open source? Forking is a pretty central part.
I don't see a problem here. Why should google have to deal with the opinions of a maintainer if they can just maintain their own version. Yeah obviously it would be nice if they'd contribute their changes back to the upstream repo but from a business perspective it's often not worth it.
At my company the inverse of this problem happened way more often: We find a problem but the maintainer just doesn't care. For example the backward-cpp library is a good example where the maintainer just isn't that active in the issues. Why wait for him to respond if you can just fork it and keep on moving.
Which cases did you have in mind? Seems like it should be easy to find half a dozen examples since you claim it happens often.
KHTML, officially discontinued in 2023. -- "Embrace, extend, and extinguish" (EEE) also known as "embrace, extend, and exterminate", is a phrase that the U.S. Department of Justice. It's also possible that President-elect Donald Trump may interfere with the DOJ's proposed remedies; he said on the campaign trail that a Google break-up may not be desirable since it could "destroy" a company that the US highly values.
The GP's complaint was that Google "took over projects" or "forked them without trying to contribute to the original".
In the case of KHTML, they never used it in the first place, so it seems like a particularly inappropriate example. I assume you actually meant Webkit? In that case, they spent half a decade and thousands of engineer-years contributing to Webkit, so it doesn't fit the original complaint about not "trying to contribute" either.
November 4, 1998; 26 years ago (KHTML released)
June 7, 2005; 19 years ago (WebKit sourced)
https://chromium.googlesource.com/chromium/src/+/HEAD/third_... * (C) 1999-2003 Lars Knoll ([email protected])
* (C) 2002-2003 Dirk Mueller ([email protected])
* Copyright (C) 2002, 2006, 2008, 2012 Apple Inc. All rights reserved.
* Copyright (C) 2006 Samuel Weinig ([email protected])
"...they never used it in the first place" I think the point is that KHTML was already forked into webkit by apple long before google came along (though, they have in fact also now forked webkit into blink).
Thank you, I rest my case. I didn't even need to bring up the DragonEgg cartel (Chandler?) going down the gcc-llvm-clang pathway used essentially for getting rid of the pesky GPL quoted above. With BSD-style, source code is no longer any of your business (not to mention chrome-chromium differences along the textbook AndroidTV tivoization).
> I didn't even need to bring up the DragonEgg cartel (Chandler?) going down the gcc-llvm-clang pathway used essentially for getting rid of the pesky GPL quoted above.
That's... not even close to what happened?
Historically, LLVM was at one point proposed by Chris Lattner, while he was at Apple, to be upstreamed into GCC (and relicensed to GPL, natch) for use as at the LTO optimization phase, which was declined. For most of its early existence, it used llvm-gcc as the frontend to generate LLVM IR. In the late '00s, serious effort was put into making a new frontend for LLVM IR which we know as clang, primarily by Apple at that point, which become self-hosting in 2009 or 2010. Basically the moment clang becomes self-hosting, everyone jumps ship from using llvm-gcc to using clang to make LLVM IR.
Google shows up around this time, I think primarily motivated by the possibility that Clang offered for mass rewriting capabilities, since it has extraordinarily good location tracking (compared to the other compilers available), which is necessary for good rewriting tools. The other major area of Google's focus at this time is actually MSVC compatibility, and I distinctly remember Chandler talking in one of his presentations that you need to be able to compile code to trust it well enough to rewrite your code, so I think the compatibility story here was mostly (again) for rewriting.
Also around this time, gcc gains proper plugin support, and llvm-gcc is reworked into dragonegg to take advantage of the proper plugin support. But because clang now exists, dragonegg is no longer very interesting, with almost all the residual attempts to use dragonegg essentially being limited to people trying to use it to get LLVM IR out of gfortran, as LLVM had no fully-working Fortran compiler at that point.
Again, that seems to be in no way demonstrating the pattern that was claimed to be happening often.
AFAIK Google did not take ownership of gcc, nor did they try to fork it without contributing to the original. They used GCC for a good couple of decades while contributing to it, but eventually switched to a different compiler. The same for clang, they neither "took it over" nor "forked it without trying to contribute".
https://web.archive.org/web/20241123183550/https://en.wikipe... https://web.archive.org/web/20241125065641/https://arstechni...
Ars is controversial on YC news. Who knew?
One could ask whether Google works ‘open source’ or more ‘source available’; the source is there but you cannot contribute, if you can build it at all
No, "open source" doesn't imply open contribution. The standard terminology is cathedral vs bazaar.
Just to add a different perspective: sometimes people mean Open Source[1] when they say "open source," and sometimes they don't.
Personally, I take the cathedral/bazaar distinction to indicate different development cadences and philosophies, rather than whether contributions are allowed/encouraged.
Various cathedral-style projects (eg: FreeBSD, Emacs) still actively take contributions and encourage involvement.
There's something even further along the spectrum that's "we provide dumps of source code, but don't really want your patches." I'm not sure what the best term is for that, but "source [merely] available" sometimes has that connotation.
The quintessential example for providing source and discouraging contributions is SQLite. Nobody would argue that it's merely source available. It is full open source.
In fact "source available" usually means you can see the source code, but there are severe restrictions on the source, such as no permission to modify the source even for your own use, or no permission to create forks of the project containing the modifications, or severe restrictions on such modifications. An example is MongoDB's Server Side Public License, which is source-available but not open source.
I think it depends on the contribution. I sent a bug report with a minimal test case. It was welcomed and quickly fixed. It is not a source code contribution, but I think it is a contribution.
OP is specifically talking about code contributions. You can (I have) make that type of contribution to proprietary software.
> sometimes people mean Open Source[1] when they say "open source," and sometimes they don't.
And when they don't when talking about source code, they are wrong. If someone says that an RJ45 cable is "a piece of software" because it's "soft" (you can bend it), would you say it's just a different perspective?
Open source, in the context of software, has a particular meaning. And it is the case that many software developers don't know it, so it's worth teaching them.
While I, too, believe that words should mean things, I don't think it's quite so cut-and-dry in this particular case. Part of the reason the term could not be trademarked was because it is too descriptive; it's easy for people to put those words together to describe software.
I agree that the OSI meaning is worth teaching. But perhaps not by saying "you're wrong; there is only one right way." Perhaps more like "some people attach XYZ specific meaning to that phrase, please be aware of it. Also, here is some history of the term if you like."
----
Aside: On re-reading this, I wonder if it comes across as testy... I think I am just channeling my annoyance with the language police of the world, in general, who sour people's interest in topics with their gatekeeping behavior. I don't mean it too personally towards you (:
To take a step back, it came from this comment:
> One could ask whether Google works ‘open source’ or more ‘source available’; the source is there but you cannot contribute, if you can build it at all
The author of this comment says "if you can't contribute, shouldn't you consider it `source available` instead of `open source`?".
There is only one valid answer: "No, you should not. It is still open source even if you cannot contribute". The context is clear, we are talking about "open source" vs "source available", which are both very specific in this context.
> I think I am just channeling my annoyance with the language police of the world, in general, who sour people's interest in topics with their gatekeeping behavior. I don't mean it too personally towards you (:
No offense taken, and I don't mean it personally either =). My point is just that in this context, the author of the comment was pretty clearly talking (asking, even?) about the difference between "open source" and "source available".
I don't even think it's shutting down the author: there was no other point than this, so the "thread" started by this author was purely about the meaning of those words.
Maybe you already know this and have discarded it (if so, no worries), but for what it's worth, this is my perspective on these things: Some people, in some contexts, use words like a laser — very specific, very targeted, with precise meanings, etc. Other people, other times (perhaps most people, most of the time?) use words more like ... a bucket of paint. Words are sloshy and approximate and about as precise as trying to sign your name using that bucket. Each has their value.
Inevitably, a laser-minded person talks with a sloshy-bucket person and misunderstandings ensue.
In sloshy-bucket land, I think "open source" has various connotations — a sense of community, encouraged contribution, being able to build it yourself, improve it yourself, etc.
And I think the commenter, in broad strokes, was saying that Google is not upholding those various virtues that are often associated with "open source," so felt the term was not a good (sloshy) fit.
In particular, I do not think they were asking the question you say they were asking.
In this space, it seems like there are both too many terms (so people rather just pick a popular one and over-apply it) and too few (so you can never find one that quite says what you want). Such is life, I guess. Maybe "open sourcey" would be good, to indicate it's talking about a hand-wavy vague "ness" rather than a particular nailed-down definition. "Google isn't being very open sourcey"? ¯\_(ツ)_/¯
Anyway, all this to say: in the ethos of trying to take a charitable interpretation of people's words, I think it's good to consider the bucket-of-paint possibility, before jumping to corrections and yes/no determinations.
----
Edit: It occurs to me that originally I misinterpreted you as being persnickety, when perhaps you were just trying to answer the question you felt they had asked. Sorry!
Note that I did not write the original answer: I answered to you :-).
> And I think the commenter, in broad strokes, was saying that Google is not upholding those various virtues that are often associated with "open source," so felt the term was not a good (sloshy) fit.
Totally valid! And I like the idea of considering the "bucket-of-paint" possibility before saying "no you're wrong". But on the other hand, sometimes it's worth agreeing on the meaning of words while discussing something.
I feel like I actually happen to regularly be on the bucket-of-paint side. I will often simplify the part of the discussion that I feel is not relevant by saying e.g. "okay this solution is bad, so if we look into this other solution we have to think about ...". And sometimes people really care about starting a discussion saying "by saying it's bad, you make it sound like whoever would think about it is stupid, and that's extreme. This solution is not necessarily bad, because in some situations it may work even though it is suboptimal". To which I tend to say "sure, I said it was bad as a way of saying that we seemed to agree that we would focus on the other one".
Until this point it's perfectly fine for me. What frustrates me is when the discussion continues in what I feel sounds like, e.g. "no, I think that your saying it is bad reflects that you disrespect whoever would think about it, and you should never have used that word in the first place. I am not sure I can ever have a meaningful discussion with you now that you used this word in this sentence, even if you later admitted that it was an oversimplification".
Anyway, communication is hard :-)
Googletest is the most widely used test library for C++. Googlemock is the only mocking library available that's reasonably feature complete.
I you are using googletest, you owe it to yourself to check out catch2 which I find much better and uses modern C++. There are a few other test frameworks in C++ that look better than google test as well, but catch2 is the one I settled on (and seems to be the best supported): feel free to check them out.
I've given up on mock frameworks. They make it too easy to make an interface for everything and then test that you are calling functions with the expected parameters instead of the program works as you want. A slight change to how I call some function results in 1000 failed tests and yet I'm confident that I didn't break anything the user could notice (sometimes I'm wrong in this confidence - but none of the failing tests give me any clue that I'm wrong!)
catch2 has become fairly bloated. doctest takes all of the best parts of catch2 without all the bloat and the end result is a test framework that is literally over 10x faster than catch2. It's also like 90% compatible with catch2 so porting your tests to it is pretty easy.
Especially if you have a build process that always runs your unit tests, it's nice to have a very fast test/compile/debug loop.
>catch2 has become fairly bloated. doctest takes all of the best parts of catch2 without all the bloat and the end result is a test framework that is literally over 10x faster than catch2. It's also like 90% compatible with catch2 so porting your tests to it is pretty easy.
I feel like you could make a madlib where you could plug in any two project names and this sentence would make sense.
Madlibs have become fairly bloated. Copypasta memes take all the best parts of madlibs without all the bloat and the end result is a form of mockery is literally over 10x faster than a madlib. It's also like 90% compatible with madlibs so porting your gibes is pretty easy.
I was just about to suggest doctest, you beat me to it! I'm all about faster compile times, and it was mostly a drop-in replacement for catch2 in my case.
Also, IMO, both doctest and catch2 are far superior to Google Test.
I've found exactly three places where I really want to have a mock available:
1) Databases and other persistent storage. Though in this case, the best mock for a database is generally another (smaller, easily snapshottable) database, not something like googlemock.
2) Network and other places where the hardware really matters. Sometimes, I really want to drop a particular message, to exercise some property of the sender. This is often possible to code around in greenfield projects, but in existing code it can be much simpler to just mock the network out.
3) Cases where I am calling out to some external black-box. Sometimes it's impractical to replicate the entire black-box in my test. This could be e.g. because it is a piece of specialized hardware, or it's non-deterministic in a way that I'd prefer my test not to be. I don't want to actually call out to an external black-box (hygiene), so some kind of a mock is more or less necessary.
For 1 have you looked at test containers?
Briefly, but frankly: copying small SQLite files around works so well in almost all cases that I don't feel the need for a new abstraction.
Sounds like the mocks are overused or used inappropriately in your experience (whether by a colleague or yourself).
Mocks have their place. A prototypical example is at user-visible endpoints (eg: a mock client).
I have found in my world it is easy to setup a test database (we use sqlite!) and the file system is fast enough (I have code to force using a different directory for files). I have been playing with starting a dbus server on a different port in my tests and then starting the real server to test against (with mixed results - I need a better way to know when dbus is running). I have had great success by writing a fake for one service that is painful - the fake tracks the information I really care about and so lets me query on things that matter not what the function signature was.
I'm not arguing that mocks don't have their place. However I have found that by declaring I won't use them at all I overall come up with better solutions and thus better tests.
Exactly! This one gets it, real communism has never been tried! On another note I do not think that it is tiresome at all, that any critique of any pattern/teqnique in SWE, always is meet with the "you are holding it wrong" rebutle.
Do you not believe it's possible to hold something wrong? If someone is a skilled and experienced golfer, it's quite believable that they won't automatically be a skilled tennis player after three months of tennis playing. If someone is an experienced race car driver, they won't automatically be a skilled member of a basketball team. "You must be holding it wrong" can sometimes take years of practising holding it right, not just minutes or months.
If a team of people who have been SWEs for decades reports that something helped their team, and you try it and it doesn't work, and you have been SWEs for decades, that doesn't automatically mean they are charlatans selling nonsense. They might all be basketball players playing together for 5 years and you might be a team of a baseball player, a racecar driver, a track and field athlete, and a water polo player, trying to play basketball from only reading about it, with nobody who has done it or experienced it, and several people who quietly don't want to be playing it and are just nodding along while hoping it fails. The conclusion that they are liars and it can't possibly work is not a strong conclusion.
When I look close I discover that those people who tried agile and found it worked either were on a much smaller projects with much simpler problems than large projects have; or they are not telling the full truth about agile. (sometimes both). I'm glad agile works for small projects, but it doesn't scale very well seems clear from all the large projects that have tried it and have gone back in major ways (generally not all the way back). The people who have failed projects still often sing the praises of agile, but we have no idea if the project would have failed if something else had been used.
I used to really like Google Test, and then Google decided in it's infinite wisdom to make the OSS version depend on their C++ shared library replacement Abseil, and not just that but the live at head version.
That makes sense internally for Google because they have their massive monorepo, but it sure as hell makes it a pain in the ass to adopt for everyone else.
I don't think you're reading those docs correctly. Googletest recommends living at head, but there's no reason you can't pin a release, either a git commit hash or a release label, of which there have been several. Googletest does not depend on the HEAD of abseil-cpp, it actually declares a direct dependency on an older LTS release of absl, but since you are building it from source any later release or commit of absl would work.
Google open source libraries are often a mess when you try to include more than one of them in the same project, but googletest isn't an example of the mess. It's actually pretty straightforward.
> Google open source libraries are often a mess when you try to include more than one of them in the same project
Completely agree. In isolation all of their libs are great, but inevitably I end up having to build Abseil from source, to then build Protobuf off of that, to then build gRPC off of that. If I can include the sanitizers under Google then that also becomes painful because Abseil (at least) will have ABI issues if it isn't built appropriately. Thinking about it I'd really just like a flat_hash_map replacement so I can drop Abseil.
Protobuf depending on Abseil (which has ongoing macOS build issues) is clinically insane. I tend to use protozero now which trades half a day’s boilerplate for two days’ build heartache.
Wouldn't it be even more insane if protobuf had its own distinct string splitting/merging routines, its own flags and logging libraries, etc?
No. Not at all. String splitting is a couple of lines' code. I don't want have to think about a logging framework just to read a protobuf - it can send stuff to stderr like everything else. If Google wants protobuf to be a widely accepted standard then it shouldn't require you to opt into their ecosystem to use it.
> Thinking about it I'd really just like a flat_hash_map replacement so I can drop Abseil.
boost has a flat_hash_map implementation for quite a few versions now, which from what I could see generally beat or is competitive with the absl implementation: https://www.reddit.com/r/cpp/comments/yikfi4/boost_181_will_...
The reddit thread mentions that the author was probably going to write a blog post about it at some point; I went and found it so you don't have to.
I was curious what exactly differentiates boost::unordered_flat_map from absl::flat_hash_map, and was not disappointed. It seems that the lion's share of the performance improvement comes from using more of the metadata for the reduced hash value, although there are a few other contributing factors.
The blog post further describes where absl::flat_hash_map performs better: iteration (and consequently erasure), which is ironic given those are a couple of areas where I always felt that absl::flat_hash_map was especially weak. But, it makes sense to double down on Abseil's strengths as well as its shortcomings.
https://bannalia.blogspot.com/2022/11/inside-boostunorderedf...
Iteration has been improved since, and now we’re beating Abseil on iteration plus erasure:
https://github.com/boostorg/boost_unordered_benchmarks/tree/...
Very cool!
I especially like how you can see the load factor across the graphs, where there are sharp downward spikes each time the map resizes, and how they vary as you move through the memory hierarchy.
I am curious what Abseil could learn from other modern hash map implementations, since my understanding is that the fundamental structure of its swisstables implementation hasn't changed meaningfully since 2017.
FWIW the flat hash map in Boost is now faster. I am not sure if integrating Boost is any easier for you.
I occasionally reconsider it so I can try a bunch of the FB alternatives (Folly, Thrift, CacheLib, etc.), but... yeah. Still just kind of waiting for a panacea.
It's been a few years to be fair, I stopped working with C++ in early 2021 or so so maybe I've just misremembered. I do remember having to take Abseil on where we previously didn't.
Google test and mock are quite powerful but are a big hit at both compile time and runtime, which matters for quick edit-compile-fix loops.
I still go back and forth on whether google test and mock are worth it.
Google benchmark is also nice.
> big hit at both compile time and runtime, which matters for quick edit-compile-fix loops
honestly if you write C++ for work, there's no excuse for your company to not give you the beefiest dev machine that money can reasonably buy. given that rust exists, I think "get a faster computer" is a totally valid answer to build times, especially now that skylake malaise era is over and CPUs are getting faster
> given that rust exists, I think "get a faster computer" is a totally valid answer to build times
I find this amusing because one of the main reasons i avoid Rust (in the sense that i prefer to build things written in other languages if possible - i don't mind if someone else uses it and gives me a binary/library i can use - and it never went beyond "i might check this at some point, sometime, maybe" in my mind) is the build times compared to most other compilers :-P.
Also, at least personally, if i get a faster computer i want my workflow to be faster.
You may want to add a '/s' at the end of your post there, because sarcasm doesn't really translate on the internet. The only way I can tell it's sarcasm is because nobody would really go 'throw away the old stuff, buy new stuff, waste more, pollute the oceans, consume, CONSUME!!!'.
Does it not support only running some or no tests? I only run the full test suite rarely, close to releases.
I blame monorepo culture. If it doesn't grow up in a context where it's expected to stand on its own, it crashes and burns when you kick it out of the nest.
I heard that Meta also has a monorepo but most of their open source projects are very community driven. I think it is corporate mandate thing, no resources to be spent on open source and not tracking open source contributions as part of career development.
Meta does have a monorepo but their open source stuff lives outside it. Or at least it did when I worked on PyTorch (2019). I did all my work in the separate open-source PyTorch repo and then commits got mirrored back to the monorepo by some automated process.
You could also build and run it using completely standard tools; you didn’t need to download random internal source control software etc. like you do for e.g. Chromium.
Curious about the organizational dynamics around this kind of decisions. There is no reason why google couldn't do the same.
I assume there is little will internally because everyone there is so focused on their performance reviews and helping external people using google open source projects is not tracked by that.
I think it's more of a strategic difference. Google seems like their long term planning involves thinking about open source less than Meta's. They're more wait-and-see about it.
React must've been destined to be open source from the get go: gotta create a mountain of js to hide in so the users can't strip out the malicious parts. Kubernetes on the other hand could've been internal forever and still would've made sense. It just happened to later make sense to open source it (it feels lopsided to me, like they kept certain parts secret. It wouldn't feel that way if they had planned it as OSS from the get go).
Tensorflow is/was decent. It looked like they made a lot of effort for it to be accessible for outsiders.
Have you tried building the damn thing ?
Nix build is still stuck in the one from 3-4 y back because bazel doesn't play well. Debian too has some issues building the thing...
As an industry we need to stop treating breaking changes as an acceptable thing. The rate of bit rot has accelerated to an absurd pace. I can't remember the package but I had to spend considerable time fixing a build because a package.. changed names.. for NO REASON. They just liked the new name better. This should be career death. You're wasting your fellow humans' time and energy on your vanity when you make a breaking change that is at all avoidable. I should be able to run a build script made 20 years ago and it should just work. No renamed package hunting, no WARNING WARNING DEPRECATED REWRITE ALL YOUR CODE FOR LEFTPAD 10.3 IMMEDIATELY in the console, no code changes, no fuss, we should expect it to just work. This state of affairs is a stain on our industry.
One day we will have bled enough and we'll switch to using cryptographic hashes of package contents (or of some recipe for deterministically building the thing on different architectures) instead of anything so flimsy as a name and version number.
For the humans, we can render the hashes as something friendly, but there's no reason to confuse the machines with our human notions of friendliness.
You’re basically describing nix and Guix.
They use a hash of the derivation and its inputs as a memoization strategy: providing yesterday's answer to today's question since it was asked yesterday. But so far as I know nobody's actually using those hashes for the initial request.
It's not like python will let you:
import nix.numpy-hsbdjd...8r5z2 as np
Such that the import mechanism ensures that the correct build of numpy is used.For that to work you'd have to change nix such that the hash did not digest parameters like `amd64-linux` witch indicated the system architecture (you'd want those to be satisfied at import time).
In Guix at least (I assume also nix) you can build things from source with a verified hash. I.e. write a numpy package definition that says download the Numpy source from this URL, and expect its hash to be equal to this string. You could then depend on that package from another package ensuring it uses a numpy built from that bit-for-bit exact source tree. Does that not amount to the same thing as what you want?
this is why you build to a specific version of a library. drop your build script into a container with the versions of software it expects and it should do fine. containerization is the admittance that versioning environments is needed for most software. I expect the nix/guix crowds to win in the end.
Blindly wrapping a build script in a Dockerfile is not nothing, but it's no replacement for being careful while writing that script in the first place.
Otherwise I agree, because if you must be careful, you might as well use tooling that's built for such care. But if you're doing that, do you need the Dockerfile? And that's how you end up with nix/guix.
Having tried on other platforms, it's not Bazwl, it's not even just Google.
It's python packaging and the way the only really supported binary distribution method of Tensorflow for many many years was to use Pip and hope it doesn't crash. And it's reflected in how the TF build scripts only support building python lib as artefact, everything else at the very least involved dissecting bazel intermediate targets