Item 43699728

roughly • 4 days ago

I like this!

In the grand HN tradition of being triggered by a word in the post and going off on a not-quite-but-basically-totally-tangential rant:

There’s (at least) three areas here that are footguns with these kinds of calculations:

1) 95% is usually a lot wider than people think - people take 95% as “I’m pretty sure it’s this,” whereas it’s really closer to “it’d be really surprising if it were not this” - by and large people keep their mental error bars too close.

2) probability is rarely truly uncorrelated - call this the “Mortgage Derivatives” maxim. In the family example, rent is very likely to be correlated with food costs - so, if rent is high, food costs are also likely to be high. This skews the distribution - modeling with an unweighted uniform distribution will lead to you being surprised at how improbable the actual outcome was.

3) In general normal distributions are rarer than people think - they tend to require some kind of constraining factor on the values to enforce. We see them a bunch in nature because there tends to be negative feedback loops all over the place, but once you leave the relatively tidy garden of Mother Nature for the chaos of human affairs, normal distributions get pretty abnormal.

I like this as a tool, and I like the implementation, I’ve just seen a lot of people pick up statistics for the first time and lose a finger.

btilly • 4 days ago

I strongly agree with this, and particularly point 1. If you ask people to provide estimated ranges for answers that they are 90% confident in, people on average produce roughly 30% confidence intervals instead. Over 90% of people don't even get to 70% confidence intervals.

You can test yourself at https://blog.codinghorror.com/how-good-an-estimator-are-you/.

1 reply

Nevermark • 4 days ago

From link:

> Heaviest blue whale ever recorded

I don't think estimation errors regarding things outside of someone's area of familiarity say much.

You could ask a much "easier"" question from the same topic area and still get terrible answers: "What percentage of blue whales are blue?" Or just "Are blue whales blue?"

Estimating something often encountered but uncounted seems like a better test. Like how many cars pass in front of my house every day. I could apply arithmetic, soft logic and intuition to that. But that would be a difficult question to grade, given it has no universal answer.

4 replies

kqr • 4 days ago

I have no familiarity with blue whales but I would guess they're 1--5 times the mass of lorries, which I guess weigh like 10--20 cars which I in turn estimate at 1.2--2 tonnes, so primitively 12--200 tonnes for a normal blue whale. This also aligns with it being at least twice as large as an elephant, something I estimate at 5 tonnes.

The question asks for the heaviest, which I think cannot be more than three times the normal weight, and probably no less than 1.3. That lands me at 15--600 tonnes using primitive arithmetic. The calculator in OP suggests 40--320.

The real value is apparently 170, but that doesn't really matter. The process of arriving at an interval that is as wide as necessary but no wider is the point.

Estimation is a skill that can be trained. It is a generic skill that does not rely on domain knowledge beyond some common sense.

1 reply

dotaenjoyer • 4 days ago

I would say general knowledge in many domains may help with this as you can try and approximate to the nearest thing you know from that domain.

How you get good at being a generalist is the tricky part, my best bet is reading and doing a lot of trivia (I found crosswords to be somewhat effective at this, but far from being efficient)

2 replies

kqr • 3 days ago

No, that has nothing to do with it. Trivia helps you narrow down an interval. It is not necessary to construct a correct interval, which can be of any width.

Nevermark • 3 days ago

I am dying inside imagining someone who had to use crossword puzzles to learn how to read. There must be a better way to educate the masses!

yen223 • 4 days ago

I guess people didn't realise they are allowed to, and in fact are expected to, put very wide ranges for things they are not certain about.

peeters • 3 days ago

So the context of the quiz is software estimation, where I assume it's an intentional parable of estimating something you haven't seen before. It's trying to demonstrate that your "5-7 days" estimate probably represents far more certainty than you intended.

For some of these, your answer could span orders of magnitude. E.g. my answer for the heaviest blue whale would probably be 5-500 tons because I don't have a good concept of things that weigh 500 tons. The important point is that I'm right around 9 times in 10, not that I had a precise estimate.

1 reply

duckmysick • 3 days ago

I don't know, an estimate spanning three orders of magnitude doesn't seem useful.

To continue your example of 5-7 days, it would turn into an estimate of 5-700 days. So somewhere between a week or two years. And fair enough, whatever you're estimating will land somewhere in between. But how do I proceed from there with actual planning or budget?

3 replies

throwup238 • 3 days ago

> But how do I proceed from there with actual planning or budget?

You make up the number you wanted to hear in the first place that ostensibly works with the rest of the schedule. That’s why engineering estimates are so useless - it’s not that they’re inaccurate or unrealistic - it’s that if we insisted on giving them realistic estimates we’d get fired and replaced by someone else who is willing to appease management and just kick the can down the road a few more weeks.

baq • 3 days ago

Your question is akin to asking ‘how do I make the tail to wag the dog?’

Your budget should be allocated for say 80% confidence (which the tool helpfully provides behind a switch) and your stakeholders must be on board with this. It shouldn’t be too hard to do since everyone has some experience with missed engineering deadlines. (Bezos would probably say 70% or even less.)

peeters • 3 days ago

I mean it's no less useful than a more precise, but less certain estimate. It means you either need to do some work to improve your certainty (e.g. in the case of this quiz, allow spending more than 10 minutes or allow research) or prepare for the possibility that it's 700 days.

Edit: And by the way given a large enough view, estimates like this can still be valuable, because when you add these estimates together the resulting probability distribution narrows considerably. e.g. at just 10 tasks of this size, you get a 95% CI of 245~460 per task. At 20, 225~430 per task.

Note that this is obviously reductive as there's no way an estimate of 5-700 would imply a normal distribution centred at 352.5, it would be more like a logarithmic distribution where the mean is around 10 days. And additionally, this treats each task as independent...i.e. one estimate being at the high end wouldn't mean another one would be as well.

MichaelDickens • 3 days ago

It shouldn't matter how familiar you are with the question. If you're pretty familiar, give a narrow 90% credence interval. If you're unfamiliar, give a wide interval.

jrowen • 4 days ago

This jives with my general reaction to the post, which was that the added complexity and difficulty of reasoning about the ranges actually made me feel less confident in the result of their example calculation. I liked the $50 result, you can tack on a plus or minus range but generally feel like you're about breakeven. On the other hand, "95% sure the real balance will fall into the -$60 to +$220 range" feels like it's creating a false sense of having more concrete information when you've really just added compounding uncertainties at every step (if we don't know that each one is definitely 95%, or the true min/max, we're just adding more guesses to be potentially wrong about). That's why I don't like the Drake equation, every step is just compounding wild-ass guesses, is it really producing a useful number?

2 replies

kqr • 4 days ago

It is producing a useful number. As more truly independent terms are added, error grows with the square root while the point estimation grows linearly. In the aggregate, the error makes up less of the point estimation.

This is the reason Fermi estimation works. You can test people on it, and almost universally they get more accurate with this method.

If you got less certain of the result in the example, that's probably a good thing. People are default overconfident with their estimated error bars.

2 replies

jrowen • 3 days ago

Read a bit on Fermi estimation, I'm not quite sure exactly what the "method" is in contrast to a less accurate method, it's basically just getting people to think in terms of dimensional analysis? This passage from the Wikipedia is interesting:

By contrast, precise calculations can be extremely complex but with the expectation that the answer they produce is correct. The far larger number of factors and operations involved can obscure a very significant error, either in mathematical process or in the assumptions the equation is based on, but the result may still be assumed to be right because it has been derived from a precise formula that is expected to yield good results.

So the strength of it is in keeping it simple and not trying to get too fancy, with the understanding that it's just a ballpark/sanity check. I still feel like the Drake equation in particular has too many terms for which we don't have enough sample data to produce a reasonable guess. But I think this is generally understood and it's seen as more of a thought experiment.

pests • 4 days ago

> People are default overconfident with their estimated error bars.

You say this but yet roughly in a top level comment mentions people keep their error bars too close.

2 replies

kqr • 4 days ago

Sorry, my comment was phrased confusingly.

Being overconfident with error bars means placing them too close to the point estimation, i.e. the error bars are too narrow.

1 reply

pests • 3 days ago

Ah right thanks, I read that backwards.

bigfudge • 4 days ago

They are meaning the same thing. The original comment pointed out that people’s qualitative description and mental model of the 95% interval means they are overconfident… they think 95 means ‘pretty sure I’m right’ rather than ‘it would be surprising to be wrong’

roughly • 3 days ago

I think the point is to create uncertainty, though, or to at least capture it. You mention tacking a plus/minus range to $50, but my suspicion is that people's expected plus/minus would be narrower than the actual - I think the primary value of the example is that it makes it clear there's a very real possibility of the outcome being negative, which I don't think most people would acknowledge when they got the initial positive result. The increased uncertainty and the decreased confidence in the result is a feature, not a bug.

pertdist • 4 days ago

I did a project with non-technical stakeholders modeling likely completion dates for a big GANTT chart. Business stakeholders wanted probabilistic task completion times because some of the tasks were new and impractical to quantify with fixed times.

Stakeholders really liked specifying work times as t_i ~ PERT(min, mode, max) because it mimics their thinking and handles typical real-world asymmetrical distributions.

[Background: PERT is just a re-parameterized beta distribution that's more user-friendly and intuitive https://rpubs.com/Kraj86186/985700]

2 replies

kqr • 4 days ago

This looks like a much more sophisticated version of PERT than I have seen used. When people around me have claimed to use PERT, they have just added together all the small numbers, all the middle numbers, and all the big numbers. That results in a distribution that is too extreme in both lower and upper bound.

1 reply

baq • 3 days ago

that... is not PERT. it's 'I read a tweet about three point estimates' and I'm using a generous interpretation of read

baq • 4 days ago

arguably this is how it should always be done, fixed durations for any tasks are little more than wishful thinking.

dawnofdusk • 3 days ago

>rent is very likely to be correlated with food costs - so, if rent is high, food costs are also likely to be high

Not sure I agree with this. It's reasonable to have a model where the mean rent may be correlated with the mean food cost, but given those two parameters we can model the fluctuations about the mean as uncorrelated. In any case at the point when you want to consider something like this you need to do proper Bayesian statistics anyways.

>In general normal distributions are rarer than people think - they tend to require some kind of constraining factor on the values to enforce.

I don't know where you're getting this from. One needs uncorrelated errors, but this isn't a "constraint" or "negative feedback".

1 reply

roughly • 3 days ago

The family example is a pat example, but take something like project planning - two tasks, each one takes between 2 and 4 weeks - except that they’re both reliant on Jim, and if Jim takes the “over” on task 1, what’s the odds he takes the “under” on task 2?

This is why I joked about it as the mortgage derivatives maxim - what happened in 2008 (mathematically, at least - the parts of the crisis that aren’t covered by the famous Upton Sinclair quote) was that the mortgage backed derivatives were modeled as an aggregate of a thousand uncorrelated outcomes (a mortgage going bust), without taking into account that at least a subset of the conditions leading to one mortgage going bust would also lead to a separate unrelated mortgage going bust - the results were not uncorrelated, and treating them as such meant the “1 in a million” outcome was substantially more likely in reality than the model allowed.

Re: negative feedback - that’s a separate point from the uncorrelated errors problem above, and a critique of using the normal distribution at all for modeling many different scenarios. Normal distributions rely on some kind of, well, normal scattering of the outcomes, which means there’s some reason why they’d tend to clump around a central value. We see it in natural systems because there’s some constraints on things like height and weight of an organism, etc, but without some form of constraint, you can’t rely on a normal distribution - the classic examples being wealth, income, sales, etc, where the outliers tend to be so much larger than average that they’re effectively precluded by a normal distribution, and yet there they are.

To be clear, I’m not saying there are not statistical methods for handling all of the above, I’m noting that the naive approach of modeling several different uncorrelated normally distributed outcomes, which is what the posted tool is doing, has severe flaws which are likely to lead to it underestimating the probability of outlier outcomes.

youainti • 4 days ago

> I’ve just seen a lot of people pick up statistics for the first time and lose a finger.

I love this. I've never though of statistics like a power tool or firearm, but the analogy fits really well.

1 reply

ninalanyon • 3 days ago

Unfortunately it's usually someone else who loses a finger, not the person wielding the statistics.

rssoconnor • 3 days ago

Normal distributions are the maximum entropy distributions for a given mean and variance. Therefore, in accordance with the principle of maximum entropy, unless you have some reason to not pick a normal distribution (e.g. you know your values must be non-negative), you should be using a normal distribution.

2 replies

tgv • 3 days ago

At least also accept a log-normal distribution. Sometimes you need a factor like .2 ~ 5, but that isn't the same as N(2.6, 1.2).

kqr • 3 days ago

> you should be using a normal distribution.

...if the only things you know about an uncertain value are its expectation and variance, yes.

Often you know other things. Often you don't know expectation and variance with any certainty.

jbjbjbjb • 3 days ago

I think to do all that you’d need a full on DSL rather than something pocket calculator like. I think adding a triangular distribution would be good though.

gamerDude • 3 days ago

Great points. I think the idea of this calculator could just be simply extended to specific use cases to make the statistical calculation simple and take into account additional variables. Moving being one example.

larodi • 4 days ago

Actually using it already after finding it few days ago on HN

JKCalhoun • 3 days ago

> 2) probability is rarely truly uncorrelated

Without having fully digested how the Unsure Calculator computes, it seems to me you could perhaps "weight" the ranges you pass to the calculator. Rather than a standard bell curve the Calculator could apply a more tightly focused — or perhaps skewed curve for that term.

If you think your salary will be in the range of 10 to 20, but more likely closer to 10 you could:

10<~20 (not to be confused with less-than)

or: 10!~20 (not to be confused with factorial)

or even: 10~12~20 to indicate a range of 10 to 20 ... leaning toward 12.

1 reply

roughly • 3 days ago

The correlation in this case isn't about the distribution for the individual event, it's about the interactions between them - so, for instance, Rent could be anywhere between 1200 and 1800, and Food could be anywhere between 100 and 150, but if Rent is 1200, it means Food is more likely to be 100, and if Food is 150, it means Rent is more likely to be 1800. Basically, there's a shared factor that's influencing both (local cost of living) that's the actual thing you need to model.

So, a realistic modeling isn't 1200~1500 + 100~150, it's (1~1.5)*(1200 + 150) - the "cost of living" distribution applies to both factors.