Nice article.
That flapping from EpicUp 140.99.244.0/23 prefix should have been subject to route dampening. This is per peer or per prefix rate limiting typically enforced on all peers by ISPs to prevent this exact issue of a single prefix making up a significant portion of the global BGP churn.
I’m unconvinced of the correlation between the updates that the author attributed to knock on effects. It would be pretty janky to have your advertisements be based on the path to other autonomous systems’ prefixes, especially unstable ones.
I don’t think there is a 40 minute periodicity either (at least there wasn’t 8 years ago when I was deep in the BGP world). Smells like what this dataset happened to show either by luck or because of the network the author was getting the BGP feed from.
If you dig into the data and look at which AS’s and prefixes are experiencing changes, you’ll find it’s all over the place and there isn’t really any bigger pattern.
On any given day there are usually a few noisy ISPs because of bad circuits or misconfigurations. Then there are new prefixes flapping in and out as a new thing is brought online for the first time, etc. Then sprinkle in path changes for regular draining maintenance, etc.
It’s simultaneously both fascinating and a little horrifying how a little ISP in Kansas experiencing a fiber consuming backhoe shows up on routers in Perth. Yet the frequency of updates is kept to <10hz globally through tons of hand tuned policies.
Route dampening has mostly fallen out of fashion with networks these days.
Most setups were horribly misconfigured and (most) routers are no longer extremely CPU starved as they once were, That doesn't mean that it does not still exist of course, when I did bgp battleships ( https://blog.benjojo.co.uk/post/bgp-battleships ) I found that 3356 (at the time) was doing route dampening, so play had to be paused for a while.
That seems crazy to me. What guardrails are there against a single hacked router pumping 10000 path changes/sec?
The direct peering to the router is likely going to have a bad time, but route advertisement interval I mention in the article is going to coalesce all of those updates together. Downstream peers would only see the one update every 30 seconds (or so).
That’s only true if they can be coalesced. Even with RPKI an intermediate transit router can path length flap 100,000 routes every 30 second interval.
Depending on the RA interval alone is negligence and if you encountered a small ISP that isn’t dampening your updates directly, their peering session is at risk with any of the major transit providers.
Route dampening guardrails were super common 7 years ago and there isn’t any technological development that fixes what they did so I highly doubt they fell out of favor.
Yup, unless that component has been disabled (which is quite rare) or the other side is bird, a bgpd that doesn't buffer anything !
See my adjacent comment to yours. I would like to see why you think dampening is out of favor. Interval batching is not an equivalent protection. If you were playing BGP battleships you were likely playing at a rate where a single prefix was not updating more than once per minute.
That wouldn’t land in the dampening levels that were normally configured that encountered with all of the transit providers.