esprehn 5 days ago

The big challenge with the approach not touched on in the post is version skew. During a deploy you'll have some new clients talk to old servers and some old clients talk to new servers. The ViewModel is a minimal representation of the data and you can constrain it with backwards compatibility guarantees (ex. Protos or Thrift), while the UI component JSON and their associated JS must be compatible with the running client.

Vercel fixes this for a fee: https://vercel.com/docs/skew-protection

I do wonder how many people will use the new React features and then have short outages during deploys like the FOUC of the past. Even their Pro plan has only 12 hours of protection so if you leave a tab open for 24 hours and then click a button it might hit a server where the server components and functions are incompatible.

1
yawaramin 5 days ago

Wouldn't this be easy to fix by injecting a a version number field in every JSON payload and if the expected version doesn't match the received one, just force a redirect/reload?

pfhayes 5 days ago

Forcing a reload is a regression compared to the "standard" method proposed at the start of the article. If you have a REST API that requests attributes about a model, and the client is responsible for the presentation of that model, then it is much easier to support outdated clients (perhaps outdated by weeks or months, in the case of mobile apps) without interruption, because their pre-existing logic continues to work

yawaramin 4 days ago

Arguable that it's a 'regression'...loading pages is kinda the normal behaviour in a web browser. You can try to paper over that basic truth but you can't abstract it away forever. Also, the original comment I replied to said it would be a 'big challenge', but if you accept that the web is the web and sometimes pages can load or even reload, then it's not really a 'challenge' any more at all.

presentation 5 days ago

Vercel's skew protection feature keeps old versions alive for a while and routes requests that come from an old client to that old version, with some API endpoints to forcibly kill old versions if need be, etc. I find it works reasonably well.

yawaramin 4 days ago

Wouldn't a solution that works perfectly be better than one that works 'reasonably well'?

presentation 19 hours ago

Your solution doesn’t work perfectly, it works perfectly in the sense that your engineers wont see errors related to this situation; but it does not work perfectly in that your users have a crappy experience. For example if you have some long form and after a user inputs a ton of stuff, you just refresh their browser for them and wipe it all out, then that is a crappy experience. Or you refresh their browser when their internet connection is bad and then prevent them from using your app until the whole thing reloads.

Maybe that doesn’t matter for your use case or you’re willing to do a lot more legwork to prevent issues like that from occurring but there will always be tradeoffs.

tantalor 5 days ago

Thrashing is why

yawaramin 4 days ago

Sorry what do you mean by 'thrashing' in this context?

tantalor 4 days ago

Reload causes skew causes reload

yawaramin 4 days ago

How does reload cause skew? Reload will just load the latest version of the webapp. That's the point.

tantalor 3 days ago

If you force a reload before the rollout is complete, the user will still experience skew, because you haven't finished the rollout. The website will be completely unusable for a significant fraction of users. You might as well turn off the website during the rollout. This is the main concern of skew - how to keep the website usable at all times for all users across versions.

If your rollout times are very short then skew is not a big concern for you, because it will impact very few users. If it lasts hours, then you have to solve it.

After the rollout is complete, then reload is fine. It's a bit user hostile but they will reload into a usable state.

yawaramin 1 day ago

If a webapp rollout lasts hours, you have a much bigger problem than skew which needs to be addressed urgently.

esprehn 16 hours ago

For most large scale apps (web or native) rollouts take multiple hours or even days. Ramps are slow to avoid widespread incidents and allow canary analysis to detect issues.

ricardobeat 3 days ago

Stickiness at the load balancer level helps mitigate these issues.