Item 44187593

alisonatwork • 2 days ago

Some of these issues could perhaps be addressed by having fixed retention of PII in the online systems, and encryption at rest in the offline systems. If a user wants to access data of theirs which has gone offline, they take the decryption hit. Of course it helps to be critical about how much data should be retained in the first place.

It is true that protecting the user's privacy costs more than not protecting it, but some organizations feel a moral obligation or have a legal duty to do so. And some users value their own privacy enough that they are willing to deal with the decreased convenience.

As an engineer, I find it neat that figuring out how to delete data is often a more complicated problem than figuring out how to create it. I welcome government regulations that encourage more research and development in this area, since from my perspective that aligns actually-interesting technical work with the public good.

jandrewrogers • 2 days ago

> As an engineer, I find it neat that figuring out how to delete data is often a more complicated problem than figuring out how to create it.

Unfortunately, this is a deeply hard problem in theory. It is not as though it has not been thoroughly studied in computer science. When GDPR first came out I was actually doing core research on “delete-optimized” databases. It is a problem in other domains. Regulations don’t have the power to dictate mathematics.

I know of several examples in multiple countries where data deletion laws are flatly ignored by the government because it is literally impossible to comply even though they want to. Often this data supports a critical public good, so simply not collecting it would have adverse consequences to their citizens.

tl;dr: delete-optimized architectures are so profoundly pathological to query performance, and a lesser extent insert performance, that no one can use them for most practical applications. It is fundamental to the computer science of the problem. Denial of this reality leads to issues like the above where non-compliance is required because the law didn’t concern itself with the physics of computation.

If the database is too slow to load the data then it doesn’t matter how fast your deterministic hard deletion is because there is no data to delete in the system.

Any improvements in the situation are solving minor problems in narrow cases. The core theory problems are what they are. No amount of wishful thinking will change this situation.

2 replies

Gigachad • 2 days ago

Instantaneous deletes might be impossible, but I really doubt that it’s physically impossible to eventually delete user data. If you soft delete first to hide user data, and then maybe it takes hours, weeks, months to eventually purge from all systems, that’s fine. Regulators aren’t expecting you to edit old backups, only that they eventually get cleared in reasonable time.

Seems that companies are capable of moving mountains when the task is tracking the user and bypassing privacy protections. But when the task is deleting the users data it’s “literally impossible”

alisonatwork • 2 days ago

It would be interesting to hear more about your experience with systems where deletion has been deemed "literally impossible".

Every database I have come across in my career has a delete function. Often it is slow. In many places I worked, deleting or expiring data cost almost as much as or sometimes more than inserting it... but we still expired the data because that's a fundamental requirement of the system. So everything costs 2x, so what? The interesting thing is how to make it cost less than 2x.