dataflow 4 hours ago

I was wondering the same thing but I guess the key is probably that you don't actually have to do it correctly every time. Just tokenizing based on a few common characteristics (brace pairing, quotes, indentation, newlines, etc.) should let you trim a ton without knowing anything about the language, I imagine?

My real worry is what if this ends up running dangerous code. Like what if you have a disabled line that writes instead of reading, that randomly gets reactivated?

1
steventhedev 3 hours ago

It's intended to run for producing compiler test cases, so there shouldn't be any code that's actually running.

CPython includes a flag to only run parsing/compiling to bytecode. While you can use it like they did here and run the code - it really depends on how much you trust every possible subset of your code