The system uses on purpose those simple words, since they are "tellers" of the style of the user in a context independent way. Burrows papers explain this very well, but in general we want to capture low-level structure, more than topics and exact non obvious words used. I tested the system with 10k words and removing the most common words, and you get totally different results (still useful, but not style matching), basically you get users grouped by interests.
>The system uses on purpose those simple words, since they are "tellers" of the style of the user in a context independent way.
Yes, that's good! I didn't state my interest clearly, though. I'd like to see the "analyze" result with the stop words excluded, not for the style comparison part, but for the reasons you state and others.
I think grouping users by interests would be a more interesting application. Most users don't have multiple accounts, but everyone probably shares some interests with other users, whom they might enjoy discovering.
Pretty sure the point here is to demonstrate how governments or other surveillance orgs can easily find your alt accounts even if you use Tor or any number of security tools.