IIRC correctly, Clippy’s most famous feature was interrupting you to offer advice. The advice was usually basic/useless/annoying, hence Clippy’s reputation, but a powerful LLM could actually make the original concept work. It would not be simply a chatbot that responds to text, but rather would observe your screen, understand it through a vision model, and give appropriate advice. Things like “did you know there’s an easier way to do what you’re doing”. I don’t think the necessary trust exists yet to do this using public LLM APIs, nor does the hardware to do it locally, but crack either of those and I could see ClipGPT being genuinely useful.
The way I remember it a lot of software had "help" documentation with full text search in the late 1980s and early 1990s but the common denominator was that it didn't work in the sense that you got useful answers less than 10% of the time. Until Google came along, users got trained to avoid full text search facilities.
The full text facility attached to Clippy really was helpful, getting useful answers around 50% of the time. I thought the whole point of making him an engaging cartoon character was to overcome the prejudice mid-1990s users had towards full-text search in help.
It looks like you're writing a letter.
Would you like help?
* Get help with writing the letter
* Just type the letter without help
|_| Don't show me this tip again
It looks like you're one of the 1% of humans who still write letters themselves! Dear me, imagine that, what do you think this is, the 90s or something?! Would you like to join the other 99% of humans and doomscroll and shytpost while I write that letter for you?
We are probably getting closer to that with the newer multimodal LLMs, but you'd almost need to take a screenshot on intervals fed directly to the LLM to provide a sort of chronological context to help it understand what the user is trying to do and gauge the users intentions.
As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.
> As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.
Of the technical, hang-out-on-HN crowd? Ya, probably not many.
Of the other 99.99% of computer users? The majority of them wouldn't even think about it, let alone care. To quote a phrase, ”the user is going to pick dancing pigs over security every time”.
Even without the non-chalent attitude towards security, the majority of the population has been so conditioned that everything they do on a computer is already being sent to 1) Apple, 2) Google, 3) Microsoft, or 4) their employer, that they're burnt-out of caring.
All that is to say that if you can make a widely-available real-time LLM assistant that appeals to non-technical users, please invite me to your private-island-celebrity-filled-yacht-parties.
I think we're well into the paradigm of "hidden employee activity monitoring software" already taking periodic screenshots and sending it to an LLM somewhere, which then generates aggregate performance metrics and dashboards for managers. I've heard of multiple companies working on this for $bigcorp environments, customer service/call center workstation PCs, etc.
Models with native video understanding would do the trick - Advanced Voice Mode on the ChatGPT iOS/Android app lets you use your camera, works pretty well; there's also https://aistudio.google.com/live (AFAIK there are no open-source models with similar capabilities)
> I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM
shudders.
Even funnier would be to make it unnecessarily mean and vexing.
Wait, are you really looking this up? You don't even know how to do this? Are you kidding me?
Gosh, it's been an hour and you still haven't fixed this bug? Are you retarded or something? You don't deserve this job.
>and give appropriate advice
"It's time to work, Dave"
It can still be annoying; I feel it is part of his personality.
It looks like you are writing a comment on Hacker News.
Would you like help with:
- Commas? There shouldn't be one behind "responds to text"
- Capitalization? You've missed a D in "did you know..."
- Punctuation? You've missed a question mark behind "what you’re doing". It goes inside the quotes, of course!
[] Don't ever suggest anything like this ever again.
Microsoft infamously is adding AI to Windows to constantly watch your screen and people understandably are not super excited for it.
I personally can’t wait to ask to recall something I saw before but can’t quite remember where.
Pretty soon I won’t even need biological memory.
i added a minutely scrot cronjob about a year ago and haven't used it once. remembering "that website i was on last week" is apparently not a real problem I was having
if it ran entirely on the local machine and didn't send information back to Microsoft I think people would be far more accepting of it.
That's exactly what recall was and is
> Things like “did you know there’s an easier way to do what you’re doing”
That could come off just as patronizing as the original Clippy. If it said things like "Would you like me to generate you a letter for X?" it would be miles ahead of the original.