Fun idea storing embeddings in inodes! Very clever!
I want to point out that this isn’t suitable for any kind of actual things you’d use a vector database for. There’s no notion of a search index. It’s always a O(N) linear search through all of your files: https://github.com/perone/vectorvfs/blob/main/vectorvfs/cli....
Still, fun idea :)
The lack of an index is not bad at all if you have it stored contiguously in RAM: the mechanical sympathy is great, SIMD will spin like a top not to mention multithreaded programming, etc. Circa 2014 or so I worked on a search engine that scanned maybe 2GB worth of vectors for 10 million documents, queries were turned around in much less than a second, nobody complained about the speed.
If you gotta gather the data from a lot of different inodes, it is a different story.
Thanks. There is a bit of a nuance there, for example: you can build an index in first pass which will indeed be linear, but then later keep it in an open prompt for subsequent queries, I'm planning to implement that mode soon. But agree, it is not intended to search 10 million files, but you seldom have this use case in local use anyways.
O(n) is still OK for vector search if n isn't too large. Filesystem search solutions are currently terrible, with background indexing jobs and poor relevance. This won't scale for every file on your system but anything in your working documents folder would easily work well.
An index could be built on top of this though if desired. No need to have it in the FS itself.
But then there's no point in storing anything in xattrs.
The reason would be that it's there as the source of truth, and when files e.g. get copied around, so does the metadata. The indexer doesn't need to be synchronous wrt such operations though, it can just watch the FS for changes and spin up reindexing as needed asynchronously.