Item 43897289

perone • 4 days ago

Hi, it is quite different, there is no LLM involved, we can certainly use it for a RAG for example, but what is currently implemented is basically a way to generate embeddings (vector representation) which are then used for search later, it is all offline and local (no data is ever sent to cloud from your files).

jlhawn • 4 days ago

I understand that LLMs aren't involved in generating the embeddings and adding the xattrs. I was just wondering what the value add of this is if there's no other background process (like mds on macOS) which is using it to build a search index.

I guess what I'm asking is: how does VectorVFS enable search besides iterating through all files and iteratively comparing file embeddings with the embedding of a search query? The project description says "efficient and semantically searchable" and "eliminating the need for external index files or services" but I can't think of any more efficient way to do a search without literally walking the entire filesystem tree to look for the file with the most similar vector.

Edit: reading the docs [1] confirmed this. The `vfs search TERM DIRECTORY` command:

> will automatically iterate over all files in the folder, look for supported files and then embed the file or load existing embeddings directly from the filesystem."

[1]: https://vectorvfs.readthedocs.io/en/latest/usage.html#vfs-se...

1 reply

freeamz • 3 days ago

Yeah this kind of setup is indefinitely scaleable, but not searchable without out a meta db/index keeping track of all the nodes.

pilooch • 4 days ago

Using it for a RAG is smart indeed, especially with a multimodal encoder (vision-rag), as the implementation would be straightforward from what you already have.