Item 43682013

adeptima • 6 days ago

Meilisearch is great, used it for a quick demo

However if you need a full-text search similar to Apache Lucene, my go-to options are based on Tantivy

Tantivy https://github.com/quickwit-oss/tantivy

Asian language, BM25 scoring, Natural query language, JSON fields indexing support are all must-have features for me

Quickwit - https://github.com/quickwit-oss/quickwit - https://quickwit.io/docs/get-started/quickstart

ParadeDB - https://github.com/paradedb/paradedb

I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Any thoughts on up-to-date hybrid search experience are greatly appreciated

jitl • 6 days ago

Quickwit was bought by Datadog, so I feel there's some risk quickwit-oss becomes unmaintained if Datadog's corporate priority shifts in the future, or OSS maintenance stops providing return on investment. Based on the Quickwit blog post, they are relicensing to Apache2 and releasing some enterprise features, so it seems very possible the original maintainers will move to other things, and it's unclear if enough community would coalesce to keep the project moving forward.

https://quickwit.io/blog/quickwit-joins-datadog#the-journey-...

1 reply

iambateman • 6 days ago

I have an implementation of Quickwit, so I've thought about this.

The latest version is stable and fast enough, that I think this won't be an issue for a while. It's the kind of thing that does what it needs to do, at least for me.

But I totally agree that the project is at risk, given the acquisition.

kk3 • 6 days ago

As far as combining full-text search with embedding vectors goes, Typesense has been building features around that - https://typesense.org/docs/28.0/api/vector-search.html

I haven't tried those features but I did try Meilisearch awhile back and I found Typesense to index much faster (which was a bottleneck for my particular use case) and also have many more features to control search/ranking. Although just to say, my use case was not typical for search and I'm sure Meilisearch has come a long way since then, so this is not to speak poorly of Meilisearch, just that Typesense is another great option.

3 replies

Kerollmops • 6 days ago

Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].

The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.

That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.

[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...

1 reply

kk3 • 6 days ago

Thank you for the response here. Not being able to upgrade the machine without completely re-indexing has actually become a huge issue for me. My use case is that I need to upgrade the machine to perform a big indexing operation that happens all at once and then after that reduce the machine resources. Typesense has future plans to persist the index to disk but it's not on the road map yet. And with the indexing improvements, Meilisearch may be a viable option for my use case now. I'll be checking this out!

irevoire • 6 days ago

I hate the way typesense are doing their « hybrid search ». It’s called fusion search and the idea is that you have no idea of how well the semantic and full text search are being doing, so you’re going to randomly mix them together without looking at all at the results both searches are returning.

I tried to explain them in an issue that in this state it was pretty much useless because you would always have one or the other search strategy that would give you awful results, but they basically said « some other engine are doing that as well so we won’t try to improve it » + a ton a justification instead of just admitting that this strategy is bad.

1 reply

jabo • 6 days ago

We generally tend to engage in in-depth conversations with our users.

But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.

For what it’s worth, the approach used in Typesense is called Reciprocal Rank Fusion (RRF) and it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.

1 reply

irevoire • 6 days ago

> But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.

Well, in this case I was just trying to be a normal user that want the best relevancy possible and couldn’t find a solution. But the reason why I couldn’t find it was not because you didn’t want to spend more time on my case, it was because typesense provide no solution to this problem.

> it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.

Yeah, cool or in other word « it’s bad, we know it and we can’t help you, but it’s the state of the art, you should instruct yourself ». But guess what, meilisearch may need some fine-tuning around your model etc, but in the end it gives you the tool to make a proper hybrid search that knows the quality of the results before mixing them.

If other people want to see the original issue: https://github.com/typesense/typesense/issues/1964

1 reply

spiderfarmer • 6 days ago

I think this is a good example of why people should disclose their background when commenting on competing products/projects. Even if the intentions were sound, which seems to be the case here, upfront disclosure would have given the conversation more weight and meaning.

jimmydoe • 5 days ago

+1 typesense is really fast. the only drawback is starting up is slow when index getting larger. the good thing is full text search (excl vector) is relatively stable feature set, so if your use case is just FTS, you won't need to restart very often for version upgrade.

inertiatic • 6 days ago

>I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Start off with ES or Vespa, probably. ES is not hard at all to get started with, IMO.

Try RRF - see how far that gets you for your use case. If it's not where you want to be, time to get thinking about what you're trying to do. Maybe a score multiplication gets you where you want to be - you can do it in Vespa I think, but you have to hack around the inability to express exactly that in ES.

navaed01 • 6 days ago

I’m using Typesense hybrid search, it does the job, well priced and is low-effort to implement. Feel free to ask any specific questions

1 reply

Kerollmops • 6 days ago

You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.

1 reply

yencabulator • 5 days ago

https://news.ycombinator.com/user?id=Kerollmops

> Meilisearch Co-Founder and Tech Lead.

You really should disclose your affiliation.

Epicism • 4 days ago

Try LanceDB https://github.com/lancedb/lancedb

It’s based off of the data fusion engine, has vector indexing and BM 25 indexing, has pipes on and rust bindings

Kerollmops • 6 days ago

> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].

[1]: https://wheretowatch.meilisearch.com/

1 reply

oulipo • 6 days ago

why couldn't it be possible to just embed Meilisearch/Tantivy/Quickwit inside Postgres as a plugin to simplify the setup?

1 reply

Kerollmops • 6 days ago

> [..] to simplify the setup?

It would be simpler to keep Meilisearch and its key-value store out of Postgres' WAL and stuff and better propose a good SQL exporter (in the plan).

1 reply

oulipo • 6 days ago

Perhaps on a technical level, but for a dev, if I just need to install Postgres and some plugins and, boom, I have a full searchable index, it's even easier