Interesting!
1. How does it compare against alternatives?
2. Do you have benchmarks?
1. I would argue there are no "real alternatives". The two most proximate alternatives in feature space are Ibis and Snowpark.
- Ibis because while it can target multiple engines (as we state in our docs, we are built on and heavily reliant on Ibis), it aims to be "single engine, single session" in its execution in that nothing is expected to persist beyond the current session and an Ibis expression can only have a single engine. We want to be multi-engine and have some artifacts durable across sessions (by way of caching)
- Snowpark because it is sort of "multi-engine" by way of external functions or python stages, but locked to Snowflake. In some sense, we want to be Anypark: Snowpark like functionality but centered on whatever engine of choice is desired and performant interop with any other engines.
2. We don't have anything I would hold out as benchmarks yet. We don't aim to be "best in class" / the "fastest engine", we aim to be "in class" for as many operations as possible (we use the word performant). Our goal is to make it easy for an org to choose whichever engine(s) they feel most performant in when they consider the full space of {developer,computation} x {time,cost}. However, Hussain has demonstrated how having information from the "whole pipeline" available but execution deferred can allow for specialized optimization by way of predicate pushdowns (https://ibis-project.org/posts/udf-rewriting/)
Thanks for your interest and please feel free challenge any of the above or point us to anything you think we might have overlooked!
Best Dan
What size data is "in class" for Xorq? Can it process data out-of-core?
Yes, "we" are out of core to the extent that the engines used in the deferred expressions we execute are out-of-core (our "batteries-included" engine is a modified Datafusion).
We have previously demonstrated the capability of doing iterative batch training by way of our "batteries-included" engine. I'll try to post a reference later but need to run now due to family obligations.
this is an example of an out-of-core processing: https://www.xorq.dev/posts/trino-duckdb-asof-join
Anecdotally, TPC-H 10 TB is pretty doable now a days with DuckDB, so xorq goes as far as your engine may take you...