Item 43682688

geoHeil • 5 days ago

As some others already posted: There is more to copying data than just moving it. It is about observability. A lot of companies have created their own frameworks.

One example I find useful for a lot of usecases is dagster. You can define resources to encapsulate complexity https://docs.dagster.io/guides/build/external-resources in fact with components build on top of custom DSLs https://docs.dagster.io/guides/labs/components

At Magenta/Telekom we are bulding on https://georgheiler.com/event/magenta-data-architecture-25/ - you can follow along with this template here https://github.com/l-mds/local-data-stack/ you may find these examples useful to understand how to use dagster/graph-based data pipeliens @scale

codingmoh • 4 days ago

Dagster is definitely one of the more polished orchestrators out there, and I like how it embraces the idea of resources and custom DSL-style components. It’s also cool that you’re using it at Magenta/Telekom at scale.

The only caution I have is that many orchestrators (Dagster, Airflow,...) are typically task-level or graph-level frameworks: they do a great job of linking up big stages of a pipeline, but I’ve found that you still end up writing quite a bit of manual or ad-hoc code. That’s not necessarily a knock on Dagster—it’s just the reality that orchestrators focus on coordinating tasks, rather than giving you a row-by-row DSL with robust side-effect semantics.

Still, those links you shared look super useful for seeing how Dagster can be extended with domain-specific abstractions. Appreciate you pointing those out — I’ll check them out