Okay, I admit, this was a clickbait title designed to make you open this email very fast. Sorry not sorry.

Now that you’re here, let me tell you about what I did this week!

A closer look at Great Expectations

I spent a couple hours watching Great Expectations (GE) introduction + training videos, and gave a close look at how they’re positioning the framework.

These are my main take-aways:

  1. They’re positioning GE as a “data quality tool” for data engineering usecases within engineering-heavy organisations. From what I’ve seen, these orgs tend to be easier to sell to, so that makes sense.
  2. GE has an impressive library of supported statistical tests. It’s really comprehensive.
  3. GE is heavy on terminology. There’s a 4-minute video where the main author goes through it, and it is very dense. This can hurt adoption, especially in orgs without big engineering resources (read: orgs like NetCheck).

For engineering-heavy orgs, I’d argue that “just going with GE” is a good call, and there’s no need for another solution in the space. It has a steep learning curve, but it’s worth it if you have the resources.

OTOH, I don’t see companies like NetCheck adopting it anytime soon. The learning curve is too steep, they do not profit from GE’s ecosystem integrations, and its workflow is too unfamiliar.

If I end up building a prototype for exactly this problem, this means that it’d be a good idea to focus on the following:

  1. easy to integrate in existing workflows
  2. little terminology
  3. accessible learning curve

I’ve started listing some ideas + thoughts for such a prototype in here:

Prototype Product Ideas

Apart from Great Expectations, the only competing effort I could find was Monte Carlo, which is oriented even more towards data engineering + operations, and (supposedly) not of big help for research applications.

I reached out to the Great Expectations team to get an interview with some of them, but haven’t heard back yet.

Interview with Elizabeth