This week was packed with interviews far and wide. And I learned a lot of new things! Let’s go through it, shall we?

Seldo Voss, Netlify

Laurie Voss (nickname Seldo) is a Data Evangelist at Netlify. That means that half of his job is to be a data analyst internally, and the other half is to write blog posts & give public talks such as this one:

https://www.youtube.com/watch?v=37GUq41zJok

Interviewing Seldo was super interesting, because it’s an insight into a high-velocity Silicon Valley company, that collects a ton of data and tries to make decisions based off of that data. This data can be how our customers use our platform, how happy they are with it, what are new trends we should be keeping an eye on, but also how competitors appear in the media or how many messages are sent in Slack at what time.

Seldo brought me a lot, a freaking lot of new insights in regards to my prototype:

humans are terribly bad at coming up with good expectations about data. here’s three examples:
- at all of the companies he worked for, somebody freaked out in february because key business metrics were down by 10% - while the reason was that february has just 10% less days than the average month. learning: always report in 28 day rolling windows.
- Netlify bills some customers for consumed bandwith, so our bandwidth calculation is very important. there was a bug where the size of HTTP headers was being ignored in that calculation - but because it was wrong since the beginning, nobody caught it for years, until the sales noticed some weirdly low bills!
- they once wanted to change how revenue was calculated. to verify their changes, they looked up the known total revenue for january 1st of 2019, as calculated by the finance team. surely that wouldn’t change, it was months ago! a couple weeks later, finance team found out that they made a typo when they recorded a contract into their system - the real revenue increased by 20cts.
when the aforementioned 2019 revenue was changed, they had that hard-coded sanity check at the very first step of the pipeline fail. this resulted in a full day of data downtime. learning: you seldom want to fail pipelines when expectations aren’t met, most things are fine with a warning.
when I asked Seldo about “evolving datasets”, he replied that this distinction is “delightfully academic” - even the datasets that you’d think are static will change all the time, and there is no such thing as static data. (reasearchers will probably disagree - this quote really showed me how different business data can be!)

Tal Gluck, Great Expectations

Tal Gluck

After spending some time last week to look at Great Expectations (GE), I reached out via their community slack with the following message:

Screenshot 2022-04-14 at 16.56.15.png

Tal reached out to me and offered to have a video call. He’s a developer advocate with Superconductive (the company behind GE), and we talked for about 30mins. Here’s the most important take-aways:

GE improves data quality. Tal mentions three big reasons for bad quality: wrong data (e.g. caused by keyboard data), wrong assumptions (e.g. analysts assuming that taxi fares are always negative, although that’s not true) and data drift (assumptions that were correct, but datasets changed)
there’s not a lot of commercial competition to GE
my impression about GE having a steep learning curve was echoed
- GE users tend to find initial setup to be the biggest challenge
- simplicity is top-of-mind for them, the cloud product they’re building should hopefully be easier to use (note from simon: and probably also easier to monetise)
research + non-engineering orgs haven’t yet been huge adopters of GE
- although the city of prague and some brazilian government have thought about using it
Tal referred to a project called “Test Driven Data Analysis”, which sounds very similar to what I’m building http://tdda.info