Databand an AI-based observability platform for data pipelines, specifically to detect when something is going wrong with a datasource when an engineer is using a disparate set of data management tools has closed a round of $14.5 million.
The Series A is being led by Accel with participation from Blumberg Capital, Lerer Hippeau, Ubiquity Ventures, Differential Ventures, and Bessemer Venture Partners. Blumberg led the company’s seed round in 2018. It has now raised around $18.5 million and is not disclosing valuation.
The problem that Databand is solving is one that is getting more urgent and problematic by the day (as evidenced by this exponential yearly rise in zettabytes of data globally). And as data workloads continue to grow in size and use, they continue to become ever more complex.
On top of that, today there are a wide range of applications and platforms that a typical organization will use to manage source material, storage, usage and so on. That means when there are glitches in any one data source, it can be a challenge to identify where and what the issue can be. Doing so manually can be time-consuming, if not impossible.
“Our users were in a constant battle with ETL (extract transform load) logic,” said Benamram, who spoke to me from New York (the company is based both there and in Tel Aviv, and also has developers and operations in Kiev). “Users didn’t know how to organize their tools and systems to produce reliable data products.”
It is really hard to focus attention on failures, he said, when engineers are balancing analytics dashboards, how machine models are performing, and other demands on their time; and that’s before considering when and if a data supplier might have changed an API at some point, which might also throw the data source completely off.
And if you’ve ever been on the receiving end of that data, you know how frustrating (and perhaps more seriously, disastrous) bad data can be. Benamram said that it’s not uncommon for engineers to completely miss anomalies and for them to only have been brought to their attention by “CEO’s looking at their dashboards and suddenly thinking something is off.” Not a great scenario.
Databand’s approach is to use big data to better handle big data: it crunches various pieces of information, including pipeline metadata like logs, runtime info, and data profiles, along with information from Airflow, Spark, Snowflake, and other sources, and puts the resulting data into a single platform, to give engineers a single view of what’s happening better see where bottlenecks or anomalies are appearing, and why.