Databases: Are We There Yet?

SUMMARY

Rangel Spasov, founder of Saberstack and former CTO in tech sectors, presents at Clojure/Conj 2025 on incremental computation for databases, critiquing traditional systems' inefficiencies with growing data and proposing DBSP formalism integrated with Datomic for instantaneous queries.

STATEMENTS

Traditional databases recompute entire queries from scratch each time, causing performance to degrade as data volume increases, particularly with operations like JOINs.
Scaling user bases often leads companies to add data warehouses, queues, and additional services, resulting in high system complexity and eventual financial strain.
Database queries are compiled from declarative languages like SQL or Datalog into pure functions that process the full dataset without reusing prior computations.
Incremental computation processes only data deltas to maintain and update query results efficiently, avoiding redundant work on unchanged portions.
Datomic's immutability and exposed transaction log make it exceptionally suitable for incremental query engines compared to other databases.
With incremental approaches, query execution time scales with the size of updates rather than total data, enabling consistent millisecond responses in most scenarios.
The speaker's open-source Clojure library implements DBSP circuits as transducers specifically for Datomic, allowing complex queries on large datasets to run instantly.
Zipper sets combined with transducers form the core mechanism for accurate incremental query processing in this system.

IDEAS

Audience poll reveals far more love for Clojure than for databases, highlighting widespread frustration with current data systems.
Fictional company story illustrates how initial simple setups with Postgres evolve into tangled ecosystems of warehouses and reverse ETL services as user demands grow.
Heavy analytics queries, like scanning a year's data at peak hours, can unexpectedly crash operational databases without clear root causes.
Decisions to bypass primary databases and write directly to warehouses create data silos, forcing awkward reverse data flows for real-time features.
Traditional query engines treat nearly identical datasets as entirely new, wasting computation on unchanged elements despite minor additions.
Columnar storage represents one rare evolution in databases, but popular systems remain fundamentally unchanged in core execution logic.
Incremental computation decouples query speed from total data size, transforming databases into systems that handle evolving data streams efficiently.
Any database providing a total order of updates—such as transactions—can integrate incremental techniques, making it a versatile paradigm shift.
Datomic stands out for incremental work because its design inherently supports immutable histories and log exposure, unlike mutable alternatives like MySQL.
Open-source development of Datomic's incremental engine involves weekly GitHub updates, demonstrating rapid iteration in Clojure-based tools.
Zipper sets, akin to efficient data structures for nested updates, pair powerfully with transducers to enable precise delta-based recomputation.
Eliminating auxiliary services like data warehouses through incrementality restores simplicity, allowing single-node execution for complex analytics.

INSIGHTS

Persistent inefficiencies in databases stem not from data growth itself but from the failure to reuse computations across similar states, trapping organizations in escalating complexity.
The allure of specialized tools like warehouses masks a deeper flaw: query engines that ignore temporal continuity, turning minor changes into full-scale recomputations.
Immutability in systems like Datomic unlocks a paradigm where data evolution becomes an asset, transforming logs into blueprints for efficient, targeted updates.
Incremental formalism reveals that most real-world updates involve tiny deltas, rendering total-data scaling obsolete and millisecond latencies achievable universally.
Clojure's transducers bridge theoretical incrementality with practical implementation, proving functional programming's edge in handling stateful data flows elegantly.
By prioritizing delta processing over wholesale restarts, databases can reclaim their role as responsive cores, obviating the need for fragmented service architectures.

QUOTES

"Who here loves their database? Some, yes, but significantly less people love their database than their programming language."
"Postgres works great up to the point where it doesn't. So whether somebody issued a heavy analytics query at 5:00 p.m... our database is down."
"Most databases have not really changed since they were first invented... in terms of query execution they all start from scratch every time."
"Incremental computation enables efficient reuse between query runs. So we don't do the same work over and over again."
"Datomic in my experience is by far the best system I've seen that is the best fit for incremental computation."
"You can achieve single milliseconds response times reliably all the time."
"Transducers and zets are the workhorses that make a correct incremental query computation possible."

HABITS

Solve complex real-time system problems using Clojure consistently since 2013 to handle millions of requests per minute.
Begin product development with straightforward setups combining a web server and primary database like Postgres for initial user onboarding.
Monitor and respond to database failures by systematically adding queues and warehouses rather than overhauling core query logic.
Maintain active open-source development by pushing weekly updates to GitHub repositories for incremental tools.
Engage conference audiences with interactive polls to gauge sentiments on tools before diving into technical critiques.

FACTS

The talk was recorded on November 13, 2025, at Clojure/Conj in Charlotte, North Carolina, lasting eight minutes.
Speaker Rangel Spasov served as CTO in ad tech, gaming, and e-commerce firms managing real-time systems for millions of users.
Traditional APIs require responses in 60 milliseconds, while data warehouses often take 10 to 60 seconds for similar queries.
Databases compile declarative SQL or Datalog into pure functions that process entire datasets, with execution times varying from 50 milliseconds to over 50 minutes.
Incremental computation requires only a total order of updates, such as Datomic's transaction log, to function across diverse systems.

REFERENCES

Datomic database system, praised for immutability and transaction log exposure.
Postgres as a primary relational database example in scaling stories.
Databricks warehouse, added for AI capabilities and direct data writes.
SQL and Datalog as declarative query languages.
DBSP formalism for incremental computation circuits.
Clojure library implementing DBSP as transducers targeting Datomic.
Zipper sets (zets) data structure paired with transducers.
Open-source incremental query engine on GitHub.

HOW TO APPLY

Identify bottlenecks in your current database by polling teams on frustrations and tracing failures to heavy queries during peak loads.
Start with a simple architecture using a primary database like Postgres connected to a web server for handling initial user requests and writes.
When scaling reveals slowdowns, introduce a data warehouse and queue to offload analytics without disrupting operational API responses.
For data trapped in warehouses, build a reverse ETL service to copy and transform it back to the primary database for real-time feature development.
Transition to incremental computation by ensuring your database exposes a total order of updates, then integrate tools like DBSP circuits via transducers for delta processing.

ONE-SENTENCE TAKEAWAY

Embrace incremental computation to process only data changes, revolutionizing databases for instant, scalable queries without complex pipelines.

RECOMMENDATIONS

Integrate incremental query engines with Datomic to leverage its immutability for efficient large-scale analytics.
Avoid direct writes to warehouses; maintain data flow through primary databases to prevent silos and reverse ETL needs.
Explore Clojure transducers for implementing DBSP, enabling reusable circuits on any update-ordered system.
Monitor open-source GitHub projects for weekly incremental tool updates to stay ahead in real-time data handling.
Replace proliferating services with single-node incremental setups, focusing efforts on delta-based optimizations for millisecond responses.

MEMO

In the bustling world of software engineering, where Clojure enthusiasts gather annually, Rangel Spasov took the stage at Clojure/Conj 2025 to deliver a pointed critique of modern databases. With a wry audience poll exposing the chasm between affection for programming languages and exasperation with data stores, Spasov painted a vivid "fictional" tale of a startup's descent into data chaos. What begins as a sleek Postgres-backed API serving hundreds of requests per second unravels under growth: a rogue analytics query crashes the system at rush hour, prompting a frantic cascade of warehouses, queues, and even direct writes to tools like Databricks. Soon, engineers are tangled in reverse ETL pipelines, shuttling data back for features that demand sub-second responses—leaving the company mired in complexity and depleted funds.

Delving into history, Spasov unpacked the unchanging heart of databases. Since their inception, these systems have treated every query as a fresh start, compiling SQL or Datalog into pure functions that chew through entire datasets afresh. Add a single row, and the engine ignores the 99.9% overlap, recalculating from zero—whether it takes 50 milliseconds or 50 minutes. Innovations like columnar storage offer tweaks, but popular engines, from MySQL to Postgres, cling to this scratch-built ethos. Spasov posed the inefficiency starkly: Why recompute the unchanged when data evolves incrementally in real life?

Looking ahead, Spasov championed incremental computation as the antidote, a general technique that processes only deltas—the differences between states—to update results with surgical precision. No longer beholden to total data volume, queries scale with change size, delivering reliable milliseconds even on massive datasets. This formalism, embodied in DBSP, thrives especially with Datomic, whose immutable logs provide the perfect ordered chronicle of updates. Spasov's own open-source Clojure library turns these circuits into transducers, zipping through zipper sets (or "zets") to enable single-node miracles over complex joins.

The implications ripple beyond tech stacks. By sidelining warehouses and acronym-laden services, incremental approaches restore simplicity, letting databases reclaim their centrality. Spasov, drawing from CTO stints handling millions of users across ad tech and gaming, urged adoption: Datomic fits best, but the paradigm suits graphs, warehouses, or anything with update ordering. As weekly GitHub pushes advance his engine, the message is clear—databases need not lag behind; they can evolve to match the fluid reality of data. In a field obsessed with speed, this could be the quiet revolution that keeps APIs humming and companies thriving.