Materialization is costly - it's usually the only thing forcing the DB to actually read the data on disk, for any columns it wasn't filtering by. But other processes internal to the DB may sometimes require data to be materialized as well. Streaming data out of the DB to the user requires serialization, and serialization requires materialization So whatever final shape the data returned from your outermost query has when it "leaves" the DB, that data will always get materialized.
no work (especially no IO) actually happens, because no data from the inner query needs to be materialized to resolve the outer query. SELECT false FROM (SELECT * FROM foo WHERE x = 1)
It's the master screw that forces all the activity dependent on it - that would otherwise stay abstract - to "really happen."ĭatabases don't materialize anything unless they're forced to. Materialization is the "enumeration" in a Streams abstraction, or the "thunk" in a lazy-evaluation language. To materialize something is to turn it from symbolic-expression form, into "hard" data - a result-set of in-memory row-tuples. The inner query from the view isn't "compiled" - forced to be in some shape - but rather sits there in symbolic form, "pasted" into your query, where the query planner can then manipulate and optimize/streamline it further, to suit the needs of the outer query. It's especially important to note that the `y + 1` expression from the view definition isn't computed in this query.
which, with efficient query planning, boils down to SELECT a FROM (SELECT x * 2 AS a, y + 1 AS b FROM foo) A database view is a "stored query" that presents itself as a table, that you can further query against.Īnd then you `SELECT a FROM bar`, then the "question" you're really asking is just: As its capabilities expand beyond parity with SQL (though I agree that's absolutely the best place for them to start and optimize), there are tremendous wins here that could power the next generation of real-time systems.ĮDIT: some clarifications and additional examples On the other hand, Materialize could allow businesses to realize this architecture, with its vast benefits to millisecond-scale data freshness and analytical flexibility, simply by writing SQL queries as if it was a traditional system.
#Materialize code#
(At my company it's required viewing for engineers across our stack, because every data structure is a materialized view no matter where on frontend or backend that data structure lives.)Ĭonfluent is building an incredible business helping companies to build these types of systems on top of Kafka, Samza, and architectural principles originally developed at LinkedIn, but more along the lines of "if you'd like this query to be answered, or this recommender system to be deployed for every user, we can reliably code a data pipeline to do so at LinkedIn scale" than "you can run this query right away against our OLAP warehouse without knowing about distributed systems." (If it's more nuanced than this please correct me!) For a primer on materialized views, and one of the key rationales for Materialize's existence, there's no better presentation than Martin Kleppman's "Turning the Database Inside-Out" (2015).