Tuesday, July 7, 2009

Real-Time Search, A Computer Science Tutorial

The recent elevation of real-time search to buzzword prominence has made plain just how little the software and tech community understands about these types of systems. Databases focused on real-time search have been around a long time, usually called "stream" or "adaptive dataflow" systems. Here is a very short tutorial on the relationship between databases for real-time search (we will call them "stream databases") and more traditional relational database systems.

In a traditional database, you dynamically apply a constraint ("query") to a mostly static set of data. For a stream database, you reverse the relationship; data is dynamically applied to a mostly static set of constraints. In this way, the queries become the "data" in the database. Consider this a database analogue to the old computer science adage about equivalency of "data" and "code".

The implementation challenge this poses usually escapes immediate notice. Constraints are a spatial data type. As has been observed on this blog before, traditional access methods offer poor performance and scaling when applied to spatial data types. In other words, we can expect real-time search to scale about as well as spatial data types scale using relational access methods, which is to say far too poorly for Internet scale applications. Stream databases for "real-time search" have been sold by the major database vendors for many years, and the reason most people have never heard of them is that it is very hard to make them scale well enough for most applications.

So when I evaluate the recent barrage of real-time search startups, the first question that I always ask is very simple: How will it scale? Surprisingly, very little thought has usually been given to this question, and it seems that there is an assumption that real-time search has never really been tried rather than that it has been studied for decades by companies like IBM with little progress toward making it scale. Given the computer science history of real-time search I expect these startups will discover these problems very quickly, but I wonder how much money will be spent by investors before then.

No comments:

Post a Comment