Saturday, July 20, 2013

There is no such thing as NoSQL

You might have heard about that new NoSQL thing. You wonder

  • Is NoSQL faster than SQL?
  • Can NoSQL replace my existing SQL database?
  • Will NoSQL fix my problem with my SQL database?
  • Is NoSQL an option for my next project?

Unfortunately this article can't give you an answer, because these questions are wrong. What's wrong about them? You see, the problem is this:

There is no such thing as NoSQL!

NoSQL is a buzzword.

For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.

But in the past few years, people started to question this dogma.  People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.

The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.

So what do NoSQL databases have in common?

Actually, not much.

You often hear phrases like:
  • NoSQL is scalable!
  • NoSQL is for BigData!
  • NoSQL violates ACID!
  • NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.

So what sets NoSQL databases apart?

So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:


Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development

Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.

Graph databases

Examples: Neo4j, GiraffeDB.
Strengths: Data Mining

Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.

Key-Value Stores

Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys

They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.

Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.

Is SQL obsolete?

Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.

Are they right?

No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.

NoSQL databases aren't a replacement for SQL - they are an alternative.

Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.

Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.

Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.

No comments:

Post a Comment