Category Archives: scalability

10 Key/Value Store, Distributed, Open Source Databases


  • Master-less, so remains operational even if multiple nodes fail
  • Near linear scalability
  • Architecture same of both large and small clusters
  • Key/value model, flat namespace, can store anything


  • Key/value. Can store data types such as sets, sorted lists, hashes and do operations on them such as set intersection and incrementing the value in a hash.
  • In-memory dataset
  • Easy to setup, master/slave replication


  • Very simple data model with 5 attributes: keys, values, timestamps, expiry date, flags for metadata
  • Chain replication across nodes that are geographically dispersed. Not single points of failure
  • Excellent performance for large batches (~200k) read/write operations
  • Runs on commodity hardware or blades. Does not require SAN


  • High performance, massively scalable, modeled after Google’s Bigtable
  • Runs on top of a distributed file system such as Apache Hadoop DFS, GlusterDS, or Kosmos File System
  • Data model is a traditional, but huge table, that is physically stored in sort order of the primary key


  • High scalability due to allowing only very simple key/value data access.
  • Used by LinkedIn
  • Not an object or a relational database. Just a big, distributed, fault-tolerant, persistent hash table
  • Includes in-memory caching, so separate caching tier isn’t required


  • High performance persistent storage that’s compatible with Memcache protocol


  • NoSQL database with messaging server
  • All data maintained in RAM. Persistence via a write ahead log.
  • Asynchronous replication and hot standby
  • Supports stored procedures
  • Data model: tuples (unique key plus any number of other fields); spaces (multiple tuples)

Apache Cassandra

  • Can use massive cluster of commodity servers with no single point of failure. Can be deploy across multiple data centers.
  • Was used by Facebook for Inbox Search until 2010
  • Read/write scales linearly with number of nodes
  • Data replicated across multiple nodes
  • Supports MapReduce, Pig, and Hive
  • Has SQL-like CQL providing for a hybrid between key/value and tabular database


  • NoSQL key/value that provides lower latency and higher throughput than some alternatives
  • Replicates data to multiple nodes
  • Very easy to administer and maintain
  • Data model: key plus zero or more attributes


  • Great performance even on small clusters with millions of keys
  • Nodes replicated via master-to-master replication.  Hot backups and restores
  • Very small client footprint
  • Built on top of Tokyo Tyrant


MapReduce vs Traditional RDDBMS

Traditional RDBMS MapReduce
Max Data Size per DB many gigabytes many petabytes
Access Method interactive & batch batch
Updates read & update many times read many times — write once
Schema Define before inital load of data, and difficult to make (and test?) extensive schema changesAlso has structured data, in which field entries are in a defined format such as XML or the type of a db column.

The keys/values are defined at the the time of data insert.

No schema required, and if schema then it can be changed dynamically without regression impact.Fields may be unstructured (free text, images) or semi-structured (spreadsheet or even this table, in which some description is implied by the row and column headings).

The keys/values are defined at the time of processing.

System Integrity/Availability Generally designed to be high.Recovery made more difficult due to multiple autonomous components (SAN, database cluster, applications server farm), that must communicate with each other. Assumed to be low.Failure is assumed to have occured, so computations occur on multiple redundant nodes whose results are sorted and merged, or are rescheduled.

Tasks generally do not have dependencies on each other in a shared-nothing architecture.

Scalability non-linear (cannot simply add additional cloned servers to the cluster forever) linear (can scale by factor of 10, 100, or even 1000 cloned nodes)
Normalization Database performs better if data is normalized. Database performs better is data is de-normalized since the data is scattered across multiple nodes. As de-normalized, all information is available within each block as it is read. Example is a webserver log, in which hostnames are specified in full each time.
Data locality Data is typically stored on a SAN, and access speed limited by fibre bandwidth. Additional processing may be done on an application server, in which bandwidth is limited by the datacenter LAN. Data is processed by CPU on the same computer which hosts the data on internal drives. Access speed is limited by internal bus bandwidth. This model is called “lack of data motion”.


HDFS fault tolerance

HDFS is fault tolerant. Each file is broken up into blocks, and each block must be written to more than one server. The number of servers is configurable, but three is the common configuration. Just as with RAID, this provides fault tolerance and increase retrieval performance.

When a block is read, its checksum indicates whether the block is valid or corrupted. If corrupted, and depending on the scope of the corruption, the block may be rewriten or the server may be taken out of the cluster and the blocks spread to other existing servers. If the cluster is running within an elastic cloud then either the server is healed or a new server is added.

Unlike high end SAN hardware which is architected to avoid failure, HDFS assumes that its low end equipment will fail so it has self-healing built into its operating model.

Cheap Hardware

In theory, a big data cluster uses low cost commodity hardware (2 CPUs, 6-12 drives, 32 GB RAM). By clustering many cheap machines, high performance can be achieved at a low cost, along with high reliability due to decentralization.

There is little benefit to running Hapdoop nodes in a virtualized environment (e.g VMWare), since when the node is active (batch processing) it may be pushing RAM and CPU utilization to its limits. This is in contrast to an application or database server which has idle and bursts, but generally has constant utilization at some medium level. What is of greater benefit is a cloud implementation (e.g. Amazon Elastic Cloud) in which one can scale from a few nodes to hundreds or thousands of nodes in real time as the batch cycles through its process.

Unlike a traditional n-tier architecture, Hadoop combines compute & storage on the same box. In contrast, an Oracle cluster would typically store its databases on a SAN, and application logic would reside on yet another set of application servers which probably do not utilize their inexpensive internal drives for application specific tasks.

A Hadoop cluster is linearly scalable, up to 4000 nodes and dozens of petabytes of data.

In a traditional db cluster (such as Oracle RAC), the architecture of the cluster should be designed with knowledge of the schema and volume (input and retrieval) of the data. WIth Hadoop, scalability is, at worst, linear. Using a cloud architecture, additional Hadoop nodes can be provisioned on the fly as node utilization increases or decreases.