Cloud Bigtable

Cloud Bigtable is a fully managed scaleble, NoSQL big data database service for large analytical operational workloads
Scales to petabytes with consistent sub-10ms latency
Learns and adjusts to access patterns
Useful for machine learning applications
Powers Google services such as Search, Maps and Gmail
Supports high read and write throughput with low latency
Integration with popular big data tools like Hadoop, Cloud Dataflow with the HBase API
Consider Cloud Bigtable if you require
- > 1TB structured data
- Very high rate of writes
- read/write latency < 10ms
- Strong consistency
- Compatibility with the Hadoop HBase API

Storage Model

Stores data in massively scalable key-value sorted tables
Rows are indexed with a single key
Related columns are grouped into column-families
Each column is identitied by a combination of column-family and column-qualifier

In this table:
- Column Family = follows
- Column qualifiers are used as data e.g. tjefferson
- Tables are sparesely populated (not all cells have a value)
Each row/cell intersection can contain multiple cells (versions) at different timestamps, thereby providing a history

Bigtable architecture

All client requests go through a front end server
Nodes are organised into a Cloud Bigtable Cluster belonging to a Cloud Bigtable Instance – a container for the cluster
Cluster throughput can be increased by adding Nodes

Cloud Bigtable data is sharded into blocks of contiguous rows called tablets

Bigtable maintains data in lexicographic order by row
key. The row range for a table is dynamically partitioned.
Each row range is called a tablet, which is the unit of distribution and load balancing.
Bigtable: A Distributed Storage System for Structured Data

Tablets are stored on Colossus, Google’s file system, in SSTable format
An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings
Performance scales linearly with the number of nodes in a cluster

Load Balancing

Each zone is managed by a primary process, balancing workload and data
This process balances load by splitting larger and busy tablets in half
Conversely, smaller and less-busy tablets are merged, thereby reducing fragmentation
Balancing of traffic and split/merge activity is handle automatically