- Cloud Bigtable is a fully managed scaleble, NoSQL big data database service for large analytical operational workloads
- Scales to petabytes with consistent
sub-10ms latency
- Learns and adjusts to access patterns
- Useful for machine learning applications
- Powers Google services such as Search, Maps and Gmail
- Supports high read and write throughput with low latency
- Integration with popular big data tools like Hadoop, Cloud Dataflow with the HBase API
- Consider Cloud Bigtable if you require
- > 1TB structured data
- Very high rate of writes
- read/write latency < 10ms
- Strong consistency
- Compatibility with the Hadoop HBase API
Storage Model
- Stores data in massively scalable key-value sorted tables
- Rows are indexed with a single key
- Related columns are grouped into
column-families
- Each column is identitied by a combination of
column-family
andcolumn-qualifier

- In this table:
- Column Family =
follows
- Column qualifiers are used as data e.g.
tjefferson
- Tables are sparesely populated (not all cells have a value)
- Column Family =
- Each row/cell intersection can contain multiple cells (versions) at different timestamps, thereby providing a history
Bigtable architecture
- All client requests go through a front end server
- Nodes are organised into a
Cloud Bigtable Cluster
belonging to aCloud Bigtable Instance
– a container for the cluster - Cluster throughput can be increased by adding Nodes
- Cloud Bigtable data is sharded into blocks of contiguous rows called
tablets
Bigtable maintains data in lexicographic order by row
Bigtable: A Distributed Storage System for Structured Data
key. The row range for a table is dynamically partitioned.
Each row range is called a tablet, which is the unit of distribution and load balancing.
- Tablets are stored on Colossus, Google’s file system, in SSTable format
- An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings
- Performance scales linearly with the number of nodes in a cluster
Load Balancing
- Each zone is managed by a primary process, balancing workload and data
- This process balances load by splitting larger and busy tablets in half
- Conversely, smaller and less-busy tablets are merged, thereby reducing fragmentation
- Balancing of traffic and split/merge activity is handle automatically