0% found this document useful (0 votes)

8 views30 pages

Understanding Column-Family Stores

Column-family stores, also known as wide-column or columnar stores, organize data in rows with many columns associated with the same key, grouped into column families for efficient access. Examples include Google BigTable, HBase, and Cassandra, which support large-scale distributed storage and can handle billions of rows and columns. The data model allows for dynamic column addition, time-based versioning, and efficient data retrieval through a hierarchical structure of rows, column families, and columns.

Uploaded by

Abeer Mahmoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views30 pages

Understanding Column-Family Stores

Uploaded by

Abeer Mahmoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Column-Family

Stores
Introduction
Basics
• AKA (Also Know As) Wide-column, columnar

• Data model

• rows that have many columns, all associated with the

same key

• Columns family

• is group of related columns

• often accessed together

Data Model: Column

• column is the basic data item

• represented as 3-tuple

• column name

• value

• timestamp
Data Model: Row
// row
"martin-fowler" : {
firstName: "Martin",
lastName: "Fowler",
• Row is a collection of columns
location: "Boston"
attached to the same row key
}
• columns can be added to
any row at anytime without
having to add for all rows
// row
"Jack-fowler" : {
firstName: "Jack",
lastName: "Fowler",
}
Data Model: Column Family
(CF)

• CF is a set of columns
containing related data

• For example, UserInfo in table

User is a column family
Data Model: Interpretation 1

• Each column family is

equivalent to a relational table

• Column family is considered a

map of map

• Map<rowKey,
Map<columnKey,
columnValue>>
Data Model: Interpretation
(Visual)
Column-family Stores
• Representative

• Cassandra

• BigTable

• HBase

• Hypertable

• Accumulo
Ranked list: [Link]
BigTable

• Google’s paper:
Chang, F. et al. (2008). Bigtable: A Distributed Storage System for
Structured Data. ACM TOCS, 26(2), pp 1–26.

• Data model: column family

• Multi-dimensional map

• (row:string, column:string, timestamp:int64) —> value

HBase
• An open source implementation of Google BigTable

• Initial release: 2008

• Implementation: Java

• Runs on top of Hadoop Distributed File System

• Operating systems: linux, windows (you need Cygwin)

• can handle billions of rows with millions of columns

Cassandra
• Developed at Facebook

• Initial release in 2008, stable release in 2013

• Written in Java

• Operations

• Cassandra Query Language (CQL)

• MapReduce support (can cooperate with Hadoop)

BigTable & HBase in
Details
BigTable: Overview
• Highly available distributed storage

• One big table distributed on multiple nodes

• Built with semi-structured data in mind

• billion of URLs with their content over time (versions)

• users data: profiles, preferences, queries

• Geographical data: road map, satellite images

BigTable: Large Scale

• Petabytes of data across thousands of computers

• Billions of users

• thousands of queries per second

BigTable: Uses
• At Google used for

• Google analytic

• a web service tool that track and analyze web traffic

• Google Finance

• stock information

• Personalized Search

• based on users preferences show personalized search

• Youtube

• Google Earth & Map

• and more
A big table

• Appears as one table

• characteristics:

• sparse

• distributed

• multi dimensional map

Characteristics

• A map

• Is an associative array

• values can be quickly looked up for a given key

• key identifies the row

• value identifies set of columns

Characteristics
• Persistent

• Data stays on disk after operations finish

• Data is stored persistently on disk

• Distributed

• BigTable data is distributed across multiple machines

• BigTable runs on top of Google File System (GFS)

• The table is split based on the rows

Characteristics
• Sparse

• different rows have different columns

• some columns may be empty for some rows

• Sorted

• BigTable uses associative array (which is not sorted)

• but BigTable sort based on rows

• related data will be adjacent

• For example if we want to store all pages of the same domain, then we reverse the url
and use it as row key

• [Link] all pages from edu domain will be adjacent

Characteristics
• Multidimensional

• Table is indexed by row

• a Table contains one or more column families

• at least one column family is created when the table is created

• a column family can contain various number of columns which are

usually related

• columns in the column family can be created on the fly

• three level of hierarchy; row, column family, and column

Characteristics

• Time based

• another dimension in BigTable

• keep multiple versions of the same data

• at retrieval time (search) if the time is not specified then

the latest version will be returned
Example
Table Model
• (row, column, timestamp) -> cell content

• Multiple versions of the same cell

Rows & Partitioning
• A table is split among rows into sub tables called (Tablets)

• A tablet

• consist of a set of consecutive rows

• it is the unit of data distribution & load balancing

• stored on one server

• rows are sorted by key

• reading is efficient

• single row or scan range of rows

• designing the row is very important as it determines how data is stored

• influence read performance, and data distribution

Table splitting
• A table start as one tablet

• As it grows it will be split into tablets (100 - 200 MB each)

• can be configured
Splitting a Tablet
Columns & Column Families
• Column Family

• group of columns

• basic unit of data access

• data is of the same type (related)

• data in the same column family is compressed together

• operations:

• create a column family

• store data in any column key

• Table can have unlimited number of column families

• Number of columns in the same column family is up to hundreds

• identified by➔ family:qualifier

Example
• Web pages table

• store web pages and their information in a table

• use URL as the row key

• several column families to store various attributes of Web pages

• content (multiple content over time)

• anchors (multiple anchors)

• language single value

Example (cont.)
• Three column families

• “content:” — content of the web page

• “language:” — language of the web page

• “anchors” — pages referencing the web page (in the row key)

• use url of page as column name

• and the cell value is anchor text

• ancho is an example of dynamic column family

Timestamps
• Each column family cell may contain multiple versions of content

• In the previous example, same URL may have different content over time

• Timestamp is identified by 64-bits

• which represent real time, time when the cell was added

• or manually specified by the user

• At retrieval time

• most recent version will be returned if no timestamp is specified

• if the timestamp is specified

• if no exact match then the latest version that is earlier than the specified timestamp

Overview of Column-Family Stores
No ratings yet
Overview of Column-Family Stores
32 pages
RDBMS to NoSQL: Understanding Column Stores
No ratings yet
RDBMS to NoSQL: Understanding Column Stores
67 pages
Bigtable: Distributed Storage System Overview
No ratings yet
Bigtable: Distributed Storage System Overview
14 pages
Understanding Column Family Databases
No ratings yet
Understanding Column Family Databases
59 pages
Column-Family Stores Overview
No ratings yet
Column-Family Stores Overview
46 pages
BigTable: Google’s Scalable Storage Solution
No ratings yet
BigTable: Google’s Scalable Storage Solution
23 pages
Bigtable: Scalable Data Storage System
No ratings yet
Bigtable: Scalable Data Storage System
24 pages
Bigtable: A Distributed Storage System For Structured Data
100% (1)
Bigtable: A Distributed Storage System For Structured Data
26 pages
Overview of Google Cloud Bigtable
No ratings yet
Overview of Google Cloud Bigtable
10 pages
Bigtable: Scalable Data Storage Solution
No ratings yet
Bigtable: Scalable Data Storage Solution
8 pages
Understanding BigTable Architecture
No ratings yet
Understanding BigTable Architecture
9 pages
Understanding Google Bigtable System
No ratings yet
Understanding Google Bigtable System
3 pages
Overview of HBase NoSQL Database
No ratings yet
Overview of HBase NoSQL Database
77 pages
Google Bigtable: Scalable Data Storage
100% (1)
Google Bigtable: Scalable Data Storage
4 pages
HBase: History and Key Features
No ratings yet
HBase: History and Key Features
32 pages
HBase: A Columnar Database Overview
No ratings yet
HBase: A Columnar Database Overview
18 pages
Bigtable: Scalable Distributed Storage System
No ratings yet
Bigtable: Scalable Distributed Storage System
12 pages
NoSQL Columnar Databases Overview
No ratings yet
NoSQL Columnar Databases Overview
39 pages
Bigtable Storage and Backup Overview
No ratings yet
Bigtable Storage and Backup Overview
18 pages
Overview of Column Family Databases
No ratings yet
Overview of Column Family Databases
34 pages
Understanding NoSQL Database Structures
No ratings yet
Understanding NoSQL Database Structures
15 pages
Overview of BigTable and Cloud Services
No ratings yet
Overview of BigTable and Cloud Services
21 pages
Big Data Analytics: HBase & Cassandra Guide
No ratings yet
Big Data Analytics: HBase & Cassandra Guide
52 pages
HBase: ACID Key-Value Data Model
No ratings yet
HBase: ACID Key-Value Data Model
108 pages
Understanding Columnar Data Models
No ratings yet
Understanding Columnar Data Models
21 pages
Bigtable Schema Design Best Practices
No ratings yet
Bigtable Schema Design Best Practices
14 pages
HBase Implementation Guide
No ratings yet
HBase Implementation Guide
46 pages
BigTable: Scalable Data Storage System
No ratings yet
BigTable: Scalable Data Storage System
9 pages
Wide Column Stores: Features & Examples
No ratings yet
Wide Column Stores: Features & Examples
7 pages
Nosql Products: It Giants Perspectives: Shagufta Praveen
100% (1)
Nosql Products: It Giants Perspectives: Shagufta Praveen
10 pages
Google Cloud Bigtable Overview
100% (1)
Google Cloud Bigtable Overview
18 pages
NoSQL Databases and HBase Overview
No ratings yet
NoSQL Databases and HBase Overview
28 pages
IN402 Unit 09 Module 06 Part 1 Code Unlocked
No ratings yet
IN402 Unit 09 Module 06 Part 1 Code Unlocked
17 pages
NoSQL Data Models: Key-Value & Wide-Column
No ratings yet
NoSQL Data Models: Key-Value & Wide-Column
32 pages
HBase: Overview and Architecture Explained
No ratings yet
HBase: Overview and Architecture Explained
33 pages
Big Data Fundamentals: Part - 1
No ratings yet
Big Data Fundamentals: Part - 1
29 pages
HBase Overview and Scaling Insights
No ratings yet
HBase Overview and Scaling Insights
32 pages
Bigtable: Scalable Distributed Storage
No ratings yet
Bigtable: Scalable Distributed Storage
26 pages
Overview of Google Bigtable System
No ratings yet
Overview of Google Bigtable System
6 pages
HBase Data Model and Implementation Guide
No ratings yet
HBase Data Model and Implementation Guide
20 pages
HBase Data Model and Implementation Guide
No ratings yet
HBase Data Model and Implementation Guide
47 pages
Overview of HBase NoSQL Database
No ratings yet
Overview of HBase NoSQL Database
53 pages
Columnar
No ratings yet
Columnar
4 pages
Bigtable: A Distributed Storage System
No ratings yet
Bigtable: A Distributed Storage System
16 pages
HBase Table Structure and Column Families
No ratings yet
HBase Table Structure and Column Families
39 pages
Understanding NoSQL and HBase Concepts
No ratings yet
Understanding NoSQL and HBase Concepts
8 pages
Bigtable: Scalable Distributed Storage System
No ratings yet
Bigtable: Scalable Distributed Storage System
26 pages
HBase Data Model and Implementation Overview
No ratings yet
HBase Data Model and Implementation Overview
34 pages
Unit-4 NoSQL Data Managementhhhhhhhhhhhhhhhhh
No ratings yet
Unit-4 NoSQL Data Managementhhhhhhhhhhhhhhhhh
68 pages
Overview of Google Cloud Bigtable
No ratings yet
Overview of Google Cloud Bigtable
21 pages
Understanding Column-Oriented Databases
No ratings yet
Understanding Column-Oriented Databases
27 pages
Understanding Hypertable Database
100% (2)
Understanding Hypertable Database
37 pages
Understanding SAP HANA Basics
No ratings yet
Understanding SAP HANA Basics
248 pages
Overview of Column Family Databases
No ratings yet
Overview of Column Family Databases
35 pages
5 columnDBs
No ratings yet
5 columnDBs
49 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Distributed Storage and NoSQL Insights
No ratings yet
Distributed Storage and NoSQL Insights
26 pages
Advanced HTML Features Explained
No ratings yet
Advanced HTML Features Explained
3 pages
Tue Madsen IR Pack Overview
No ratings yet
Tue Madsen IR Pack Overview
8 pages
RT4000 Calibration Guide
No ratings yet
RT4000 Calibration Guide
2 pages
Software Testing Fundamentals in COMP201
No ratings yet
Software Testing Fundamentals in COMP201
36 pages
Data Structures: Concepts and Applications
No ratings yet
Data Structures: Concepts and Applications
11 pages
Understanding Internet Protocols and IPv4
No ratings yet
Understanding Internet Protocols and IPv4
73 pages
HU Clinical Management System Project
No ratings yet
HU Clinical Management System Project
31 pages
ModBerry 500: Industrial Computer Overview
No ratings yet
ModBerry 500: Industrial Computer Overview
6 pages
Dahua Device Diagnostic Tool Manual
No ratings yet
Dahua Device Diagnostic Tool Manual
25 pages
MacBook Air M1: Power for Students
No ratings yet
MacBook Air M1: Power for Students
3 pages
Computer Data Representation Basics
No ratings yet
Computer Data Representation Basics
46 pages
Management Information Systems Overview
No ratings yet
Management Information Systems Overview
175 pages
CCNA Cisco Solutions Course Overview
No ratings yet
CCNA Cisco Solutions Course Overview
19 pages
VoxUCM-1.0.1 API
No ratings yet
VoxUCM-1.0.1 API
157 pages
Smart Energy Management with Node-Red
No ratings yet
Smart Energy Management with Node-Red
21 pages
CNS Major
No ratings yet
CNS Major
2 pages
Android Intent and Activity Examples
No ratings yet
Android Intent and Activity Examples
60 pages
IT Curriculum for Semesters V & VI
No ratings yet
IT Curriculum for Semesters V & VI
51 pages
Cybersecurity Expertise of Manoj Kumar
No ratings yet
Cybersecurity Expertise of Manoj Kumar
5 pages
Scia Engineer 2012 Protection Manual
No ratings yet
Scia Engineer 2012 Protection Manual
28 pages
Web-Based Capstone Repository System
No ratings yet
Web-Based Capstone Repository System
8 pages
Dell Inspiron N5010 Specs Overview
No ratings yet
Dell Inspiron N5010 Specs Overview
1 page
Vehicle Classification for Toll Systems
No ratings yet
Vehicle Classification for Toll Systems
74 pages
Smile Scan: Precision Dental Scanning
No ratings yet
Smile Scan: Precision Dental Scanning
1 page
Ms 2021
No ratings yet
Ms 2021
17 pages
Streaming Data Mining & Sentiment Analysis
No ratings yet
Streaming Data Mining & Sentiment Analysis
3 pages
BCA Java Programming Syllabus 2020-21
No ratings yet
BCA Java Programming Syllabus 2020-21
2 pages
DigiSkills Digital Marketing Exercise 1
No ratings yet
DigiSkills Digital Marketing Exercise 1
3 pages
Understanding the cd Command in Linux
No ratings yet
Understanding the cd Command in Linux
10 pages