Overview of BigTable and Cloud Services

BigTable is a distributed storage system developed at Google for managing large amounts of structured data. It stores data as multidimensional sorted maps and provides real-time read/write access to petabytes of data across thousands of commodity servers. BigTable's data is distributed across many machines and it uses Google File System for storage. It was inspired by Google's need to manage user data for services like Search, Analytics, Maps and Gmail.

Uploaded by

sharath_rakki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views21 pages

Overview of BigTable and Cloud Services

Uploaded by

sharath_rakki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

BigTable

• BigTable is one that may be petabytes in size and distributed among

tens to thousands of machines. It is designed for storing items such as
billions of URLs, with many versions per page; over 100 TB of satellite
image data; hundreds of millions of users; and performing thousands
of queries a second.
• BigTable was developed at Google it has been in use since 2005 in
dozens of Google services. An open source version, HBase, was
created by the Apache project on top of the Hadoop core. Apache
Cassandra, first developed at Facebook to power their search engine,
is similar to BigTable with a tunable consistency model and no master.
• BigTable is designed with semi-structured data storage in mind. It is a
large map that is indexed by a row key, column key, and a timestamp.
Each value within the map is an array of bytes that is interpreted by
the application. Every read or write of data to a row is atomic,
regardless of how many different columns are read or written within
that row.
characteristics of BigTable:
• Map: A map is a data structure that allows one to look up a value to a
corresponding key quickly. BigTable is a collection of (key, value) pairs where the
key identifies a row and the value is the set of columns.
• persistent: The data is stored persistently on disk.
• Distributed: BigTable's data is distributed among many independent machines.
At Google, BigTable is built on top of GFS (Google File System). The Apache open
source version of BigTable, HBase, is built on top of HDFS (Hadoop Distributed File
System) or Amazon S3.
• Sparse: The table is sparse, meaning that different rows in a table may use
different columns, with many of the columns empty for a particular row.
• Sorted: Most associative arrays are not sorted. A key is hashed to a position in a
table. BigTable sorts its data by keys. This helps keep related data close together,
usually on the same machine. For example, if domain names are used as keys in a
BigTable
[Link]
[Link]
[Link]
• Multidimensional: A table is indexed by rows. Each row contains one or more
named column families. Column families are defined when the table is first
created. Within a column family, one may have one or more named columns. All
data within a column family is usually of the same type. Columns within a column
family can be created on the fly. Rows, column families and columns provide a
three-level naming hierarchy in identifying data. For example:
• Time-based: Time is another dimension in BigTable data.
Every column family may keep multiple versions of column
family data. If an application does not specify a timestamp, it
will retrieve the latest version of the column family.
Alternatively, it can specify a timestamp and get the latest
version that is earlier than or equal to that timestamp.
BigTable: Columns and column families
BigTable: Rows and partitioning
• A table is logically split among rows into multiple
sub tables called tablets. A tablet is a set of
consecutive rows of a table and is the unit of
distribution and load balancing within BigTable.
Because the table is always sorted by row, reads
of short ranges of rows are efficient: one typically
communicates with a small number of machines.
Hence, a key to ensuring a high degree of locality
is to select row keys properly.
BigTable: Timestamps
• Each column family cell can contain multiple versions of
content. For example, in the earlier example, we may have
several timestamped versions of page contents associated with
a URL. Each version is identified by a 64-bit timestamp that
either represents real time or is a value assigned by the client.
Reading column data retrieves the most recent version if no
timestamp is specified or the latest version that is earlier than a
specified timestamp.
• A table is configured with per-column-family settings for
garbage collection of old versions. A column family can be
defined to keep only the latest n versions or to keep only the
versions written since some time t.
BigTable: Chubby
• Chubby is a highly available and persistent distributed lock service
that manages leases for resources and stores configuration
information. The service runs with five active replicas, one of which
is elected as the master to serve requests. A majority must be
running for the service to work. Paxos is used to keep the replicas
consistent. Chubby provides a namespace of files & directories. Each
file or directory can be used as a lock.

In BigTable, Chubby is used to:

 ensure there is only one active master
 store the bootstrap location of BigTable data
 discover tablet servers
 store BigTable schema information
 store access control lists
BigTable indexing hierarchy
Google Big Data services
• Search
• Analytics
• Maps
• Gmail
OpenStack
• OpenStack is a project originally started by NASA and Rackspace for delivering a
cloud computing and storage platform. Today, OpenStack is a global
collaboration of developers and technologists producing an open source cloud
computing platform for public and private clouds.
• The technology consists of a series of interrelated projects delivering various
components for a cloud infrastructure solution. OpenStack software delivers a
massively scalable cloud operating system consisting of three major
components:
 Compute: open source software designed to provision and manage large
networks of virtual machines, creating a redundant and scalable cloud
computing platform.
 Object Storage: open source software for creating redundant, scalable object
storage using clusters of standardized servers to store petabytes of accessible
data (code-named "Swift").
 Image Service: provides discovery, registration, and delivery services for virtual
disk images (code-named "Glance").
• OpenStack has attracted more than 500 member organizations, including Dell,
Cisco, Citrix, HP, EMC, VMware, Red Hat, IBM and Intel, and the project is
currently managed by the non-profit OpenStack Foundation.
Microsoft Azure
• Microsoft Azure is widely considered both a
Platform as a Service (PaaS) and Infrastructure
as a Service (IaaS).
• Microsoft Azure is one of several major public
cloud service providers operating on a large
global scale. Other major providers include
Google Cloud Platform (GCP), Amazon Web
Services (AWS) and IBM.
Azure products and services
Microsoft categorizes Azure cloud services into 18 main product types:
• Compute -- These services enable a user to deploy and manage virtual
machines (VMs), containers and batch processing, as well as support
remote application access.
• Web -- These services support the development and deployment of web
applications, and also offer features for search, content delivery,
application programming interface (API) management, notification and
reporting.
• Data storage -- This category of services provides scalable cloud storage
for structured and unstructured data and also supports big data projects,
persistent storage (for containers) and archival storage.
• Analytics -- These services provide distributed analytics and storage, as
well as features for real-time analytics, big data analytics, data lakes,
machine learning, business intelligence (BI), internet of things (IoT) data
streams and data warehousing.
• Networking -- This group includes virtual networks, dedicated
connections and gateways, as well as services for traffic management and
diagnostics, load balancing, domain name system (DNS) hosting, and
network protection against distributed denial-of-service (DDoS) attacks.
• Media and content delivery network (CDN) -- These services include on-
demand streaming, digital rights protection, encoding and media
playback and indexing.
• Hybrid integration -- These are services for server backup, site recovery
and connecting private and public clouds.
• Identity and access management (IAM) -- These offerings ensure only
authorized users can access Azure services, and help protect encryption
keys and other sensitive information in the cloud. Services include
support for Azure Active Directory and multifactor authentication (MFA).
• Internet of things -- These services help users capture, monitor and
analyze IoT data from sensors and other devices. Services include
notifications, analytics, monitoring and support for coding and execution.
• Development -- These services help application developers share code,
test applications and track potential issues. Azure supports a range of
application programming languages, including JavaScript, Python, .NET
and [Link].
• Security -- These products provide capabilities to identify and respond to
cloud security threats, as well as manage encryption keys and other
sensitive assets.
• Artificial intelligence (AI) and machine learning -- This is a wide range of
services that a developer can use to infuse machine learning, AI and
cognitive computing capabilities into applications and data sets.
• Containers -- These services help an enterprise to create, register, arrange
and manage huge volumes of containers in the Azure cloud, using
common platforms such as Docker and Kubernetes.
• Databases -- This category includes Database as a Service (DBaaS)
offerings for SQL and NoSQL, as well as other database instances, such as
Azure Cosmos DB and Azure Database for PostgreSQL. It also includes SQL
Data Warehouse support, caching, and hybrid database integration and
migration features.
• Migration -- This suite of tools helps an organization estimate
workload migration costs, and perform the actual migration of
workloads from local data centers to the Azure cloud.
• Mobile -- These products help a developer build cloud applications
for mobile devices, providing notification services, support for back-
end tasks, tools for building APIs and the ability to couple geospatial
(location) context with data.
• Management -- These services provide a range of backup, recovery,
compliance, automation, scheduling and monitoring tools that can
help a cloud administrator manage an Azure deployment.
Integrating Data source
• A primary purpose of Data integration is to present the data in
new and unique ways. To gain new insights and, in business,
new advantages. Recognizing the needs of the organization
prior to “organizing” the data is useful in a broad range of Big
Data projects, including business and scientific research. Big
Data Integration combines traditional data, social media data
from the Internet of Things (IoT), and transactional data. Data
that is not compatible, or has not been
translated/transformed, is essentially useless for such projects.
• Organizations use MDM systems to promote the collection,
aggregation, consolidation, and delivery of reliable data
throughout the organization. Additionally, new tools, such as
Scribe and Sqoop are being used to support the integration of
Big Data.
• Managing “integrated” Big Data assures more
confidence in decision-making and provides
superior insights. The process of integrating
huge data sets can be quite complicated and
can present several challenges. Some
challenges faced during the integration
process include: uncertainty of data,
management, syncing across data sources,
finding insights, and skill availability.
Big Data Integration Tools
• As “traditional” tools for data integration continue to evolve, they
should be re-evaluated for their abilities to process the ever-
increasing variety of unstructured data, as well as the growing
volume of Big Data. Integration technologies must have a
common platform to support Data Quality and profiling.
• In traditional data warehouses, ETL (extract, transform, and load)
technologies are used to organize data. Those technologies have
evolved, and continue to evolve, to work within Big Data
environments.
• When using the cloud, data can be organized using integration
Platform-as-a-Service (iPaaS). This service is generally easy to use
and can include data from Cloud-based sources, such as Software-
as-a-Service (SaaS).
The Challenges of Big Data Integration

• Finding Staff
• Bringing in the Data
• Synchronization
• Data Management Tools
• Choosing a Strategy

NoSQL Data Management Overview
No ratings yet
NoSQL Data Management Overview
36 pages
Deep Learning Glossary
No ratings yet
Deep Learning Glossary
30 pages
Ed-Tech Trends in Indian Private Equity
No ratings yet
Ed-Tech Trends in Indian Private Equity
29 pages
Introduction to Software Development Process
No ratings yet
Introduction to Software Development Process
19 pages
Cyborg Future Law Policy Implications FINAL
No ratings yet
Cyborg Future Law Policy Implications FINAL
28 pages
School Operations Virtual Summit 2018
No ratings yet
School Operations Virtual Summit 2018
4 pages
M4:Cloud Security
No ratings yet
M4:Cloud Security
44 pages
Understanding Web Space and Hosting
No ratings yet
Understanding Web Space and Hosting
26 pages
Cloud Computing Course Overview
No ratings yet
Cloud Computing Course Overview
84 pages
Containerization in Cloud Computing
No ratings yet
Containerization in Cloud Computing
13 pages
Distributed Systems & Cloud Computing Overview
No ratings yet
Distributed Systems & Cloud Computing Overview
4 pages
Drivers of IoT Network Architecture
No ratings yet
Drivers of IoT Network Architecture
99 pages
M5:Cloud Programming and Software Environments
No ratings yet
M5:Cloud Programming and Software Environments
80 pages
AI-Driven Automation in Vertical Farming
No ratings yet
AI-Driven Automation in Vertical Farming
37 pages
Augmented Reality Laboratory Manual
No ratings yet
Augmented Reality Laboratory Manual
44 pages
Cloud Portability and Interoperability Issues
No ratings yet
Cloud Portability and Interoperability Issues
21 pages
Architectural Influences on Cloud Computing
No ratings yet
Architectural Influences on Cloud Computing
22 pages
IoT Connectivity Technologies Overview
No ratings yet
IoT Connectivity Technologies Overview
96 pages
Cloud Management and VM Provisioning
No ratings yet
Cloud Management and VM Provisioning
24 pages
Importing Data with ServiceNow Import Sets
No ratings yet
Importing Data with ServiceNow Import Sets
18 pages
SDN and NFV in IoT Explained
No ratings yet
SDN and NFV in IoT Explained
56 pages
Simplified IoT Architecture Overview
No ratings yet
Simplified IoT Architecture Overview
14 pages
Understanding Service-Oriented Architecture
No ratings yet
Understanding Service-Oriented Architecture
19 pages
OS Storage Management Overview
No ratings yet
OS Storage Management Overview
32 pages
Cloud Solutions for E-Commerce & Collaboration
No ratings yet
Cloud Solutions for E-Commerce & Collaboration
6 pages
Create A Pizza
No ratings yet
Create A Pizza
77 pages
(Ebook) Analytics For The Internet of Things (IoT) : Intelligent Analytics For Your Intelligent Devices by Andrew Minteer ISBN 9781787120730, 1787120732 Instant Download
No ratings yet
(Ebook) Analytics For The Internet of Things (IoT) : Intelligent Analytics For Your Intelligent Devices by Andrew Minteer ISBN 9781787120730, 1787120732 Instant Download
71 pages
Cloud Infrastructure Design Overview
No ratings yet
Cloud Infrastructure Design Overview
8 pages
Virtualization Concepts Explained in Tenglish
No ratings yet
Virtualization Concepts Explained in Tenglish
51 pages
Agricultural Systems Reliability Explained
No ratings yet
Agricultural Systems Reliability Explained
5 pages
Cloud Technology and Virtualization Overview
No ratings yet
Cloud Technology and Virtualization Overview
23 pages
Subnetting Examples in Networking
No ratings yet
Subnetting Examples in Networking
13 pages
Introduction to Raspberry Pi in IoT
No ratings yet
Introduction to Raspberry Pi in IoT
41 pages
Cloud Computing Unit 1 Overview
No ratings yet
Cloud Computing Unit 1 Overview
29 pages
MA3354 Discrete Mathematics Nov Dec 2022 Question Paper Download
No ratings yet
MA3354 Discrete Mathematics Nov Dec 2022 Question Paper Download
3 pages
Cloud Programming Features Overview
No ratings yet
Cloud Programming Features Overview
23 pages
Cloud Computing Security & Advanced Concepts
No ratings yet
Cloud Computing Security & Advanced Concepts
7 pages
186 Cloud Computing From Scet
No ratings yet
186 Cloud Computing From Scet
8 pages
Cloud Architecture and Deployment Models
No ratings yet
Cloud Architecture and Deployment Models
23 pages
Understanding Industry 4.0 Revolution
No ratings yet
Understanding Industry 4.0 Revolution
20 pages
Unstructured Data Storage Solutions Overview
No ratings yet
Unstructured Data Storage Solutions Overview
5 pages
Principles of Cloud Computing Architecture
No ratings yet
Principles of Cloud Computing Architecture
34 pages
Elu Avasthigal
No ratings yet
Elu Avasthigal
3 pages
Firewall Seminar Report Overview
No ratings yet
Firewall Seminar Report Overview
27 pages
12 Things You Must Know When Choosing A B2B Ecommerce Platform
No ratings yet
12 Things You Must Know When Choosing A B2B Ecommerce Platform
16 pages
Email Recovery After Deletion
No ratings yet
Email Recovery After Deletion
180 pages
BDACh 02 L01 Hadoop
No ratings yet
BDACh 02 L01 Hadoop
24 pages
Digital Governance in Public Administration
No ratings yet
Digital Governance in Public Administration
20 pages
Patient Queue Management Guide
No ratings yet
Patient Queue Management Guide
34 pages
Cloud App Design & Streaming Protocols
No ratings yet
Cloud App Design & Streaming Protocols
25 pages
Understanding Web Servers and Node.js
No ratings yet
Understanding Web Servers and Node.js
36 pages
ServiceNow Administrator Course Report
No ratings yet
ServiceNow Administrator Course Report
13 pages
OCS352 IoT Concepts Overview
100% (1)
OCS352 IoT Concepts Overview
164 pages
Understanding XML Databases and Data Types
No ratings yet
Understanding XML Databases and Data Types
36 pages
Software Quality Management Process Control
No ratings yet
Software Quality Management Process Control
6 pages
Cloud Service Management Overview
No ratings yet
Cloud Service Management Overview
30 pages
Agricultural Systems Management Overview
No ratings yet
Agricultural Systems Management Overview
21 pages
JAQL Overview for Big Data Analysis
No ratings yet
JAQL Overview for Big Data Analysis
56 pages
Cloud Data Processing Case Studies
No ratings yet
Cloud Data Processing Case Studies
24 pages
Big Data and NoSQL Systems Overview
No ratings yet
Big Data and NoSQL Systems Overview
51 pages
Evolution of GSM Mobile Phones
No ratings yet
Evolution of GSM Mobile Phones
41 pages
Types and Benefits of Business Analytics
No ratings yet
Types and Benefits of Business Analytics
21 pages
Understanding Information Management
100% (3)
Understanding Information Management
18 pages
Business Intelligence Research Insights
No ratings yet
Business Intelligence Research Insights
34 pages
Understanding Hadoop YARN Architecture
No ratings yet
Understanding Hadoop YARN Architecture
7 pages
Types of Handover in GSM Networks
No ratings yet
Types of Handover in GSM Networks
5 pages
WLAN Architecture and Mobile IP Overview
No ratings yet
WLAN Architecture and Mobile IP Overview
44 pages
Advanced Analytics in Big Data
100% (1)
Advanced Analytics in Big Data
10 pages
Data Dissemination in Mobile Networks
No ratings yet
Data Dissemination in Mobile Networks
46 pages
Benefits of Professional Forensics
100% (1)
Benefits of Professional Forensics
57 pages
Forensic Readiness Planning Guide
No ratings yet
Forensic Readiness Planning Guide
60 pages
Statement and Branch Testing Explained
No ratings yet
Statement and Branch Testing Explained
6 pages
Public Key Cryptography Overview
No ratings yet
Public Key Cryptography Overview
36 pages
Introduction to Computer Forensics
No ratings yet
Introduction to Computer Forensics
38 pages
Network Programming Concepts and Practices
No ratings yet
Network Programming Concepts and Practices
5 pages
RTOS Performance and Market Overview
No ratings yet
RTOS Performance and Market Overview
4 pages
9-7 Developing For OneData PDF
No ratings yet
9-7 Developing For OneData PDF
155 pages
Mastering Pareto's Principle in Management
No ratings yet
Mastering Pareto's Principle in Management
2 pages
GameCenter Application Startup Log
No ratings yet
GameCenter Application Startup Log
9 pages
File System Overview and Design
No ratings yet
File System Overview and Design
28 pages
Mikrotik Dual WAN Load Balancing Script
100% (1)
Mikrotik Dual WAN Load Balancing Script
2 pages
Adding SCCPCH to Alleviate FACH Congestion
No ratings yet
Adding SCCPCH to Alleviate FACH Congestion
11 pages
Computer Basics: Nitin Dawar, Delhi University Computer Centre Nitin@ducc - Du.ac - in 1
No ratings yet
Computer Basics: Nitin Dawar, Delhi University Computer Centre Nitin@ducc - Du.ac - in 1
0 pages
CCS API Detailed Design Overview
No ratings yet
CCS API Detailed Design Overview
29 pages
Oracle DBA Resume of Abhishek Singh
No ratings yet
Oracle DBA Resume of Abhishek Singh
4 pages
Ddr4 Sdram Sodimm: MTA16ATF2G64HZ - 16GB Features
No ratings yet
Ddr4 Sdram Sodimm: MTA16ATF2G64HZ - 16GB Features
23 pages
Breakdown of URL Components
100% (2)
Breakdown of URL Components
3 pages
Data Types
No ratings yet
Data Types
6 pages
DNS Traffic Analysis Lab Guide
No ratings yet
DNS Traffic Analysis Lab Guide
11 pages
Understanding Avro in Big Data Systems
No ratings yet
Understanding Avro in Big Data Systems
38 pages
Tc58nvg2s0hta00 Nand Flash 4g
No ratings yet
Tc58nvg2s0hta00 Nand Flash 4g
3 pages
Wipro Internal Server Environment Overview
No ratings yet
Wipro Internal Server Environment Overview
10 pages
Splunk Commands for Data Visualization
No ratings yet
Splunk Commands for Data Visualization
7 pages
Optimizing Windows Shellcode in C
No ratings yet
Optimizing Windows Shellcode in C
10 pages
Managing Database User Privileges
No ratings yet
Managing Database User Privileges
23 pages
CDAC Project Ideas and Resources
No ratings yet
CDAC Project Ideas and Resources
94 pages
Business Information Systems Course Overview
No ratings yet
Business Information Systems Course Overview
22 pages
Serial vs Parallel Communication Basics
No ratings yet
Serial vs Parallel Communication Basics
10 pages
Software Engineer at Virgosys Pvt. Ltd.
No ratings yet
Software Engineer at Virgosys Pvt. Ltd.
4 pages
Testbank Database Systems A Practical Approach To Design Implementation and Management 5th Edition by Thomas M Connolly Download
No ratings yet
Testbank Database Systems A Practical Approach To Design Implementation and Management 5th Edition by Thomas M Connolly Download
221 pages
PowerBuilder DataWindow Basics
No ratings yet
PowerBuilder DataWindow Basics
21 pages
Villawood Amazon EDI Setup Guide
No ratings yet
Villawood Amazon EDI Setup Guide
3 pages
Bash Command Line A-Z Index
100% (1)
Bash Command Line A-Z Index
5 pages
Computer Applications Exam Paper
No ratings yet
Computer Applications Exam Paper
32 pages
LM048v2 Bluetooth RS232 Adapter Overview
No ratings yet
LM048v2 Bluetooth RS232 Adapter Overview
4 pages
GraphQL Study Path: From Beginner to Pro
No ratings yet
GraphQL Study Path: From Beginner to Pro
5 pages
AMBA AXI Protocol Overview
100% (3)
AMBA AXI Protocol Overview
47 pages
IGCSE Technology Worksheet on Devices
No ratings yet
IGCSE Technology Worksheet on Devices
2 pages

Overview of BigTable and Cloud Services

Uploaded by

Overview of BigTable and Cloud Services

Uploaded by

BigTable

• BigTable is one that may be petabytes in size and distributed among

In BigTable, Chubby is used to:

You might also like