0% found this document useful (0 votes)
10 views17 pages

Google File System Overview and Design

Uploaded by

wovit87983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views17 pages

Google File System Overview and Design

Uploaded by

wovit87983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

GOOGLE FILE

SYSTEM

mi
Lohu
gesh
Ⓒ Yo
AGENDA
● Introduction ● Fault Tolerance and Diagnosis

● Design Overview ● Measurements

● System Interactions ● Conclusion

● Master Operations ● References

mi
Lohu
gesh
Ⓒ Yo
2
1
INTRODUCTION
Let’s start with the first set of slides

mi
Lohu
gesh
Ⓒ Yo
● Google File System(GFS) is a
distributed file system developed
by GOOGLE for its own use

“ ● It is a scalable file system for large


distributed data-intensive
applications.

● It is widely used within GOOGLE


as a storage platform for
generation and processing of data.

mi
Lohu
gesh
Ⓒ Yo
4
INSPIRATION
▸ Multiple clusters distributed worldwide.

▸ Thousands of queries served per second.

▸ Single query reads more than 100's of MB of data.

▸ Google stores dozens of copies of the entire Web.

▸ Large data processing needs Performance, Reliability, Image credit


:google data
center

▸ Scalability and Availability

mi
Lohu
gesh
Ⓒ Yo
5
GFS ARCHITECTURE
Design Assumptions
▸ Component Failures
● File System consists of hundreds of
machines made from commodity
parts.
▸ Huge File Sizes
▸ Workload
● Large streaming reads.
● Small random reads.
● Large, sequential writes that append
data to file.
▸ Applications & API are co-designed
● Goal is simple file system, light
burden on applications. mi
Lohu
gesh
Ⓒ Yo
6
mi
Lohu
gesh
Ⓒ Yo
7
Metadata
▸ The file and chunk namespaces.

▸ The mappings from files to chunks.

▸ Location of each chunk’s replica.

mi
Lohu
gesh
Ⓒ Yo
8
GES Client code
▸ Code at client machine that
interacts with GFS.
▸ Interacts with the master for

metadata operations.
▸ Interacts with Chunk Servers
for all Read-Write
operations. i
um
oh
hL
s umi
o ggeesh Loh
©yYo

9
GFS ARCHITECTURE
Master
● Contains the system metadata like:
▸ Namespaces
▸ Access Control Information
▸ Mappings from files to chunks
▸ Current location of chunks
● Also helps in:
▸ Garbage collection
▸ Synching across Chunk Servers
mi
Image credit :google data center Lohu
gesh
Ⓒ Yo
10
Chunk Server

▸ Machines ▸ For reliability, ➢ Each Master


each chunk is server can have a
containing number of
physical files replicated on
associated chunk
multiple chunk
divided into servers.
servers
chunks. Img credit :google data centre

mi
Lohu
gesh
Ⓒ Yo
11
Chunk Size
Having a large uniform chunk size of 64 MB has the following
advantages:
▸ Reduced Client-Master interaction.
Vestibulum congue Vestibulum congue
▸ Reduced
tempu Network-Overhead. tempus

▸ Reduction in the size of metadata stored.


Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor. Ipsum dolor sit amet elit, sed do
eiusmod tempor

mi
Lohu
gesh
Ⓒ Yo
12
Master Operations
▸ Namespace Management and Locking (separate locks ,read and write lock)

▸ Replica Placement (maximise data reliability availability and NW bandwidth Utilisn. )

▸ Creation, Re-replication and Rebalancing

▸ Garbage Collection

▸ Stale Replica Detection (chunk replica may become stale if chunk server fails and misses

mutations hence for each chunk master maintains version number)


mi
Lohu
gesh
Ⓒ Yo
13
Fault Tolerance and
Diagnosis
▸ High Availability

▸ Fast Recovery, Chunk replication and

Master replication (shadow)

▸ Data integrity

▸ Checksum is used
mi
Lohu
gesh
Ⓒ Yo
14
Conclusion
Google File System
▸ Support Large Scale data processing workloads on
COTS x86 servers.
▸ Component failure are norms rather than
exceptions.
▸ Optimize for huge files mostly append to and then
read sequentially.
▸ Fault tolerance by constant monitoring, replicating
crucial data and fast and automatic recovery.
▸ Delivers high aggregate throughput to many
concurrent readers and writers. gesh
Lohu
mi

Ⓒ Yo
15
References
▸ Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google
File System." ACM SIGOPS Operating Systems Review:29. Print.
▸ Chandramohan A. Thekkath, Timothy Mann, and Edward K. Lee.
Frangipani: A scalable distributed file system. In Proceedings of the 16th
ACM Symposium on Operating System Principles, pages 224—237,
Saint-Malo, France, October 1997.
▸ [Link]
▸ [Link] [Link]/internet/basics/google-file-
[Link]
▸ [Link]
▸ [Link]
▸ [Link] WUIP40Nw
mi
Lohu
gesh
Ⓒ Yo
16
mi
Lohu
gesh
Ⓒ Yo
17

You might also like