0% found this document useful (0 votes)

8 views9 pages

Distributed Database Architecture

Q: What are the distinctive features of distributed database systems compared to traditional centralized systems?

Distributed database systems store data across multiple geographic locations, allowing for data redundancy and availability even if a particular site fails. Unlike centralized systems, distributed databases require complex management and consistency protocols to ensure data integrity across sites. They have the advantage of reduced latency by storing data closer to where it's needed. However, they introduce complexity in terms of ensuring data consistency and involve increased reliance on network performance .

Q: How does a key-value NoSQL database differ from a document NoSQL database, both in structure and use cases?

A key-value NoSQL database stores data as a collection of key-value pairs, where each key is unique and acts as a pointer to a particular value. This structure is simple and efficient for retrieval by key and is particularly suited for caching and session management use cases. In contrast, a document NoSQL database stores data in more complex formats such as JSON, BSON, or XML, allowing for nested structures and semi-structured data. This model is more flexible and suited for applications like content management systems or social media platforms, where it is necessary to handle diverse data types and schema evolutions .

Q: In what scenarios would a generalized query language like SQL be inadequate, prompting the use of specialized query languages in NoSQL databases?

SQL might be inadequate in scenarios where data models are highly dynamic, hierarchical, or involving nested structures which don't fit well into relational tables. For instance, NoSQL databases like MongoDB use specialized query languages that allow for querying nested and complex data forms efficiently, supporting operations on various data types beyond what SQL can handle comfortably. In situations requiring horizontal scalability with heterogeneous data structures or where single operations span multiple entities within documents, specialized query languages offer more direct and performance-efficient solutions .

Q: Discuss the role and advantages of sharding in a NoSQL database implementation.

Sharding plays a crucial role in NoSQL database systems by distributing data across multiple servers, thereby enhancing load balancing and potentially improving performance through parallel query processing. This method allows for horizontal scaling, enabling the system to handle more traffic by adding more shards, which is critical in large-scale applications. The advantages of sharding include improved query response times, more efficient data storage management, and the ability to maintain data availability and resilience even if some shards encounter issues. However, effective sharding requires careful planning regarding key distribution to avoid unbalanced data loads .

Q: Analyze how replication within distributed databases enhances fault tolerance and availability.

Replication in distributed databases involves copying and maintaining the same data across multiple sites, enhancing fault tolerance by ensuring that a database remains available even if some nodes fail. This redundancy means that data can be accessed from any available replica, thereby preserving data accessibility regardless of site-specific failures. Replication also supports load balancing, as read operations can be distributed across replicas, reducing the demand on any single node. However, achieving consistency across replicas requires careful management to avoid issues like stale data or data conflicts .

Q: What challenges do distributed database systems face regarding data consistency, and how are they typically addressed?

Distributed database systems face significant challenges in maintaining data consistency due to the inherent nature of data being stored and replicated across multiple sites and nodes. Network latency, node failures, and partitioning can lead to inconsistencies. These challenges are typically addressed through consensus protocols like Paxos or Raft, ensuring that a majority of nodes agree on the data state. Additionally, techniques such as two-phase commit are employed to ensure transaction consistency, though they can introduce latency and require high coordination between nodes .

Q: What are the trade-offs of using BASE properties in a distributed system instead of ACID properties?

Using BASE properties in distributed systems allows for increased scalability and availability by accepting that not all parts of the data system will be updated immediately. BASE sacrifices strict consistency for eventual consistency, where data updates are propagated progressively over time. This model benefits applications needing high availability and handling large amounts of distributed users, such as social networks. However, it might not be appropriate for systems requiring strong transactional consistency, where using ACID properties would ensure atomicity, consistency, isolation, and durability across transactions .

Q: In what situations might a NoSQL database be preferred over a traditional RDBMS, and why?

NoSQL databases are often preferred in situations involving large-scale data needs, such as handling big data, requiring high scalability across distributed systems, and managing semi-structured or unstructured data efficiently. They offer advantages in terms of flexible schema designs and can better accommodate rapidly evolving data models without downtime for schema changes. NoSQL systems like key-value stores, document stores, or column-family stores are particularly useful for applications like real-time web analytics, IoT data collection, and social media storage where high volume and velocity are prevalent .

Q: Evaluate the impact of network performance on distributed database management, particularly concerning data access and replication.

Network performance significantly impacts distributed database management because data access and replication across distant nodes depend heavily on network speed, reliability, and bandwidth. High network latency can lead to slower transaction processes and data retrieval, affecting end-user applications and operational efficiency. Ensuring high-performance and redundant networking equipment becomes pivotal for maintaining data consistency and availability during replication. Moreover, distributed databases must mitigate potential network failures while balancing the overhead of synchronizing data across sites .

Q: How does the CAP Theorem apply to distributed data systems, and what are its implications?

The CAP Theorem, proposed by Eric Brewer, asserts that a distributed data system can provide only two out of the following three guarantees simultaneously: Consistency, Availability, and Partition Tolerance. This means systems must prioritize which attributes to focus on according to their application needs. For instance, some systems might favor availability and partition tolerance, accepting eventual consistency, while others might ensure consistency and partition tolerance at the cost of immediate availability .

The document discusses distributed database systems, highlighting their architecture, advantages, and disadvantages. It explains the differences between structured and unstructured data, as well as the various types of databases such as NoSQL and relational databases. Additionally, it covers key concepts like consistency, availability, and partition tolerance in distributed systems.

Uploaded by

prafull.barathe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views9 pages

Distributed Database Architecture

Uploaded by

prafull.barathe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

M S S

Page No

Unit 5' NoSaLDatabase Date:

YOUVA

bwted Detis acoclcsnedts thatue

mutbple loasons but appu s a sigle,uniped
DAossmuubple distibtled
In DpB,he data is'stoed o cuoss dbsuske
jeoit site
RRneded
via nehDAk - is manages by pDDBs.
chavacteisticl- DData is
stonedaaoss uusple sites,seues
ceThe disibuban is basal on oqapbical nquiemete.
dout ned to bno0 obee the data isstoned. USesaue
2eflbiple copies odata sdonit oeud touzy atauk
hoo daia s diidedunto ameto ) DDBMS contitethe
stoaae and acess to data 4)Each sils bave somedaesa
autonoytoimange tsdb.)Relieo ôn netKok oncerouaiatn
Adanttaeo'Quees can be pocesoed at the stecootaig
peceaced leayat+
Abta odudsa pea tatsc2)Replicath da taauoss Site
ersues that ilue of one sla daeot disapt tteertia systen
EoSHto add neosit toaccoOodatepotth 4) Data can besto
cloceto obee it ia oeeded,aeducing laleny
Disadiantaagai
)aagng andasckoniigditibuted
Aata adda capie
data addo tsmplexity Ensuing dala consictoy auass
-al eites
à dikpiciut )Relies boaon
beiyy on netokpe
netsok paomara

*Distibuted DB auchitectuuetbeêni sveral aucbitetue

ndistöbded db systema Dciet-save 2)ee topeea
)Mutiiesn Hdeated archLtectue
yCientseNeA outbitectuuei-Cueit seuds queie to antaled
D2
cobich róancea DDBS.-The
DDAs-tbe

DB3
DB, saue ia nponsi bo ko coodinatig XaXachy
datastonae arl
- i s simple to Iroplemeut and ete
Gie dueto cetabiRed PweA Sy teni
2)P2F Al sikes acts as cqual oleoandnponsibiltis.
mçecioto ioteaacterth ap avesto accesdb:
T W T F S
|Page No.:
YOUVA
Date:

*Distobu&d DB syste uchitectue

stle 2

Comniaaon ste l
saver neho
AtLe distbuted systen bave Ôndatabase at ach sLte
and comOni cateusto ach etver othe thUgh
commAta
nroAk Lach ato maiotains ts on independeut sssten
but tthe d s aue
pmoide Comoto
iotenated ttrough middlaoae layea ttat
fhe data.

*lupea etdata- Data can be ateaiaed as )stuctued data

ate a) Sonishctueddata
)stuctued dala'- It odejoed ao thedata that isonina
ne schea,typcally incadcolumrs
-Itqollouosastoct
asòct scheratasiy
sohera schable using
lanuaae uhe sqL'stoned in tibula tonhats uithea
denesAatatypa-Rettol db as uato stoe (soLDacl
¢anplo!- Custornedetais db
LCustrnelD|Nae Age E-mil

KAuDEaS to stne, quet and analu

ighiorsisbyd
SnstmctueddataData
node n cberna1t lac
that
a
does not ioLwapicse7ned
aSpeci Reshutue,mkisnt
Aikiaet t stoe in tradiinat' bRaunes Speidied.
too and alysiiUnstctied'dat can
oApo
bestaneusing dbssuch as ongo D., Ca
-Foneroarn ple Tet fla 2) Imqcs a)ideos a)Emaila
Adu Handleavaùey otkmata 2)itasle psocial maia alo
+Disatu. DDHat to sente proan) Requie chvnce tslo to anaise
Page No.

Date:
YOUVA

Monstuetuetdata Dala tha doe not bae

stictchaa
tipuhay ongauiaeA but not as iidas stutuea da
se mak LkeSON,XLen YAML obeo data b
eible.
CAANle!- )XML nJsoN doumeuts 2)RIML Fles, conig
ile and senso data uitth oetadata
rusea NasQldatabase koA stauage.
Ad DEleibla and adaptable tochenes )Easieto antk_ie
as Coped tonsutued data
DÙsad DMoo complexto pocen Han nstucuted data
Ragujats fecial tanla eyig
Notony seLNoSL)-NosQL o ont bot RDBMs. -It is
speiatly desigoed alaqe arout aq data stae n dstuuta
en)oetTA is not bonded b t l e scea aoticaK
Lite Rnams-taveo btiant stod sooe data
avoids jon operations
AKey kathues-)NesLAbs donotAine xed scheOA
alban stoaage oa aicd Aata ooats, iit to sale
etyoictabubg data oss houliple node,
ercuig H®An
cEOsistet pe.A)suitabk je
stigand mnaging aSSie datasetsgeated SOiel
meáas, lotapsgecoamece
Advantagei iy Daaie sehen Good aeouca salabilty
2)Laeopeatonl ost ) Suppot seni-huctucddats
4) NGstaic schema S) Suppots disnbuted ampuing
lepODiA hdata No cooplcated aclathionips
Kelativey simple Aata otea
Disadu iATD Ceapiane a DCakeCerapaicdto RDBMs.
Concistensy chaonqes B)Reia keaunig haoqey laguag
Dlacz tgitstingi baudenta rigaale eigate
NoSQL DA' NessL DB Balenpized based onthei
*Tupb o
data models andbno they orqnize and Manage dataEoch
tupeis opinized pnspecise usecases sporidovauouique capabiiis
nKeu- vahe databaseData is stoied as beu- pais,ohee
aauaique beyidentsiesa piece oq data.-It is bybyy enfciad
uaique bey
iaenneample
a siple lsbtap oeahons
) Keis )AmazDn
aneamplo
DunamoDA 4D Riak S)ATS
osardha
se caseoi- Dacheig sessinanaqeneut
3) Real-me ecornmendabons
*A)Eaemey ost jea Rloopeahas)Smpledata sh
Disadantahea:- DIoiteBKqueyg capasi'tie
2)Docunoet databasel - Datu is stnedn seri-Stwetued
Lici JsoN, BSON aXni-Dourgebtican orptin seoted
data,oabing it sitabla Konbieachicalen compla dati
*PneamplaIMono Ds 2)Couch ba 3) RaveD DB
Use case-)cs 2)E-Coneoece apps s)oble appbatked
*AdEleible schema koneolig tata )ippott cangplerdak
Dadvantaae zutanana aydeasde jpnelahanal da.
2)oide column dato- Dala is stone is [Link]
eColums -Each AOcanbavedieet Dumbeetol colums.
Foaearnple )Apache Cassanda a) HBase s) Se
Lse cases'IAT data stoae )Tinoesaie da ) Event loein

citacbeay opeatans
Disadvantaesi DLirnifesuppot pacomploxd
)Grapbdaabosel -Dala is nepaeeted as nodeo a
-deln apps nequiing aelahsnsiphay data poocai
efoaerdhople DNen, )Aazn Nephune a)Tiaeiapt
lse cases!- )Social nehbin 2) Fraud detacti
) Recormoendan nginea
sAdantagi )Fç
ogsimized gntausigelotinshiça
ble quegnected daka
Disadhantgei. )Conopleaseap an roaintaae
YOUVA

nA PTheonern (Biaoei's Theonerm)-The cAP

theanem popasd by
Pdeoe,stalee that ina distibuted data sustem,it isipossi
pslmetanêousy qaastee a thnee popeses ie corsistery,
Avilabiity and Paubhn Toleanco.
onsistenyEveyAdoperabonnteivotherot Aant uite
-Ensues that alnode in tte Syste neuusthe sana
Jala at the sanne sme DAvai labiity Evey Aqueatncivtoa
nEapONSe Aaudless oaystem's stato.-Enswe thesskmiscpetoa
thé tine sy Pauhan tolerance i- Thesystkmcs
to

operate euen theae is a nehDk ailee

Theeaue 3 combinahos oo NasaLulao
Donstay+AailabiitCc S' onto oel usthaut neot
pattisbut them.
Ense data orssteru
auossnode' but ay delay aaponsa tg morgo Da, HBse
&)Aailabiity+ftstoo Teoleane AF) Ensuue
but may netan stale data tg byanooDB, assanta
*BASE popetie oa NosaLi--BASE Stands koa Bosially
Availablo, Sort stat, Evetual corssh Bosiaiy
BASEnodel asiee
HA, scalabiy and petomane in dishibded syskos
Basically Ava lable i-Gsuesthbat he sysknAeMA0S oting
andsves Aequesta eIen if ne pait o ã[Link]
=Availability is pioni oe stoct corsisicoy EA Asoppig
oekstaallos addig
ot stata's-The ssondoent assume ed stata at a!
Data
aay beAcpTated nupdated aossacuos nte
nates ovel tne
E Adishsbutedcacho update to
to enswe betepetomNA
-Guaastees that althe updata to daa

cosEO-Terpon
popogata to
ome useLs stohla tatig longa to s atha
Ohento yseben )Scala bitty cuciaI ) HA isoeimpott
Thay cersistey Petaena nets utoeg shit corSsky
is toleate
soialnedia pe~ts appeT
AODcorsisteny odel Itis a set ot principles that
ensuethe neliBiiy andobustnessodb tansachoos
nelatiana) db.-AcDtands a [Link].
|solaton,and Duuebiity
DAtoicity' Ensuen tht a Xacian isall-onnothirg
any paut of Xachon kail,the ente Wacian uçAoetbact,
2)Consetenuy Guauantees that a Xa hin xpns the db
[Link] alid state to anothe vali state,mairtainiy
dbele
)TsolhoEnsue ttatconcuueut Xachons donot
usitth each ote- Each Xacton eaeuteoao t
oethe ony one in the sustem
)Duatility Enswethat once Xiachon is cotted, it
Aeains p anent, een inthe eveut
Obyuse AUD model'-Itis ucial kpaaps thatequie
)Deta steaity:oaniad KiackanHealts caue systema
) Reliabik}yses oheo inconsistey lead to ssue
SQL
NosaL
DFullominStutued Que Dflkonmis Not ony sQL.
Lanatase
)TELaeclaaiue queyy laauane2tt isnot adeclasesne lagagR
sutwed ond napii daba 4) nstcuuet 4 unepicabledab
S)Relasamal Abisable based 5Keyaluepai
soaae,lalum
stne,DOumei soAe,Graph Stou
6)Data andit's aelatnship ia lQNoAdened schema
stened in sepealte tablea
Tiaht consi stey 7)Evetual congisteny
)Eanpleai mysaL,Cha clo, ) TOogoDB, Big Tabe
ms saL PostgaesaL Neo couchN
SQe, DB2 Cassanda, HPase
A
S S
Page No
YoUVA

DB:- -It s
Once,itesababiNosALdoumeutroented
tyy and Neibity -It
dtabae
1 stoneo data i
PitaJsON i gumat called BSON matin t ida j
apps ustth dunamiC and unshuctued data.
Daa io stond is BsON unat allmig o Heible chea
featuesl-)Data
lsupponto bonzartel salig usig shadim aemad oa po

oebpoojul taolY oa kada Xthhandonatysis s)upports

queue foage<pabal lata
CAdvataatol )EleLible schea2)Richuey languane 3HA
La)Hoo2ngtal salig s)ase of inteaation
DCms 29emoAce sRealime amuytG ArDbloa
doueut
move:"Hanuman
iccet: 2oo":
Suen 2,

4 Singe
RCRUDOperatians Daeate Tnset doauoest is a alectia,
Comand inseLtOne ldocuau) [Link]
)aeato oustplo docuroento O
SosetanyltdoamnatsJ) [Link](too'an

EAditindQlquey
ueying a docroeut inaCUch'oy
dbusekadgesgt:2D4 })
poata:TMadi7y a speci gc ield hdouat caUtes

updateonel plte,updata). thusesupdalOneC{hei Aia'?,

4)Deletedocucoeti RrO Colecin

deleteone(te)
Agqsegabon pieine -Ttisapamawont in Mongp DA
data ata inseleestaago
hea
each aagAegahnptmd
anstMi the dounetand passes
passG theolptoney
thepto
*featuz)Each staae pepunan opedhon 2)suppots
ne

gathmatcal,stongaua y dato-based opeations.

2) Cannehape douteut. Spuca caniptedulta
)optized oa laae datasek
Surtrt [Link]
istage 2}.
3);
-Bosic plpeine posdes ilteothat opeate iteaquey -soNG
Pipeine cpeahons poddes tanla
Aoauroetb by speipedkelds - Frpeinepoodes efkciet
data aageaaBon usth he naive opelaho.
Napeduco i -Tbe TnoapDB canpatpummapAduee ops
to pamaaaeaatson.- It babattasb i t 2hase
Dmah-Appueauncmtaaachdoumat,emni
Redice!- Combioe va lueo assOcaled utt same ting keuae
*Advortaaes' DPocess data acss muliplo doco paale
Handles lomplex logic that miaht not be achieabla wsty
withay.
DSadu-)slooeacopaed to aaga Ppeline
lOgmapt
Lsme corplex 2) It isdepaicated.
[Link]
in apt) out: "oututcotlectian
enit (eeynalue);
kunchoncduca keynalue
YetunAluced Value iitinumbe
Page No..
YOUVA
Date

odeiraiomongobB- Thd iopove the spad op quey

Puhetcing DO dco the dbneeds to sCan.
ezue:- ) a s eKaieut qucyy ceution 2) Redueo Vo oæby
)Suppt mlpe ypo7 indaes tailaed to dipet queie
Tupe of intee i
keld indea Idea ona
ba single peldin adocumeut
Single te ode
SlopodindeaIoder on mustspla Fields,paeig
amay
skey indea Sna canbiiha on shigcodat
A)Tezt intez. Suppad tot se
indz
6) Hashed inder hder usiog bash vaua, ideal 7en shading
)Geaspatial indeEabagspatialqueeangogephie dek
DOdcaud ide:Toder au relds ibadoumet
) TTLsdeAutomaicalq AcroovetOUMeutaye T7L peid
-1)ique isder Ensuenthat indeled peldcontalsuniquo vehe
data
*Shadig nMongoDShadig used toshibute
auoss dtipto saves aclustes
shaudiog': ) Dishi buteedata aLoss mse acie

Oyhequey
DBde4eq dstte adcity loadamong dieeut shads. s)ttardle
dataseb that encéods the ocity sinale achie
4Ensuedata ernaini acceinibla eu~n some hdskil
Sbarding auchitectue' t consista o Donq saes' Stans
oetadata bout data ist bubon. 2)Quey leo (rongos-Rote
B)shads-Cophins the actual
ciet hequesto tothecoect shud.
data,typiably deplayed as neplioA Ses.
adig
hads 2Ba laro quey tonaae loads acos lplosaaa
*utt toleanca.
Replicaton ustbin shads ensuea
zaotain )Pon shavd
a D i r g a tto con auodistibutn.3)lrUcased
tocnevendata
4selection can ead
abeds.

Common questions