MongoDB.pdf

Technical Overview
Learnizo Global LLP
1

Agenda
• Welcome and Introductions
• The World We Live In
• MongoDB Technical Overview
• Use Case Discussion
• Demo
2

MongoDB – It’s About the Data
3

MongoDB, It’s for the Developers
4

MongoDB Brings It All Together
5
Volume of Data
Agile Development
• Cloud Computing
• Commodity servers
• Trillions of records
• 100’s of millions of
queries per second
• Iterative
• Continuous
Hardware Architectures

MongoDB Use Cases
User Data Management High Volume Data Feeds
Content Management Operational Intelligence Product Data Mgt
6

10gen: The Creators of MongoDB
Set the
direction &
contribute
code to
MongoDB
Foster
community
& ecosystem
Provide
MongoDB
management
services
Provide
commercial
services
• Founded in 2007
– Dwight Merriman, Eliot
Horowitz
– Doubleclick, Oracle,
Marklogic, HP
• $31M+ in funding
– Flybridge, Sequoia, Union
Square
• Worldwide Expanding
Team
– 150+ employees
– NY, CA and UK
7

Agenda
• The World We Live In
• Demo
8

1
0
NoSQL
• Key-value
• Graph database
• Document-oriented
• Column family

Traditional Architecture
• Relational
– Hard to map to the way we code
• Complex ORM frameworks
– Hard to evolve quickly
• Rigid schema is hard to change, necessitates migrations
– Hard to scale horizontally
• Joins, transactions make scaling by adding servers hard
11

RDBMS Limitations
12
Productivity
Cost

MongoDB
• Built from the start to solve the
scaling problem
• Consistency, Availability, Partitioning
- (can’t have it all)
• Configurable to fit requirements
13

1
4
Theory of noSQL: CAP
CAP Theorem:
satisfying all three at the
same time is impossible
A P
• Many nodes
• Nodes contain replicas of
partitions of data
• Consistency
– all replicas contain the same
version of data
• Availability
– system remains operational on
failing nodes
• Partition tolarence
– multiple entry points
– system remains operational on
system split
C

1
6
ACID - BASE
• Atomicity
• Consistency
• Isolation
• Durability
• Basically
• Available (CP)
• Soft-state
• Eventually
consistent (AP)

MongoDB is easy to use
17
START TRANSACTION;
INSERT INTO contacts VALUES
(NULL, ‘joeblow’);
INSERT INTO contact_emails VALUES
( NULL, ”joe@blow.com”,
LAST_INSERT_ID() ),
( NULL, “joseph@blow.com”,
LAST_INSERT_ID() );
COMMIT;
db.contacts.save( {
userName: “joeblow”,
emailAddresses: [
“joe@blow.com”,
“joseph@blow.com” ] } );
MongoDB
MySQL

As simple as possible,
but no simpler
Depth of functionality
Scalability
&
Performance
Memcached
Key / Value
RDBMS
18

Representing & Querying Data
19

Schema design
 RDBMS: join
20

Schema design
 MongoDB: embed and link
 Embedding is the nesting of objects and arrays inside
a BSON document(prejoined). Links are references
between documents(client-side follow-up query).
 "contains" relationships, one to many; duplication of
data, many to many
21

Tables to Collections
of JSON Documents
{
title: ‘MongoDB’,
contributors:
[
{ name: ‘Eliot Horowitz’,
email: ‘eliot@10gen.com’ },
{ name: ‘Dwight Merriman’,
email: ‘dwight@10gen.com’ }
],
model:
{
relational: false,
awesome: true
}
}
23

Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
24

Documents
Collections contain documents
Documents can contain other documents
and/or
Documents can reference other documents
Flexible/powerful ability to relate data
Schemaless
Flexible Schema
25

2
6
CRUD
• Create
– db.collection.insert( <document> )
– db.collection.save( <document> )
– db.collection.update( <query>, <update>, { upsert: true } )
• Read
– db.collection.find( <query>, <projection> )
– db.collection.findOne( <query>, <projection> )
• Update
– db.collection.update( <query>, <update>, <options> )
• Delete
– db.collection.remove( <query>, <justOne> )

Documents
var p = { author: “roger”,
date: new Date(),
title: “Spirited Away”,
avgRating: 9.834,
tags: [“Tezuka”, “Manga”]}
> db.posts.save(p)
27

Linked vs Embedded Documents
{ _id :
ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 …”,
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ],
comments : [
{
author : "Fred",
date : "Sat Jul 26 2010…”,
text : "Best Movie Ever"
}
],
avgRating: 9.834 }
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT
tags : [ "Tezuka", "Manga" ],
comments : [ 6, 274, 1135, 1298, 2245, 5623],
avg_rating: 9.834 }
comments { _id : 274,
movie_id : ObjectId(“4c4ba5c0672c6
author : "Fred",
date : "Sat Jul 24 2010 20:51:0
text : "Best Movie Ever”}
{ _id : 275,
movie_id : ObjectId(“3d5ffc88
author : "Fred",
28

Querying
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
tags : [ "Tezuka", "Manga" ] }
Note:
- _id is unique, but can be anything you’d like
29

Query Operators
• Conditional Operators
– $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
– $lt, $lte, $gt, $gte
// find posts with any tags
> db.posts.find( {tags: {$exists: true }} )
// find posts matching a regular expression
> db.posts.find( {author: /^rog*/i } )
// count posts by author
> db.posts.find( {author: ‘roger’} ).count()
30

Atomic Operations
• $set, $unset, $inc, $push, $pushAll, $pull, $pullAll,
$bit
> comment = { author: “fred”,
date: new Date(),
text: “Best Movie Ever”}
> db.posts.update( { _id: “...” },
$push: {comments: comment} );
31

Arrays
• $push - append
• $pushAll – append array
• $addToSet and $each – add if not contained,
add list
• $pop – remove last
• $pull – remove all occurrences/criteria
• { $pull : { field : {$gt: 3} } }
• $pullAll - removes all occurrences of each
value 32

Indexes
// Index nested documents
> db.posts.ensureIndex( “comments.author”:1 )
> db.posts.find({‘comments.author’:’Fred’})
// Index on tags (array values)
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: ’Manga’ } )
// geospatial index
> db.posts.ensureIndex({ “author.location”: “2d” )
> db.posts.find( “author.location” : { $near : [22,42] } )
Create index on any Field in Document
>db.posts.ensureIndex({author: 1})
33

Aggregation/Batch Data Processing
• Map/Reduce can be used for batch data processing
– Currently being used for totaling, averaging, etc
– Map/Reduce is a big hammer
• Simple aggregate functions available
• (2.2) Aggregation Framework: Simple, Fast
– No Javascript Needed, runs natively on server
– Filter or Select Only Matching Sub-documents or
Arrays via new operators
• MongoDB Hadoop Connector
– Useful for Hadoop Integration
– Massive Batch Processing Jobs
34

• Data Redundancy
• Automatic Failover / High Availability
• Distribution of read load
• Disaster recovery
Why Replicate?
36

Replica Sets
Asynchronous
Replication
37

Replica Sets
• One primary, many secondaries
– Automatic replication to all secondaries
• Different delays may be configured
– Automatic election of new primary on failure
– Writes to primaries, reads can go to secondaries
• Priority of secondary can be set
– Hidden for administration/back-ups
– Lower score for less powerful machines
• Election of new primary is automatic
– Majority of replica set must be available
– Arbiters can be used
• Many configurations possible (based on use case)
38

Replica Sets
Asynchronous
Replication
39

Replica Sets
Automatic
Leader Election
41

Sharding
mongod
Write Scalability
Key Range
0..100
mongod mongod
Key Range
0..30
Key Range
31..100
44

Sharding
mongod mongod
Write Scalability
Key Range
0..30
Key Range
31..100
Write Scalability
45

Sharding
mongod mongod
mongod mongod
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
Write Scalability
Key Range
31..100
46

• Splitting data into chunks
– Automatic
– Existing data can be manually “pre-split”
• Migration of chunks/balancing between servers
– Automatic
– Can be turned off/chunks can be manually moved
• Shard key
– Must be selected by you
– Very important for performance!
• Each shard is really a replica set
Sharding Administration
47

Full Deployment
mongod mongod
mongod mongod
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
Write Scalability
MongoS MongoS MongoS
Primary
Secondary
Secondary
Key Range
0..30
Primary
Secondary
Secondary
Key Range
31..60
Primary
Secondary
Secondary
Key Range
61..90
Primary
Secondary
Secondary
Key Range
91.. 100
48

Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
MongoS MongoS
Queries
MongoS
Config
Config
Config
49

MMS: MongoDB Monitoring Service
• SaaS solution providing instrumentation and visibility
into your MongoDB systems
• 3,500+ customers signed up and using service
50

Agenda
• MongoDB and the New Frontier
• Demo
51

Agenda
• MongoDB and the New Frontier
• Demo
52

Queries
• Importing Data into Mongodb
– mongoimport --db test --collection restaurants --
file dataset.json
• Exporting Data from MongoDB
– mongoexport -db test -collection newcolln -file
myexport.json
53

Queries
• MongoDB query operation:
• Query in SQL
54

Create/Insert Queries
• Db.collection.insert()
db.inventory.insert(
{ item: "ABC1",
details: { model: "14Q3", manufacturer: "XYZ
Company" },
stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],
category: "clothing" } )
55

Find Queries
• db.collection.find()
• db.inventory.find( {} )
• db.inventory.find( { type: "snacks" } )
• db.inventory.find( { type: { $in: [ 'food', 'snacks'
] } } )
• db.inventory.find( { type: 'food', price: { $lt:
9.95 } } )
• db.inventory.find( { $or: [ { qty: { $gt: 100 } }, {
price: { $lt: 9.95 } } ] } )
56

Update Queries
• To use operators on fields of a subdoc use $
db.inventory.update( { item: "MNO2" },
{ $set: { category: "apparel", details: { model: "14Q3",
manufacturer: "XYZ Company" } },
$currentDate: { lastModified: true } },
false, true )
• False: Update by replacement
• True: update all matching documents
57

Queries
• Aggregation
– SQL Query
– SELECT state, SUM(pop) AS totalPop FROM
zipcodes GROUP BY state HAVING totalPop >=
(10*1000*1000)
– MongoDB
– db.zipcodes.aggregate( [ { $group: { _id: "$state",
totalPop: { $sum: "$pop" } } }, { $match: { totalPop:
{ $gte: 10*1000*1000 } } } ] )
58

Indexes
• db.collection.find({field:’value’}).explain()
• db.collection. ensureIndex({title: 1 });
• db.collection.dropIndex("index_name");
• db.mycoll.ensureIndex({'address.coord': ‘2d’})
• db.mycoll.find({"address.coord": { $near: [70,
40], $minDistance: 0.05 }})
59

Indexed and NonIndexed
Search
> db.mycoll.find({"name" : "Tov Kosher Kitchen"}).pretty().explain()
{
"cursor" : "BtreeCursor name_1","isMultiKey" : false, "n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,…
}
> db.mycoll.find({"cuisine" : "Jewish/Kosher"}).pretty().explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 316,
"nscannedObjects" : 25359,
"nscanned" : 25359,…
}
>
60

MongoDB.pdf

More Related Content

Similar to MongoDB.pdf

Recently uploaded

MongoDB.pdf

More Related Content

Similar to MongoDB.pdf

Recently uploaded

MongoDB.pdf