Technical Overview
Learnizo Global LLP
1
Agenda
• Welcome and Introductions
• The World We Live In
• MongoDB Technical Overview
• Use Case Discussion
• Demo
2
MongoDB – It’s About the Data
3
MongoDB, It’s for the Developers
4
MongoDB Brings It All Together
5
Volume of Data
Agile Development
• Cloud Computing
• Commodity servers
• Trillions of records
• 100’s of millions of
queries per second
• Iterative
• Continuous
Hardware Architectures
MongoDB Use Cases
User Data Management High Volume Data Feeds
Content Management Operational Intelligence Product Data Mgt
6
10gen: The Creators of MongoDB
Set the
direction &
contribute
code to
MongoDB
Foster
community
& ecosystem
Provide
MongoDB
management
services
Provide
commercial
services
• Founded in 2007
– Dwight Merriman, Eliot
Horowitz
– Doubleclick, Oracle,
Marklogic, HP
• $31M+ in funding
– Flybridge, Sequoia, Union
Square
• Worldwide Expanding
Team
– 150+ employees
– NY, CA and UK
7
Agenda
• Welcome and Introductions
• The World We Live In
• MongoDB Technical Overview
• Use Case Discussion
• Demo
8
Why was MongoDB built?
9
1
0
NoSQL
• Key-value
• Graph database
• Document-oriented
• Column family
Traditional Architecture
• Relational
– Hard to map to the way we code
• Complex ORM frameworks
– Hard to evolve quickly
• Rigid schema is hard to change, necessitates migrations
– Hard to scale horizontally
• Joins, transactions make scaling by adding servers hard
11
RDBMS Limitations
12
Productivity
Cost
MongoDB
• Built from the start to solve the
scaling problem
• Consistency, Availability, Partitioning
- (can’t have it all)
• Configurable to fit requirements
13
1
4
Theory of noSQL: CAP
CAP Theorem:
satisfying all three at the
same time is impossible
A P
• Many nodes
• Nodes contain replicas of
partitions of data
• Consistency
– all replicas contain the same
version of data
• Availability
– system remains operational on
failing nodes
• Partition tolarence
– multiple entry points
– system remains operational on
system split
C
Supported languages
15
1
6
ACID - BASE
• Atomicity
• Consistency
• Isolation
• Durability
• Basically
• Available (CP)
• Soft-state
• Eventually
consistent (AP)
MongoDB is easy to use
17
START TRANSACTION;
INSERT INTO contacts VALUES
(NULL, ‘joeblow’);
INSERT INTO contact_emails VALUES
( NULL, ”joe@blow.com”,
LAST_INSERT_ID() ),
( NULL, “joseph@blow.com”,
LAST_INSERT_ID() );
COMMIT;
db.contacts.save( {
userName: “joeblow”,
emailAddresses: [
“joe@blow.com”,
“joseph@blow.com” ] } );
MongoDB
MySQL
As simple as possible,
but no simpler
Depth of functionality
Scalability
&
Performance
Memcached
Key / Value
RDBMS
18
Representing & Querying Data
19
Schema design
 RDBMS: join
20
Schema design
 MongoDB: embed and link
 Embedding is the nesting of objects and arrays inside
a BSON document(prejoined). Links are references
between documents(client-side follow-up query).
 "contains" relationships, one to many; duplication of
data, many to many
21
Schema design
22
Tables to Collections
of JSON Documents
{
title: ‘MongoDB’,
contributors:
[
{ name: ‘Eliot Horowitz’,
email: ‘eliot@10gen.com’ },
{ name: ‘Dwight Merriman’,
email: ‘dwight@10gen.com’ }
],
model:
{
relational: false,
awesome: true
}
}
23
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
24
Documents
Collections contain documents
Documents can contain other documents
and/or
Documents can reference other documents
Flexible/powerful ability to relate data
Schemaless
Flexible Schema
25
2
6
CRUD
• Create
– db.collection.insert( <document> )
– db.collection.save( <document> )
– db.collection.update( <query>, <update>, { upsert: true } )
• Read
– db.collection.find( <query>, <projection> )
– db.collection.findOne( <query>, <projection> )
• Update
– db.collection.update( <query>, <update>, <options> )
• Delete
– db.collection.remove( <query>, <justOne> )
Documents
var p = { author: “roger”,
date: new Date(),
title: “Spirited Away”,
avgRating: 9.834,
tags: [“Tezuka”, “Manga”]}
> db.posts.save(p)
27
Linked vs Embedded Documents
{ _id :
ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 …”,
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ],
comments : [
{
author : "Fred",
date : "Sat Jul 26 2010…”,
text : "Best Movie Ever"
}
],
avgRating: 9.834 }
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ],
comments : [ 6, 274, 1135, 1298, 2245, 5623],
avg_rating: 9.834 }
comments { _id : 274,
movie_id : ObjectId(“4c4ba5c0672c6
author : "Fred",
date : "Sat Jul 24 2010 20:51:0
text : "Best Movie Ever”}
{ _id : 275,
movie_id : ObjectId(“3d5ffc88
author : "Fred",
28
Querying
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ] }
Note:
- _id is unique, but can be anything you’d like
29
Query Operators
• Conditional Operators
– $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
– $lt, $lte, $gt, $gte
// find posts with any tags
> db.posts.find( {tags: {$exists: true }} )
// find posts matching a regular expression
> db.posts.find( {author: /^rog*/i } )
// count posts by author
> db.posts.find( {author: ‘roger’} ).count()
30
Atomic Operations
• $set, $unset, $inc, $push, $pushAll, $pull, $pullAll,
$bit
> comment = { author: “fred”,
date: new Date(),
text: “Best Movie Ever”}
> db.posts.update( { _id: “...” },
$push: {comments: comment} );
31
Arrays
• $push - append
• $pushAll – append array
• $addToSet and $each – add if not contained,
add list
• $pop – remove last
• $pull – remove all occurrences/criteria
• { $pull : { field : {$gt: 3} } }
• $pullAll - removes all occurrences of each
value 32
Indexes
// Index nested documents
> db.posts.ensureIndex( “comments.author”:1 )
> db.posts.find({‘comments.author’:’Fred’})
// Index on tags (array values)
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: ’Manga’ } )
// geospatial index
> db.posts.ensureIndex({ “author.location”: “2d” )
> db.posts.find( “author.location” : { $near : [22,42] } )
Create index on any Field in Document
>db.posts.ensureIndex({author: 1})
33
Aggregation/Batch Data Processing
• Map/Reduce can be used for batch data processing
– Currently being used for totaling, averaging, etc
– Map/Reduce is a big hammer
• Simple aggregate functions available
• (2.2) Aggregation Framework: Simple, Fast
– No Javascript Needed, runs natively on server
– Filter or Select Only Matching Sub-documents or
Arrays via new operators
• MongoDB Hadoop Connector
– Useful for Hadoop Integration
– Massive Batch Processing Jobs
34
Deployment & Scaling
35
• Data Redundancy
• Automatic Failover / High Availability
• Distribution of read load
• Disaster recovery
Why Replicate?
36
Replica Sets
Asynchronous
Replication
37
Replica Sets
• One primary, many secondaries
– Automatic replication to all secondaries
• Different delays may be configured
– Automatic election of new primary on failure
– Writes to primaries, reads can go to secondaries
• Priority of secondary can be set
– Hidden for administration/back-ups
– Lower score for less powerful machines
• Election of new primary is automatic
– Majority of replica set must be available
– Arbiters can be used
• Many configurations possible (based on use case)
38
Replica Sets
Asynchronous
Replication
39
Replica Sets
40
Replica Sets
Automatic
Leader Election
41
Replica Sets
42
Sharding
43
Sharding
mongod
Write Scalability
Key Range
0..100
mongod mongod
Key Range
0..30
Key Range
31..100
44
Sharding
mongod mongod
Write Scalability
Key Range
0..30
Key Range
31..100
Write Scalability
45
Sharding
mongod mongod
mongod mongod
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
Write Scalability
Key Range
31..100
46
• Splitting data into chunks
– Automatic
– Existing data can be manually “pre-split”
• Migration of chunks/balancing between servers
– Automatic
– Can be turned off/chunks can be manually moved
• Shard key
– Must be selected by you
– Very important for performance!
• Each shard is really a replica set
Sharding Administration
47
Full Deployment
mongod mongod
mongod mongod
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
Write Scalability
MongoS MongoS MongoS
Primary
Secondary
Secondary
Key Range
0..30
Primary
Secondary
Secondary
Key Range
31..60
Primary
Secondary
Secondary
Key Range
61..90
Primary
Secondary
Secondary
Key Range
91.. 100
48
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
MongoS MongoS
Queries
MongoS
Config
Config
Config
49
MMS: MongoDB Monitoring Service
• SaaS solution providing instrumentation and visibility
into your MongoDB systems
• 3,500+ customers signed up and using service
50
Agenda
• Welcome and Introductions
• MongoDB and the New Frontier
• MongoDB Technical Overview
• Use Case Discussion
• Demo
51
Agenda
• Welcome and Introductions
• MongoDB and the New Frontier
• MongoDB Technical Overview
• Use Case Discussion
• Demo
52
Queries
• Importing Data into Mongodb
– mongoimport --db test --collection restaurants --
file dataset.json
• Exporting Data from MongoDB
– mongoexport -db test -collection newcolln -file
myexport.json
53
Queries
• MongoDB query operation:
• Query in SQL
54
Create/Insert Queries
• Db.collection.insert()
db.inventory.insert(
{ item: "ABC1",
details: { model: "14Q3", manufacturer: "XYZ
Company" },
stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],
category: "clothing" } )
55
Find Queries
• db.collection.find()
• db.inventory.find( {} )
• db.inventory.find( { type: "snacks" } )
• db.inventory.find( { type: { $in: [ 'food', 'snacks'
] } } )
• db.inventory.find( { type: 'food', price: { $lt:
9.95 } } )
• db.inventory.find( { $or: [ { qty: { $gt: 100 } }, {
price: { $lt: 9.95 } } ] } )
56
Update Queries
• To use operators on fields of a subdoc use $
db.inventory.update( { item: "MNO2" },
{ $set: { category: "apparel", details: { model: "14Q3",
manufacturer: "XYZ Company" } },
$currentDate: { lastModified: true } },
false, true )
• False: Update by replacement
• True: update all matching documents
57
Queries
• Aggregation
– SQL Query
– SELECT state, SUM(pop) AS totalPop FROM
zipcodes GROUP BY state HAVING totalPop >=
(10*1000*1000)
– MongoDB
– db.zipcodes.aggregate( [ { $group: { _id: "$state",
totalPop: { $sum: "$pop" } } }, { $match: { totalPop:
{ $gte: 10*1000*1000 } } } ] )
58
Indexes
• db.collection.find({field:’value’}).explain()
• db.collection. ensureIndex({title: 1 });
• db.collection.dropIndex("index_name");
• db.mycoll.ensureIndex({'address.coord': ‘2d’})
• db.mycoll.find({"address.coord": { $near: [70,
40], $minDistance: 0.05 }})
59
Indexed and NonIndexed
Search
> db.mycoll.find({"name" : "Tov Kosher Kitchen"}).pretty().explain()
{
"cursor" : "BtreeCursor name_1","isMultiKey" : false, "n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,…
}
> db.mycoll.find({"cuisine" : "Jewish/Kosher"}).pretty().explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 316,
"nscannedObjects" : 25359,
"nscanned" : 25359,…
}
>
60
Thank You
61