Introduction

Christian Kvalheim - christkv@10gen.com
                @christkv
Today's Talk
• Quick introduction to NoSQL

• Some Background about mongoDB

• Using mongoDB

• Deploying mongoDB
Database Landscape

                               memcached
                                    key/value
   scalability & performance




                                                                RDBMS




                                       depth of functionality
What is NoSQL?




Key / Value   Column   Graph   Document
Key-Value Stores
• A mapping from a key to a value
• The store doesn't know anything about the the key
  or value
• The store doesn't know anything about the insides
  of the value
• Operations
 • Set, get, or delete a key-value pair
Column-Oriented Stores
• Like a relational store, but flipped around: all data
 for a column is kept together
• An index provides a means to get a column value for a
  record
• Operations:
• Get, insert, delete records; updating fields
• Streaming column data in and out of Hadoop
Graph Databases
• Stores vertex-to-vertex edges
• Operations:
 • Getting and setting edges
 • Sometimes possible to annotate vertices or edges
• Query languages support finding paths between
  vertices, subject to various constraints
Document Stores
• The store is a container for documents
• Documents are made up of named fields
 • Fields may or may not have type definitions
 • e.g. XSDs for XML stores, vs. schema-less JSON stores
• Can create "secondary indexes"
• These provide the ability to query on any document field(s)
• Operations:
• Insert and delete documents
• Update fields within documents
What is mongoDB?
MongoDB is a scalable, high-performance,
open source NoSQL database.

• Document-oriented storage
• Full Index Support
• Replication & High Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates
• Map/Reduce
• GridFS
• Company behind mongoDB
 – (A)GPL license, own copyrights, engineering team
 – support, consulting, commercial license

• Management
 – Google/DoubleClick, Oracle, Apple, NetApp
 – Funding: Sequoia, Union Square, Flybridge
 – Offices in NYC, Palo Alto, London, Dublin
 – 100+ employees
Where can you use it?
MongoDB is Implemented in C++
• Platforms 32/64 bit Windows, Linux, Mac OS-X,
  FreeBSD, Solaris

Drivers are available in many languages

10gen supported
• C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript,
  Perl, PHP, Python, Ruby, Scala, Node.JS

Community supported
• Clojure, ColdFusion, F#, Go, Groovy, Lua, R ...
  http://www.mongodb.org/display/DOCS/Drivers
History
• First release – February 2009
• v1.0 - August 2009
• v1.2 - December 2009 – MapReduce, ++
• v1.4 - March 2010 – Concurrency, Geo
• V1.6 - August 2010 – Sharding, Replica Sets
• V1.8 – March 2011 – Journaling, Geosphere
• V2.0 - Sep 2011 – V1 Indexes, Concurrency
• V2.2 - Soon - Aggregation, Concurrency
Terminology
RDBMS           MongoDB
Table           Collection
Row(s)          JSON Document
Index           Index
Join            Embedding & Linking
Partition       Shard
Partition Key   Shard Key
Documents
  Blog Post Document

> p = { author: "Chris",
         date: new ISODate(),
         text: "About MongoDB...",
         tags: ["tech", "databases"]}

> db.posts.save(p)
Querying

> db.posts.find()

   { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
     author : "Chris",
     date : ISODate("2012-02-02T11:52:27.442Z"),
     text : "About MongoDB...",
     tags : [ "tech", "databases" ] }

Notes:
     _id is unique, but can be anything you'd like
Introducing BSON
JSON has powerful, but limited set of datatypes
 • arrays, objects, strings, numbers and null

BSON is a binary representation of JSON
 • Adds extra dataypes with Date, Int types, Id, …
 • Optimized for performance and navigational abilities
 • And compression

MongoDB sends and stores data in BSON
 • bsonspec.org
Secondary Indexes
Create index on any Field in Document

//   1 means ascending, -1 means descending
 > db.posts.ensureIndex({author: 1})

> db.posts.findOne({author: 'Chris'})

  { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author: "Chris", ... }
Compound Indexes
Create index on multiple fields in a Document

//   1 means ascending, -1 means descending
 > db.posts.ensureIndex({author: 1, ts: -1})

> db.posts.find({author: 'Chris'}).sort({ts: -1})

  [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
     author: "Chris", ...},
   { _id : ObjectId("4f61d325c496820ceba84124"),
     author: "Chris", ...}]
Query Operators
Conditional Operators
- $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
- $lt, $lte, $gt, $gte

// find posts with any tags
> db.posts.find({tags: {$exists: true }})

// find posts matching a regular expression
> db.posts.find({author: /^ro*/i })

// count posts by author
> db.posts.find({author: 'Chris'}).count()
Examine the query plan
> db.posts.find({"author": 'Ross'}).explain()
{
	    "cursor" : "BtreeCursor author_1",
	    "nscanned" : 1,
	    "nscannedObjects" : 1,
	    "n" : 1,
	    "millis" : 0,
	    "indexBounds" : {
	    	   "author" : [
	    	   	    [
	    	   	    	   "Chris",
	    	   	    	   "Chris"
	    	   	    ]
	    	   ]
	    }
}
Atomic Operations
  $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

// Create a comment
> new_comment = { author: "Fred",
                      date: new Date(),
                      text: "Best Post Ever!"}

// Add to post
> db.posts.update({ _id: "..." },
  	     	   	     {"$push": {comments: new_comment},
                   "$inc": {comments_count: 1}
                  });
Nested Documents
    {       _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
            author : "Chris",
            date : "Thu Feb 02 2012 11:50:01",
            text : "About MongoDB...",
            tags : [ "tech", "databases" ],
            comments : [{
	           	   author : "Fred",
	           	   date : "Fri Feb 03 2012 13:23:11",
	           	   text : "Best Post Ever!"
	           }],
            comment_count : 1
        }
Nested Documents
    {       _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
            author : "Chris",
            date : "Thu Feb 02 2012 11:50:01",
            text : "About MongoDB...",
            tags : [ "tech", "databases" ],
            comments : [{
	           	   author : "Fred",
	           	   date : "Fri Feb 03 2012 13:23:11",
	           	   text : "Best Post Ever!"
	           }],
            comment_count : 1
        }
Secondary Indexes
// Index nested documents
> db.posts.ensureIndex("comments.author": 1)
> db.posts.find({"comments.author": "Fred"})

// Index on tags (multi-key index)
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: "tech" } )
Geo
  • Geo-spatial queries
   • Require a geo index
   • Find points near a given point
   • Find points within a polygon/sphere


// geospatial index
> db.posts.ensureIndex( "author.location": "2d" )
> db.posts.find( "author.location" :
                 { $near : [22, 42] } )
Map Reduce
    The caller provides map and reduce functions written
    in JavaScript
// Emit each tag
> map = "this['tags'].forEach(
    function(item) {emit(item, 1);}
  );"

// Calculate totals
> reduce = "function(key, values) {
     var total = 0;
     var valuesSize = values.length;
     for (var i=0; i < valuesSize; i++) {
       total += parseInt(values[i], 10);
     }
     return total;
  };
Map Reduce
// run the map reduce
> db.posts.mapReduce(map, reduce, {"out": { inline : 1}});
{
	    "results" : [
	    	    {"_id" : "databases", "value" : 1},
	    	    {"_id" : "tech", "value" : 1 }
	    ],
	    "timeMillis" : 1,
	    "counts" : {
	    	    "input" : 1,
	    	    "emit" : 2,
	    	    "reduce" : 0,
	    	    "output" : 2
	    },
	    "ok" : 1,
}
Aggregation - coming in 2.2
// Count tags
> agg = db.posts.aggregate(
    {$unwind: "$tags"},
    {$group : {_id : "$tags",
               count : {$sum: 1}}}
  )

> agg.result
  [{"_id": "databases", "count": 1},
   {"_id": "tech", "count": 1}]
GridFS
 Save files in mongoDB
 Stream data back to the client

// (Python) Create a new instance of GridFS
>>> fs = gridfs.GridFS(db)

// Save file to mongo
>>> my_image = open('my_image.jpg', 'r')
>>> file_id = fs.put(my_image)

// Read file
>>> fs.get(file_id).read()
Rich Documents

• Intuitive
• Developer friendly
• Encapsulates whole objects
• Performant
• They are scalable
Rich Documents
{   _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    line_items : [ { sku: 'tt-123',
                     name: 'Coltrane: Impressions' },
                   { ski: 'tt-457',
                     name: 'Davis: Kind of Blue' } ],
    address : { name: 'Banker',
                street: '111 Main',
                zip: 10010 },
    payment: { cc: 4567,
               exp: Date(2012, 7, 7) },
    subtotal: 2355
}
Deployment

• Single server
 - need a strong backup plan   P
Deployment

• Single server
 - need a strong backup plan       P
• Replica sets
 - High availability           P   S   S
 - Automatic failover
Deployment

• Single server
 - need a strong backup plan       P
• Replica sets
 - High availability           P   S   S
 - Automatic failover

• Sharded
 - Horizontally scale
 - Auto balancing              P   S   S

                               P   S   S
MongoDB Use Cases
• Archiving
• Content Management
• Ecommerce
• Finance
• Gaming
• Government
• Metadata Storage
• News & Media
• Online Advertising
• Online Collaboration
• Real-time stats/analytics
• Social Networks
• Telecommunications
In Production
download at mongodb.org

     conferences, appearances, and meetups
                http://www.10gen.com/events



   Facebook             |    Twitter   |        LinkedIn
http://bit.ly/mongofb       @mongodb   http://linkd.in/joinmongo


  support, training, and this talk brought to you by