BIG DATA MODULE 2 :
MONGODB
1
Summary
Chapter 1: Introduction to NoSQL
Chapter 2: Operation CRUD
Chapter 3: Requests
Chapter 4: Framework Aggregation
2
The Relational World
Characteristics:
Structured Data (Table/Schema)
Standardized Data (Normal Forms)
Standard (SQL)
Transactional (ACID)
Complex Query (Join)
Constraints(Data Integrity)
NoSQL
NoSQL databases (understand Not Only SQL) responding to the
following issues :
- high availability
- high read and/or write performancethe
- processing of large volumes of data.
NoSQL
A NoSQL engine is...
- A data structure
- An absence of constraints
- A modelling method
- A data schema... flexible
- A relationship with an object model of the querying
- Lack of transactions and compromises
NoSQL or SQL ?
CAP theorem
NoSQL solutions
NoSQL solutions
Key/value : in the manner of associative arrays of programming languages.
A key corresponds to a value.
Column oriented: a key corresponds to a set of columns, each with a value.
Document oriented: to a key corresponds sets fields/values that can be prioritized
Graph-oriented: data is modeled as nodes that have links between them.
NoSQL & Document Oriented: Example
MongoDB
MongoDB is a document-oriented database sponsored by 10gen.
A "document" is an entry in a database.
JSON that Mongo stores in binary (BSON)
MongoDB
● Document-oriented basis means:
that the stored objects are represented as a BSON document
Allows you to easily map to the objects we manipulate in our programs.
● MongoDB is schemaless.
No document schema is required to store the data.
MongoDB
MongoDB is a DBMS:
- document-oriented
- free
- scalable: replication, auto-sharding
- flexible: no data schema, full-text index
- written in C++.
Operational database
Document Data Model
Shell and Drivers
Vocabulary
Basic commands
Basic commands
CHAPTER 2
CRUD Operation
Document JSON-Format
Official Site: [Link]
A json document is of the form: { } (object) or [ ] (array).
Document JSON-value
JSON-Example-1
JSON-Example-2
JSON-Example-3
Object with
nested arrays
and objects
Operation CRUD
The operations of CRUD MongoDB are available in the form of
functions /methods
(API) of a programming language and not as a separate language
(i.e. SQL).
Vocabulary:
MongoDB language
● Update
● [Link]()
● [Link]()
● [Link]()
● [Link]()
● Question
● [Link]()
● [Link]()
Insert, Find & Count
[Link]({…})
The collection unique ID field is called “_id” and can be provided. If not provided an
ObjectID will be generated based on the time, machine, process-id and process
dependent counter.
“_id” does not have to be a scalar value – it can be a document, e.g. _id : {a:1, b:’ronald’}
[Link] || findOne ({…}, {field1 : true, …}).pretty()
//no argument will find all docs
[Link]({…})
Creating a document
Three ways to insert a document into mongoDB:
- Insert a document with insert
- Insert a document with update
- Insert a document with save
Insert a document with insert
- A document is a table of key /values
-Documents is a list of table of key /values
- collection is the name of the collection you want add the document(s).
If the collection did not exist before, it is created (it is from this how collections are created)
Insert a document with insert
Insert a document with save
Case: The document contain _id 🡺It replace
Case: The document does not contain _id 🡺 It makes a insertion.
MAJ d’un document avec update
criteria is of the same form as for find
donnée_maj is a table of key / values defining operations on fields.
multi (Boolean):
- false (default): maj of an occurrence found (which one?)
- true: maj all occurrences
upsert (Boolean):
- false :update
- true : update or insert if not exists
Update a document with update
Update an array with update
[Link]({ myQuery }, {myField: “newValue”, … })
🡺 replaces the existing document
[Link]({ myQuery }, {$set : {myField: “newValue”}})
🡺 Create or update myField
[Link]({ myQuery }, {$inc : {age: 1}})
[Link]({ myQuery }, {$unset : {myField: 1}})
[Link]({ myQuery }, {$set : {myField: “newValue”}}, {upsert:true})
🡺 Create or update document specified by { myQuery } with myField
Arrays Update
.
Delete a document with remove
- without any parameter : all documents are deleted (be careful, then)
- query is of the same form as for find, it refers to the documents that will be deleted.
Delete a document with remove
Reading document with find
without parameters, all documents are returned
criteria is a table of key / values specifying operators on the fields of the documents sought
projection is a table to limit the fields you want to consult in the documents sought (this option will be
processed later)
Example:
[Link]()
[Link]( { ‘prenom’: ‘Camille’ } )
Reading document with find
Reading document with findOne
The findOne function does the same thing as find but without bothering with a
cursor, when you want to retrieve a unique document (by its identifier,
for example).
If the query designates multiple documents, the first one found is returned.
Cursors
CHAPTER 3
Request
mongoimport
Operators
Comparison Operators
Operator on elements
Operators Evaluated
Used to query the database
Operators: Logical
Query on a list (embedded)
Operator on a list
Query with notation « . »
Operator on a list
Aggregation framework
The aggregation framework extends the query and processing capabilities
of data.
With the new operators, developers will be able to sort and aggregate by
groups the data to which the queries relate and apply various operations to
them.
Pipeline
Example I
Example II
Operations Pipeline
$match: query, equivalent to find
$project: Formats results, including removing/adding fields
$unwind: allows you to stream table, each element of the table will be treated as a
document
$group: Aggregates data
$sort: Sorting features
$limit: Limits the number of documents returned
$skip: excludes certain documents from the result
Cardinality
$match
Use equivalent to find({...})
$sort, $limit, $skip
$sort: Sorting documents
Memory-intensive; Can't use index (at least after grouping)
[Link]([ {$match: { state:"NY" } }, {$sort: { population:-1 } } ])
$limit: Limits the number of documents
$skip: excludes certain documents from the result
Makes only sense when you do a sort first
First skip – then limit (order of the stages in the pipeline matter)
$project
$project: allows you to rework a document by adding, deleting, renaming … fields.
$project
● Remove keys - If you don't mention a key, it is not included, except for _id,
which must be explicitly suppressed {$project: {_id: 0, …
● Add keys (also possible to create new subdocuments)
● Keep keys: {$project: {myKey: 1, …
● Rename keys / Use functions: $toUpper, $toLower, $add, $multiply
$project
Another example
$unwind
$group
Group operations
$sum: Add one to a key
mySum: {$sum:1}) to count number of elements
sum_prices:{$sum:”$price”}) : sum of variables
$avg, $min, $max: Average, Minimum or maximum value of a key
Create arrays: $push, $addToSet
categories: {$addToSet: “$category”}
Only useful after a sort: $first, $last
{$group:{_id:"$_id.state", population:{$first:"$population"}}}