MongoDB
Comprehensive and extremely detailed guide to MongoDB, covering architecture, data modeling, CRUD operations, indexing, aggregation, replication, sharding, transactions, security, monitoring, schema design strategies, performance tuning, backup, and best practices with full conceptual explanations.
MongoDB is a powerful NoSQL database designed for high scalability, flexibility, and performance. Unlike relational databases, it stores data as BSON (Binary JSON) documents, allowing for nested objects and arrays.
MongoDB Architecture Explained
MongoDB’s architecture consists of several components working together:
- mongod: The core database server process that handles data storage and client requests.
- mongos: A routing service used in sharded clusters to direct queries to the correct shard.
- Config Server: Holds metadata about the cluster structure (used in sharding).
- Replica Set: A group of
mongodinstances providing redundancy and fault tolerance. - Shard: Each shard stores a subset of data in a distributed manner.
Why this architecture?
- High availability through replication.
- Horizontal scalability using sharding.
- Schema flexibility enabling rapid development.
Data Modeling Strategies – Detailed Concepts
Embedded vs. Referenced Data
Embedded Documents (Denormalization)
When data is tightly coupled and always accessed together, embedding is preferred. Example:
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"username": "harsha",
"email": "harsha@example.com",
"posts": [
{ "title": "First Post", "content": "Welcome to MongoDB" },
{ "title": "Second Post", "content": "Advanced NoSQL concepts" }
]
}Pros:
- Single document read → Fast performance.
- Easier atomic updates on the whole document.
Cons:
- Document size limit (16MB max).
- Difficult to query sub-documents independently.
Referenced Documents (Normalization)
When data grows large or changes frequently, referencing avoids duplication:
// User document
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"username": "harsha",
"email": "harsha@example.com"
}
// Post document
{
"_id": ObjectId("607f1f77bcf86cd799439022"),
"userId": "507f1f77bcf86cd799439011",
"title": "First Post",
"content": "Welcome to MongoDB"
}Pros:
- Smaller document size.
- Easier updates on individual documents.
Cons:
- More round-trip queries needed → Slower reads.
CRUD Operations – In Depth
Create
db.users.insertOne({
username: 'alice',
email: 'alice@example.com',
createdAt: new Date()
});insertMany for bulk inserts:
db.users.insertMany([
{ username: 'bob', email: 'bob@example.com' },
{ username: 'carol', email: 'carol@example.com' }
]);Read
.find()returns a cursor..findOne()returns a single document.
db.users.find({ username: 'alice' }).pretty();Projection limits fields returned:
db.users.find({}, { username: 1, email: 1 });Update
Use atomic operators like $set, $inc:
db.users.updateOne(
{ username: 'alice' },
{ $set: { email: 'alice123@example.com' } }
);Upsert creates a new document if none match:
db.users.updateOne(
{ username: 'eve' },
{ $set: { email: 'eve@example.com' } },
{ upsert: true }
);Delete
db.users.deleteOne({ username: 'bob' });
db.users.deleteMany({ createdAt: { $lt: new Date('2023-01-01') } });Indexing – Detailed Concepts
An index is a data structure that improves the speed of data retrieval.
- Single Field Index:
{ username: 1 } - Compound Index:
{ username: 1, email: -1 }
Example:
db.users.createIndex({ username: 1 });Text Index for Search:
db.articles.createIndex({ content: "text" });TTL Index: Automatically deletes documents after a time period.
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 });Aggregation Framework – Full Explanation
Aggregation pipelines process documents through stages:
$match: Filter documents.$group: Group by fields.$project: Shape documents.$sort,$limit,$skip: Pagination and ordering.
Example – Count Orders by Customer:
db.orders.aggregate([
{ $match: { status: 'completed' } },
{ $group: { _id: '$customerId', totalSpent: { $sum: '$amount' } } },
{ $sort: { totalSpent: -1 } }
]);Faceted aggregation example:
db.products.aggregate([
{
$facet: {
priceStats: [
{ $group: { _id: null, avgPrice: { $avg: '$price' }, maxPrice: { $max: '$price' } } }
],
categoryCounts: [
{ $group: { _id: '$category', count: { $sum: 1 } } }
]
}
}
]);Replication – Explained in Depth
A Replica Set is a group of mongod processes:
- One Primary node handles writes.
- Multiple Secondary nodes replicate oplog and serve reads.
Benefits:
- Data redundancy → High availability.
- Failover: Automatic election of a new primary if primary fails.
Example Initiation:
rs.initiate();
rs.add('mongodb-secondary:27017');
rs.status();Monitor replication lag:
rs.printSlaveReplicationInfo();⚡ Pro Tip: Use read preference to balance load between primary and secondaries.
Sharding – In-Depth Explanation
Sharding partitions data across multiple servers (shards).
Why Shard?
- Datasets too big for a single machine.
- High write throughput.
Key Concepts
- Shard Key: Determines document distribution.
- Must have high cardinality and uniform distribution.
- Avoid monotonically increasing keys (e.g., timestamps).
Example Setup:
sh.enableSharding('mydatabase');
sh.shardCollection('mydatabase.users', { username: 1 });Architecture Flow:
- Application →
mongos→ Appropriate shard based on shard key. - Config Server stores metadata mapping.
Transactions – Full Concept
Multi-document transactions provide ACID guarantees (Atomicity, Consistency, Isolation, Durability).
Example:
const session = client.startSession();
session.withTransaction(async () => {
await db.accounts.updateOne(
{ _id: 'acc1' },
{ $inc: { balance: -100 } },
{ session }
);
await db.accounts.updateOne(
{ _id: 'acc2' },
{ $inc: { balance: 100 } },
{ session }
);
});⚠️ Overhead: Transactions are heavier than single-document writes. Use only when necessary.
Security – Detailed Best Practices
User Authentication
use admin;
db.createUser({
user: 'admin',
pwd: 'securepassword',
roles: [{ role: 'userAdminAnyDatabase', db: 'admin' }]
});Enable in mongod.conf:
security:
authorization: "enabled"Role-Based Access Control (RBAC)
- Fine-grained permissions.
- Example Roles:
readWrite,dbAdmin.
Data Encryption
- Data at Rest: Enable filesystem-level encryption.
- In-Transit: Enable TLS/SSL.
Monitoring and Performance Tuning – In Detail
Monitoring Tools
mongostat: Real-time stats (operations per second).mongotop: Tracks read/write per collection.- MongoDB Atlas: Built-in performance monitoring dashboard.
Example Monitoring Commands:
mongostat --host localhost
mongotop --host localhostPerformance Tuning
- Limit document size (< 16MB).
- Use projection to fetch only necessary fields.
- Avoid inefficient queries (full collection scans).
- Index only necessary fields.
Backup and Restore Strategies – In Depth
Backup with mongodump
mongodump --db=mydatabase --out=/backup/mydatabaseRestore with mongorestore
mongorestore --db=mydatabase /backup/mydatabase⚡ Pro Tip: Use filesystem-level snapshots for large datasets.
Schema Design Best Practices – Full Explanation
- Embed for tightly-coupled data.
- Reference for large, frequently changing datasets.
- Avoid very large arrays inside documents.
- Apply schema validation:
db.createCollection('users', {
validator: {
$jsonSchema: {
bsonType: 'object',
required: ['username', 'email'],
properties: {
username: { bsonType: 'string' },
email: { bsonType: 'string' }
}
}
}
});