Understanding MongoDB Indexes
MongoDB stores data as BSON documents inside collections. Without indexes, every query performs a full collection scan, which scales linearly with document count and quickly becomes a bottleneck. Indexes act as lookup tables that map key values to document locations, enabling MongoDB to locate results with logarithmic complexity.
Index Storage Architecture
Under the hood, MongoDB uses a B‑Tree (specifically a B+Tree) structure for most index types. Each node contains sorted key entries and pointers to child nodes, guaranteeing O(log n) search, insert, and delete performance. The leaf nodes hold the document’s _id or a recordId, allowing direct access to the storage engine.
The storage engine (wiredTiger by default) writes index modifications to its own internal log, persisting them asynchronously. This design isolates index workload from the main data file, but it also means that heavy index builds can consume considerable I/O and CPU resources.
Why Index Selection Matters
Choosing the right index directly affects:
- Query latency - a well‑chosen index can reduce execution time from seconds to milliseconds.
- Write throughput - each additional index adds overhead to insert, update, and delete operations.
- Storage consumption - indexes duplicate data; over‑indexing can inflate disk usage.
Balancing these trade‑offs is the essence of a robust indexing strategy.
Code Example: Creating a Simple Index
// Node.js - using the native MongoDB driver
const { MongoClient } = require('mongodb');
async function createIndex() { const client = await MongoClient.connect('mongodb://localhost:27017', { useUnifiedTopology: true }); const db = client.db('blog'); const posts = db.collection('posts');
// Create an ascending index on the "author" field const result = await posts.createIndex({ author: 1 }, { name: 'idx_author' }); console.log('Index created:', result); await client.close(); }
createIndex().catch(console.error);
The createIndex command returns the index name, confirming successful creation. This basic pattern is the foundation for more sophisticated indexing tactics discussed later.
Choosing the Right Index Types
MongoDB offers a rich set of index types to address diverse query patterns. Selecting the appropriate type hinges on query predicates, sort requirements, and cardinality of the fields.
Single‑Field and Compound Indexes
- Single‑field indexes are ideal for queries that filter on a single attribute (e.g.,
status). - Compound indexes combine multiple fields and can satisfy both filter and sort stages when the field order matches the query pattern. The rule of thumb: place the most selective field first.
// Create a compound index on "category" (ascending) and "publishedAt" (descending)
await db.collection('articles').createIndex(
{ category: 1, publishedAt: -1 },
{ name: 'idx_category_published' }
);
Multikey Indexes
Arrays pose a challenge because each element can be indexed separately, producing a multikey index. This enables queries like tags: 'mongodb' to be efficient.
await db.collection('products').createIndex({ tags: 1 }, { name: 'idx_tags' });
Beware of index key explosion: if an array contains thousands of elements, the index size can balloon dramatically.
Text Indexes
Full‑text search across string fields uses a special text index. It tokenizes words, removes stop‑words, and supports language‑specific stemming.
await db.collection('comments').createIndex(
{ content: 'text' },
{ default_language: 'english', name: 'idx_comment_text' }
);
When you query with $text, MongoDB leverages the inverted index for fast relevance scoring.
Geospatial Indexes
Location‑based queries rely on 2d or 2dsphere indexes. The latter follows GeoJSON standards and supports spherical calculations.
await db.collection('places').createIndex(
{ location: '2dsphere' },
{ name: 'idx_location_2dsphere' }
);
Partial and Sparse Indexes
Partial indexes index only documents that meet a filter expression, reducing index size and write cost.
await db.collection('orders').createIndex(
{ status: 1 },
{ partialFilterExpression: { status: { $in: ['shipped', 'delivered'] } }, name: 'idx_status_partial' }
);
Sparse indexes omit documents missing the indexed field, useful for optional attributes.
When Not to Index
- Fields with low cardinality (e.g., boolean flags) rarely benefit from an index unless combined in a compound index.
- Highly volatile fields (frequent updates) may degrade write performance.
- Large arrays that cause key explosion should be reconsidered or stored in a separate collection.
Choosing the right combination of index types is a balancing act between read efficiency and write overhead.
Implementing an Effective Indexing Strategy
A systematic approach to indexing prevents ad‑hoc index creation, which often leads to redundant structures and hidden performance penalties.
Step 1: Profile Real‑World Queries
Enable the database profiler or use MongoDB Atlas Performance Advisor to capture slow queries. Prioritize queries that exceed your SLA (e.g., > 100 ms). Export the query shapes and identify common filter fields, sort orders, and projected fields.
// Enable profiling for slow operations (>100 ms)
await db.setProfilingLevel(1, { slowms: 100 });
Step 2: Design Indexes Around Query Patterns
Map each high‑frequency query to an index that satisfies filter → sort → projection stages. Follow the covering index principle: if an index contains all fields needed by the query, MongoDB can return results directly from the index without fetching the full document.
// Example covering index for a paginated feed
await db.collection('feed').createIndex(
{ userId: 1, createdAt: -1 },
{ name: 'idx_feed_user_created', projection: { _id: 1, postId: 1, createdAt: 1 } }
);
Step 3: Validate with Explain Plans
Run explain('executionStats') on the query to verify index usage, index bounds, and the number of documents examined.
const stats = await db.collection('feed').find({ userId: 'abc123' })
.sort({ createdAt: -1 })
.limit(20)
.explain('executionStats');
console.log(JSON.stringify(stats.queryPlanner.winningPlan, null, 2));
Key indicators of a good plan:
- IXSCAN stage present.
- totalDocsExamined ≈ totalKeysExamined.
- executionTimeMillis meets performance targets.
Step 4: Automate Index Builds in CI/CD
Treat index creation as part of schema migrations. Store index definitions in version‑controlled files and apply them during deployment.
yaml
Example MongoDB migration script (JavaScript file)
module.exports = async function (db) { await db.collection('users').createIndex({ email: 1 }, { unique: true, name: 'idx_email_unique' }); await db.collection('sessions').createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0, name: 'idx_sessions_ttl' }); };
Step 5: Monitor Index Health Over Time
Indexes degrade as data evolves. Use the collStats command to watch totalIndexSize and indexStats to track access frequency.
const stats = await db.command({ collStats: 'orders' });
console.log('Total index size (bytes):', stats.totalIndexSize);
const idxUsage = await db.collection('orders').aggregate([ { $indexStats: {} } ]).toArray(); console.log(idxUsage);
If an index shows accesses.ops close to zero for weeks, consider dropping it.
Step 6: Handle Index Rebuilds Gracefully
For large collections, use background index builds (default in modern MongoDB) to avoid blocking reads and writes. In sharded clusters, create the index on each shard sequentially to maintain balancer stability.
await db.collection('logs').createIndex({ timestamp: -1 }, { background: true });
Architectural Best Practices Summary
| Principle | Action |
|---|---|
| Read‑First | Profile queries before adding indexes. |
| Selectivity | Prioritize fields with high cardinality. |
| Compound Order | Place the most selective field first, followed by sort fields. |
| Covering | Include projected fields in the index to avoid document fetches. |
| Maintenance | Periodically review indexStats and drop unused indexes. |
| Automation | Manage index definitions via migrations under version control. |
By integrating these steps into the development lifecycle, teams can achieve predictable query performance while keeping write overhead and storage consumption in check.
FAQs
Frequently Asked Questions
1️⃣ When should I use a partial index instead of a regular compound index?
Partial indexes are ideal when only a subset of documents participates in a query pattern. For example, if you frequently query only active users, a partial index on { status: 1 } with the filter { status: "active" } reduces index size and write cost compared to indexing the entire collection.
2️⃣ How does MongoDB handle index updates during high‑write bursts?
Each write operation updates the relevant index entries in memory first, then flushes them to disk asynchronously. WiredTiger’s checkpoint mechanism ensures durability. However, an excessive number of indexes can increase lock contention and CPU usage, so keep the index count to the minimum required for your query workload.
3️⃣ Can I safely drop an index that appears to be unused?
Before dropping, confirm the index has zero or negligible accesses over a representative monitoring window (e.g., 7‑14 days). Use db.collection.aggregate([{ $indexStats: {} }]) to view accesses.ops. If the index is truly unused, dropping it will free storage and improve write throughput.
4️⃣ What is the impact of building indexes on a sharded collection?
Indexes must be created on each shard. MongoDB performs the build in parallel across shards, but the balancer may pause migrations temporarily. Using background or wildcard index builds mitigates disruption. Always test index creation on a staging replica of the sharded cluster before applying to production.
5️⃣ How do TTL indexes differ from regular indexes?
TTL (Time‑To‑Live) indexes automatically delete documents once a date field exceeds a configured expiration threshold. They are implemented as a special background thread that scans the indexed field at the expireAfterSeconds interval. TTL indexes are useful for session data, audits, or cache collections.
Conclusion
An effective MongoDB indexing strategy blends data‑driven profiling, thoughtful index selection, and continuous monitoring. By leveraging the appropriate index type-single‑field, compound, multikey, text, or geospatial-and adhering to best‑practice patterns such as covering indexes and partial indexes, developers can dramatically cut query latency while preserving write performance. Automation through migration scripts, coupled with regular indexStats reviews, ensures the index landscape evolves alongside the application.
Investing time in a disciplined indexing approach pays dividends: faster user experiences, reduced server costs, and a smoother path to scaling. Apply the steps outlined herein, and let your MongoDB deployment achieve its full performance potential.
