Skip to main content

Why Your Bulk Data Load Slows Down Over Time: The Hidden Cost of Index Maintenance

 

Introduction

Have you ever started a bulk data load that ran lightning-fast initially, only to watch it grind to a crawl hours later? You're not alone. This is one of the most common—and most misunderstood—performance issues in Oracle databases.

In this post, I'll share a real-world case study where a data load that started at 550,000 rows/hour degraded to just 40,000 rows/hour—a 93% performance drop. More importantly, I'll explain exactly why this happens and how to fix it.

The Scenario

Our team was loading data into a staging table as part of a nightly ETL process. The setup looked like this:

Table Configuration:

CREATE TABLE STG_CUSTOMER_EXTRACT (
    CUSTOMER_ID       NUMBER,
    ACCOUNT_NUMBER    VARCHAR2(50),
    BILLING_ID        VARCHAR2(30),
    REGION_CODE       VARCHAR2(10),
    LOAD_DATE         DATE
);

-- Three indexes to support downstream queries
CREATE INDEX IDX_CUSTOMER_ID ON STG_CUSTOMER_EXTRACT(CUSTOMER_ID);
CREATE INDEX IDX_ACCOUNT_NUM ON STG_CUSTOMER_EXTRACT(ACCOUNT_NUMBER);
CREATE INDEX IDX_BILLING_ID ON STG_CUSTOMER_EXTRACT(BILLING_ID);


-- Three indexes to support downstream queries

CREATE INDEX IDX_CUSTOMER_ID ON STG_CUSTOMER_EXTRACT(CUSTOMER_ID);

CREATE INDEX IDX_ACCOUNT_NUM ON STG_CUSTOMER_EXTRACT(ACCOUNT_NUMBER);

CREATE INDEX IDX_BILLING_ID ON STG_CUSTOMER_EXTRACT(BILLING_ID);

AttributeValue
Target Row Count87 million
Table Size~8 GB
Number of Indexes3 B-tree indexes
Load MethodRow-by-row INSERT

Nothing unusual, right? Just a standard staging table with some indexes. What could go wrong?

The Observation: A Tale of Two Performance Profiles

We monitored the load throughout the night and observed something alarming:

TimeRows LoadedInsert RateAvg Time/InsertStatus
3:00 PM550,000550K/hour0.003 sec✅ Great
4:00 PM800,000248K/hour0.011 sec⚠️ Slowing
5:00 PM850,00051K/hour0.066 sec🔴 Problem
9:00 PM2.1M44K/hour0.078 sec🔴 Critical
2:00 AM3.2M26K/hour0.101 sec🔴 Severe

The insert time increased by 3,367% (from 0.003s to 0.101s per row).

But here's what really caught our attention—the Logical I/O per insert also increased dramatically:

TimeLogical I/O per Insert
3:00 PM561
5:00 PM794
2:00 AM805

Why would a simple INSERT require 42% more I/O as the load progressed?

The Root Cause: B-tree Index Maintenance

The answer lies in understanding how Oracle maintains B-tree indexes during INSERT operations.


What Happens on Every Single INSERT

When you insert one row into a table with 3 indexes, Oracle must:

Step 1: INSERT the row into a table data block → 2-3 I/Os

Step 2: For EACH of the 3 indexes, Oracle must:

Navigate from Root block

Traverse to correct Branch block

Find the correct Leaf block

Insert the index entry in sorted order

If the leaf block is full → SPLIT the block

Cost per index: 4-6 I/Os

Total I/O per INSERT: 14-21 logical I/Os (instead of just 2-3 without indexes)

Visual: The Hidden Work Behind One INSERT

INSERT 1 ROW INTO TABLE
2-3 I/Os for table
INDEX #1
CUSTOMER_ID
(4-6 I/Os)
Root Block
Branch Block
Leaf Block
INSERT HERE
INDEX #2
ACCOUNT_NUM
(4-6 I/Os)
Root Block
Branch Block
Leaf Block
INSERT HERE
INDEX #3
BILLING_ID
(4-6 I/Os)
Root Block
Branch Block
Leaf Block
INSERT HERE


Multiply this by 87 million rows, and you begin to see the problem.

Why Performance Degrades Over Time

Here's the critical insight: B-tree indexes grow taller as more data is inserted.

The Index Height Problem

As more data is inserted, indexes grow taller, requiring more I/O operations for each insert:
MilestoneRows LoadedIndex HeightI/O per InsertLeaf Block SplitsInsert TimeStatus
Start0 - 20M2-3 levels~6 I/OsRare0.003s✅ Fast
Early20M - 50M3-4 levels~9 I/OsModerate0.02s⚠️ Slowing
Middle50M - 87M4 levels~12 I/OsFrequent0.05s🔴 Slow
End87M+4-5 levels15-18 I/OsVery Frequent0.08-0.10s🔴 Critical
 Key Insight: Insert time increased by 33x (from 0.003s to 0.10s) as the index grew from height 2 to height 5!

Understanding the Progression

At 0-20M rows: Indexes are small (height 2-3). Navigation is fast. Block splits are rare.
At 20-50M rows: Indexes grow taller (height 3-4). More blocks to traverse. Block splits become common.
At 50-87M rows: Indexes are tall (height 4-5). Every insert requires navigating through multiple levels. Block splits are frequent.

The Block Split Problem

When an index leaf block is full and you need to insert a new entry, Oracle must split the block:
BEFORE SPLIT
Leaf Block (FULL) 🔴
A, B, C, D, E, F, G, H
Problem: Need to insert "C1" but block is full!
AFTER SPLIT
Original Leaf Block (50%)
A, B, C, D
NEW Leaf Block (50%)
C1, E, F, G, H

Result: One full block becomes two half-full blocks. The parent branch block must also be updated to point to both blocks.

Cost of Each Block Split

OperationAdditional I/OsRedo Generated
Allocate new block1Yes
Write original block (half entries)1Yes
Write new block (other half)1Yes
Update parent branch block2Yes
Total6+ I/OsSignificant

The Redo Log Cascade

Every index change generates redo log entries. Here's what a single INSERT produces:
// Redo entries generated per INSERT:
├─ Table block change
├─ Undo block for table change
├─ Index #1 leaf block change
├─ Index #1 undo entry
├─ Index #2 leaf block change
├─ Index #2 undo entry
├─ Index #3 leaf block change
├─ Index #3 undo entry
└─ Additional entries for any block splits


This explains the wait events we observed in our AWR report:

Wait EventIncrease from Baseline
log file sync+993%
log file switch (checkpoint incomplete)+548%
write complete waits+7,329%
The database was drowning in redo log activity.

The Math: Why This Matters

Let's compare the total I/O for our 87-million row load:

❌ With Indexes (Current State)

Per Row:
• Table INSERT: 2 I/Os
• Index #1: 5 I/Os (avg)
• Index #2: 5 I/Os (avg)
• Index #3: 5 I/Os (avg)
• Block splits: 2 I/Os (avg)
Total: 19 I/Os per row
For 87M rows:
19 × 87,000,000 = 1.65 BILLION I/Os

✅ Without Indexes (Optimized)

Per Row:
• Table INSERT: 2 I/Os
Total: 2 I/Os per row
For 87M rows:
2 × 87,000,000 = 174M I/Os
Plus index rebuild (one-time):
~300 million I/Os
Grand Total: 474 million I/Os
Result: 71% reduction in total I/O

The Solution: Load First, Index Later

The fix is elegantly simple: disable indexes during the load, then rebuild them afterward.

Step 1: Prepare the Table (Pre-Load)

-- Reduce redo generation during load
ALTER TABLE STG_CUSTOMER_EXTRACT NOLOGGING;

-- Make indexes unusable (no maintenance during INSERT)
ALTER INDEX IDX_CUSTOMER_ID UNUSABLE;
ALTER INDEX IDX_ACCOUNT_NUM UNUSABLE;
ALTER INDEX IDX_BILLING_ID UNUSABLE;

Step 2: Load the Data
SQL*Loader (Recommended for Files)

sqlldr userid=etl_user/password \
    control=customer_extract.ctl \
    DIRECT=TRUE \
    PARALLEL=TRUE \
    ROWS=500000 \
    log=customer_extract.log

Step 3: Rebuild Indexes (Post-Load)

-- Rebuild indexes using parallel processing
ALTER INDEX IDX_CUSTOMER_ID REBUILD PARALLEL 8 NOLOGGING;
ALTER INDEX IDX_ACCOUNT_NUM REBUILD PARALLEL 8 NOLOGGING;
ALTER INDEX IDX_BILLING_ID REBUILD PARALLEL 8 NOLOGGING;

-- Reset to normal settings
ALTER TABLE STG_CUSTOMER_EXTRACT LOGGING;
ALTER INDEX IDX_CUSTOMER_ID LOGGING NOPARALLEL;
ALTER INDEX IDX_ACCOUNT_NUM LOGGING NOPARALLEL;
ALTER INDEX IDX_BILLING_ID LOGGING NOPARALLEL;

-- Gather fresh statistics
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'STG_CUSTOMER_EXTRACT');

Results: Before and After

MetricBefore (With Indexes)After (Optimized)Improvement
Insert rate40,000/hour (degraded)2,000,000+/hour50x faster
Time per insert0.08-0.10 sec0.001-0.003 sec30-100x faster
Total I/O1.65 billion474 million71% less
Redo generatedMassiveMinimal90%+ reduction
Total load time10+ hours45 min + 15 min rebuild90% faster
Database impactSevere contentionMinimalDramatically improved

When to Use This Approach

✅ Ideal Scenarios

Bulk loading millions of rows
Initial data migrations
Nightly ETL batch processes
Data warehouse staging loads
Any scenario where indexes aren't queried during load

❌ Not Recommended For

OLTP systems with concurrent reads
When unique constraints must be enforced row-by-row
Small loads (under 100K rows—overhead isn't worth it)
When foreign key constraints reference the indexed columns

Important Considerations

Recovery Implications

When using NOLOGGING:

Changes are not recoverable via archive logs
Take a backup immediately after the load completes
Alternatively, use LOGGING if point-in-time recovery is critical
Constraint Handling
If you have unique constraints:

Key Takeaways

Index maintenance is expensive - Each INSERT must update every index on the table.
The cost grows over time - As indexes get taller, navigation requires more I/O.
Block splits compound the problem - Full leaf blocks trigger expensive split operations.
Redo generation multiplies - Every index change generates redo log entries.
The solution is simple - Disable indexes during load, rebuild afterward.
Bulk operations win - Building an index once is far more efficient than maintaining it row-by-row.

Conclusion
The next time you're loading millions of rows and notice performance degrading over time, remember: your indexes might be the culprit.

The counterintuitive solution—removing indexes during load—can transform a 10-hour nightmare into a 1-hour success story. Your database (and your on-call rotation) will thank you.

Comments

Popular posts from this blog

🚀 Automating Oracle Database Patching with Ansible: A Complete Guide

Oracle database patching has long been the bane of DBAs everywhere. It's a critical task that requires precision, expertise, and often results in extended maintenance windows. What if I told you that you could automate this entire process, reducing both risk and downtime while ensuring consistency across your Oracle estate? 💡 In this comprehensive guide, I'll walk you through a production-ready Ansible playbook that completely automates Oracle patch application using OPatch. Whether you're managing a single Oracle instance or hundreds of databases across your enterprise, this solution will transform your patch management strategy! 🎯 🔥 The Challenge: Why Oracle Patching is Complex Before diving into the solution, let's understand why Oracle patching is so challenging: 🔗 Multiple dependencies : OPatch versions, Oracle Home configurations, running processes ⚠️ Risk of corruption : Incorrect patch application can render databases unusable ⏰ Downtime requirements : Da...

Oracle RAC Switchover & Switchback: Step-by-Step Guide

 Ensuring business continuity requires regular Disaster Recovery (DR) drills. This guide covers the Switchover and Switchback process between Primary (DC) and Standby (DR) databases . Pre-checks Before Performing Switchover Before starting the activity, ensure there are no active sessions in the database. If any are found, share the session details with the application team, get their confirmation, and terminate the sessions. Primary Database Name: PRIMARY Standby Database Name: STANDBY  Identify Active Sessions set lines 999 pages 999 col machine for a30 col username for a30 col program for a30 compute sum of count on report break on report select inst_id,username,osuser,machine,program,status,count(1) "count" from gv$session where inst_id=1 and program like 'JDBC%' group by inst_id,username,osuser,machine,program,status order by 1,2; select inst_id,username,osuser,machine,program,status,count(1) "count" from gv$session where inst_id=2 and program lik...

Mastering Oracle RAC with SRVCTL Commands

Oracle Real Application Clusters (RAC) provide high availability, scalability, and manageability for databases. One of the most powerful tools for managing RAC databases is srvctl , a command-line utility that allows administrators to control various database services. This blog explores essential srvctl commands to help you efficiently manage Oracle RAC environments. 1. Checking Database Configuration and Status  List all available databases on the host:                  srvctl config database   Check the status of a specific database and its instances:                    srvctl status database -d <database_name>   Retrieve detailed status information about a database, including its instances and states:                    srvctl status database -d <database_name> -v 2. Stopping and Starting Databases   ...