Investigating High Churn in Oracle

Problem: Had an issue whereby the archive log volume was filling up. This post tries to explain steps taken to find the root cause of the churn.

Log Switch Frequency Map….

The first step was to verify if in fact the churn was anything out of the ordinary. We use Quest Toad so I was able to easily see the number of Log Switches per hour. [Database | Diagnose | Log Switch Frequency Map]

image

The Frequency Map quickly shows that since around 11am on 18/07 there was a big increase in the number of Log Switches. Our database is set to Log Switch every 90MB so to go from averaging less that 90MB an Hour to around 6GB an hour is a big deal.

If you don’t have access to Toad, then the query below also shows the log frequency.

SELECT TO_CHAR(FIRST_TIME,'YYYY-MM-DD') DAY,
TO_CHAR(FIRST_TIME,'HH24') HOUR,
COUNT(*) TOTAL
FROM V$LOG_HISTORY
GROUP BY TO_CHAR(FIRST_TIME,'YYYY-MM-DD'),TO_CHAR(FIRST_TIME,'HH24')
ORDER BY TO_CHAR(FIRST_TIME,'YYYY-MM-DD'),TO_CHAR(FIRST_TIME,'HH24')
ASC;

Session Browser….

Check Session browser or gv$session as a first step to investigating churn as the issue may be glaringly obvious. More details on T-SQL for session investigation can be found in my blog entry T-SQL: Session Locks, Blocks & Waits.

Grid Control….

The grid control Top Activity section under performance is a good starting point to see what is going on, on the database. I found a couple of good pointers with some SQL being run to Update a table, which i could then query with developers.

Automatic Workload Repository (AWR) Reports….

The next step was to find out what had been running when all the log switches were happening. To do this I ran an AWR report covering a full hour from 9am to 10am when there had been128 log switches.

[Grid Control | Targets | Databases | DB_Name | Performance | Top Activity | Additonal Monitoring Links – Snapshots]

image

Select the beginning Time then click Go:

image

Change the Action Drop Down to [View report] and Click Go:

image

Then Choose a End Snapshot which i have chosen as 10am, then Click OK.

image

Once the report was created, i analysed some areas which were to do with high level of log switches, which suggests a lot of writes and tablespace activity.

image

The Tablespace IO Stats shows the Tablespace which is getting the most activity and quite quickly i could see the number of writes against this 1 tablespace was more 17 times higher than any other user tablespace. (342,188 versus 20,308)

That helped me narrow down where i needed to look which was further confirmed by the Segments by Physical Writes section in the AWR Report.

image

image

Other useful sections are SQL Ordered by Execution which will show how many times a particular SQL ID has been executed.

Log Miner…

To see more in depth details of what was in a particular archive log we can log mine in Quest Toad for Oracle. [Database | Diagnose | LogMiner]

image

Then Choose Files to LogMine. I chose only 1 which was at a period of high activity so covering a small space in time. You can choose either Redo logs or Archived logs.

image

Confirm the LogMining Range and then clock Finish.

image

I normally increase the number of columns that are returned to include Segment Owner and Segment Name so that i can then narrow down to the Schema which i found to be causing the problem during the AWR report.

image

Then Click the Play Button and the LogMining will start.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: