Investigating High Churn in Oracle

Problem: Had an issue whereby the archive log volume was filling up. This post tries to explain steps taken to find the root cause of the churn.

Log Switch Frequency Map….

The first step was to verify if in fact the churn was anything out of the ordinary. We use Quest Toad so I was able to easily see the number of Log Switches per hour. [Database | Diagnose | Log Switch Frequency Map]

image

The Frequency Map quickly shows that since around 11am on 18/07 there was a big increase in the number of Log Switches. Our database is set to Log Switch every 90MB so to go from averaging less that 90MB an Hour to around 6GB an hour is a big deal.

If you don’t have access to Toad, then the query below also shows the log frequency.

SELECT TO_CHAR(FIRST_TIME,'YYYY-MM-DD') DAY,
TO_CHAR(FIRST_TIME,'HH24') HOUR,
COUNT(*) TOTAL
FROM V$LOG_HISTORY
GROUP BY TO_CHAR(FIRST_TIME,'YYYY-MM-DD'),TO_CHAR(FIRST_TIME,'HH24')
ORDER BY TO_CHAR(FIRST_TIME,'YYYY-MM-DD'),TO_CHAR(FIRST_TIME,'HH24')
ASC;

Session Browser….

Check Session browser or gv$session as a first step to investigating churn as the issue may be glaringly obvious. More details on T-SQL for session investigation can be found in my blog entry T-SQL: Session Locks, Blocks & Waits.

Grid Control….

The grid control Top Activity section under performance is a good starting point to see what is going on, on the database. I found a couple of good pointers with some SQL being run to Update a table, which i could then query with developers.

Automatic Workload Repository (AWR) Reports….

The next step was to find out what had been running when all the log switches were happening. To do this I ran an AWR report covering a full hour from 9am to 10am when there had been128 log switches.

[Grid Control | Targets | Databases | DB_Name | Performance | Top Activity | Additonal Monitoring Links – Snapshots]

image

Select the beginning Time then click Go:

image

Change the Action Drop Down to [View report] and Click Go:

image

Then Choose a End Snapshot which i have chosen as 10am, then Click OK.

image

Once the report was created, i analysed some areas which were to do with high level of log switches, which suggests a lot of writes and tablespace activity.

image

The Tablespace IO Stats shows the Tablespace which is getting the most activity and quite quickly i could see the number of writes against this 1 tablespace was more 17 times higher than any other user tablespace. (342,188 versus 20,308)

That helped me narrow down where i needed to look which was further confirmed by the Segments by Physical Writes section in the AWR Report.

image

image

Other useful sections are SQL Ordered by Execution which will show how many times a particular SQL ID has been executed.

Log Miner…

To see more in depth details of what was in a particular archive log we can log mine in Quest Toad for Oracle. [Database | Diagnose | LogMiner]

image

Then Choose Files to LogMine. I chose only 1 which was at a period of high activity so covering a small space in time. You can choose either Redo logs or Archived logs.

image

Confirm the LogMining Range and then clock Finish.

image

I normally increase the number of columns that are returned to include Segment Owner and Segment Name so that i can then narrow down to the Schema which i found to be causing the problem during the AWR report.

image

Then Click the Play Button and the LogMining will start.

Advertisements

T-SQL: Investigating session activity

Shows what is currently running on an instance including SQL text.

SELECT 
DB_NAME(req.database_id) as [Database],
sqltext.TEXT,
req.session_id,
s.login_time,
req.status,
req.command,
req.cpu_time,
req.total_elapsed_time,
s.login_name, s.last_request_start_time,
req.reads, req.writes
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext
left join sys.dm_exec_sessions s on s.session_id = req.session_id
inner join sys.databases d on d.database_id = req.database_id
where req.session_id > 50

Shows details of all connected sessions on an instance, can be narrowed down by uncommenting login_name

SELECT
session_id, login_time, host_name, program_name, cpu_time, client_interface_name, 
login_name, [status], last_request_start_time, last_request_end_time 
FROM sys.dm_exec_sessions s
--where login_name = 'sa'
ORDER BY login_time DESC

T-SQL: Session Locks, Blocks & Waits

Find all locking and blocking on a database

SELECT    lpad(' ',DECODE(request,0,0,1))||sid||'('||inst_id||')' sess, 
        id1, id2, lmode, request, type, ctime, block
FROM gV$LOCK
WHERE (inst_id,id1) IN (SELECT inst_id,id1 FROM gV$LOCK WHERE lmode = 0)
ORDER BY id1,request

Query to Drill down access locks per spid / schema / Object name

SELECT /*+ RULE */
       lk.SID, se.username, se.osuser, se.machine,
       DECODE (lk.TYPE,
               'TX', 'Transaction',
               'TM', 'DML',
               'UL', 'PL/SQL User Lock',
               lk.TYPE
              ) lock_type,
       DECODE (lk.lmode,
               0, 'None',
               1, 'Null',
               2, 'Row-S (SS)',
               3, 'Row-X (SX)',
               4, 'Share',
               5, 'S/Row-X (SSX)',
               6, 'Exclusive',
               TO_CHAR (lk.lmode)
              ) mode_held,
       DECODE (lk.request,
               0, 'None',
               1, 'Null',
               2, 'Row-S (SS)',
               3, 'Row-X (SX)',
               4, 'Share',
               5, 'S/Row-X (SSX)',
               6, 'Exclusive',
               TO_CHAR (lk.request)
              ) mode_requested,
       TO_CHAR (lk.id1) lock_id1, TO_CHAR (lk.id2) lock_id2, ob.owner,
       ob.object_type, ob.object_name,
       DECODE (lk.BLOCK, 0, 'No', 1, 'Yes', 2, 'Global') BLOCK, 
       se.lockwait
  FROM v$lock lk, dba_objects ob, v$session se
 WHERE lk.TYPE IN ('TM', 'UL')
   AND lk.SID = se.SID
   AND lk.id1 = ob.object_id(+)
   AND se.username = 'SCHEMA_NAME';
   --AND (lk.SID = '452');
   --and ob.object_name = 'ACCESSORY';

Show all tables which have any access lock mode on them:

SELECT b.inst_id,
       b.session_id AS sid,
       NVL(b.oracle_username, '(oracle)') AS username,
       a.owner AS object_owner,
       a.object_name,
       Decode(b.locked_mode, 0, 'None',
                             1, 'Null (NULL)',
                             2, 'Row-S (SS)',
                             3, 'Row-X (SX)',
                             4, 'Share (S)',
                             5, 'S/Row-X (SSX)',
                             6, 'Exclusive (X)',
                             b.locked_mode) locked_mode,
       b.os_user_name
FROM   dba_objects a,
       gv$locked_object b
WHERE  a.object_id = b.object_id
ORDER BY 1, 2, 3, 4;

View all Wait Class Descriptions

SELECT name, wait_class FROM V$EVENT_NAME ORDER BY name; I

Bug 7234778 – Unnecessary "cursor: pin S wait on X" waits [ID 7234778.8]

HOW TO FIND BLOCKING SESSION FOR MUTEX WAIT EVENT cursor: pin S wait on X [ID 786507.1]

SELECT p2raw 
,to_number(substr(to_char(rawtohex(p2raw)), 1, 8), 'XXXXXXXX') sid 
FROM v$session 
WHERE event = 'cursor: pin S wait on X'

Show waits relating to Library Cache Pin

--Run as SYS
SELECT sid, event, p1raw, seconds_in_wait, wait_time
FROM sys.gv_$session_wait
WHERE event = 'library cache pin'
AND state = 'WAITING';

Prioritise session waits by the session waiting the longest

--prioritizes session waits by the session waiting the longest
SELECT sid,
       username,
       event,
       total_waits,
       100 * round((total_waits / sum_waits),2) pct_of_total_waits,
       time_wait_sec,
       total_timeouts,
       average_wait_sec,
       max_wait_sec
FROM
(SELECT a.event, 
       b.sid sid,
       decode (b.username,null,c.name,b.username) username,
       a.total_waits total_waits,
       round((a.time_waited / 100),2) time_wait_sec,
       a.total_timeouts total_timeouts,
       round((average_wait / 100),2)
       average_wait_sec,
       round((a.max_wait / 100),2) max_wait_sec
  FROM sys.v_$session_event a, 
       sys.v_$session b,
       sys.v_$bgprocess c,
       sys.v_$process d
 WHERE a.event NOT IN
          ('lock element cleanup',
          'pmon timer',
          'rdbms ipc message',
          'smon timer',
          'SQL*Net message from client',
          'SQL*Net break/reset to client',
          'SQL*Net message to client',
          'SQL*Net more data from client',
          'dispatcher timer',
          'Null event',
          'parallel query dequeue wait',
          'parallel query idle wait - Slaves',
          'pipe get',
          'PL/SQL lock timer',
          'slave wait',
          'virtual circuit status',
          'WMON goes to sleep'
          )
   AND a.event NOT LIKE 'DFS%'
   AND a.event NOT LIKE 'KXFX%'
   AND a.sid = b.sid
   AND d.addr = b.paddr 
   AND c.paddr (+) = b.paddr
   and b.username is not NULL  
),
(select sum(total_waits) sum_waits
 FROM sys.v_$session_event a, 
       sys.v_$session b
 WHERE a.event NOT IN
          ('lock element cleanup',
          'pmon timer',
          'rdbms ipc message',
          'smon timer',
          'SQL*Net message from client',
          'SQL*Net break/reset to client',
          'SQL*Net more data from client',
          'SQL*Net message to client',
          'dispatcher timer',
          'Null event',
          'parallel query dequeue wait',
          'parallel query idle wait - Slaves',
          'pipe get',
          'PL/SQL lock timer',
          'slave wait',
          'virtual circuit status',
          'WMON goes to sleep'
          )
   AND a.event NOT LIKE 'DFS%'
   AND a.event NOT LIKE 'KXFX%'
   and b.username is not NULL
   AND a.sid = b.sid)
order by 6 desc, 1 asc

Prioritises wait by their wait time and filters out most ‘idle’ waits

--prioritizes waits by their wait time and filters out most "idle" waits
select event,
       total_waits,
       round(100 * (total_waits / sum_waits),2) pct_waits,
       time_wait_sec,
       round(100 * (time_wait_sec / greatest(sum_time_waited,1)),2)
       pct_time_waited,
       total_timeouts,
       round(100 * (total_timeouts / greatest(sum_timeouts,1)),2) 
       pct_timeouts,
       average_wait_sec
from
(select event,
       total_waits,
       round((time_waited / 100),2) time_wait_sec,
       total_timeouts,
       round((average_wait / 100),2) average_wait_sec
from sys.v_$system_event
where event not in 
('lock element cleanup', 
 'pmon timer', 
 'rdbms ipc message',
 'rdbms ipc reply',
 'smon timer', 
 'SQL*Net message from client', 
 'SQL*Net break/reset to client',
 'SQL*Net message to client',
 'SQL*Net more data from client',
 'dispatcher timer',
 'Null event',
 'parallel query dequeue wait',
 'parallel query idle wait - Slaves',
 'pipe get',
 'PL/SQL lock timer',
 'slave wait',
 'virtual circuit status',
 'WMON goes to sleep',
 'jobq slave wait',
 'Queue Monitor Wait',
 'wakeup time manager',
 'PX Idle Wait') AND 
 event not like 'DFS%' AND 
 event not like 'KXFX%'),
(select sum(total_waits) sum_waits,
        sum(total_timeouts) sum_timeouts,
        sum(round((time_waited / 100),2)) sum_time_waited
 from sys.v_$system_event
 where event not in 
 ('lock element cleanup', 
 'pmon timer', 
 'rdbms ipc message',
 'rdbms ipc reply',
 'smon timer', 
 'SQL*Net message from client', 
 'SQL*Net break/reset to client',
 'SQL*Net message to client',
 'SQL*Net more data from client',
 'dispatcher timer',
 'Null event',
 'parallel query dequeue wait',
 'parallel query idle wait - Slaves',
 'pipe get',
 'PL/SQL lock timer',
 'slave wait',
 'virtual circuit status',
 'WMON goes to sleep',
 'jobq slave wait',
 'Queue Monitor Wait',
 'wakeup time manager',
 'PX Idle Wait') AND 
 event not like 'DFS%' AND 
 event not like 'KXFX%')
order by 4 desc, 1 asc

T-SQL to find all columns in DB of a certain data type

This script finds all columns which are of a certain data type including any user defined data types.

SELECT OBJECT_NAME(c.OBJECT_ID) TableName, c.name ColumnName, 
    t.name, c.max_length, c.precision, c.scale
FROM sys.columns AS c
JOIN sys.types AS t ON c.user_type_id=t.user_type_id
WHERE t.name = 'tinyint' --you can change text to other datatypes
--and OBJECT_NAME(c.OBJECT_ID) = 'Class_card'
ORDER BY c.OBJECT_ID;
GO

Introduction to MySQL

Decided to learn a bit more about MySQL as my team at work will soon be taking on some MySQL instances.

Installation: http://dev.mysql.com/downloads/ (MySQL 5.5)

How to Install: http://www.wikihow.com/Install-the-MySQL-Database-Server-on-Your-Windows-PC

The installation on windows is fairly painless. took around 5-10 minutes to install with simple click through windows.

Administration via a GUI?: I started using the MySQL Workbench (5.2 CE) which ships with the installation. Other tools are available including probably the most widely known which is the MAAT Kit (http://www.maatkit.org/)

Workbench home screen:

image

Workbench simple Query Editor:

image

Admin Page:

image

Command Line: the command line interface initially didn’t work for me. To resolve this i had to add the MySQL bin directory (C:\Program Files (x86)\MySQL\MySQL Server 5.5\bin) into the environment variables on my Machine.

To do this on vista Right click my Computer | Properties | Advanced System Settings [Left pane] | Environment Variables…

Connecting via CMD

mysql -u root -p -h localhost

Some other Simple commands:

show databases;

use database;

show tables;

Reference Manual: http://dev.mysql.com/doc/refman/5.5/en/index.html

DELETE Top N records in a batch script (T-SQL)

BEGIN

-- Number of rows to delete per batch
DECLARE @N INT 
-- Total count of rows matching specified criterion
DECLARE @cnt decimal(18,2) 
-- No. of times DELETE statement should loop to delete all relevent records
DECLARE @loops INT 

SET @N = 1000000

SELECT @cnt = COUNT(*) FROM aagtrans 
    WHERE    (amount_1 =0.00 and amount_2 = 0.00 
            and amount_3 = 0.00 and amount_4 = 0.00)

SET @loops = CEILING(@cnt/@N)

WHILE @loops > 0
    BEGIN
        DELETE TOP (@N) FROM aagtrans 
        WHERE    (amount_1 =0.00 and amount_2 = 0.00 
                and amount_3 = 0.00 and amount_4 = 0.00)
        
        SET @loops = @loops - 1
    END

END

ORA-01436: CONNECT BY loop in user data

We have been continuously getting the following error [ORA-01436: CONNECT BY loop in user data] whenever we run a refresh via a refresh group with more than 1 mview. This is happening on several different schemas.

The data refreshes fine as individual refreshes (COMPLETE)

Full Error

ORA-12012: error on auto execute of job 5839709
ORA-01436: CONNECT BY loop in user data
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2545
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2751
ORA-06512: at "SYS.DBMS_IREFRESH", line 685
ORA-06512: at "SYS.DBMS_REFRESH", line 195
ORA-06512: at line 1

The Following has been attempted:

Refresh of mviews manually – WORKS

Refresh of a refresh group with only 1 mview – WORKS

Refresh of a refresh group with more than 1 mview – ERRORs

Workaround

This looks to be caused by some sort of dependency loop with object_id’s for mviews. use the SQL below to locate offending objects which have a looping dependency

SELECT    p_obj# p, CONNECT_BY_ISCYCLE "Cycle",
        CONCAT(SYS_CONNECT_BY_PATH(d_obj#,'/'), '/') x 
FROM ( SELECT p_obj#, d_obj# 
FROM sys.dependency$, sys.obj$ p, sys.obj$ d 
WHERE p_obj# = p.obj# AND d_obj# = d.obj# AND p.type# = 42 AND d.type# = 42) d
CONNECT BY NOCYCLE PRIOR d.p_obj# = d.d_obj# ;

Locate the row where Cycle is 1, in that row we will get the object ids of the involved objects (P column is parent object, X column is the chain of dependent objects)

Then find the offending objects by running the following:

select * from dba_objects
where object_id in (17679693, 17679697);

Where object_ID’s are where Cycle was equal to 1 in the previous query.

Drop and Recreate any mviews that are returned. This will alleviate the problem. This can be confirmed by running the following and no rows will be returned.

SELECT    p_obj# p, CONNECT_BY_ISCYCLE "Cycle",
        CONCAT(SYS_CONNECT_BY_PATH(d_obj#,'/'), '/') x 
FROM ( SELECT p_obj#, d_obj# 
FROM sys.dependency$, sys.obj$ p, sys.obj$ d 
WHERE p_obj# = p.obj# AND d_obj# = d.obj# AND p.type# = 42 AND d.type# = 42) d
WHERE  CONNECT_BY_ISCYCLE = 1
CONNECT BY NOCYCLE PRIOR d.p_obj# = d.d_obj# ;

Further Info

ORA-01436 While Refreshing a Refresh Group [ID 793843.1]

OERR: ORA 1436 CONNECT BY loop in user data [ID 18840.1]