Efficiently retrieving Oracle CLOBS in high latency environments

The Problem

We have recently come across an interesting scenario whereby a java application living within a container in Kubernetes in GCP needs to talk back to an Oracle 12.1 database in an on premise DC.

There are several things that are a given in these situations; that the latency in the chatter between a cloud environment and an on premise environment will be Higher than two services talking in the same DC, and depending on the way you look at it, creating a high latency situation will highlight poor code/queries or inefficiencies that may well have been hidden for years on premise due to the sub 2ms latency that developers and ops were lucky enough to have…

So, althougth there is short term pain, there is definately a lot to gain in spending the resource effort in creating efficiencies in the app to database conversations.


So how do you start investigating an application developer complaining that his database queries are taking 70-80 seconds since moving the application to the cloud, queries that used to take 2-3 seconds? Mainly by breaking down the problem and finding little nuggets (or red herrings!) to concentrate on.

We are lucky to have a multi disciplined squad/team so we are able to quickly troubleshoot from various angles including database, networking and our kubernetes architecture.

What metrics did we have?

Although we like to trust our developers, we also like to have proven data or logs to back up slowness claims. Our developer provided us with Kibana Logs and Graphana dashboards which showed that in some cases a particular heavy query was taking a significant amount of time (greather than 30 seconds up to 80 seconds). The application had a timeout of 30 seconds so in these cases, it was user impacting.

We also had the query which was causing the problem, and from the Kibana logs we could see that the issue was only happening on big data sets – or was more exponential on big datasets.

Creating a safe testing environment

We knew we had a problem with the query in production, but I didnt want to affect Prod performance even more so we backported a copy of the poorly performing dataset into our QA database environment.

As an example, if we continued to use prod to test the Select query, we may be inadvertantly flushing out hot data from the buffer pool and replacing it with out testing data.

Creating a repeatable test and finding patterns

Generally when I am investigating a problem, my approach is normally scientific and make sure I am doing both a Fair test and a Controlled experiment. A fair test is important as when changing variables you should only change one thing at a time, and controlling a test is important so that you have one test that follows the normal expected behaviour.

The query itself always returned 3727 rows with around 35MB of data from the database.

I embedded the query into a script and set some sqlplus options:

spool qadb21-local-5000.log
set ARRAYSIZE 5000
set TIMING on
set TERMOUT off
SELECT /*+ ALL_ROWS() */ V.ID as V1,A.*, V.*,M.*
WHERE A.ID = '2dfdda64-e15c-4cdd-9d56-2db1d013c6a0'
spool off

AUTOTRACE TRACEONLY – I didnt want to write any results to the spool – only the trace. Reason for this is that its possible the writing of the result set to a file would add seconds to the query completion.

ARRAYSIZE – To test fetch size in sqlplus – I tested multiple array sizes of 50, 125, 250, 1000, 2500 & 5000

ALL_ROWS Hint – Just to force the same optimizer results behaviour.

I tested 4 scenarios in total:

  • Running the query directly on the Oracle DB [<1 second runtime]
  • Running the query from my laptop to the Oracle DB [25-30 seconds runtime]
  • Running the query froma container in our GCP testing project to the Oracle DB [1:50 – 2:00 minutes runtime]
  • Running the query from a container in our GCP PreProd project to the oracle DB [1:50 – 2:00 minutes runtime]

By testing 2 different GCP Projects against our database, we can see if the problem is environment specific or a general error. The query was returning in similar durations both in our testing project and our preprod project so we deduced that it was something environment independent – but continued to doublecheck our networking config.

Below is an extract of the kind of output I was using to make sure the test stayed fair. I only took results if the recursive calls were 0 – i.e no hard parses.

0 recursive calls
0 db block gets
32406 consistent gets
28457 physical reads
0 redo size
3065681 bytes sent via SQL*Net to client
522366 bytes received via SQL*Net from client
6008 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3727 rows processed

As the example output above shows, we were seeing over 6000 rountrips on 3727 rows of data. This rang some alarm bells as in a high latency environment, every rountrip is exponential so this was a key area of focus.

SQLPlus has a default fetchsize of 15 so evern at its worst, we should be seeing no more than approx. 500 roundtrips (rows processed / fetchsize * 2).

I setup a sql trace against my connection and this also showed a high level of chatter between the database and client in all test cases. (https://oracle-base.com/articles/misc/sql-trace-10046-trcsess-and-tkprof)


FETCH #139847718960616:c=0,e=4,p=0,cr=0,cu=0,mis=0,r=0,dep=4,og=4,plh=2542797530,tim=15317741724313
CLOSE #139847718960616:c=0,e=1,dep=4,type=3,tim=15317741724339
EXEC #139847718275456:c=0,e=22,p=0,cr=0,cu=0,mis=0,r=0,dep=4,og=4,plh=3765558045,tim=15317741724381
FETCH #139847718275456:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=4,og=4,plh=3765558045,tim=15317741724409
FETCH #139847718275456:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=1,dep=4,og=4,plh=3765558045,tim=15317741724424
FETCH #139847718275456:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=0,dep=4,og=4,plh=3765558045,tim=15317741724435
CLOSE #139847718275456:c=0,e=0,dep=4,type=3,tim=15317741724446


Our network engineers testing various parts of our cloud to on prem architecture including transfer speeds between different environments, How many hops it was taking to get to and from the database, and packet loss.

We found that the network route the query was taking was doing one extra hop, so we did some remediation work to correct this – but we didnt see much impact on query times [reduction of less than 1-2 seconds].

We also did a tcpdump of the packets between my laptop and the database server, to see what was actually happening when the query ran…

sudo tcpdump -i en0 host -w query.pcap
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes
08:58:05.142087 IP > ICMP echo request, id 35751, seq 0, length 64
08:58:05.146186 IP > ICMP echo reply, id 35751, seq 0, length 64
2 packets captured
804 packets received by filter
0 packets dropped by kernel

We ran the pcap file thorugh wireshark and could see straight away that the query was constantly going back and forth to the database – thousands of times in a 2 minute period. This matched up pretty well to what we found with the database statistics showing a high level of roundtrips, and also the SQL Trace output.

Digging Deeper into the fetch size

I broke up the query and looked into all the data type of the result set and found that there was a CLOB column which was storing json data. After investigating I found that sqlplus does individual row retrieval for queries with CLOBs. SQLPlus has been developed so that it doesnt use a feth array when retriving CLOBS (https://asktom.oracle.com/pls/apex/asktom.search?tag=performance-issue-with-clob).

To test that the CLOBs were the culprit, I converted the CLOB data to varchar2 and reran the query and found that the data was returned under 1 seconds with 2 roundtrips.

UPDATE M SET METRICS_DATA_VAR = dbms_lob.substr( METRICS_DATA, 4000, 1 );

How does the Java Driver CLOB retrieval work?

It seems it works in a similar fashion to the sqlplus retrieval. Our CLOBs ranged between 4200 and 5000 characters. The default fetch size for CLOBS is 4000 characters so potentially you could have 2 fetches for 1 CLOB depending on the size.

As of Oracle 11.2g JDBC Driver you can use a prefetch.

if (statement1 instanceof OracleStatement) {
    ((OracleStatement) statement1).setLobPrefetchSize(50000);

or can be done globally for the application;

System.setProperty(OracleConnection.CONNECTION_PROPERTY_DEFAULT_LOB_PREFETCH_SIZE, "50000");


Setting the CONNECTION_PROPERTY_DEFAULT_LOB_PREFETCH_SIZE to 50000 has made a huge difference to our application performance. Why 50000? I worked it out based on being able to fit 10 rows per fetch but not overloading the application java heap.

We have seen queries that were taking 70 seconds on big datasets now taking less that 3 seconds. Round trip latency is the major factor at play here and by reducing the number of fetch trips the query is making to retrieve the data, it significantly boosts performance.

ORA-00020: maximum number of processes exceeded

We got the following error on our Development database

866322-Wed Mar 28 15:56:40 2018
866347:ORA-00020: maximum number of processes (1256) exceeded

An easy way to see how exhausted your processes/sessions are getting is by running:

FROM gv$resource_limit 
WHERE resource_name in ('processes','sessions')

Which will give something like this – max utilization is the high water mark level:



From the screenshot above, you can see the processes on instance 1 have been fully exhausted at some point, but currently are fine.

You can also see which machines/schemas are causing any potential process exhaustion:

select distinct
from    gv$session s,
        gv$process p
where   s.paddr       =  p.addr
and     s.inst_id     =  p.inst_id
GROUP BY         s.inst_id,
ORDER BY 4 desc;

Nice summary of the difference between Processes, Sessions and Connections from AskTom:

A connection is a physical circuit between you and the database. A connection might be one of many types — most popular begin DEDICATED server and SHARED server. Zero, one or more sessions may be established over a given connection to the database as show above with sqlplus. A process will be used by a session to execute statements. Sometimes there is a one to one relationship between CONNECTION->SESSION->PROCESS (eg: a normal dedicated server connection). Sometimes there is a one to many from connection to sessions (eg: like autotrace, one connection, two sessions, one process). A process does not have to be dedicated to a specific connection or session however, for example when using shared server (MTS), your SESSION will grab a process from a pool of processes in order to execute a statement. When the call is over, that process is released back to the pool of processes.

ORA-00742: Log read detects lost write on dataguarded standby RA database

We had an issue whereby the standby of our 5 node RAC cluster had the following errors reported:

ORA-00742: Log read detects lost write in thread 4 sequence 101460 block 6145
ORA-00312: online log 29 thread 4: '/oraredo/RACDB1_A/standby_redo29a.log'
Tue Feb 20 08:41:44 2018
MRP0: Background Media Recovery process shutdown (RACDB11)
Tue Feb 20 08:45:43 2018
Errors in file /u01/app/oracle/diag/rdbms/RACDB1dg/RACDB11/trace/RACDB11_arc3_18603.trc:
ORA-00742: Log read detects lost write in thread 4 sequence 101460 block 6145
ORA-00312: online log 29 thread 4: '/oraredo/RACDB1_A/standby_redo29a.log'
Tue Feb 20 08:45:43 2018
ARC3: All Archive destinations made inactive due to error 742
Tue Feb 20 08:45:43 2018
ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/oraarch/RACDB1/4_101460_813692162.arc' (error 742) (RACDB11)
Tue Feb 20 08:45:43 2018
ARC4: Archiving not possible: error count exceeded
Tue Feb 20 08:49:55 2018
Archived Log entry 444009 added for thread 4 sequence 101491 rlc 813692162 ID 0x974e092 dest 2:
RFS[30]: No standby redo logfiles available for T-4
RFS[30]: Opened log for thread 4 sequence 101492 dbid 4059507730 branch 813692162

This caused the following Cloud Control alert:

EM Event: Critical:RACDB1 - The Data Guard status of RACDB1DG is Error ORA-16766: Redo Apply is stopped.

On researching the error, it looked like logical log corruption on the standby redo log 29 thread 4.

MOS advised that there were some problems on reaching the stanbdy local archiving disk form pirmary, therefore the log corruption occurred. OERR: ORA-7286 “sksagdi: cannot obtain device information.” Reference Note ( Doc ID 20342.1 )

We confirmed with our Network Administrators to make sure the following Firewall Features are disabled.

SQLNet fixup protocol 
Deep Packet Inspection (DPI) 
SQLNet packet inspection 
SQL Fixup 
SQL ALG (Juniper firewall)

Also check with the Storage admin the issue from the standby disks, make sure enough space exist.

Then restart the standby arch – identify all archives on each node for this standby and then kill them

ps -ef|grep arc 
kill -9 <arc PIDs>

Then form the broker interface, dgmgrl, restart the trasport and apply

edit database <primary> set state=transport-off; 
edit database <primary> set state=transport-on; 
edit database <standby> set state=apply-off; 
edit database <standby> set state=apply-on;

This corrects the error on the primary although redo apply is still stopped. so the next step is to rollforward the physical standby using an RMAN incremental backup  (Doc ID 836986.1)

Important thing to note, is that even if you do not use RMAN to do the standard backup and recovery in your environment, you can still use it to rollforward the standby. We use Netapp snapshots for our general backup and recovery but this process uses the native RMAN method.

Following the steps in the RMAN rollforward document brings the database past the point of logical corruption and we can then resume apply on the standby and let it naturally catch up.

A couple of things to note:

I disabled our standard 30 minute backups from taking place by disabling our snapshot cron job.

We also wrote a little script to nohub the actual backup because it would have taken a while:

We created sufficient space in our archive log area to hold the rman backup on both the primary and the secondary database volumes.

On the Primary:

vi rman.cmd

allocate channel t1 type disk;
allocate channel t2 type disk;
allocate channel t3 type disk;
allocate channel t4 type disk;
BACKUP INCREMENTAL FROM SCN 795871180305 DATABASE FORMAT '/oraarch/RACDB1/forstandby/ForStandby_%U' tag 'FORSTANDBY';

We allocated 4 parrelel threads to create the backup. In hindsight, we could have allocated 8 channels as we run a 8 CPU host hyper threaded to 16 cores.

vi rman.sh

rman cmdfile=rman.cmd msglog '/oraarch/RACDB1/forstandby/rman.log'

to run the script:

nohup sh rman.sh &

Once the RMAN backup is running, it can be monitoring using:

AND OPNAME NOT LIKE '%aggregate%'

YOu can also get estimated completion time by running:

col dbsize_mbytes for 99,999,990.00 justify right head "DBSIZE_MB"
col input_mbytes for 99,999,990.00 justify right head "READ_MB"
col output_mbytes for 99,999,990.00 justify right head "WRITTEN_MB"
col output_device_type for a10 justify left head "DEVICE"
col complete for 990.00 justify right head "COMPLETE %" 
col compression for 990.00 justify right head "COMPRESS|% ORIG"
col est_complete for a20 head "ESTIMATED COMPLETION"
col recid for 9999999 head "ID"

select recid
 , output_device_type
 , dbsize_mbytes
 , input_bytes/1024/1024 input_mbytes
 , output_bytes/1024/1024 output_mbytes
 , (output_bytes/input_bytes*100) compression
 , (mbytes_processed/dbsize_mbytes*100) complete
 , to_char(start_time + (sysdate-start_time)/(mbytes_processed/dbsize_mbytes),'DD-MON-YYYY HH24:MI:SS') est_complete
 from v$rman_status rs
 , (select sum(bytes)/1024/1024 dbsize_mbytes from v$datafile) 
 where status='RUNNING'
 and output_device_type is not null

On an NFS filesystem, the files that are created will be hidden until they are fully written so it may be a while before you see a written file..unless you look for hidden files:

ls -llah

The end result when the backup is complete will look something like this:

-rw-r----- 1 oracle oinstall 15G Feb 20 22:21 ForStandby_0ssrntdu_1_1
-rw-r----- 1 oracle oinstall 28G Feb 20 22:38 ForStandby_0qsrntds_1_1
-rw-r----- 1 oracle oinstall 22G Feb 20 22:38 ForStandby_0tsrnte0_1_1
-rw-r----- 1 oracle oinstall 22G Feb 20 23:01 ForStandby_0rsrntdt_1_1
-rw-r----- 1 oracle oinstall 21G Feb 21 00:21 ForStandby_10sro3bd_1_1
-rw-r----- 1 oracle oinstall 15G Feb 21 00:40 ForStandby_0usro2c2_1_1
-rw-r----- 1 oracle oinstall 13G Feb 21 01:10 ForStandby_0vsro3ba_1_1
-rw-r----- 1 oracle oinstall 17G Feb 21 01:28 ForStandby_11sro4n4_1_1
-rw-r----- 1 oracle oinstall 22G Feb 21 02:47 ForStandby_12sro9co_1_1
-rw-r----- 1 oracle oinstall 25G Feb 21 02:56 ForStandby_15sroda0_1_1
-rw-r----- 1 oracle oinstall 22G Feb 21 03:45 ForStandby_13sroafn_1_1
-rw-r----- 1 oracle oinstall 30M Feb 21 03:45 ForStandby_18srolbl_1_1
-rw-r----- 1 oracle oinstall 14G Feb 21 04:03 ForStandby_14sroc7m_1_1
-rw-r----- 1 oracle oinstall 15G Feb 21 05:51 ForStandby_16srohte_1_1
-rw-r----- 1 oracle oinstall 27G Feb 21 05:54 ForStandby_17sroifq_1_1

Our RMAN backup created 15 backup pieces. When i was monitoring the backup, i could see that each ‘piece’ was reading around 500GB of data from the database, so if you imagine that each file above is the result of the churn that was found in each 500gb piece that RMAN generated.

At this point I renabled my backups on the primary and just in case took a manual snapshot straight away.

Then SCP the files over to the standby side – ideally in the same file structure:

scp For* oracle@standbyhost://oraarch/RACDB1/forstandby/.
scp For* oracle@standbyhost://oraarch/RACDB1/forstandby/.oracle@xxx.xx.xxx.xxx's password:

ForStandby_0qsrntds_1_1 100%   27GB  81.3MB/s   05:46
ForStandby_0rsrntdt_1_1 100%   22GB  90.2MB/s   04:07
ForStandby_0ssrntdu_1_1 100%   15GB  62.1MB/s   04:02
ForStandby_0tsrnte0_1_1 100%   22GB  40.4MB/s   09:11
ForStandby_0usro2c2_1_1 100%   14GB  45.6MB/s   05:19
ForStandby_0vsro3ba_1_1 100%   13GB  76.4MB/s   02:53
ForStandby_10sro3bd_1_1 100%   21GB  92.5MB/s   03:51
ForStandby_11sro4n4_1_1 100%   17GB  78.6MB/s   03:36
ForStandby_12sro9co_1_1 100%   21GB  90.3MB/s   04:01
ForStandby_13sroafn_1_1 100%   21GB  91.8MB/s   03:56
ForStandby_14sroc7m_1_1 100%   13GB  91.3MB/s   02:30
ForStandby_15sroda0_1_1 100%   25GB  76.1MB/s   05:33
ForStandby_16srohte_1_1 100%   14GB  69.0MB/s   03:34
ForStandby_17sroifq_1_1 100%   27GB  92.0MB/s   04:57
ForStandby_18srolbl_1_1 100%   30MB  29.8MB/s   00:00
ForStandbyCTRL.bck 100%   30MB  29.8MB/s   00:00

The next step is to restore the rman backup over the existing standby database – again we used scripts to do this:

vi rman.cmd

connect target /
CATALOG START WITH '/oraarch/RACDB1/forstandby';
allocate channel t1 type disk;
allocate channel t2 type disk;
allocate channel t3 type disk;
allocate channel t4 type disk;

vi rman.sh

rman cmdfile=rman.cmd msglog '/oraarch/RACDB1/forstandby/rman.log'

Before running the recovery, i spooled out the current datafiles in the standby

spool datafile_names_step9.txt 
set lines 200 
col name format a60 
select file#, name from v$datafile order by file# ; 
spool off

run the recovery with nohup:

nohup sh rman.sh &

The restore should be a lot quicker than the backup as its only having to read in the changes that have occurred and write them to the datafiles. Progress can be monitoring using the same script provided above for monitoring the backup.

you can also grep the log to see completed pieces:

less rman.log | grep "restore complete"
channel t3: restore complete, elapsed time: 00:10:45
channel t4: restore complete, elapsed time: 00:12:11
channel t2: restore complete, elapsed time: 00:16:17
channel t1: restore complete, elapsed time: 00:18:04
channel t4: restore complete, elapsed time: 00:10:27
channel t3: restore complete, elapsed time: 00:15:09
channel t1: restore complete, elapsed time: 00:10:38
channel t2: restore complete, elapsed time: 00:17:10

After the database has been recovered, the standby control file needs to be restored from the backup taken on the primary.

From RMAN, connect to STANDBY database and restore the standby control file:

Note: In a RAC environment i would advise not to run the shutdown/startup commands via RMAN because it will create conflicts with CRS and potentially cause your instances to crash. The MOS document is assuming that its a single instance database. I would suggest to do something similar to below instead:

srvctl stop database -d RACDB1DG 
srvctl start database -d RACDB1DG -i RACDB11 -o nomount
rman target / 
RMAN> RESTORE STANDBY CONTROLFILE FROM '<Backup_Path>/ForStandbyCTRL.bck'; <<< Controlfile backup piece name 
srvctl stop database -d RACDB1DG 
srvctl start database -d RACDB1DG -i RACDB11 -o mount

The last few steps are to clear the standby logfiles and restart the MRP process

This should kick everything into life and the standby should start applying redo.

DGMGRL> validate database 'RACDB1DG'

Database Role: Physical standby database
 Primary Database: RACDB1

Ready for Switchover: Yes
 Ready for Failover: Yes (Primary Running)

After a couple of days the incremental backup can be deleted:

RMAN> report obsolete; 
RMAN> delete obsolete;

The REPORT OBSOLETE and DELETE OBSOLETE commands work in two steps:

For each datafile for which there are full backup, datafile copy, or level 0 incremental backups, RMAN identifies the oldest full or level 0 backup or copy that is not obsolete under the retention policy being tested. Any full backup, level 0 incremental backup, or datafile copy of a datafile older than the one identified in this step is considered obsolete.
Any archived logs and level 1 incremental backups that are older than the oldest non-obsolete full backup are then obsolete because there is no full or level 0 backup to which they can be applied.

There is a possible bug that causes the redo log corruption but oracle have not confirmed that this was the cause for our issue yet: