ORA-00020: maximum number of processes exceeded

We got the following error on our Development database

866322-Wed Mar 28 15:56:40 2018
866347:ORA-00020: maximum number of processes (1256) exceeded

An easy way to see how exhausted your processes/sessions are getting is by running:

FROM gv$resource_limit 
WHERE resource_name in ('processes','sessions')

Which will give something like this – max utilization is the high water mark level:



From the screenshot above, you can see the processes on instance 1 have been fully exhausted at some point, but currently are fine.

You can also see which machines/schemas are causing any potential process exhaustion:

select distinct
from    gv$session s,
        gv$process p
where   s.paddr       =  p.addr
and     s.inst_id     =  p.inst_id
GROUP BY         s.inst_id,
ORDER BY 4 desc;

Nice summary of the difference between Processes, Sessions and Connections from AskTom:

A connection is a physical circuit between you and the database. A connection might be one of many types — most popular begin DEDICATED server and SHARED server. Zero, one or more sessions may be established over a given connection to the database as show above with sqlplus. A process will be used by a session to execute statements. Sometimes there is a one to one relationship between CONNECTION->SESSION->PROCESS (eg: a normal dedicated server connection). Sometimes there is a one to many from connection to sessions (eg: like autotrace, one connection, two sessions, one process). A process does not have to be dedicated to a specific connection or session however, for example when using shared server (MTS), your SESSION will grab a process from a pool of processes in order to execute a statement. When the call is over, that process is released back to the pool of processes.


Insufficient access rights to perform the operation when running setspn

When attempting to add an SPN to a service account for SQL Server, you may get the following error if you are not a domain admin:

Checking domain DC=..,DC=....,DC=..,DC=..

Registering ServicePrincipalNames for CN=vsqldev01 svc,OU=Service Accounts,OU=Shared Resources,OU=..,DC=..,DC=
Failed to assign SPN on account 'CN=vsqldev01 svc,OU=Service Accounts,OU=Shared Resources,OU=E..,DC=.,DC=...
error 0x2098/8344 
-> Insufficient access rights to perform the operation.

If your lucky enough, then get your domain admin to give you the required permissinos against the OU in Active Directory. They would need to do the following:

On a Domain Controller, run adsiedit.msc (Doing this via the normal dsa.msc console will not expose the spn permissions that need to be added)

Then run the following sequence of actions:

Right-Click on the OU and select Properties
Select the "Security" tab
Select the "Advanced" tab
 Select the "Add" button
 Enter the security principal name
 security principal
 Properties tab
 Apply to:
 Descendant User objects
 Read servicePrincipalName - Allow
 Write servicePrincipalName - Allow


Set up Windows Auth to a SQL Instance with SQL Ops Studio on Mac OS

Microsoft released a preview of SQL Ops Studio in the last week or two, and as a Mac user I was interested to see how well the interface would work compared to SSMS.

Here is a quick intro to the produce from Microsoft: https://www.youtube.com/watch?v=s5DopE7ktwo
More details about Ops Studio can be found here: https://docs.microsoft.com/en-us/sql/sql-operations-studio/what-is
Download – https://docs.microsoft.com/en-us/sql/sql-operations-studio/download

For reference, I am using MacOS Sierra version 10.12.6 and this is what I did to get windows authentication working properly.

If you try to get Windows AD Auth working, you might initially see this error:


The link in the message above takes you to: Connect SQL Operations Studio (preview) to your SQL Server using Windows authentication – Kerberos

There are 3 main areas to configure before Windows Auth to a SQL Instance works.

Service Principal Names (SPN)

A service principal name (SPN) is a unique identifier of a service instance. SPNs are used by Kerberos authentication to associate a service instance with a service logon account. This allows a client application to request that the service authenticate an account even if the client does not have the account name.

As an Example. on a SQL Clustered instance called VSQLDEV01, you can check if the SPN is set by running the following in a command prompt/ps terminal:

setspn -L VSQLDEV01

If it doesnt return a row that contains the following:


MSSQLSvc/FQDN:port | MSSQLSvc/FQDN, where:

  • MSSQLSvc is the service that is being registered.
  • FQDN is the fully qualified domain name of the server.
  • port is the TCP port number.

You can add an SPN to register the service account by doing the following:


Get Key Distribution Center (KDC) and join Domain

The KDC is usually just the FQDN of your Domain Controller – fairly straightforward to find out via the nltest command on a windows machine:

 nltest /dsgetdc:DOMAIN.COMPANY.COM
Configure KDC in krb5.conf on your mac

Edit the /etc/krb5.conf in an editor of your choice. Configure the following keys

sudo vi /etc/krb5.conf

  default_realm = DOMAIN.COMPANY.COM

   kdc = dc-33.domain.company.com

Then save the krb5.conf file and exit

Test Granting and Retreiving a ticket from the KDC

On your Mac, run the following in a terminal:

kinit username@DOMAIN.COMPANY.COM

Authenticate and then check the ticket has been granted:

Credentials cache: API:9999E999-99CA-9999-AC9C-99A999999D99
Issued                Expires               Principal
Mar  8 07:55:10 2018  Mar  8 17:55:01 2018  krbtgt/DOMAIN@DOMAIN

Hopefully, if the intruction are followed, then you should be ready to go!

Make a connection using Windows Auth via SQL Ops Studio.

To test a connection has authenticated via KERBEROS, you can check in sql once a connection is made:

FROM sys.dm_exec_connections
WHERE auth_scheme = 'KERBEROS'

Should return your connected session:


ORA-00742: Log read detects lost write on dataguarded standby RA database

We had an issue whereby the standby of our 5 node RAC cluster had the following errors reported:

ORA-00742: Log read detects lost write in thread 4 sequence 101460 block 6145
ORA-00312: online log 29 thread 4: '/oraredo/RACDB1_A/standby_redo29a.log'
Tue Feb 20 08:41:44 2018
MRP0: Background Media Recovery process shutdown (RACDB11)
Tue Feb 20 08:45:43 2018
Errors in file /u01/app/oracle/diag/rdbms/RACDB1dg/RACDB11/trace/RACDB11_arc3_18603.trc:
ORA-00742: Log read detects lost write in thread 4 sequence 101460 block 6145
ORA-00312: online log 29 thread 4: '/oraredo/RACDB1_A/standby_redo29a.log'
Tue Feb 20 08:45:43 2018
ARC3: All Archive destinations made inactive due to error 742
Tue Feb 20 08:45:43 2018
ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/oraarch/RACDB1/4_101460_813692162.arc' (error 742) (RACDB11)
Tue Feb 20 08:45:43 2018
ARC4: Archiving not possible: error count exceeded
Tue Feb 20 08:49:55 2018
Archived Log entry 444009 added for thread 4 sequence 101491 rlc 813692162 ID 0x974e092 dest 2:
RFS[30]: No standby redo logfiles available for T-4
RFS[30]: Opened log for thread 4 sequence 101492 dbid 4059507730 branch 813692162

This caused the following Cloud Control alert:

EM Event: Critical:RACDB1 - The Data Guard status of RACDB1DG is Error ORA-16766: Redo Apply is stopped.

On researching the error, it looked like logical log corruption on the standby redo log 29 thread 4.

MOS advised that there were some problems on reaching the stanbdy local archiving disk form pirmary, therefore the log corruption occurred. OERR: ORA-7286 “sksagdi: cannot obtain device information.” Reference Note ( Doc ID 20342.1 )

We confirmed with our Network Administrators to make sure the following Firewall Features are disabled.

SQLNet fixup protocol 
Deep Packet Inspection (DPI) 
SQLNet packet inspection 
SQL Fixup 
SQL ALG (Juniper firewall)

Also check with the Storage admin the issue from the standby disks, make sure enough space exist.

Then restart the standby arch – identify all archives on each node for this standby and then kill them

ps -ef|grep arc 
kill -9 <arc PIDs>

Then form the broker interface, dgmgrl, restart the trasport and apply

edit database <primary> set state=transport-off; 
edit database <primary> set state=transport-on; 
edit database <standby> set state=apply-off; 
edit database <standby> set state=apply-on;

This corrects the error on the primary although redo apply is still stopped. so the next step is to rollforward the physical standby using an RMAN incremental backup  (Doc ID 836986.1)

Important thing to note, is that even if you do not use RMAN to do the standard backup and recovery in your environment, you can still use it to rollforward the standby. We use Netapp snapshots for our general backup and recovery but this process uses the native RMAN method.

Following the steps in the RMAN rollforward document brings the database past the point of logical corruption and we can then resume apply on the standby and let it naturally catch up.

A couple of things to note:

I disabled our standard 30 minute backups from taking place by disabling our snapshot cron job.

We also wrote a little script to nohub the actual backup because it would have taken a while:

We created sufficient space in our archive log area to hold the rman backup on both the primary and the secondary database volumes.

On the Primary:

vi rman.cmd

allocate channel t1 type disk;
allocate channel t2 type disk;
allocate channel t3 type disk;
allocate channel t4 type disk;
BACKUP INCREMENTAL FROM SCN 795871180305 DATABASE FORMAT '/oraarch/RACDB1/forstandby/ForStandby_%U' tag 'FORSTANDBY';

We allocated 4 parrelel threads to create the backup. In hindsight, we could have allocated 8 channels as we run a 8 CPU host hyper threaded to 16 cores.

vi rman.sh

rman cmdfile=rman.cmd msglog '/oraarch/RACDB1/forstandby/rman.log'

to run the script:

nohup sh rman.sh &

Once the RMAN backup is running, it can be monitoring using:

AND OPNAME NOT LIKE '%aggregate%'

YOu can also get estimated completion time by running:

col dbsize_mbytes for 99,999,990.00 justify right head "DBSIZE_MB"
col input_mbytes for 99,999,990.00 justify right head "READ_MB"
col output_mbytes for 99,999,990.00 justify right head "WRITTEN_MB"
col output_device_type for a10 justify left head "DEVICE"
col complete for 990.00 justify right head "COMPLETE %" 
col compression for 990.00 justify right head "COMPRESS|% ORIG"
col est_complete for a20 head "ESTIMATED COMPLETION"
col recid for 9999999 head "ID"

select recid
 , output_device_type
 , dbsize_mbytes
 , input_bytes/1024/1024 input_mbytes
 , output_bytes/1024/1024 output_mbytes
 , (output_bytes/input_bytes*100) compression
 , (mbytes_processed/dbsize_mbytes*100) complete
 , to_char(start_time + (sysdate-start_time)/(mbytes_processed/dbsize_mbytes),'DD-MON-YYYY HH24:MI:SS') est_complete
 from v$rman_status rs
 , (select sum(bytes)/1024/1024 dbsize_mbytes from v$datafile) 
 where status='RUNNING'
 and output_device_type is not null

On an NFS filesystem, the files that are created will be hidden until they are fully written so it may be a while before you see a written file..unless you look for hidden files:

ls -llah

The end result when the backup is complete will look something like this:

-rw-r----- 1 oracle oinstall 15G Feb 20 22:21 ForStandby_0ssrntdu_1_1
-rw-r----- 1 oracle oinstall 28G Feb 20 22:38 ForStandby_0qsrntds_1_1
-rw-r----- 1 oracle oinstall 22G Feb 20 22:38 ForStandby_0tsrnte0_1_1
-rw-r----- 1 oracle oinstall 22G Feb 20 23:01 ForStandby_0rsrntdt_1_1
-rw-r----- 1 oracle oinstall 21G Feb 21 00:21 ForStandby_10sro3bd_1_1
-rw-r----- 1 oracle oinstall 15G Feb 21 00:40 ForStandby_0usro2c2_1_1
-rw-r----- 1 oracle oinstall 13G Feb 21 01:10 ForStandby_0vsro3ba_1_1
-rw-r----- 1 oracle oinstall 17G Feb 21 01:28 ForStandby_11sro4n4_1_1
-rw-r----- 1 oracle oinstall 22G Feb 21 02:47 ForStandby_12sro9co_1_1
-rw-r----- 1 oracle oinstall 25G Feb 21 02:56 ForStandby_15sroda0_1_1
-rw-r----- 1 oracle oinstall 22G Feb 21 03:45 ForStandby_13sroafn_1_1
-rw-r----- 1 oracle oinstall 30M Feb 21 03:45 ForStandby_18srolbl_1_1
-rw-r----- 1 oracle oinstall 14G Feb 21 04:03 ForStandby_14sroc7m_1_1
-rw-r----- 1 oracle oinstall 15G Feb 21 05:51 ForStandby_16srohte_1_1
-rw-r----- 1 oracle oinstall 27G Feb 21 05:54 ForStandby_17sroifq_1_1

Our RMAN backup created 15 backup pieces. When i was monitoring the backup, i could see that each ‘piece’ was reading around 500GB of data from the database, so if you imagine that each file above is the result of the churn that was found in each 500gb piece that RMAN generated.

At this point I renabled my backups on the primary and just in case took a manual snapshot straight away.

Then SCP the files over to the standby side – ideally in the same file structure:

scp For* oracle@standbyhost://oraarch/RACDB1/forstandby/.
scp For* oracle@standbyhost://oraarch/RACDB1/forstandby/.oracle@xxx.xx.xxx.xxx's password:

ForStandby_0qsrntds_1_1 100%   27GB  81.3MB/s   05:46
ForStandby_0rsrntdt_1_1 100%   22GB  90.2MB/s   04:07
ForStandby_0ssrntdu_1_1 100%   15GB  62.1MB/s   04:02
ForStandby_0tsrnte0_1_1 100%   22GB  40.4MB/s   09:11
ForStandby_0usro2c2_1_1 100%   14GB  45.6MB/s   05:19
ForStandby_0vsro3ba_1_1 100%   13GB  76.4MB/s   02:53
ForStandby_10sro3bd_1_1 100%   21GB  92.5MB/s   03:51
ForStandby_11sro4n4_1_1 100%   17GB  78.6MB/s   03:36
ForStandby_12sro9co_1_1 100%   21GB  90.3MB/s   04:01
ForStandby_13sroafn_1_1 100%   21GB  91.8MB/s   03:56
ForStandby_14sroc7m_1_1 100%   13GB  91.3MB/s   02:30
ForStandby_15sroda0_1_1 100%   25GB  76.1MB/s   05:33
ForStandby_16srohte_1_1 100%   14GB  69.0MB/s   03:34
ForStandby_17sroifq_1_1 100%   27GB  92.0MB/s   04:57
ForStandby_18srolbl_1_1 100%   30MB  29.8MB/s   00:00
ForStandbyCTRL.bck 100%   30MB  29.8MB/s   00:00

The next step is to restore the rman backup over the existing standby database – again we used scripts to do this:

vi rman.cmd

connect target /
CATALOG START WITH '/oraarch/RACDB1/forstandby';
allocate channel t1 type disk;
allocate channel t2 type disk;
allocate channel t3 type disk;
allocate channel t4 type disk;

vi rman.sh

rman cmdfile=rman.cmd msglog '/oraarch/RACDB1/forstandby/rman.log'

Before running the recovery, i spooled out the current datafiles in the standby

spool datafile_names_step9.txt 
set lines 200 
col name format a60 
select file#, name from v$datafile order by file# ; 
spool off

run the recovery with nohup:

nohup sh rman.sh &

The restore should be a lot quicker than the backup as its only having to read in the changes that have occurred and write them to the datafiles. Progress can be monitoring using the same script provided above for monitoring the backup.

you can also grep the log to see completed pieces:

less rman.log | grep "restore complete"
channel t3: restore complete, elapsed time: 00:10:45
channel t4: restore complete, elapsed time: 00:12:11
channel t2: restore complete, elapsed time: 00:16:17
channel t1: restore complete, elapsed time: 00:18:04
channel t4: restore complete, elapsed time: 00:10:27
channel t3: restore complete, elapsed time: 00:15:09
channel t1: restore complete, elapsed time: 00:10:38
channel t2: restore complete, elapsed time: 00:17:10

After the database has been recovered, the standby control file needs to be restored from the backup taken on the primary.

From RMAN, connect to STANDBY database and restore the standby control file:

Note: In a RAC environment i would advise not to run the shutdown/startup commands via RMAN because it will create conflicts with CRS and potentially cause your instances to crash. The MOS document is assuming that its a single instance database. I would suggest to do something similar to below instead:

srvctl stop database -d RACDB1DG 
srvctl start database -d RACDB1DG -i RACDB11 -o nomount
rman target / 
RMAN> RESTORE STANDBY CONTROLFILE FROM '<Backup_Path>/ForStandbyCTRL.bck'; <<< Controlfile backup piece name 
srvctl stop database -d RACDB1DG 
srvctl start database -d RACDB1DG -i RACDB11 -o mount

The last few steps are to clear the standby logfiles and restart the MRP process

This should kick everything into life and the standby should start applying redo.

DGMGRL> validate database 'RACDB1DG'

Database Role: Physical standby database
 Primary Database: RACDB1

Ready for Switchover: Yes
 Ready for Failover: Yes (Primary Running)

After a couple of days the incremental backup can be deleted:

RMAN> report obsolete; 
RMAN> delete obsolete;

The REPORT OBSOLETE and DELETE OBSOLETE commands work in two steps:

For each datafile for which there are full backup, datafile copy, or level 0 incremental backups, RMAN identifies the oldest full or level 0 backup or copy that is not obsolete under the retention policy being tested. Any full backup, level 0 incremental backup, or datafile copy of a datafile older than the one identified in this step is considered obsolete.
Any archived logs and level 1 incremental backups that are older than the oldest non-obsolete full backup are then obsolete because there is no full or level 0 backup to which they can be applied.

There is a possible bug that causes the redo log corruption but oracle have not confirmed that this was the cause for our issue yet:


Check compressed database backup ratio

If running compressed backups natively in SQL Server, then you can check teh ratio of compressed using the following script:

SELECT backup_size/compressed_backup_size 'Ratio', backup_size/1024/1024/1024 'Original Size GB', compressed_backup_size/1024/1024/1024 'Compressed Size GB', 
backup_start_date, backup_finish_date FROM msdb..backupset
where database_name = 'Database_Name'
and backup_size/compressed_backup_size > 1
order by backup_start_date desc;

Send alert out when SQL Server Restarts

An easy to set up procedure which runs every time SQL Server starts. This can be particular useful in clusters if the instances can move around nodes.

CREATE PROCEDURE [dbo].[spEmailSQLServerRestart]
-- Declare Variables
DECLARE @strServer VARCHAR(128) = CONVERT(VARCHAR(128), SERVERPROPERTY('ComputerNamePhysicalNetBIOS'))
DECLARE @strMailSubject VARCHAR(128) = 'SQL Server '
+ UPPER(@@SERVERNAME) + ' restarted!'
DECLARE @strMailBody VARCHAR(1000) = 'SQL Server '
+ UPPER(@@SERVERNAME) + ' restarted at '
+ CONVERT(VARCHAR(12), GETDATE(), 108) + ' on '
+ ' Now running on server: ' + @strServer
-- Wait for the database mail engine to start
WAITFOR DELAY '00:00:15'

DECLARE @sender varchar (50)
SELECT @sender = name from msdb..sysmail_profile where profile_id = 1

-- Send Email
EXEC msdb.dbo.sp_send_dbmail @profile_name = @sender,
@recipients = 'DBAOnCallAlerts@autotrader.co.uk',
@subject = @strMailSubject, @body = @strMailBody,
@body_format = 'HTML';

Just deploying the procedure in this scenario is not enough to trigger the alert. We now need to tell SQL Server that it should scan for this procedure every time in starts up. This is known as auto execution in SQL Server and is set by sp_procoption (from SQL Server 2008)

To Set a procedure to run on instance startup, you can run:

-- Sets stored procedure for automatic execution.
sp_procoption    @ProcName = 'spEmailSQLServerRestart',
@OptionName = 'startup',
@OptionValue = 'on'

To see which procedures have the startup option set:

SELECT name, type_desc, create_date, modify_date
FROM sys.procedures
WHERE is_auto_executed = 1

From Microsoft:

Startup procedures must be in the master database and cannot contain INPUT or OUTPUT parameters. Execution of the stored procedures starts when all databases are recovered and the “Recovery is completed” message is logged at startup.


ADRCI error – DIA-48448: This command does not support multiple ADR homes

Problem: When running adrci on an oracle server with multiple homes, you can get the following error when trying to complete some actions:

DIA-48448: This command does not support multiple ADR homes

Reason is existence of multiple oracle database instances/ASM/Listener homes etc. Solution is to set the ADRCI home for the instance you want to operate.

As the error above stipulates, this is because the adrci home is not set. To check the homes available on the server:

ADRCI: Release – Production on Tue Feb 21 22:03:57 2017

ADR base = “/u01/app/oracle”
adrci> show homes
ADR Homes:

to set the home to listener3, run the following command:

adrci> SET HOME diag/tnslsnr/myhost/listener_scan3

You should now be able to complete actions against this home, such as purging alert logs over X number of days.