Replication Troubleshooting

Some info to help diagnose possible issues with replication.

How many undelivered commands are in the distribution database?…

SELECT
ds.*
, ma.article
, da.publication
, da.name
FROM dbo.MSdistribution_status DS
INNER JOIN dbo.msdistribution_agents da ON da.id = ds.agent_id
INNER JOIN dbo.msarticles ma ON ma.publisher_id = da.publisher_id and ma.article_id = ds.article_id
ORDER BY
UndelivCmdsInDistDB DESC
,publication

Why is a complete snapshot being generated when a new article is added (SQL 2005)?

This is expected behaviour if you have a merge or snapshot publication. If you have a transactional publication, a snapshot of all articles will always be generated if the immediate_sync publication property is set to true. Typically, the immediate_sync publication property is set to true if you allowed anonymous subscriptions while creating the publication through the CreatePublication wizard. To prevent the complete snapshot, run the script below:

EXEC sp_changepublication
@publication = 'MainPub',
@property = N'allow_anonymous',
@value = 'false'
GO

EXEC sp_changepublication
@publication = 'MainPub',
@property = N'immediate_sync',
@value = 'false'
GO

Query Timeout Expired

When applying a large snapshot to the subscriber, a ‘Query Timeout Expired’ message is given on the replication monitor, and the snapshot stops processing. The query timeout usually happens after 30 minutes.

There is a manual workaround which involves BCPing the file into the destination database

To BCP in a file created by the snapshot job, run the following

H:\>BCP -h
usage: BCP {dbtable | query} {in | out | queryout | format} datafile
[-m maxerrors]            [-f formatfile]          [-e errfile]
[-F firstrow]             [-L lastrow]             [-b batchsize]
[-n native type]          [-c character type]      [-w wide character type]
[-N keep non-text native] [-V file format version] [-q quoted identifier]
[-C code page specifier]  [-t field terminator]    [-r row terminator]
[-i inputfile]            [-o outfile]             [-a packetsize]
[-S server name]          [-U username]            [-P password]
[-T trusted connection]   [-v version]             [-R regional enable]
[-k keep null values]     [-E keep identity values]
[-h "load hints"]         [-x generate xml format file]

For example:

bcp "dbname"."dbo"."REP_IMAGE" in "F:\TMP_SHARE\REP_TABLE2_IMAGE_3.bcp" -e "F:\TMP_SHARE\errorfile.log" -t"\n<x$3>\n" -r"\n<,@g>\n" -Sservername -T -w

Before starting replication again, complete the following tasks:

1. Truncate the live msrepl_commands table

2. Truncate the live msrepl_transactions table

3. Ensure the replication procedures are on the subscriber database. These are the INSERT, DELETE, & UPDATE procs for each article in the subscription.

Manually extract the INS, DEL, UPD procs for all articles from the snapshot folder and run them into the subscriber.

4. Once this is done, restart the distribution agent

5. Attempt a dummy transaction from source to destination.

Log reader agent failed and its history shows message: “No such interface”

You need to re-register your log reader agent. Try regsvr32 logread.exe; you might also have to register the entire contents of C:\Program Files\Microsoft SQL Server\90\Com (Hilary Cotter)

Not all my logreaders start up – what can I do?

Increase the max_worker_threads setting in the syssubsystems table of the msdb database.

What are the Pros and cons of restarting the log reader agent?

Sometimes under extreme high load you will get deadlocking between the log reader agent and the distribution clean up agent. In this case stopping the log reader agent to let the distribution clean up agent do its job will alleviate the problem. It is recommended that in this case you use a remote distributor. You also can bounce the log reader agent when you want to switch profiles.

If you stop it, the latency will increase and if you stop for a significant time, the commands’ age might exceed the retention period. Also, the log can’t be backed up fully (and therefore truncated) unless the log reader agent has marked it as read.

I receive the error 14100: Specify all articles when subscribing to a publication using concurrent snapshot processing

If you add a new table to an existing publication using sp_addarticle when you try to subscribe to that newly added article from an existing subscription, using sp_addsubscription, the error above may be received. This applies when the existing publication set up with concurrent snapshot option and means that you can’t synchronize subscriptions for such publications without a complete resynch.

There are 2 unofficial workarounds: (a) you can circumvent the check by specifying @reserve = ‘internal’ when you add the subscription for the new article and the snapshot agent should generate snapshot for the new article after that and (b) you could change the immediate_sync property in syspublications to 0 (see sp_changepublication).

Other more official workarounds including changing the sync_method from ‘concurrent’ to either ‘database snapshot’ (enterprise edition only) and ‘native’ (which locks table during snapshot generation). Change the sync_method will force a reinitialization of all your subscriptions at this point. Alternatively you could create another publication and use this instead.

How to….. truncate the transaction log? After restoring a database to another server, when I subsequently try to shrink the log I get the following error: “The log was not truncated because records at the beginning of the log are pending replication”

Before truncating the log, you can execute sp_repldone. In cases where this is not enough, you might have to set up this database as a transactional publisher before executing sp_repldone, then remove the publication afterwards.

How to find out which commands are waiting to be replicated?

Use this to get the timestamp of the latest command to be replicated:

select transaction_timestamp
from subscriberdatabasename..MSreplication_subscriptions

Then run this in the distribution database (replace the value with the one returned from above:)

sp_browsereplcmds @xact_seqno_start = '0x000000AF00000043000B00000001'

How to safely backup transactional replication?

Have a look in BOL for “Strategies for Backing Up and Restoring Transactional Replication” and “Backing Up and Restoring Replication Databases”. The key for standard transactional replication is whether you use ‘sync with backup’ or not. If you do, you are ensured that your distribution backup can never get ahead of the publisher backup (ie no transactions enter the msrepl_commands which haven’t already been backed up on the publisher), and all will be well. However this will introduce latency (even using log shipping you can only backup the logs once per minute at the highest frequency, and this is clearly not ideal). If you don’t use this option, after disaster recovery you’ll have to ignore some transactions and treat errors manually (using -SKIPERRORS).

As for the subscriber backups which are to be restored, this is not usually seen as being so crucial. As long as they are restored to a time before the distribution restore, then commands can be sent down by the distribution agent – details in BOL for this. Alternatively you could of course reinitialize.

What happens when a transaction fails at the publisher – does it still run at the subscriber?

If you have a transaction on the publisher, you may check @@error and then call rollback but whether you rollback or not, the sp is still executed on the subscriber. This situation is altered (no subscriber call) if you set the transaction isolation level to serializable. This is important to do because even if you trap the same error in the transaction on the subscriber and rollback there, the error is registered and the distribution agent will fail. SkipErrors would avoid this problem but ideally the call shouldn’t be sent from the publisher to the subscriber if it has already failed once.

How to read the transactions for TR in non-binary format?

These transactions exist in the transactions table MSrepl_commands: use sp_browsereplcmds to view them. In the case of a queue, use sp_replqueuemonitor to read the MSreplication_queue table and sp_browsereplcmds to look at the compensating commands when there is conflict resolution.

Advertisements

6 Responses to Replication Troubleshooting

  1. Pingback: Microsoft – SQLServer – Error Message – Backup Log with Truncate_Only or with no_log is deprecated. The simple recovery model should be used to automatically truncate the Transaction Log | Daniel Adeniji's – Learning in the Op

  2. Pingback: Microsoft – SQLServer – Error Message – Backup Log with Truncate_Only or with no_log is deprecated. The simple recovery model should be used to automatically truncate the Transaction Log | Daniel Adeniji's – Learning in the Op

  3. Pingback: Microsoft – SQLServer – Error Message – Backup Log with Truncate_Only or with no_log is deprecated. The simple recovery model should be used to automatically truncate the Transaction Log | Daniel Adeniji's – Learning in the Op

  4. Pingback: Microsoft – SQLServer – Error Message – Backup Log with Truncate_Only or with no_log is deprecated. The simple recovery model should be used to automatically truncate the Transaction Log | Daniel Adeniji's – Learning in the Op

  5. Pingback: Microsoft – SQLServer – Error Message – Backup Log with Truncate_Only or with no_log is deprecated. The simple recovery model should be used to automatically truncate the Transaction Log | Daniel Adeniji's – Learning in the Op

  6. Pingback: Microsoft – SQLServer – Error Message – Backup Log with Truncate_Only or with no_log is deprecated. The simple recovery model should be used to automatically truncate the Transaction Log | Daniel Adeniji's – Learning in the Op

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: