Vertica – Recovery Monitoring

Use the admintools view_cluster tool from the command line to see the cluster state:

$ /opt/vertica/bin/admintools -t view_cluster»

DB | Host | State

———+————–+————

<data_base> | 112.17.31.10 | RECOVERING

<data_base> | 112.17.31.11 | UP

<data_base> | 112.17.31.12 | UP

<data_base> | 112.17.31.17 | UP

________________________________

2 Main tables for monitoring recovery within the database –

RECOVERY_STATUS

PROJECTION_RECOVERIES

However the database has to be UP to be able to use! so only really useful when only one node is recovering and cluster is k-safety compliant.

Specifically, the recovery_status system table includes information about the node that is recovering, the epoch being recovered, the current recovery phase, and running status:

select node_name, recover_epoch, recovery_phase, current_completed, is_running 
from recovery_status;

The projection_recoveries system table maintains history of projection recoveries. To check the recovery status, you can summarize the data for the recovering node, and run the same query several times to see if the counts change. Differing counts indicate that the recovery is working and in the process of recovering all missing data.

select node_name, status , progress from projection_recoveries; 

After recovery is complete, all nodes should show UP.

Checking the epoch should show all the nodes at the same level.

EPOCH

A logical unit of time in which a single change is made to data in the system.

SELECT GET_CURRENT_EPOCH();
Advertisements

Cannot detach a suspect or recovery pending database

After losing connectivity to the data and log volume for one of our SQL servers, we received the following error:

LogWriter: Operating system error 2
(failed to retrieve text for this error. Reason: 15100) encountered.

Which resulted in the database on that volume showing as being suspect.

After gaining connection back to the volume, I tried to detach the database, but received the following error:

Cannot detach a suspect or recovery pending database. 
It must be repaired or dropped. (Microsoft SQL Server, Error: 3707) 

To resolve i did the following:

ALTER DATABASE emergencydemo SET EMERGENCY;
GO

EMERGENCY mode marks the database as READ_ONLY, disabled logging, and access is limited to sysadmins. Marking the database in this mode is a first step for resolving log corruption.

ALTER DATABASE emergencydemo set single_user
GO
DBCC CHECKDB (emergencydemo, REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS;
GO
ALTER DATABASE emergencydemo set multi_user
GO

This should resolve any corruption and bring the database online. The database will come out of EMEREGENCY mode automatically.