Corrupt database: Could not read and latch page (foo:bar) with latch type SH

Peter Schmitz

Administrator
Staff member
Every so often, a database will end up in a corrupt state. There's nothing much you can do to about it, as normally speaking, these types of failures just happen. You can minimize the impact on your business by using redundant hardware and a solid backup strategy, but you might still have to deal with corruption every now and then.

You will typically notice it by the database being unavailable and not responsive. If you then check SQL Server Management Studio, you will see the database flagged as corrupt. It will have a different icon in front of it's name, and behind the database name, it will have the text: "(corrupt)":
CorruptDatabase.png

The SQL Server Error Log yielded a bit more information about the cause of failure:

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: foo; actual: bar). It occurred during a read of page (foo:bar) in database ID 7 at offset foo in file 'F:\Database\<filename>.ndf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

In my case, a server crashed in the middle of a heavy load process. After getting the server back up and running, the database was flagged as Suspect. My first step was ensuring the database would become available for querying by putting it in Emergency Mode. This can be achieved by using the following command:

Code:
ALTER DATABASE <database name> SET EMERGENCY
GO

The database will now show up as being in Emergency mode, which gives it another (red) icon, as well as the suffix of (Emergency):

EmergencyDatabase.png
This will allow you (and any other user in the sysadmin role) to connect to the database and issue queries against it, including those needed to do some immediate troubleshooting. The SQL Error log offers the advice to try and run a DBCC CHECKDB, which is what I tried:

Code:
DBCC CHECKDB WITH ALL_ERRORMSGS

This provided some additional information:

Code:
Msg 7985, Level 16, State 2, Line 1
System table pre-checks: Object ID 3. Could not read and latch page (foo:bar) with latch type SH. Check statement terminated due to unrepairable error.
DBCC results for '<database name>'.
Msg 5233, Level 16, State 98, Line 1
Table error: alloc unit ID foo, page (foo:bar). The test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. The values are foo and -bar.
CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object.
CHECKDB found 0 allocation errors and 1 consistency errors in database '<database name>'.

Unfortunately, the first error informs you that one of the system tables (object_id 3 is the sysrscols table, which is a base system table) has become corrupt, and due to the nature of the table, the only solution is to restore from a backup.

Luckily in my case, the database lost was a test database, so no harm done. If it happens to you in a production environment, I do hope that you happen to have a solid backup strategy in place.
 
Top