Home NewsX Recovering and Validating Data After Unexpected SQL Server Failovers

Recovering and Validating Data After Unexpected SQL Server Failovers

by info.odysseyx@gmail.com
0 comment 9 views


In SQL Server environments, including on-premises, Azure SQL Database, or SQL Managed Instance, an unexpected failover can cause Availability Group roles to become out of sync. In such cases, if the original primary replica encounters an issue while the new primary replica takes over and transactions are in progress or uncommitted, data inconsistencies can occur. To recover or verify data loss, it is essential to resynchronize or verify the new primary replica with the old primary replica as soon as it comes back online. This ensures that any lost transactions are reconciled. If critical data is missing during the failover, it must be recovered and merged to maintain database consistency. This blog explains the steps to recover or verify data loss using SQL Server Database Compare Utility.

To simulate this situation with a SQL Server Box product, I set up two SQL Server VMs in the same subnet, turned on Always-On HA for both instances, restored a simplified WideWorldImportersDW (WWIDW) database, and put it in full recovery mode. After creating the necessary certificates and logins/users for a domain-less availability group (AG), I created an asynchronous commit AG (it had to be asynchronous, as a synchronous AG cannot simulate transaction loss).

Mitch_Half_Hooksloot_0-1723479461596.png

Using a trading simulator, many trades per second were performed on the underlying database.

Mitch_Ban_Hooksloot_1-1723479461601.png

During this activity, the primary instance was stopped, resulting in a group of committed transactions that were not replicated to the secondary instance. The goal of this post is to recover the missing transactions.

After stopping sql1, SSMS can no longer connect to sql1 to update its status, so you need to open the dashboard on sql2.

Mitch_Half_Hooksloot_2-1723479461605.png

When you open the Failover Wizard, you will be warned about data loss.

Mitch_Ban_Hooksloot_3-1723479461608.png

The wizard wants to inform the user that data loss may occur.

Mitch_Half_Hooksloot_4-1723479461612.png

Click to see results:

Mitch_Half_Hooksloot_5-1723479461615.png

The sql2 dashboard displays the current cluster status.

Mitch_Half_Hooksloot_6-1723479461618.png

To capture a new default state: Create a database snapshot immediatelyIdeally, before opening a new application connection:

Mitch_Ban_Hooksloot_7-1723479461621.png

(In this case, the transaction is not logged to sql2 because we used sql1 instead of sqlistener in the transaction simulator.)

Restart the sql1 instance. If you refresh, you will see that the old primary instance is now in an Out of Sync/Recovering state.

Mitch_Half_Hooksloot_8-1723479461622.png

Create a database snapshot on the previous primary (sql1). ~ before Making the existing basics secondary (Otherwise, the missing transactions will be lost when the previous primary transaction is synchronized to the secondary transaction.🙁

Mitch_Ban_Hooksloot_9-1723479461625.png

At this point, you can resume data movement to make the former primary database a secondary database.

Mitch_Half_Hooksloot_10-1723479461627.png

It takes one minute for the old primary node to roll back the lost transactions and resynchronize with the new primary node, which must now become a sync secondary node.

Mitch_Ban_Hooksloot_11-1723479461629.png

SQL Server Database Compare is now available.SSDC (Threaded Code)) Application to check if transactions are lost between snapshots of new primary (sql2 – source) and snapshots of old primary (sql1 – target). See SSDBC documentation for setup instructions.

Mitch_Ban_Hooksloot_12-1723479461630.png

In this case, there are 313 hash differences (updates) and 423 missing (inserts) rows. Note that there are no unreplicated deletes due to the timing of the simulated transaction cascade deletes (when the configured percentage is exceeded, deletes stop until inserts/updates catch up):

Mitch_Half_Hooksloot_13-1723479461643.png

If you look in the SSDBC folder in My Documents, there is a SQL script file for each table in the database, with a number in front of it to indicate the order in which they are executed (based on foreign key references). If your database has DRI configured, you may need to combine the scripts if the referenced table has an identity column and the Capture Identity Values ​​option is turned on. In the same folder, there is a log file with more details about the comparison.

Mitch_Ban_Hooksloot_14-1723479461653.png

When you open one of the files, you will see update and insert statements. The update statement is written as safely as possible by checking (using additional conditions in the where clause) that the column values ​​have not been changed subsequently in the new primary.

Mitch_Ban_Hooksloot_15-1723479461669.png

If you run all the change scripts for WWIDW on sql2, then re-run the SSDBC on WWIDW on sql2 (not a snapshot) and the sql1 snapshot, you will see that the databases are now identical (this only works on a static new primary).

Mitch_Half_Hooksloot_16-1723479461677.png

We can also use: tablediff utility or SSDT Generate a difference script.

The SSDBC download package includes a PowerShell script that makes it easy to run comparison operations on multiple servers/databases, built from an Excel workbook. You can list the source and target servers and databases in the provided Servers.xlsx workbook and run the BulkDatabaseRecovery.ps1 script.

Please contact us if you have any feedback or suggestions to improve this team asset. Azure Databases SQL Customer Success Engineering Team. Thank you for your support!





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX