Recover Crashed Exchange 2013 Mailbox Server in DAG

Recovering a crashed mailbox server is a straight-forward process if they are in DAG. You can do it by setup.exe /m:RecoverServer. However, there are certain steps to do for smooth recovery process. The following steps will do to recover the crashed mailbox servers in DAG. I will explain the each steps in more details.
  1. Reset the crashed computer accounts in AD.
  2. Install new server OS to replace the old crashed servers.  Install windows features, pre-requisites and updates.
  3. Remove the database passive copies on crashed servers. If the servers are accessible you can manually delete DB file and logs file residues from crashed servers.
  4. Remove the crash servers from DAG. This can be done by EMC or EMS.
  5. Evict(remove) the crash servers from failover cluster manager.
  6. Start the recovery process by running setup file in command prompt with necessary switches. More details later in this section.
After recovery process is complete, you can see the servers come up in EMC console. Then, do DAG and DB reseeding as necessary. Done!
 Here are detail steps I did for recovery:
1. Reset the crashed computer accounts in AD.
    You can do it in ADUC console. Right-click the crashed computer account and “Reset Account”.
 
2. Install new server OS to replace the old crashed servers.
Install new servers with the same spec as old ones. You also need to install some windows features ,Microsoft filter packs and Unified Communication Runtime.
a) On new machine, open powershell with “Run as Administrator.
b) Install the necessary windows features in Powershell:
Install-WindowsFeature AS-HTTP-Activation, Desktop-Experience, NET-Framework-45-Features, RPC-over-HTTP-proxy, RSAT-Clustering, RSAT-Clustering-CmdInterface, Web-Mgmt-Console, WAS-Process-Model, Web-Asp-Net45, Web-Basic-Auth, Web-Client-Auth, Web-Digest-Auth, Web-Dir-Browsing, Web-Dyn-Compression, Web-Http-Errors, Web-Http-Logging, Web-Http-Redirect, Web-Http-Tracing, Web-ISAPI-Ext, Web-ISAPI-Filter, Web-Lgcy-Mgmt-Console, Web-Metabase, Web-Mgmt-Console, Web-Mgmt-Service, Web-Net-Ext45, Web-Request-Monitor, Web-Server, Web-Stat-Compression, Web-Static-Content, Web-Windows-Auth, Web-WMI, Windows-Identity-Foundation
   c) Download and install other prerequisities from here:
Note: Use the same computer name as old ones and join to domain.
 
3. Remove the database passive copies on crashed servers.
You can do it on EMS on good servers(since bad servers are not accessible). Open EMS and type:
[Ps]C:> Get-MailboxDatabaseCopyStatus *<name of your crashed server>
Make sure all the listed selected databases on your console output are all in Service Down state. Now, you can remove the failed databases by the following command. This should give you some warnings and don’t worry, just proceed it.
[Ps]C:> Get-MailboxDatabaseCopyStatus *<name of your crashed server> | Remove-MailboxDatabaseCopy
4. Remove the crash servers from DAG.
You can remove crashed servers from DAG by EMC console.
Go to EMC >> servers >> database availability group. Select the DAG Group and click the “Manage DAG Membership” icon. Remove the crashed servers from there.
<or>
You can remove crashed servers in EMS shell.
Remove-DatabaseAvailabilityGroupServer -Identity <your DAG Name> -MailboxServer <Your failed server name>
 
5. Evict the crash servers from failover cluster manager
Removing the failed servers from DAG does not remove them from failover cluster itself. So, we have to manually remove it. Before this, you can check which nodes in cluster are currently down state by the elevated command prompt.
C:cluster.exe node
To evict the node from cluster:
Go to Failover Cluster Manager >> [your cluster name] >> Nodes >>  select your failed server >> Right-click and choose “More action” >> Evict
6. Start the recovery process
Go the directory where setup files are located and run:
setup.exe /m:RecoverServer /IAcceptExchangeServerLicenseTerms
 
In most cases, when you use the original exchange installation CD for recovery, you might encounter errors that prompts you to use the later cumulative updated exchange setup files than the ones you have setup. If so, you can find the latest released Exchange CUs here, get the latest CU, extract it to folder and run the setup files again with the switches shown above.
You can also check your current exchange server version with build numbers on good servers in EMS shell by:
 [PS] C:\>Get-ExchangeServer | Format-List Name, Edition, AdminDisplayVersion

As the recovery process is fetching info from AD objects and reinstalling the exchange server, you can see the progress in the console.
7) Add to DAG and Re-seeding DB copies
After the servers are recovered, you need to reboot the recovered servers for proper functioning. Then,
       a) add the servers back to DAG group.
       b) reseed the DB passive copies.
This is quite a simple process and I won’t go details with these.
Congratulation ! Your recovery process is now Successful.
Note: For me, I had some minor issues when reseeding DB with the following errors. So, I had to take additional steps to fix.
ERROR:
The seeding operation failed. Error: An error occurred while performing the seed operation. Error: Unable to delete logs at ‘D:\M datalogs’. The database has been seeded successfully. If any obsolete log files exist, manually delete them to prevent database divergence. Error: System.IO.IOException: The file or directory is corrupted and unreadable. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileInfo.Delete() at Microsoft.Exchange.Cluster.Replay.DatabaseSeederInstance.DeleteLogFiles(DirectoryInfo di, String logfilePrefix, String logfileSuffix, Int32& logNum) at Microsoft.Exchange.Cluster.Replay.DatabaseSeederInstance.
So, I assume there is some disk corruption in my harddisk. Fortunately, that server has no active database copies, So I manually chkdsk the volume without /r switch. Since some disk errors are found, I use the chkdsk /r /f instead.
C:chkdsk d: /r /f
Then, I reseed the DB again, and another errors come up.
Error:
The seeding operation failed. Error: An error occurred while performing the seed operation. Error: Failed to notify source server ‘[my source server name]’ about the local truncation point. Hresult: 0xc8000713. Error: Unable to find the file.
According to this article , it said that this is the issue with the old logs folder which was not in the sync. And, I did the following procedures:
    1) Dismount the Active Copy of the database from EMC console.
     2) Find the Database file path and Log file folder path in EMS shell. Let us assume here, edb file path is D:\Databasemydatabase.edb  and Log folder path is D:\DatabaseLogs.
[PS]C:>Get-MailboxDatabase mdb04 | fl *path*
 
     3) Login to the source server hosting that database, here ‘[my source server name]’ and you need to run eseutil.exe in elevated command prompt to verify that DB is in clean shutdown state. We have got the DB path & log folder path in step 2.
C:\eseutil /mh D:Databasemydatabase.edb
(If the DB state is clean shutdown, you can continue the next step. If the state is dirty shutdown, you need to go for the recovery process using log files. And this article will help you.)
     4) If the DB is clean shutdown state, you can delete all log files in folder path we obtained from step-2. If you are unsure, you can rename the Log folder and delete them later. You can also delete via the following command if there are thousands of log files.
D:\DatabaseLogsrmdir . /s /q
 
     5) Mount the database, this will create new log files.
     6) Reseed the database copies.
This entry was posted in Exchange Server. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *