SharePoint: Let's talk about Disaster Recovery!

Disaster. It sounds worse than it is, until it happens. Then, it's worse than it sounds. A lot of companies I worked with had some trouble figuring out, how to handle a disaster for SharePoint. Mostly because those companies used tools and processes not designed for SharePoint.

Honestly, I'm not even sure if I found the best way, but it works and seems to be supported by Microsoft. Backup and in the end Restore are a lot more complex and worrisome than many think. So, without further ado: Let's talk backups!

Disaster
First of all, we need to define what a "disaster" even is. We can, of course, make some suggestions what a disaster might be, but the customer himself needs to define the scenarios applicable to a disaster recovery. I usually only consider the obvious things:

  • Farm is not available (as a whole)

  • some SharePoint Servers are missing

  • SQL Server is not available

In this case, I have a lot of details I could go into, depending on the farms and SQL installation. And I also only cover problems in which users can't work anymore. I usually don't consider a broken site collection a disaster. That's just business as usual. BUT: It is your decision and the one of your customer, not mine. You might as well but logins of the CEO in there. It's not recommended, but you can do it.

Backup / Restore
The backup-part is the worst in my eyes, because it takes a lot of planning and testing. We could just imagine disaster cases and be happy with that, but a backup needs to be tested. Tested a lot. At least once a year. Once. A. Year. Now you might wonder why I put a lot of emphasis on the fact, that you have to test it over and over again. Simple: A lot of things will happen or might happen on your SharePoint Farm, f. ex. new Solutions, Patches, or your SQL environment, f. ex. Patches, Clustering, new nodes, which ultimately change the backup you are creating. So... One of these things might destroy the plan you have.
It's not only the farm itself. It's also possible that a change in your virtualization environment, f. ex. Hyper-V, VMWare, needs to be considered.
Maybe there are some changes to your backup software that you didn't get a notice of. The only way to guarantee a working backup is: test it over and over again.
And a final aspect is also the restore time. How long until your farm needs to be up and running again? You have to test this, because your content databases will, hopefully, grow.

But how can we create a working backup and restore it?

First of all: Don't you dare to use snapshots. Snapshots are a no-no. Personally I use them only for my development system, because if it crashed I tend to not loose that much information. Especially for production environment I would recommend proper backup mechanics. Snapshots got a lot of problems, because you can't save them independently and it's not consistent. Don't do it, you might have more problems in the end and be forced to reinstall the SharePoint farm. Not the best option I think.

Glad that we settled that. I also recommend to always backup the server as a whole, not just parts of the configuration. If you are using a virtualization environment I don't see a reason not to. Of course you need a lot of disc space and also your network infrastructure needs a lot of power, but using these as arguments against a server backup is just cheap. Always consider this: Is the information I backup vital / important to my business? If yes, you should be able to afford a proper, tested backup concept.

In my opinion it is enough to backup your application servers every couple of weeks, right around the time to change your configuration. Because after changing your configuration, there is no turning back without loosing data. And data is money. Of course you have to backup your SQL databases on a more regular basis, something like twice a day, depending on the business needs. Also think about using a dedicated window for application server backups, f. ex. the maintenance window. Because in my experience it was pretty hard to get a farm running again, if I just created a backup without shutting down SharePoint. This is something depending on your backup software. Some are able to shut down the services, others are not. Sometimes it works, sometimes it doesn't. I recommend to test a scenario that you would like to have. In my case it's usually: Shut down services, shut down server, backup, done. But it might also work with running servers, that's up to your testing.

Now, let's say you don't have a working backup and want to install the SharePoint farm from scratch. Don't use the configuration database. You would need an identical farm to use the configuration database which is rarely the case. So I would recommend that you just reinstall the farm, f. ex. using PowerShell, and attach the content databases. The configuration will be created anyway.

Since we are already talking about restoring the farm, we might as well continue. If you used running application servers as a backup and did not restore your SQL server / cluster / database first, you will have a lot of funny event log errors to ignore. Restore itself is pretty easy, because you will have to wait until your servers and data is restored.

But what then? Guess what: You have to test. A lot. You should have a test matrix created for cases like this. Usually you just test a couple of standard feature you use a lot, f. ex. Workflows, Metadata, and check a couple of health states, f. ex. are the server up and running?, services?. Depending on your business requirements it can be an easy Excel list with just two rows, it might also be a lot more complex. But always remember: You need to be able to work with the list while under a lot of pressure.

tl;dr:

  • create disaster cases

  • create a backup process

  • create a test matrix

  • test your backup

  • be happy.


I tried a lot of solutions that were supposed to work, but in the end didn't. I can only give you this advice: Every solution you think is possible should be tested first, than declared to be true. Keep in mind, that this is just a short summary of things that need to be done. There is a lot more things to do! Don't underestimate this process, it's very time consuming.

SharePoint 2010: Get All Site Collections

SharePoint 2010: Move SharePoint Designer Workflows