What email address or phone number would you like to use to sign in to Docs.com?
If you already have an account that you use with Office or other Microsoft services, enter it here.
Or sign in with:
Signing in allows you to download and like content, and it provides the authors analytical data about your interactions with their content.
Embed code for: Click here for disaster! SharePoint standby farms.
Select a size
Click here for disaster! SharePoint standby farms
Office Server & Services MVP Platform architect with a thing for SharePoint. Speaker. Trainer. Involuntary DBA. Architect at Xylos.
Disaster Recovery Concepts
High Availability & Disaster Recovery
Critical factor in any SharePoint deployment
Keep it Simple
Disaster Recovery (DR)
High Availability (HA)
Protects against complete datacenter failure.
Protects against individual component failures within the same datacenter.
Architect for redundancy
Office Online Server
Provider or SharePoint Hosted Add-ins
operations & procedures
Recovery Time Objective (RTO) When will my system be available again?
Recovery Point Objective (RPO) How much data can I afford to lose?
Service Level Agreement (SLA) How long can I be down?
Putting things in perspective
Outage at 08:00
Full recovery at 12:00
Last backup at 20:00
Full recovery at 08:15
Last backup at 07:55
RPO/RTO versus Cost
It’s all about the Business
Business Continuity Planning
Setting expectations is key
Involve all stakeholders when planning
Analyze data & systems
Consider non-technical elements
Establish recovery targets
What should be restored and what not?
Is some data more important than other?
How must the restored system behave?
Balance costs & risks when designing a solution
The most crucial step
Test, test, test!
SharePoint DR Possibilities & Limitations
How not to protect your SharePoint farm?
“Never mind, we’ll just use
Virtual Machine Snapshots
VM Snapshots are not supported with SharePoint
Timer jobs, search index and other state info is the reason
It’s not supported, but that does not mean it does not work
Power off all servers in the farm, including SQL Server
Create your VM snapshots
Manage snapshots as a single atomic unit
Power on all servers in the farm
Patching might be a valid reason to take snapshots
Think about: AD membership, content freshness
“Never mind, we’ll just use a stretched farm”
Stretched Farm for HA/DR
Only supported in case of a highly consistent intra-farm latency of <1ms (one way), 99.9% of the time over a period of ten minutes and bandwidth speed of at least 1 GB/s.
How can we protect our farm?
Supported techniques with SharePoint
Setting up a new farm in a new location, (preferably by using a scripted deployment), and restoring from backups.
Shipped backups reduce complexity
Slowest (hours or days)
Expensive to recover
Requires manually configuring servers
Regularly ship backups or VM images to alternate data center for constructing duplicate farm.
Inexpensive to recover
Availability in minutes or hours
Virtual server farm can require little configuration upon recovery
Expensive and time consuming to maintain
May require additional tools
Failover farm maintained in alternate data center.
Often fairly fast to recover (seconds or minutes)
Lower RTO, lower RPO
Expensive to configure and maintain
Requires maintaining two farms in parallel
Network bandwidth & latency are considerations
Disaster Recovery Strategies
Potential Supportability Issues…
Rebuild Farm / Cold Standby
Never simply dismiss this option
Relies on backup/restore
Script your install
Documentation is essential
Keep a change log
Introducing the Standby Farm
Two completely separate farms
Near identical configuration
No shared storage involved
Keep data up to date on the DR site
Failover is always manual
SQL Availability Groups
Introduced in SQL Server 2012
Allows for some very flexible designs
Failover a group of databases
Transparent to SharePoint
Relies on failover clustering (without shared storage)
Relatively easy to setup but can be complex to operate!
SYNCHRONOUS for HA
ASYNCHRONOUS for DR
It’s all about the data
Two types of SharePoint databases when it comes to DR:
Databases that do support asynchronous commit
Databases that don’t support asynchronous commit (Mostly “farm-specific” databases)
Databases NOT supporting asynchronous commit
Central Admin Content
Usage and Health Data Collection
User Profile Synchronization
Content databases are kept up-to-date through async commit Will be made available to the DR farm in read-only mode
Service applications don’t support read-only databases Service applications will be created upon disaster, after their databases are made read/write
Configuration for the DR farm
Use the same service accounts where possible
Configure the same web applications as in production
Only farm-specific service applications are deployed
Install and configure all customizations
Modify the hosts file to perform local testing
Not all service applications are created equal
“Simple” service apps are no big deal:
User Profile Service
Business Connectivity Service
Option 1: Separate service app and just crawl your read-only content databases (or dual crawl your production). If index freshness is important, redo customizations.
Option 2: Backup the search admin database and use it to create a new service app in DR. If index freshness is not important.
Option 3: full service app backup and restore through native tools If index freshness and customizations is important but RTO is not
Enterprise Search - Considerations
Many features rely on search
Manage RTO/RPO expectations
Search config is kept in the search admin database
Search analytics are kept in a different database Only native backup and restore will help you if you need it.
Other potential pitfalls
Provider & SharePoint hosted add-ins (
User Profile Synchronization encryption key (
Secure Store master key (
TTL on DNS records for SharePoint & SQL listener names (
Refreshing the sitemap on read-only databases: $ db = Get- SPDatabase | ? -Property Name - eq " DatabaseName "} $ db.RefreshSitesInConfigurationDatabase ()
Alternative: Azure Site Recovery
Use Azure IaaS as your disaster recovery environment
Works with multiple virtualization vendors
Automated protection and replication of virtual machines
When Disaster Strikes
General steps to follow
Declaration of disaster
Notification of all stakeholders
Establish a communication line
Follow the steps in your DR plan
Bring SQL & SharePoint online in your DR site
Create service applications
Perform initial validation of web & service applications
Perform the switchover (DNS)
When disaster strikes…
Availability Groups Configuration
Don’t touch the Failover Cluster Manager to administer your Availability Groups!
Relies on Windows Failover Clustering
Databases have to be in the FULL recovery model
Take a full backup of all databases first
Add-WindowsFeature Failover-Clustering -IncludeAllSubFeature -IncludeManagementTools
Install SQL on all servers
Install Failover-Clustering feature on all servers
Create cluster without shared storage
Make sure you have the right to create computer objects in AD
Manage quorum & votes
Every node has a vote
You always want an uneven amount of votes
If you have an even amount of nodes you need another vote to make it uneven again:
disk witness (requires shared storage)
file share witness (my favorite)
The votes are in!
Don’t assign a vote to your DR instance
Make your life easier
Use the same disk layout across your SQL servers
Use a single administration account
Open the necessary firewall ports (1433, 5022)
Don’t forget to transfer logins:
C opy-SqlLogin -Source SQL01 -Destination SQL01
Reality Check – Number of databases
SharePoint has a lot of databases
Some customers give every site collection its dedicated content database
The recommended maximum amount of databases that belong to an availability group is 100 per server. Watch out for threads exhaustion.
Enable SQL Server for Availability Groups
Do this on every cluster node
Restart SQL Server service afterwards
Create 2 Availability Groups
Group 1 (PROD)
Group 2 (PROD & DR)
Synchronous commit for HA with automatic failover
Primary datacenter only
Asynchronous commit for DR with manual failover
Content & supported service application databases
Creating an availability group
Databases to be included
Listener configuration (hostname)
Initial data synchronization method
Group 1 - High Availability
Group 2 - High Availability & Disaster Recovery
Connecting SharePoint to the right group
Each availability group will have a hostname to connect to (“listener”)
Use 2 SQL Aliases, pointing to the different listeners
Alternative: PowerShell Support for SharePoint
Add-DatabaseToAvailabilityGroup -AGName "SPAG" -FileShare "\\SQLAO1\Backup" –ProcessAllDatabases
$ag = Get-AvailabilityGroupStatus
SQL Server 2016 possibilities
No support for SharePoint 2013 :/
Automatic failover with more than 2 nodes
Automatic seeding of databases in stead of taking backups yourself
Distributed Availability Groups (supported with SharePoint since May 2017)
Disaster Recovery with SharePoint is no walk in the park
Talk to the business, discuss requirements (RTO/RPO)
If you want to do it properly: a lot of planning, expertise and money
Underlying SQL layer is key to success
Don’t underestimate the operational consequences
Don’t forget supportability aspects of your solution
No testing, no Disaster Recovery solution
SUGUK London 29/06/2017 | @thomasvochten
rty Name - eq " DatabaseName "} $ db.RefreshSitesInConfigurationDatabase ()