What do you do when your client's WAN can't support the busines-critical application you just built? You install SQL Server replication. This has been an interesting process trying to get the a client of ours with two offices running an application we developed at a reasonable pace at both ends. The network environment for this client is very locked down - it's more locked down than any government or multi-national corporation network that I have developed for. We engaged an infrastructure management compnay that we recommend to our clients to assist us in setting up the initial replication between the two offices - the reason we went this route was twofold. 1) Allow the users in both offices to run the application within their local office so that the network latency didn't effect the usability of the application (by having to traverse the WAN) and 2) Give our client a business continuity plan that allows them to run off of the remote office's server in the event they have to vacate their corporate office.
The process is pretty straight-forward...
- On the home server, create a publisher for the database (this adds additional columns for replication to every table in the database)
- Backup the database and copy it to the remote server
- Restore the database to the second server and add a subscription to the publisher on the other SQL Server
OK, that's oversimplified but those are the basic steps - it took us 16 hours to copy the database between the two offices over the WAN and about an hour to get everything working in the remote office.
Performance issues...
We started to see strange things that shut down replication throughout the day (usually 2-3 times a day) with an enumeration error - which was really that the query being run for replication was timing out. so we made a few changes
On the replication agent, you can set a schedule for when the replication job starts - now this is an interesting thing - because replicaiton should run and sit idle when it has nothing to process... so, I set the job to start each hour from 5:00 AM through midnight (allowing it to stop during our nightly processing) - but in the event that it did shut down - it would re-start the next hour - which depending on the urgancy of the replication this may be acceptable. When I reduced the time to every 10 minutes I found that the job was ended in success - but not going to idle - so, that defeated my purpose.
When replication was first set up we used the Default agent profile which has the following key parameters:
BarchSize: 100000
ChangesPerHistory: 100
DestThreads: 4
DownloadedGenerations/Read/WritePerBatch: 100
KeepAliveMessageInterval: 300
DeadlockRetries: 5
PollingInterval: 60
QueryTimeout: 300
There is an alternate profile that we switched t 'High Volume - Server to Server' with the following key parameters:
BatchSize: 100000
ChangesPerHistory: 100
DestThreads: 4
DownloadedGenerationsPerBatch: 2000
DownloadedRead/WriteChangesPerBatch: 100
KeepAliveMessageInterval: 300
DeadlockRetries: 5
PollingInterval: 60
QueryTimeout: 600
This profile doubles the query time-out value - we found that in our environment we eliminated the enumeration errors which had been shutting down the replication. Haven't seen the error again.
This turned out to be a good solution for our client and we will keep this solution for future reference.