Repair a broken Replication Sql Server 2005 vs Sql Server 2008
BEFORE YOU READ THIS: In the end we decided to turn off the continue on conflict option, because too many conflicts were ignored and the tables got too far out of sync... So the continue on conflict is not such a great option afterall...
This is actually a personal note that I wrote down in june 2007.
That day we broke the peer to peer replication across our three servers. But I fixed it without having to restore databases and rebuild the replication.
The bad news: Today we broke replication again. This time the cause was a unique index on 2 columns that was being filled from 2 nodes simultaniously. That is BAD.
The good news: I fixed it.
Last time the replication broke I noticed a list of stored procedures that are responsible for theC(R)UD operations of replication. There are insert, update and delete procedures for all tables in replication and for every subscribing node. So for 1 table there are 2 insert, 2 update and 2 delete procedures.
To fix replication you can tweak these stored procedures to return success even in case of a failure. But be careful! If the databases get even more out of sync that’s even worse!
If a delete or update fails you can skip the error message that the row does not exist on the subscribing node.
However if an insert fails, you will have to write your own check to see if the insert will fail and than skip that insert.
Last friday the support team had a problem with peer-to-peer replication after installing SP 3 for Sql Server 2005. Even the update for SP3 did not resolve the issue. The issue was that every 30 seconds replication claimed that the process cannot allocate memory, although it was still replicating.
So with replication in a fault state and a wish to upgrade to Sql Server 2008 in the near future the support team decided to upgrade now and hope that that would fix the issue. Well… needless to say… it didn’t.
Non of the support people know how to set up replication, so I ended up doing it. The first time it all seemed to work, tracers worked fine and arrived after about 3 seconds. But the moment I started some of the processes replication failed. That was a set back, because now the databases were out of sync… I decided to ignore the fact that the databases were out of sync and set up replication all over again. I could do this, because I was and still am sure that these missing records on two of the nodes aren’t going to give us problems.
With a situation like this I would have had to spend at least a day to get back in sync.
As far as this article - http://technet.microsoft.com/en-us/library/bb934199.aspx - on technet claims: Once I turn on the option to continu on conflict, I do not have to manually change all the stored procedures to continu on conflict. If that is true it will save me a lot of time! First thing on monday I will try this on a test environment before I will use this on the production server.
I will definitely let you know how this went!