Wednesday, March 7, 2012

Merge replication downloading old data to the suscriber

I'm having an error with merge replication my suscriber are downloading old data, all this start happen after I install SP4.

let explaint this with more details ,

if a made an update to a row in a table from the publisher , the agent history show that 1 change have downloaded to the suscriber, a put an audit in the suscriber and yes one update was downloaded but the row still the same, look like the suscriber downloaded old data instead of the curent change that i made.

If i delete the row, the rows gets delete at the suscriber, if a insert the row the row get insterted ok.

The problem only happen with rows inserted at the suscriber, that later on get update at the publisher. My suscriber has MSDE 2000 SP3 an my Server has SQL Enterprice 2000 SP4.

Any help would be apreciate. thxs.

After 2 weeks of research, my coworkers and me found out that the Update trigers for the publication were incorrectly createad by the sp_repladdcolumn store procedure. There is a call inside that store procedure to another store procedure call sp_MSaddmergetriggers call after the article tigers were delete with the store procedure sp_MSdroparticletriggers. Inside the sp_MSaddmergetriggers another store procedure is called sp_MSaddupdatetrigger the store procedure create the Update triger for the article related with the replication , it was made upside the sp_MSaddmergetriggers because the update triger for the replication is far more complex than the insert and delete.

Inside the sp_MSaddupdatetrigger the value for the varible inside the triger @.missingbm is pupulated from the table sysmergearticles column missing_cols but some how the value for the column was wrong. this column tell you if the amout of columns for the table is not equal to the max colmun id in the syscolumns table, this can only happen if you have deleted columns from the table. well fixing the update triger was as simple as in the sentence

set @.missingbm = 0X00

include the right value from the sysmergearticles missing_cols column instead of 0X00.

this is not an issue on sql server 2005 since the update triger does not have contants inside, it gets the values directly from the table sysmergearticles.

if you read the code inside these store procedure sp_MSaddmergetriggers and sp_MSaddupdatetrigger from the master database in the system store procedure you will see that this store procedure were made in rush. ON sql 2005 they took time to build this procedures.

|||Thanks Pedro for looking into this, I'll look into this as well to see if this is truly and bug and see if it has been addressed in any existing QFE or not. I can assure you that the stored procedure was not made into a rush, it's just that in SQL 2005 there was a lot more to take into account and in 5 years we had more time and more eyes to work on it . Understand that you're on SQL 2000 SP4, and you most likely hit an untested or uncommon scenario, if it was very common it most likely would have been hit and reported a long time ago.|||

Thanks Greg for looking into the post. Rush or not I have no intentions of blame anybody, any software have bugs that is an inherited from complex systems and thousands of lines of code.

I would explain my scenario so you might want to use on your testing environment (I’m assuming that you work for Microsoft). As you said this is not something common and I can’t replicate the problem at will. I have a few servers with the same data base and same publication not all of the servers were affected with this problem, something that tells me that this behavior can’t be replicated easily. To make clear my problem I would say that I have a non-convergence between subscriber and publisher in a pull subscription on the update statement. If the row is inserted at the subscriber any update made at this row in the subscriber will replicate to the publisher but not the other way around , and vice versa if the row is inserted at the publisher any update in the publisher is replicated at the subscriber but the update at the subscriber for this row inserted in the publisher is not replicated at the publisher. Even if I update the row at both the publisher and the subscriber I have no conflicts and I should have since is the same rowguid.

All this start happening after I upgrade my servers to SP4 from SP3 and use the sp_repladdcolumn to add 5 columns to one of my tables. The table needs especial conditions to be involved on this behavior on my case where 5 tables with these conditions and only 2 tables were affected.

1. The tables need to have at least one columns deleted and a few added after. This means that the max(colid) from syscolumns for that table needs to be different than the count(colid) from syscolumns.

This query will tell you those tables.

SELECT *

FROM sysmergearticles

WHERE objid in (SELECT id

FROM syscolumns

GROUP BY id

HAVING max(colid) <> count(colid) )

ORDER BY name

2. the Publication need to be merge replication with pull suscribers, column tracking need to be true in the merge articles and the merge article need to have sub filter query.

3. al least one article needs to have a vertical partition , this will happen any way if you use the sp_repladdcolumn.

4. I don’t know if this is necessary or not but you may need it, the rowguid column in the merge articles can’t be at the end.

On my impresion the problem is that colv1 value (this value is in the msmerge_contents table) generated ussing the missing columns values from the mergearticles table , is generated incorrectly since the value for @.missingbm is incorrectly, and this value is use in the update trigger on this sentence.

set @.cv = { fn UPDATECOLVBM(@.cv, @.nick, @.bm, @.missingbm, { fn GETMAXVERSION(@.lineage) }) }

Another thing that update tiger for the merge articles were correclty generatead in the suscriber but incorreclty in the publisher that is why the colv1 values are diferents between the publisher ans suscribers.

I will post another comment explaing the code inside the store procudure sp_MSaddmergetriggers and sp_MSaddupdatetrigger and how this was done different in 2005 for column leveltracking merge replication

Hope this help. Regards

Pedro R. Lopez.

|||

Pedro, thanks for the feedback. If possible, could you please submit your feedback to the SQL Server Feedback site, this way it gets logged in our bugs database and is on everyone's radar:

http://connect.microsoft.com/SQLServer/Feedback

|||

I think I've run into a similar problem.

I have a table that is being replicated through merge replication. It has been a while since it was first published, so some columns have been added and removed.

A few weeks ago, SQL Server 2000 Service Pack 4 was installed on the server.

Today I got a complaint that one of the columns was not being updated in the subscriber.

While running a trace on the server, I noticed that after updating one column in the table, the stored procedure that does the updating (the sp_upd_<guid> procedure) was called as if the column "next to" the updated column was updated.

Doing some research on this problem, I came across your post

My situation is a bit different from the one described here, this is what happened at my end:

- The table was first already part of a transactional replication
- some columns were added and removed
- then the table was also published via merge replication (push) to another server / database
- one column was added to the table
- Service pack 4 was installed

Things that are different:

- The query posted at point 1) to get the suspect articles doesn't return all the articles when run on the subscriber.
- There are no pull subscribers, only one push-subscriber. No filtering is done.

The solution posted here (manually altering the @.missingbm to the value found in sysmergearticles.missing_cols in the update trigger on the table) works in the Publisher, because there the whole history of the table (added and removed columns) is known.

In the subscriber, this doesn't work, because the missing_cols doesn't contain a value other than the default (0x00). I might be able to use the values from the Publisher, but I'm not 100% sure that this would work. If I try it and it's not the right value, I might be in even deeper trouble.

This has happened on two servers (our development server, and at a client), so this may be a reproducable problem. Our server was updated from service pack 3, at the client service pack 2 was installed before updating to sp4.

So, to summarize my question: What do I have to change to make an update in the subscriber work? You seem to have done a lot of research into this problem, so you might be able to help me out.

Kind regards,

Bart Holthuijsen

|||

Bart keep in the suscriber the value 0x00 since this values is local to the database , but it confuse me because the same table structure shuold be at both the publisher and the suscriber. Now, can you test in your system if the non-convergence problem persist for new rows inserted after you made de fix ? in both direction suscriber-publisher and publusher-suscriber. for the old row that have this problem before, the problem will persist since the colv1 columns in the merge contents table is corructed, maybe deleting this rows and re-inserting again will solve your problem, I hope you know the rows with the problem, this rows are any rows inserted or updated after you install the Sp4 or since you detect the problem.

any other question let me know.

|||Here are the test results for today.

query to get columns in both tables (run at the publisher):
declare @.TableName sysname
set @.TableName = 'Werkrapportveld'
declare @.LocalTableName nvarchar(256)
SET @.LocalTableName = N'<PUBLISHER>.dbo.' + @.TableName
declare @.RemoteTableName nvarchar(256)
SET @.RemoteTableName = N'<SUBSCRIBER>.dbo.' + @.TableName

select @.TableName as TableName, s1.name as ColumnName, s1.colid as LocalColID, s2.colid as RemoteColID
FROM syscolumns s1
left outer join <SUBSCRIBER>.dbo.syscolumns s2
on s1.name = s2.name
-- uncomment this to only get columns with different ids (these are the columns that will not be updated correctly)
--and s1.colid <> s2.colid
where s1.id = object_id(@.LocalTableName)
AND s2.id = object_id(@.RemoteTableName)
order by s1.colid

Results:

TableName ColumnName LocalColID RemoteColID
Werkrapportveld WerkrapportVeldID 1 1
Werkrapportveld Omschrijving 2 2
Werkrapportveld DataTypeID 3 3
Werkrapportveld DatumVanaf 4 4
Werkrapportveld DatumTot 5 5
Werkrapportveld InvoerMedewerkerID 6 6
Werkrapportveld MutatieMedewerkerID 7 7
Werkrapportveld InvoerDatum 8 8
Werkrapportveld MutatieDatum 9 9
Werkrapportveld DataTypeFormatID 11 10
Werkrapportveld rowguid 12 11
Werkrapportveld Format 14 12

The create date of the update triggers:

<PUBLISHER data>:
select crdate
from sysobjects
where name = 'upd_6E0445E642B0409C9CB2A4115359CA75'
2006-06-01 10:48:52.140

Note: this was the day Service Pack 4 was installed

code in the update trigger:
/* only do the map down when needed */
set @.missingbm = 0x00

select missing_cols,missing_col_count,name
from sysmergearticles where name = 'WerkrapportVeld'

missing_cols missing_col_count name
0x0012 2 WerkrapportVeld

<SUBSCRIBER data>:
crdate
2006-06-01 10:49:30.220
(sp4 install date)

update trigger:
/* only do the map down when needed */
set @.missingbm = 0x00

missing_cols missing_col_count name
0x00 0 WerkrapportVeld

** We now add a new column **

exec sp_repladdcolumn
@.source_object = 'WerkrapportVeld'
, @.column = 'dummy'
, @.typetext = 'tinyint null'
, @.publication_to_add = default
, @.schema_change_script = null
, @.force_invalidate_snapshot = 1
, @.force_reinit_subscription = 0
GO

output:
Cannot add rows to sysdepends for the current stored procedure because it depends on the missing object 'sp_sel_6E0445E642B0409CCF919DB8A88D4CB5_pal'. The stored procedure will still be created.
/*after running the procedure, the sp_sel_6E0445E642B0409CCF919DB8A88D4CB5_pal does exist, so that warning can probably be ignored*/

<PUBLISHER data>:
crdate : 2006-11-03 12:03:52.653

update trigger:
/* only do the map down when needed */
set @.missingbm = 0x0012

missing_cols missing_col_count name
0x0012 2 WerkrapportVeld

<SUBSCRIBER data>:

crdate : 2006-11-03 12:06:12.577

update trigger:
/* only do the map down when needed */
set @.missingbm = 0x00

missing_cols missing_col_count name
0x00 0 WerkrapportVeld

Columns: same, but one exta column
TableName ColumnName LocalColID RemoteColID
Werkrapportveld dummy 15 13

Updates that happen at the publisher work, changes are replicated to the subscriber.

Subscriber updates still go wrong:
When the Format column is updated, the update procedure for rowguid is called (rowguid in the publisher has the colid of Format in the subscriber)
When the dummy column is updated, the update procedure is called for no column (all parameters are 'default'). I think this happens because the dummy column has id 13 in the subscriber. In the publisher, there is no column with that id, so no column is updated.

** Drop the column **

exec sp_repldropcolumn @.source_object = 'WerkrapportVeld'
, @.column = 'Dummy'
, @.force_invalidate_snapshot = 1
, @.force_reinit_subscription = 0
GO

output:
Cannot add rows to sysdepends for the current stored procedure because it depends on the missing object 'sp_sel_6E0445E642B0409CCF919DB8A88D4CB5_pal'. The stored procedure will still be created.
/*after running the procedure, the sp_sel_6E0445E642B0409CCF919DB8A88D4CB5_pal does exist, so that warning can probably be ignored*/

<PUBLISHER data>:
crdate : 2006-11-03 12:49:59.483

update trigger:
/* only do the map down when needed */
set @.missingbm = 0x0052

missing_cols missing_col_count name
0x0052 3 WerkrapportVeld

<SUBSCRIBER data>:

crdate : 2006-11-03 12:51:39.030

update trigger:
/* only do the map down when needed */
set @.missingbm = 0x0010

missing_cols missing_col_count name
0x0010 1 WerkrapportVeld

Update at the publisher still works correctly.

Subscriber update:
When the Format column is updated, the update procedure for rowguid is called (rowguid in the publisher has the colid of Format in the subscriber)

This is still the same as before adding and removing the dummy column.

Conclusions:
The good news seems to be that the update triggers in the publisher and subscriber databases are only wrong after service pack 4 is installed (bug in the service pack scripts?). If a column is added or removed by the replication procedures after the install, the triggers will be fixed.
Unfortunately, updates that happen at the subscriber are still wrong. Columns that are updated that have a different ColID than in the publisher will update a different column, or no column at all if no column with that id exists in the publisher.

I still don't know how to fix the problem with updates on the subscriber. After adding the dummy column (which alters the update trigger), the trigger code is correct. But somewhere down the line these correct values are used to call the update stored procedure on the publisher, and there the wrong parameters are used. I haven't found out yet how the parameters for the update are determined.

I hope this will help someone determine what the problem is and hopefully find a fix.|||I don't know if anyone is still following this thread hijack, but here are some new findings.

The table trigger at the subscriber is still not correct. It has been altered to work correctly if the newly dropped column was the only column that was dropped, but that is not the case. Any further modification to the database schema seem to work okay too, they just start at the drop of the Dummy column in the example code posted above.

The table triggers for insert / update and delete can be regenerated by calling the "sp_MSaddmergetriggers" system stored procedure for the tables to do. This is easier (and safer) than manually altering the triggers.

I've made a script to do this. It recreates the triggers for all articles that have missing columns.
This should be executed at the publisher and at the subscriber(s).

DECLARE @.Error INT;
SET @.Error = 0;

DECLARE @.TableName SYSNAME;

-- create the cursor
IF(@.Error = 0)
BEGIN
DECLARE Cur_Recreate_Trigger
CURSOR LOCAL FAST_FORWARD FOR
-- get the name of the tables with missing columns
SELECT name
FROM sysmergearticles
WHERE missing_col_count > 0;
SET @.Error = @.@.Error;
END;

--Open the cursor
IF(@.Error = 0)
BEGIN
OPEN Cur_Recreate_Trigger;
SET @.Error = @.@.Error;
END;

IF(@.Error = 0)
BEGIN
-- get the first data
FETCH NEXT FROM Cur_Recreate_Trigger INTO @.TableName;
SET @.Error = @.@.Error;

-- while results found:
WHILE(@.Error = 0 and @.@.Fetch_Status <> -1)
BEGIN
IF((@.@.Fetch_Status <> -2) AND (@.Error = 0))
BEGIN
-- show the table to do
RAISERROR(@.TableName,0,1) WITH NOWAIT;
-- re-create the triggers
EXEC @.Error = dbo.sp_MSaddmergetriggers @.TableName;
END;

IF(@.Error = 0)
BEGIN
-- get the next tablename to do
FETCH NEXT FROM Cur_Recreate_Trigger INTO @.TableName;
SET @.Error = @.@.Error;
END;
END;
END;

After the script has run, the @.missingbm in the triggers have the value that can be found in the sysmergearticles table. Unfortunately, in my case, the information in the subscriber database is incorrect.

While doing some more testing, I found out that during the initial initialization, the missing_col_count column in the subscriber is the same as in the publisher. Since all further database schema modifications were done by the replication stored procedures (sp_repladdcolumn and sp_repldropcolumn) the value for missing_col_count should be the same in both the publisher and the subscriber.

The following query SHOULD return no rows:
SELECT s1.name, s1.missing_col_count, s2.missing_col_count
FROM <publisher>.dbo.sysmergearticles s1
INNER JOIN <subscriber>.dbo.sysmergearticles s2
ON s1.name = s2.name
WHERE s1.missing_col_count <> s2.missing_col_count

Unfortunately for me, some rows are returned for my databases.
I know this can be "fixed" by reinitializing the subscriber, but I'd rather not do that, because that would mean I'd have to take the database offline while doing the re-init. Just doing the re-init also doesn't guarantee that it won't happen again in the future, just that it is fixed for the moment.

Since this problem only affects a few columns in my case, I'd much rather make SQL Server update the missing_col_count and missing_cols in the sysmergearticles table and manually choose which data to use by updating the column I want to keep (update tablename set column = column where <some criteria>). Replication would take care of updating the other database.

Now I just have to find a way to have SQL Server update the sysmergearticles table.

Does anyone know how to do this?

Alternatively, I could manually update the data in the sysmergearticles table, but I haven't figured out what values to use for the columns. Plus, updating system tables is a big no-no.

Any help?|||

Bart on my opinion this is a bug, Problem that you may not be able to solve.

If is possible for you to drop the publication and recreate it again, do it. Before re-create it make sure that you re-arrange the columns in your table to have no gap's in syscolumns, you could do this by script out you table, move the data to a temporaly table and recreate the table move the data back. do this for any table that have the missing_cols > 0 on sysmergeartcicles.

after that , reinit your suscriber. having 0 in missing columns for all your replicated tables is the best.

if you find any solution please , post it into this treat.

|||

There are known bugs in SQL Server 2000 with vertical partitioning and DML.

The whole feature is redesigned in SQL Server 2005 and customers should be moving to SQL Server 2005 when they can.

If this is a serious issue, I would recommend contacting CSS.

|||Thank you both for the feedback.

I fixed this (for now) by re-initializing the merge replication. I'll try not to drop any columns until we've upgraded to SQL Server 2005. I've had my eye on some of the new features in '05 anyway, so hopefully this will convince someone further up the ladder.

I couldn't re-create the tables on the publisher to fill the gaps in the column ids, because the tables are part of another publication too. Plus: this is live data, so the users would not be very happy if I took the database down.

No comments:

Post a Comment