Forum

Please or Register to create posts and topics.

Remove Duplicate Places and Duplicate Place Details

Attached find a script that I used to remove duplicate places and duplicate place details records in an RM8 database.  Maybe it will be helpful to others as a starting point.

Uploaded files:
Tom Holden has reacted to this post.
Tom Holden

This is exactly what I was looking for...very timely!

I have a large imported GEDCOM that is plagued with poor standardization of place names.  PlaceClean does a good job at fixing many of these, which is still fairly manually intensive, but the manual Merge is far more laboriously intensive.  I have PlaceCleaned A-F and this script reduced unique Places by nearly 10%, from 18k to 16.5k entries.

Thanks!
Chad

Thanks @salkind for posting this. I don't recall seeing a need for merging duplicates pre-RM8 and never addressed it. Do you know why it became a need in your case?

Perhaps @we9v describes it - RM8's manual merge being laborious. I've seen some commentary on it being more so than in RM7. Must confess that I am an infrequent RM8 user at this stage.

Has there been any changes to the RM8 database since this was written?   I tried running the script in two different SQLite IDE's and it doesn't seem to find anything (though I have hundreds of dups after placecleaning and merging gedcoms in)?

No changes that I can recall and the script runs without SQLite error on a database created under RM 8.1.8.

The lack of reported duplicates in the zDupPlaceTable and zDupPlaceDetailTable inyour case despite believing you have many is likely due to the rigorous test for duplication. Almost every field in each record of the PlaceTable is tested; to be declared a duplicate place means that every field tested in one record has to match every field in another. Because you have been dealing with multiple sources of data for Places, you may well have cases where the Name fields match, but one of Abbrev, Normalized, Latitude, Longitude et al does not. You could edit the script to comment out those fields that you don't wish to test.  @salkind kindly formatted the script so that each field test is on a separate line. Just preface a line with two consecutive hyphens to comment it out.

To compare on Name alone, you could shorten the statement that creates the zDupPlaceTable to:

DROP TABLE IF EXISTS zDupPlaceTable;
CREATE /* TEMP */ TABLE zDupPlaceTable
AS
    select min(p1.placeid) primaryid, p2.PlaceID dupid, p1.name primaryname
    from PlaceTable p1,
         PlaceTable p2
    where p1.PlaceType = p1.PlaceType
      and p1.Name = p2.Name
      and p1.PlaceID < p2.PlaceID
      and p1.PlaceType = 0
    group by p2.placeid
    order by primaryname;

Thank you Tom, your code worked perfectly.

That removed a little over 630 duplicates 🙂

Matthew