Remove Duplicate Places and Duplicate Place Details

Quote from salkind on 2021-10-14, 6:17 pmAttached find a script that I used to remove duplicate places and duplicate place details records in an RM8 database. Maybe it will be helpful to others as a starting point.
Attached find a script that I used to remove duplicate places and duplicate place details records in an RM8 database. Maybe it will be helpful to others as a starting point.
Uploaded files:
Quote from Chad Kurszewski on 2021-11-29, 9:40 amThis is exactly what I was looking for...very timely!
I have a large imported GEDCOM that is plagued with poor standardization of place names. PlaceClean does a good job at fixing many of these, which is still fairly manually intensive, but the manual Merge is far more laboriously intensive. I have PlaceCleaned A-F and this script reduced unique Places by nearly 10%, from 18k to 16.5k entries.
Thanks!
Chad
This is exactly what I was looking for...very timely!
I have a large imported GEDCOM that is plagued with poor standardization of place names. PlaceClean does a good job at fixing many of these, which is still fairly manually intensive, but the manual Merge is far more laboriously intensive. I have PlaceCleaned A-F and this script reduced unique Places by nearly 10%, from 18k to 16.5k entries.
Thanks!
Chad

Quote from Tom Holden on 2021-12-30, 11:42 pmThanks @salkind for posting this. I don't recall seeing a need for merging duplicates pre-RM8 and never addressed it. Do you know why it became a need in your case?
Perhaps @we9v describes it - RM8's manual merge being laborious. I've seen some commentary on it being more so than in RM7. Must confess that I am an infrequent RM8 user at this stage.
Thanks @salkind for posting this. I don't recall seeing a need for merging duplicates pre-RM8 and never addressed it. Do you know why it became a need in your case?
Perhaps @we9v describes it - RM8's manual merge being laborious. I've seen some commentary on it being more so than in RM7. Must confess that I am an infrequent RM8 user at this stage.

Quote from Matthew Friend on 2022-06-27, 1:01 pmHas there been any changes to the RM8 database since this was written? I tried running the script in two different SQLite IDE's and it doesn't seem to find anything (though I have hundreds of dups after placecleaning and merging gedcoms in)?
Has there been any changes to the RM8 database since this was written? I tried running the script in two different SQLite IDE's and it doesn't seem to find anything (though I have hundreds of dups after placecleaning and merging gedcoms in)?

Quote from Tom Holden on 2022-06-27, 9:07 pmNo changes that I can recall and the script runs without SQLite error on a database created under RM 8.1.8.
The lack of reported duplicates in the zDupPlaceTable and zDupPlaceDetailTable inyour case despite believing you have many is likely due to the rigorous test for duplication. Almost every field in each record of the PlaceTable is tested; to be declared a duplicate place means that every field tested in one record has to match every field in another. Because you have been dealing with multiple sources of data for Places, you may well have cases where the Name fields match, but one of Abbrev, Normalized, Latitude, Longitude et al does not. You could edit the script to comment out those fields that you don't wish to test. @salkind kindly formatted the script so that each field test is on a separate line. Just preface a line with two consecutive hyphens to comment it out.
To compare on Name alone, you could shorten the statement that creates the zDupPlaceTable to:
DROP TABLE IF EXISTS zDupPlaceTable; CREATE /* TEMP */ TABLE zDupPlaceTable AS select min(p1.placeid) primaryid, p2.PlaceID dupid, p1.name primaryname from PlaceTable p1, PlaceTable p2 where p1.PlaceType = p1.PlaceType and p1.Name = p2.Name and p1.PlaceID < p2.PlaceID and p1.PlaceType = 0 group by p2.placeid order by primaryname;
No changes that I can recall and the script runs without SQLite error on a database created under RM 8.1.8.
The lack of reported duplicates in the zDupPlaceTable and zDupPlaceDetailTable inyour case despite believing you have many is likely due to the rigorous test for duplication. Almost every field in each record of the PlaceTable is tested; to be declared a duplicate place means that every field tested in one record has to match every field in another. Because you have been dealing with multiple sources of data for Places, you may well have cases where the Name fields match, but one of Abbrev, Normalized, Latitude, Longitude et al does not. You could edit the script to comment out those fields that you don't wish to test. @salkind kindly formatted the script so that each field test is on a separate line. Just preface a line with two consecutive hyphens to comment it out.
To compare on Name alone, you could shorten the statement that creates the zDupPlaceTable to:
DROP TABLE IF EXISTS zDupPlaceTable; CREATE /* TEMP */ TABLE zDupPlaceTable AS select min(p1.placeid) primaryid, p2.PlaceID dupid, p1.name primaryname from PlaceTable p1, PlaceTable p2 where p1.PlaceType = p1.PlaceType and p1.Name = p2.Name and p1.PlaceID < p2.PlaceID and p1.PlaceType = 0 group by p2.placeid order by primaryname;

Quote from Matthew Friend on 2022-06-27, 10:00 pmThank you Tom, your code worked perfectly.
That removed a little over 630 duplicates 🙂
Matthew
Thank you Tom, your code worked perfectly.
That removed a little over 630 duplicates 🙂
Matthew

Quote from Tricia Aanderud on 2024-06-09, 12:41 pmI was not able to get the query to work with MIN function.
So I did a work around to create a temp table that I loaded with the duplicates. A few extra steps - but as far as I can tell it worked.
Note: this program only looks at the name - it's not strict at all.
Any feedback welcomed.
I was not able to get the query to work with MIN function.
So I did a work around to create a temp table that I loaded with the duplicates. A few extra steps - but as far as I can tell it worked.
Note: this program only looks at the name - it's not strict at all.
Any feedback welcomed.
Uploaded files: