Contents
Issues
This project arose from a request Fix & Merge Hundreds of Newspapers.com Sources in the Forum. The poster is a heavy consumer of the Newspapers.com sources through RM’s TreeShare with Ancestry.com and had issues with:
- A long Source List in the application and repetitiously long report Bibliographies due to a different Master Source for each page of a newspaper.
- Repetitious listing of “Newspapers.com” in Source Names and in the Title in Bibliographies and in Footnotes. Her approach was to manually delete it in every Master Source but still had hundreds to do.
- Leading punctuation in the Footnotes and Bibliographies because the Author value is empty in sources from this Ancestry collection.
- “N.p.” and “n.d.” notations in Footnotes and Bibliographies when a value for Publisher, Publish Place or Publish Date is empty.
Solution
Because the sources were imported via TreeShare, they are Ancestry Record type, i.e., they are created using the built-in Ancestry Record Source Template. Built-in Source Templates are uneditable through the RM user interface but are defined in the same table that holds user-defined templates. Thus, the built-in templates can be modified by using SQLite to edit entries in the SourceTemplateTable. We can address Issues #3 and #4 by modifying the Footnote and Bibliography sentence templates in the Ancestry Record template. That will be of benefit also to citations having empty values from some other Ancestry Collections (see Ancestry TreeShare – Impact).
Issues #1 and #2 are more challenging because the values of the source and citation variables that appear in the Footnote and Bibliography sentences are stored in a XML data structure. To solve #1, we want to “lump” all citations of a given newspaper Title under one Master Source. That requires that the data that differentiates the Master Sources for a common newspaper must be deleted or transferred from the Master Source to the Citation Details. For example, the Page # must be extracted from the Source Name in the SourceTable and moved to the Detail ([Page] variable in XML) for each Citation of that Source in the CitationTable. There are more steps than that alone for each of that one newspaper’s multiple Master Sources and Citations.
Once all the data manipulations are complete, there will be multiple identical Master Sources for a given newspaper Title. RM’s AutoMerge Sources function can finish the job.
Before/After Screenshots
The database undergoing modification was from RM7, hence the screenshots are of RM7. However, the solution also works with RM8 and RM9.
Before
Transition
After
Download Scripts
Procedure
- Backup your database in case you need to revert to it.
- Open your database with a SQLite manager having RMNOCASE – faking it in SQLiteSpy or RMNOCASE – faking it in SQLite Expert, command-line shell et al and supporting the REGEXP_REPLACE() function.
- Load and execute Sources-NewspapersCom-LumpClean.sql.
- If the Ancestry Record source template does not have ” – cleaned” appended to it, load and execute SourceTemplate-AncestryRecord-cleaned.sql.
- On returning to RM, run Rebuild Indexes in Database Tools.
- In RM, open the Source List and run AutoMerge.
- If you have two or so remaining sources for the same newspaper using the Ancestry Record template and you wish to have only one, use RM’s Manual Merge for Sources.
- Repeat after you have added more Newspapers.com sources via TreeShare.
Notes
- Should you have reason to revert the Ancestry Record source template to the format supplied by the application, load and execute in your SQLite manager SourceTemplate-AncestryRecord-Reset.sql, edited to find a RM database file of the same major version number to fetch the built-in format.
- Should you upgrade or drag’n’drop to another database, the “Ancestry Record – cleaned” template will revert to the built-in format. Run step #4 on the target database to restore it.
- The user reported that TreeShare does not report any change as a consequence of this procedure; it would seem to rely solely on the link to the Ancestry Record stored in the RM7 LinkAncestryTable (AncestryTable in RM8, RM9).
- The procedures should work also on RM8 and RM9.
- The main script is not what I would call ‘elegant’. It grew like Topsy as I explored the database and evolved the process through a sequence of building blocks. Someone cleverer than I with SQLite might well produce a better, faster version.
thejerrybryan
20 August 2016 00:18:32
The reports that I post process are usually 50 to 60 pages, so the scale is not quite so grand as yours. Nevertheless, I try to do as much of the cleanup as possible with global replaces. If you want to go that route, you will have to play with it to see what works for your needs and what doesn’t. I usually try to do the global replaces in a text editor, processing the RTF file produced by RM before it has been touched by Microsoft Word. That requires learning a little bit about RTF tags. The other approach (and sometimes I do both) is to do the global replaces from within Microsoft Word. You have to turn on Word’s equivalent of what WordPerfect used to call “Reveal Codes” to expose white space characters to global replace. It’s tricky business, either with a text editor or with Word, but if you can figure out what meets your needs you can save a huge amount of time with global replaces. In case you want to try the RTF file plus text editor option, I wouldn’t recommend Notepad for a file that big. I would recommend Notepad++ instead – a very powerful and free text editor.
On your other questions, I really haven’t looked at them. What you are doing seems much fancier than what I do, and I have to admit that I don’t use Tom’s script. I still do it kind of manually where I add dummy children to anybody I want to force into the next generation in a narrative report. I leave such dummy children in my production database at all times and when I’m preparing a report for a family reunion I copy my production database into a reporting database that I can manipulate as necessary to produce the report. I use File->Copy, not Drag-and_Drop or GEDCOM Export/Import to make the copy. Then, I have an SQLite script that essentially deletes the dummy children without also deleting the FamilyTable entries for the parent of the dummy children. You can’t do the delete of dummy children that I’m talking about from within RM itself because RM is smart enough to also delete the FamilyTable entry for the person to be carried into the next generation, and using the FamilyTable entries to carry the person into the next generation is the whole point of the exercise. So my delete of the dummy children is an incomplete delete which is exactly what is needed. But I’m afraid that my rather manual technique doesn’t scale up very well to your use case. That’s part of the reason that Tom wrote his script.
Jerry