Sources – Newspapers.com – Cleaner Footnotes and Simpler Bibliography

Issues

This project arose from a request Fix & Merge Hundreds of Newspapers.com Sources in the Forum. The poster is a heavy consumer of the Newspapers.com sources through RM’s TreeShare with Ancestry.com and had issues with:

  1. A long Source List in the application and repetitiously long report Bibliographies due to a different Master Source for each page of a newspaper.
  2. Repetitious listing of “Newspapers.com” in Source Names and in the Title in Bibliographies and in Footnotes. Her approach was to manually delete it in every Master Source but still had hundreds to do.
  3. Leading punctuation in the Footnotes and Bibliographies because the Author value is empty in sources from this Ancestry collection.
  4. “N.p.” and “n.d.” notations in Footnotes and Bibliographies when a value for Publisher, Publish Place or Publish Date is empty.

Solution

Because the sources were imported via TreeShare, they are Ancestry Record type, i.e., they are created using the built-in Ancestry Record Source Template. Built-in Source Templates are uneditable through the RM user interface but are defined in the same table that holds user-defined templates. Thus, the built-in templates can be modified by using SQLite to edit entries in the SourceTemplateTable. We can address Issues #3 and #4 by modifying the Footnote and Bibliography sentence templates in the Ancestry Record template. That will be of benefit also to citations having empty values from some other Ancestry Collections (see Ancestry TreeShare – Impact).

Issues #1 and #2 are more challenging because the values of the source and citation variables that appear in the Footnote and Bibliography sentences are stored in a XML data structure. To solve #1, we want to “lump” all citations of a given newspaper Title under one Master Source. That requires that the data that differentiates the Master Sources for a common newspaper must be deleted or transferred from the Master Source to the Citation Details. For example, the Page # must be extracted from the Source Name in the SourceTable and moved to the Detail ([Page] variable in XML) for each Citation of that Source in the CitationTable. There are more steps than that alone for each of that one newspaper’s multiple Master Sources and Citations.

Once all the data manipulations are complete, there will be multiple identical Master Sources for a given newspaper Title. RM’s AutoMerge Sources function can finish the job.

Before/After Screenshots

The database undergoing modification was from RM7, hence the screenshots are of RM7. However, the solution also works with RM8 and RM9.

Before

Example of one source in the original database. Note that the Source Name and the Title variable (italics) are identical and contain the unwanted “Newspapers.com”, the title of the newspaper, the publish date and the page number. The [Page] variable at the Citation level contains description of the item of interest and the publish date (repeated from the [Title] variable). All three sentences have unwanted leading punctuation and white space.
A Master Source for each page cited from a given newspaper. This example of ‘extreme splitting’ of sources is perfectly acceptable for some users while, for others, the long Source List and report Bibliographies are objectionable and ‘lumping’ to one Master Source per newspaper is preferred.

Transition

These Before/After shots of the Edit Source window show the operations needed to prepare Sources and Citations for lumping Sources by Newspaper Title and the resulting sentence previews from an improved Ancestry Record source template.

After

Now just one Arizona Republic in the Source List instead of many individual Page #’s. In some cases such as the Arizona Daily Star at the top of the list, RM’s Source AutoMerge leaves two Master Sources that look identical and it is necessary to Manual Merge the two to end with just one. Despite fields looking identical in the Source Editor, AutoMerge compares the full XML strings of each source and there’s no match if the order of otherwise identical fields is different.

Download Scripts

Procedure

  1. Backup your database in case you need to revert to it.
  2. Open your database with a SQLite manager having RMNOCASE – faking it in SQLiteSpy or RMNOCASE – faking it in SQLite Expert, command-line shell et al and supporting the REGEXP_REPLACE() function.
  3. Load and execute Sources-NewspapersCom-LumpClean.sql.
  4. If the Ancestry Record source template does not have ” – cleaned” appended to it, load and execute SourceTemplate-AncestryRecord-cleaned.sql.
  5. On returning to RM, run Rebuild Indexes in Database Tools.
  6. In RM, open the Source List and run AutoMerge.
  7. If you have two or so remaining sources for the same newspaper using the Ancestry Record template and you wish to have only one, use RM’s Manual Merge for Sources.
  8. Repeat after you have added more Newspapers.com sources via TreeShare.

Notes

  1. Should you have reason to revert the Ancestry Record source template to the format supplied by the application, load and execute in your SQLite manager SourceTemplate-AncestryRecord-Reset.sql, edited to find a RM database file of the same major version number to fetch the built-in format.
  2. Should you upgrade or drag’n’drop to another database, the “Ancestry Record – cleaned” template will revert to the built-in format. Run step #4 on the target database to restore it.
  3. The user reported that TreeShare does not report any change as a consequence of this procedure; it would seem to rely solely on the link to the Ancestry Record stored in the RM7 LinkAncestryTable (AncestryTable in RM8, RM9).
  4. The procedures should work also on RM8 and RM9.
  5. The main script is not what I would call ‘elegant’. It grew like Topsy as I explored the database and evolved the process through a sequence of building blocks. Someone cleverer than I with SQLite might well produce a better, faster version.

Source Names – Append Surnames

Sources view in RM9 after running the script to append Surnames of people using each Source.

In response to a request in the Forum, Adding Surname to Truncated Source Names in RM9, posted here is a script that may be of use to others. The user’s objective is to facilitate distinguishing which Master Sources (among many similar source names derived from imports from FamilySearch, Ancestry, et al) are relevant to families or persons of interest without clicking down through Citations, Citation Uses and dead ends from which he needs to back out to go onto the next. With 32k Sources, 32k Citations (he’s a ‘source-splitter’) and 113k ‘uses’ among 7k people and 3k families, one can appreciate the scope of his challenge.

Description

This batch script appends to the Source Name the unique surnames of all persons in the database who have a citation or ‘use’ of a Master Source in their profile. The list of surnames is enclosed in drawing symbols: ╣surnamelist╠ acting as bookends. The resulting extended name is gracefully truncated at 256 characters if caused by the operation, the maximum RM9 supports in a drag’n’drop transfer; the value is easily changed in four places in the same statement.

The script execution creates a series of temporary Views (in-memory queries) to build
the final View “TruncNewName” from which the SourceTable is updated with the
surnamelist╠ appended. These Views are lost when the SQLite manager closes the database.

At the start of the script after REINDEXing against the fake RMNOCASE collation, the SourceTable is updated with Names stripped of previous ╣surnamelist╠. Therefore, the script can be rerun again after changes have been made in the database.

Because the database has been REINDEXed against the fake RMNOCASE, the RM Rebuild Indexes tool must be run on returning to work on the database with the RootsMagic app.

Execution time for the script on the sample database with 32k sources is 20-25 seconds on a 5-yr old i5, middling laptop with HDD so it is not prohibitively long to use repeatedly. I do not know how the time would scale with larger databases.

Requires

Requires a SQLite manager that has a “fake RMNOCASE collation” and supports REGEXP_REPLACE(). Script was developed and tested with SQLiteSpy 1.9.16 64-bit and the fake RMNOCASE extension from
RMNOCASE – faking it in SQLiteSpy. Backup before using!

Script File Download

Further

For source ‘lumpers’, the string of surnames might well be too long for the Source Name. Even for ‘splitters’, some source such as a census page for a residential school, hospital, prison could have a long list of surnames. The current Sources view in RM9 is amenable to showing only 100-150 characters depending on screen size and the position of the divider while a drag’n’drop truncates at 256. That may call for two things:

  1. A similar procedure to append surnames to the Citation Name instead of the Surname which still risks field overflow and truncation for heavily ‘reused’ Citations, e.g., the above example and those sometimes resulting from the “Merge all duplicate citations” tool.
  2. An enhancement to RM to provide a popout of the full content of the Source and Citation Names when hovered over or selected.

Working on this project has got me thinking about a means to differentiate what RM thinks are duplicate citations when they differ in Media or Web tags. Stay tuned…

And please comment here or in the Forum on whether this script does anything positive or negative for you!

Source Citation Sort RM8 #sources #citations

Issue

As for earlier versions, the order in which Citations for a given fact are presented in RM8 is the order in which they were added. Some users wish for them to be sorted alphabetically by name.

Solution

A brute force method was described for earlier versions at Source Citation Sort . In that case, the name of the Master Source was presented in Citation Manager so the CitationTable itself had to be reordered according to the Source Name and required links from tables for media and web tags to track the changes. With the introduction of TreeShare, there was a further complication for links between Citations and Ancestry.com. Revisions to the database structure in RM8 cause that script to fail.

Luckily, those changes in database design for RM8 that enabled reusable citations (master citation concept) make the job of sorting the citations easier and faster than for pre- RM8. It is the new CitationLinkTable that must be re-ordered and that has no knock-on effect for media, web tags or Ancestry.com links which remain linked to the CitationTable which does not need to be re-ordered. The new CitationName field requires that it be the second sorting field after the Source Name.

To sort citations in a RM8 database, load the following script into a SQLite manager with a fake #RMNOCASE extension and execute it against the target .rmtree file (backup first!). Note that the database file likely must not be open in RootsMagic as it may cause a ‘file is locked’ error on the outboard SQLite. When you reopen the database in RM8, run Files > Tools > Rebuild Indexes to clear out indexing errors from the outboard process reported by RM8 Test Integrity.

This post is the outcome of the Forum discussion CitationSort.sql giving an error when run..,

A Sample Query Created with Views #events #colorcoding #sql #views #citations #sources

I don’t post here as often as I probably should. I find that most of my SQL queries are quite specific to my own personal research and are of little general interest. On the other hand, Tom has a genius for coming up with queries that are of very general utility. The queries I’m going to post in this particular message are probably very much in the category of being quite specific to my own personal research, but perhaps some of the techniques I’m using might be of interest to others. In particular, for complex queries I now use SQL VIEW statements to create a number of views, and I then construct my queries from these views. This breaks the query into smaller pieces, which I find simpler. I’m just a simple country boy, so I like simple.

Having said that, breaking complex tasks into many smaller pieces can be taken too far, to the point that managing the pieces can become more complex than the original problem was. The reader will have to decide if I have gone too far in that direction. Also, using VIEW statements can make SQL queries run a bit slower than they would
otherwise. Again, the reader will have to decide if the technique of using views is advantageous or not. It is advantageous for me, but it might not be advantageous for everybody.

The overall query we are going to be working on is to identify facts without sources for a particular sub-population of individuals in my database. So we have to go through a process identifying the individuals in the sub-population, identifying their facts (both individual and family), and identifying which of those facts do not have sources. We start by considering a very simple query to list the values in RM’s PersonTable for everyone who is color coded as red. Events-CitationsMissing-UsingViews.sql 2017-04-17 This compilation of many of the following scripts can be downloaded and executed to list events with no citations.

SELECT P.*
FROM PersonTable AS P
WHERE P.Color = 1;

I’m going to present a number of similar examples in the same general vein, but in the meantime I would point out the following. There is no particular magic about the color red. I just needed an example. I could have chosen a different color or combination of colors for the example. I could could have chosen an RM Named Group for the example, but the example would have been slightly more complicated. Etc. But the point is that it’s very easy in SQL to test for color coding. So I have moved some selection logic that might possibly be very complicated to do in SQL such as “all the descendants of John Doe” out of SQL and into RM. Which is to say, it’s very easy in RM to color code all descendants of John Doe as red but it’s not so easy in SQL to select all the descendants of John Doe.

Having created this query, we can run it to test it. We can then convert it into a VIEW as follows. We DROP the VIEW before doing a CREATE so that the query can be run multiple times. We include IF EXISTS so that the DROP will run correctly the first time before the VIEW has ever been created. And we make a TEMP VIEW so that the VIEW is kept in memory and doesn’t clutter up the actual database.

DROP VIEW IF EXISTS PersonTableView;
CREATE TEMP VIEW PersonTableView AS
SELECT P.*
FROM PersonTable AS P
WHERE P.Color = 1;

Having created a view in this manner, we can then invoke it to actually use it or just to test it again as follows.

SELECT P.*
FROM PersonTableView AS P;

Obviously, if this is all we have done, we have introduced extra work and complexity into an SQL query process for no particular advantage. The advantage comes only when we construct more complicated queries. In particular, we can replace every reference to PersonTable in a complex query with a reference to PersonTableView. Having done so, the query is automatically filtered to include only those individuals who are color coded as red (or whatever filtering we have specified in the definition of PersonTableView).

Here are some other examples of other possible definitions of PersonTableView.

-- PersonTableView Example #1
--
-- This one filters on the color maroon and it omits any individuals whose
-- given name contains the string "dummy". I have a number of dummy people
-- in my database,  denoted by the string "dummy" in the given name field.
-- For many queries, I wish to exclude these dummy people. This query does
-- a JOIN with the NameTable to gain access to the given name, but it does
-- not select any of the columns from the NameTable to be included in the view.
-- A similar idea could be used to filter on a particular surname, e.g.
--
--             AND N.Surname LIKE('Smith%');
--
DROP VIEW IF EXISTS PersonTableView;
CREATE TEMP VIEW PersonTableView AS
SELECT P.*
FROM PersonTable AS P
       INNER JOIN
     NameTable AS N ON N.OwnerID = P.PersonID
WHERE P.Color = 8 AND N.Given NOT LIKE('%dummy%'); -- color 8 is maroon
 
-- PersonTableView Example #2
--
-- This example filters on a Named Group which is defined within
-- RM. I have hard coded the group number as group 37 for simplicity,
-- but it would also be easy to include a JOIN between the GroupTable
-- and the LabelTable to specify the name of the group by its name.
-- In order to hard code the group number, you have to look up the
-- group number in the LabelTable yourself which is completely trivial.
-- For the purposes of this example, I have assumed that we have done
-- so and that the group of interest is group 37. The query
-- does a cross JOIN between the GroupTable and the PersonTable for the
-- purpose of making the selection with a WHERE statement, but no columns
-- from the GroupTable are selected in the query. (A cross JOIN is when you
-- list multiple table names and you are separate them with a comma rather
-- than with a JOIN statement.)
--
DROP VIEW IF EXISTS PersonTableView;
CREATE TEMP VIEW PersonTableView AS
SELECT P.*
FROM PersonTable AS P,
     GroupTable AS G
WHERE G.GroupID=37
         AND
      G.StartID <= P.PersonID AND P.PersonID <= G.EndID;
 
-- PersonTableView Example #3
--
-- This example filters on a small list of people based on their
-- PersonID. Such a VIEW is usually most useful during the
-- development and debugging of a query to target the query
-- towards a very small population of individuals who have
-- the properties of interest to the query.
--
DROP VIEW IF EXISTS PersonTableView;
CREATE TEMP VIEW PersonTableView AS
SELECT P.*
FROM PersonTable AS P
WHERE P.PersonID IN (3,17,33);

Having created a filtered PersonTable called PersonTableView, you might like to have a filtered NameTable called NameTableView to go along with it. Such a view might be created by SQL such as the following.

-- NameTableView example.-- Create a view of the FactTypeTable filtered to exclude dummy facts.
DROP VIEW IF EXISTS NameTableView;
CREATE TEMP VIEW NameTableView AS
SELECT N.*
FROM NameTable AS N
        INNER JOIN
     PersonTableView AS P ON N.OwnerID = P.PersonID;

You could then use NameTableView instead of NameTable in queries, or you could just list your filtered NameTable as follows.

SELECT N.*
FROM NameTableView AS N;

Having created a filtered PersonTable called PersonTableView, you might like to have a filtered FamilyTable called FamilyTableView to go along with it. Such a view might be created by SQL such as the following.

-- FamilyTableView example.
--
-- The FamilyTable is filtered to include only individuals who are included in the filtered PersonTable.
-- The DISTINCT parameter is needed to avoid selecting the same row of the FamilyTable twice if
-- both the Father and the Mother are in PersonTableView. The VIEW is set up to select the family
-- if either or both spouses are in the PersonTableView. If you want the VIEW to select the family
-- only if both spouses are in the PersonTableView, change the OR to AND.
DROP VIEW IF EXISTS FamilyTableView;
CREATE TEMP VIEW FamilyTableView AS
SELECT DISTINCT F.*
FROM FamilyTable AS F
        INNER JOIN
     PersonTableView AS P ON (F.FatherID = P.PersonID OR F.MotherID = P.PersonID);

We could go on in this same vein for a while, just giving examples of how to make queries without actually making a query. Instead, let us now try to make an example that might actually be useful. In particular, let us make a query that lists facts without a source. We start by making a View of the FactTypeTable. As will continue to be the case, much of this view is specific to my own research, but I include it to present ideas.

-- FactTypeTableView example.
--
-- I use a lot of dummy facts - facts that will never appear in any reports but
-- which record "to do" type of information, research log type of information, etc.
-- Probably nobody else than me does this, but it still provides a good example.
-- The VIEW omits these dummy facts. I use the standard Reference Number fact
-- as a dummy fact (I'm phasing out this usage), and facts whose name incudes an
-- asterisk as dummy facts. This query omits such fact types from the VIEW.
-- If this type of filtering is not needed, you could just reference the FactTypeTable
-- directly without creating a VIEW or else have a FactTypeView that includes
-- the entire FactTypeTable.
--
DROP VIEW IF EXISTS FactTypeTableView;
CREATE TEMP VIEW FactTypeTableView AS
SELECT F.*
FROM FactTypeTable AS F
WHERE (F.Abbrev NOT LIKE '%*%' AND F.Abbrev NOT LIKE 'Ref%');

We now use the filtered VIEW of the FactTypeTable to create a filtered VIEW of the EventTable. Obviously, the filtering of the FactTypeTable and the filtering of the EventTable could have been done in a single VIEW. But I find it simpler to split the filtering into separate pieces.

-- EventTableView example.
--
-- This VIEW simply applies the concept of dummy facts from the FactTypeTable to the EventTable.
-- It also adds one new variable so that that the EventTableView has the name of fact type available
-- to display for each event, a piece of data that is not available in the EventTable itself.
-- I wonder about the wisdom if this approach. We could instead leave out the name of the fact type
-- for now and bring it in later in the query. That way, the EventTableView would be a pure
-- representation of what is in the actual EventTable.
--
DROP VIEW IF EXISTS EventTableView;
CREATE TEMP VIEW EventTableView AS
SELECT E.*,
       F.Abbrev
FROM EventTable AS E
       INNER JOIN
     FactTypeTableView AS F ON E.EventType = F.FactTypeID;

We now use the filtered VIEW of the EventTable to create two new filtered VIEW’s of the EventTable – one filtered VIEW for individual facts and one filtered VIEW for family facts. The two filtered views can be combined together with a UNION as required, but care is required because individual facts and family facts work somewhat differently.

-- Two EventTableView's - example to deal with individual facts vs. family facts.
DROP VIEW IF EXISTS IndividualEventView;
CREATE TEMP VIEW IndividualEventView AS
SELECT E.*
FROM EventTableView AS E
        INNER JOIN
     PersonTableView AS P ON E.OwnerID = P.PersonID
WHERE E.OwnerType = 0;
--
DROP VIEW IF EXISTS FamilyEventView;
CREATE TEMP VIEW FamilyEventView AS
SELECT E.*
FROM EventTableView AS E
        INNER JOIN
     FamilyTableView AS F ON E.OwnerID = F.FamilyID
WHERE E.OwnerType = 1;

We are now getting close to our goal of making a report of facts without sources. We require three queries for facts without sources, each of which we make into a VIEW. One query is for individual facts. One query is for family facts where the person is a Father. And one query is for family facts where the person is Mother. The latter two could surely be combined, but I have chosen not to do so. Note that the queries need to test for NULL and testing for NULL with SQL can be tricky and error prone if you are not extremely careful and mindful of how it works. I believe and hope that I have gotten it right. At least my code seems to produce correct results.

-- CitationViewIndividualEvents example.
--
-- We use a LEFT JOIN from the individuals to the citations to identify events
-- without citations, and the test for NULL effectively takes place after
-- the JOIN is complete.
--
-- First, we create a VIEW of the CitationTable filtered on OwnerType = 2 (citations for events)
-- The VIEW will be used several times and has the effect of isolating the test for citations
-- for events to being made at only a single place in the overall query.
--
DROP VIEW IF EXISTS CitationViewEvent;
CREATE TEMP VIEW CitationViewEvent AS
SELECT C.*
FROM CitationTable AS C
WHERE C.OwnerType = 2;
--
-- We now create the three queries for individual events, family events for the father, and family events for the mother.
-- The JOIN structure for the family event queries is slightly more complicated than is the JOIN structure for the
-- individual event query. The extra complication is because the RIN number comes from a different table than the
-- name of the fact when a family fact is involved.
--
DROP VIEW IF EXISTS CitationViewIndividualEvents;
CREATE TEMP VIEW CitationViewIndividualEvents AS   -- query for individual events
SELECT E.Ownerid AS RIN,
       E.Abbrev
FROM IndividualEventView AS E
       LEFT JOIN
     CitationViewEvent AS C ON C.OwnerID = E.EventID
WHERE C.OwnerID IS NULL;
 
DROP VIEW IF EXISTS FatherFactCitations;
CREATE TEMP VIEW FatherFactCitations AS
SELECT F.FatherID AS RIN,
       E.Abbrev
FROM FamilyEventView AS E
       INNER JOIN
     FamilyTableView AS F ON E.OwnerID = F.FamilyID
       LEFT JOIN
     CitationViewEvent AS C ON C.OwnerID = E.EventID
WHERE C.OwnerID IS NULL;
 
DROP VIEW IF EXISTS MotherFactCitations;
CREATE TEMP VIEW MotherFactCitations AS
SELECT F.MotherID AS RIN,
       E.Abbrev
FROM FamilyEventView AS E
       INNER JOIN
     FamilyTableView AS F ON E.OwnerID = F.FamilyID
       LEFT JOIN
     CitationViewEvent AS C ON C.OwnerID = E.EventID
WHERE C.OwnerID IS NULL;

Finally, each VIEW we need has been created. We can list the results with a query like the following. The query lists only the RIN number and the name of each fact for that person with no citation. We could easily list additional information, but I find that the RIN number is all I need. I use it in RM with Magnifying Glass -> Record to navigate to the correct person. The query has a minor quirk that is not worth fixing. Namely, a family fact without a citation will be listed for both spouses even if only one of the spouses is included in PersonTableView. But I regard a missing source as a problem to be fixed, and for a family fact the missing source can be fixed just as well from either spouse. Whenever I fix such a problem, I immediately rerun the query to get a reduced list of problems to be fixed.

SELECT * FROM CitationViewIndividualEvents
   UNION
SELECT * FROM FatherFactCitations
   UNION
SELECT * FROM MotherFactCitations;

I will end my narrative by reiterating what I said at the beginning: many (maybe most) users may find this way of developing queries not to be any simpler than a more normal way. But I really like developing such a script just a piece at a time. And by developing it in the order described, I can easily test and validate each piece of the script as I go. So by the end, it should just work.

Jerry

Discussions & comments from Wikispaces site


ve3meo

Kudos, Jerry

ve3meo
19 August 2016 00:05:58

Great development and exposition! A valuable learning page.

Tom

TreeShare – Link Pasted Ancestry Sources #ancestrycom #TreeShare #citations #sources

Facebook Discussion Forums Discussion
EXPERIMENTAL

When a TreeShare linked Ancestry Source is Memorised and Pasted through RootsMagic Citation Manager, the pasted source loses its TreeShare link. The TreeShare update to the Ancestry Tree creates a new Other Source rather than linking to the Ancestry Source. This script creates a TreeShare link for such pasted Ancestry Sources from that of the original linked Ancestry Source.

Caveat:
The script has had little testing. Possibly there are types of Ancestry Sources whose extID in RM is NOT independent of the person or event, the principal on which the script was designed. Undesired outcomes might result.

Usage

Safest approach is to close out of TreeShare, run the script and reopen TreeShare to work through the updates to the Ancestry Tree.
For a newly downloaded Tree in which editing is just beginning, it is possible to run the script while in the Edit Person screen launched from TreeShare or even in the TreeShare Compare screen. I cannot say yet whether either is any less safe than closing and reopening TreeShare to carry out the updates. It is certainly more convenient and faster.

Download

TreeShare-LinkPastedAncestrySources Rev 2018-01-27 excludes OtherSources

Note to RootsMagic Developers

The Memorise-Paste function should be enhanced to memorise the extID of the linked citation along with its CitationID and to paste both the new Citation record and the new LinkAncestryTable record with the original extID. That procedure would not suffer the ambiguity that the above script has to deal with for Other Sources because it cannot know which linked citation was the original where there are multiples.

Observations

RM created sources are uploaded by TreeShare to become “Other Sources” on the Ancestry Tree. A link is (unreliably) established between the RM Citation and the Other Source as a record in the LinkAncestryTable. Its extID value appears to be person or even event-dependent. Thus the simple matching of an unlinked SourceID and Fields values from the CitationTable with citations that do have TreeShare links, as used for Ancestry Sources is insufficient for Other Sources.

Sources – Adventures in Extreme Splitting #sources #sourcetemplates #replace

This page brings to bear some SQLite tools in support of Jerry Bryan’s discussion in the RootsMagic forums titled Adventures in Extreme Splitting. In summary, the concept is that every unique citation is a Master Source, i.e., all data fields are to be found in the Master Source, none under Source Details. Jerry’s observations are that RootsMagic behaves better as a result in these respects:

  • templated sources export to standard GEDCOM and thus to third-party software and websites with better integrity. See Ultimate Splitting of Sources for Better GEDCOM from RootsMagic (PDF)
  • sources transfer more completely to FamilySearch Family Tree (RM does not transfer Research Notes and Detail WebTags, only Source Text and first Master WebTag so this procedure moves Research Notes and Detail Web Tags to the Master Source fields)
  • easy duplication
  • one page to edit all citations of a source instead of one page per citation, if the change would otherwise be at the conventional Source Details level.
  • one set of images per source which covers all citations of the source
  • a complete source can be created in the Source Manager

A disadvantage is that the Bibliography may be verbose, becoming just an alphabetical sort of the complete endnotes, without duplicates and there is little opportunity to benefit from succinct Short Footnotes.
[Inline comment:

external image user_none_lg.jpg ve3meo Jun 18, 2013

This bears some scrutiny. True if the Bibliography sentence is the same as the footnote but not so if the Bibliography sentence template uses a small subset of the source fields as do many of the built-in templates.
]

Convert Existing Sources and Citations

This query converts all sources and citations and the source templates they use to extremely split sources. It’s a work in progress – feedback invited. The current version converts lumped source templates, sources and citations to extremely split duplicates for review against the original “lumpy” sources. The new split source names are prepended with the carat character (^) to the original source name and appended to it are the name of the person to whom the citation belongs followed by the fact type in parentheses and, conditionally another name for family and alternate name facts. Images, WebTags, Repositories et al are now remapped. This is highly experimental so, of course, you will not want to run it against your working database without the appropriate precautions!!! Try it and see what you get.

  1. May be advisable to run Delete Phantoms and Source Templates – Merge Duplicates first.
  2. Sources-CreateExtremelySplitFromCitations.sql Stop here to review the new “ultimate split” sources and citations against the originals. rev 7 Oct 2014 now handles a Source having no Master Source fields defined in its Source Template (caused a RootsMagic error)
  3. Sources-DeleteAfterSplit.sql 30 Jun 2013. Delete the “lumpy” stuff.
  4. Deletion of unused custom source templates yet to be done; alternatively run the Conversion of Templates procedure below on custom templates so that they are ready to be used as Ultimately Split.
  5. Delete Phantoms: probably advisable to run this after the split.
  6. After the deletion, use RootsMagic Database Tools to test integrity, rebuild indexes if integrity not OK, and compact to recover space.

One problem I have seen with the results is that the differentiation of the Master Source Name due to the appending of the Person Name and Fact Type prevents the Source Manager’s AutoMerge from merging otherwise identical Master Sources. Manual merging is way too laborious so until RootsMagic gives AutoMerge options, another outboard utility is needed. Stay tuned for Sources – Merge Duplicate Masters.

Conversion of Templates

A Free Form template is readily used for extreme splitting as there is but one Source Details field, [Page], which can be left empty and the Master Source fields may be filled arbitrarily with text. There are several built-in templates which are already setup for extreme splitting as they have no Source Details fields:
Artifact, archived
Artifact, Family, photographed (privately held)
Artifact, privately held
Bible Records
Broadside, original
Broadside, reprint
Family Bible Record
Genetic Test (DNA) report
Legal Document, unrecorded (family copy)
Letter, Historic, private
Maps, Historic
Maps, Topographic
Military, Muster Rolls (manuscript)
Photo, Portrait, Archived (Annotated)
Photo, Portrait, Private (Annotated with Provenance)
Photo, Portrait, Private, scanned
School Records (student transcripts)
Slip Laws, Federal
Slip Laws, state
The remaining ~400 templates have Source Details fields that preclude them from being used for extreme splitting unless modified.

The following query moves all Source Details fields into the Master Source for all sources, thus enabling them to be used for extreme splitting.

-- SourceTemplateExtremeSplitConvert.sql
/*
2013-06-18 Tom Holden ve3meo
Converts all fields in the Source Template defined as Source Details or Citation Details
to Master Source fields, thus enabling the template to be used for Extreme Spliting.
 
WARNING - use this on unused templates; it does not convert existing citations and master sources.
*/
UPDATE SourceTemplateTable
SET FieldDefs = CAST(REPLACE(CAST(FieldDefs AS TEXT), '<CitationField>True</CitationField>', '<CitationField>False</CitationField>') AS BLOB)
WHERE 1;-- Apply constraints here, otherwise all Source Templates are modified.
 
-- View FieldDefs
SELECT CAST(FieldDefs AS TEXT)
FROM SourceTemplateTable T;-- view results

As it stands, this query modifies all the built-in templates, which is fine for experimentation but to be avoided in practice. Better to create editable templates first and then run this query with WHERE 1 changed to WHERE TemplateID > 9999. Refer to Source Templates for ways to batch import or convert to editable source templates.

Sources without Media

Needing to find Master Sources lacking media and to move media from Source Details to Master Source as part of his conversion to Extreme Splitting, Jerry posted Query for Sources Without Media.

Inline comments


ve3meo

Comment: This bears some scrutiny. True if the…

ve3meo
18 June 2013 20:21:51

This bears some scrutiny. True if the Bibliography sentence is the same as the footnote but not so if the Bibliography sentence template uses a small subset of the source fields as do many of the built-in templates.

Sources – Unreverse Author Names #sources #tmg

Contents

    To “Unreverse” something is to restore it from a reversed state. Names inadvertently entered into a name type field in a templated source in “reverse” order, i.e., surname, given are not properly handled by RootsMagic 7 and earlier. The resulting sentences for the Footnote, Short Footnote and Bibliography are screwed up. RootsMagic wants entries for name type fields to be in normal or forward order, i.e., given surname and without commas. This would not be a problem for one at a time source creation because it will be readily detected in the sentence previews and can be corrected on the spot. However, it is a much bigger problem if that is the way that many sources were imported, as is the case for direct imports from TMG.

    This script unreverses reverse names found in the Author field of templated sources. There may be other name-type fields used in templated sources, e.g., Editor, Compiler, ItemAuthor, Family, Owner, AgencyAuthor, ChapAuthor, BookAuthor, SuplAuthor, …. The script can be readily modified for each one of these.

    It reverses correctly those entries having a single name with one comma. It will screw up entries having multiple persons in the one field.

    On executing the script, the status display will show that a number of records have been updated. All of the temporary VIEWs that the script creates will be empty and can be manually DROPped and will be gone anyway when the SQLite manager closes the database.

    By partially executing the script up but not including the UPDATE statement, the SourceFieldsNew VIEW will show the original SourceTable.Fields value, the extracted name and the unreversed name.

    Sources-AuthorUnreverse.sql

    Source List Query #sources #sourcetemplates #citations #xml #date

    Description

    Here are the most advanced and useful SQLite queries for reviewing citations in your RootsMagic 4-8 database. They provide some of the same info that you would get in the RootsMagic Source List report and then some you don’t… You have the advantage of sorting and filtering results in ways you cannot within RM and then finding the person and fact using that source in RM. And they provide the key ID’s with which you can find specific table records and possibly edit fields directly using SQLite (if you know what you are doing). What is yet missing is expansion of the data into sentences using the sentence templates; this requires a high level programming language and is unlikely to be added.

    Two Queries

    Two separate queries are needed because basic SQLite managers can only present results in tabular format and screen width is insufficient to show all the fields that make up a fully detailed RM4 Source List report. The two queries provide, respectively,:

    1. List of all the citations (relatively many) with details against the names of the Master Sources (RM4-8 as of 2022-02-21)
    2. List of all the Master Sources (relatively few) with their details but not the Source Details of each citation (RM4-8 as of 2021-03-03)

    Using a SQLite manager that supports two or more queries in separate windows or tabs (e.g. SQLiteSpy), it is easy to have both queries’ results available by toggling between windows. To view the properties of media items, use the Media List Query in a third window.

    1. Source List – Citations

    Download SourceList.sql RMtrix_tiny_check.png for RM4 to #RM7
    2011-11-06 now reports Free Form and orphaned citations; count of media items linked to citation; improved format of CitFields; compatability with managers other than SQLiteSpy.
    2011-11-18 now outputs Source Fields as per MasterSources.sql plus Event Place and Event Site (Place Detail)

    Download SourceList-RM8.sql #rm8 version of SourceList.sql 2022-02-21

    This complex query was built on Romer’s much earlier Source Template List – Query. In so doing, I discovered errors and inconsistencies in and among it and my All Citations – Query and All Citations & Dupes Count – Query. Hopefully, these are now resolved with revisions to all of them. Major additions include date decoding from Date Decoder and the stripping of XML tags from the FIELDS field pioneered in Source Detail View (Parsing XML). The result is a very comprehensive and readable spreadsheet. Here are two examples of output displays from two different SQLite managers, a different set of columns selected for display:

    SourceListQueryScreenShotExpert.png
    From the latest version of the query, as displayed by SQLite Expert. Note the wrapping of text.

    SQLite Expert requires that you comment out the ORDER BY clause in order to sort on a results column; it adds an ORDER BY clause and re-runs the query. It has excellent filtering tools but one can add one’s own WHERE clause to filter results.

    SourceListQueryScreenShot.png
    Screenshot of results of the query from SQLiteSpy

    At the bottom of the screen, you see the content of the highlighted cell. Clicking on any column heading will sort on that one column. The order displayed in this screenshot is defined in the query which can be readily changed to suit your particular requirements. This one is sorted on Source Name, Cit Text, Cit Comment, Person 1, Cited by, EventDate, Person 2 and would be useful for reviewing citations of a source having identical texts.

    Column Definitions

    ColumnDescriptionEditable
    Source Namename of the Master Source; primary sort field in this queryN
    Source Fieldsthe field names and values entered in the Master SourceY
    Cit Fieldsthe field names and values entered in the Source Details screen for a citationY
    Qualcitation quality code (decoding would take up too much screen space but you’ll get the drift)Y
    Cit TextSource Details textY
    Cit CommentSource Details commentY
    Cited byperson, family or fact type that cited the sourceN
    EventDatefact/event date (helps to pick out in the Edit Person screen which of two or more facts of the same type cited the source)N
    Event PlacePlace of the fact/eventN
    Event SitePlace Detail of the fact/eventN
    RIN 1record number of person whose fact cited the sourceN
    Person 1name of person whose fact cited the sourceN
    MRINFamilyID of FamilyTable, the Marriage Record Number that is invisible in RM4N
    RIN 2record number of spouseN
    Person 2name of spouseN
    CitIDCitationID of CitationTableN
    SrcIDSourceID of SourceTableN
    Templatename of Source TemplateN
    TpltIDTemplateID of SourceTemplateTableN
    Tplt Type“OEM”=Built-in Source Template (TemplateID<10000); “USR”=user-defined Source Template (TemplateID>9999)N

    2. Master Sources

    Download MasterSources.sql RMtrix_tiny_check.png #rm7
    2011-11-06 now reports Free Form and orphaned citations; count of media items linked to master source; improved format of SrcFields

    2017-03-21 show unused Master Sources and correct citation count (was offset by 1)

    2020-06-21 the offset ‘corrected’ above was for unused Master Sources and threw all others off by 1; this rev works for both used and unused Master Sources.

    Download MasterSources-RM8.sql #rm8
    2021-03-03 Converted to work with both RM7 and RM8 format – PJ Feb2021

    MasterSourcesScreenShot.png

    Column Definitions

    ColumnDescriptionEditable
    SrcIDSourceID of SourceTableN
    Source Namename of the Master Source; primary sort field in this queryN
    RefNumberMaster Source File #Y
    SrcFieldsthe field names and values entered in the Master Source screenY
    ActualTextMaster Source textY
    CommentsMaster Source commentsY
    IsPrivate1 if Master Source marked private, else 0 but currently unused (?)Y
    Citationsnumber of times Master Source is citedN
    Templatename of Source TemplateN
    TpltIDTemplateID of SourceTemplateTableN

    Discussions & comments from Wikispaces site


    thejerrybryan

    The Queries Don’t Run on My System

    thejerrybryan
    05 November 2011 23:09:17

    Neither query will run. There are no error messages. It’s just that there are no results at all – almost as if my database were empty. But I can hand code very simple queries that demonstrate my chosen SQL manager SQLiteSpy is working ok and that my database is not empty.

    Any ideas?

    Thanks,
    Jerry


    thejerrybryan

    thejerrybryan
    06 November 2011 00:10:08

    I haven’t looked at the first query yet, but the second query won’t run because all I have are Free Form templates. Hence, the following:

    FROM SourceTemplateTable
    INNER JOIN SourceTable ON
    SourceTemplateTable.TemplateID = SourceTable.TemplateID

    yields a NULL result because the Free Form template appears not to be in the SourceTemplateTable.

    Jerry

    Problem Adding Entries to MediaLinkTable #media #views #citations #sources

    I don’t usually post a page here requesting help, but I’ll make an exception in this case. I have created a query that adds rows to the MediaLinkTable. The purpose of the added rows has to do with the fact that I’m an extreme source splitter and therefore I link media files to RM’s Master Sources and not to RM’s Source Details. I now find that because of the way I would like my data to appear in a new product called GedSite, it would be convenient for the same media files to be linked to all the Source Details associated with each Master Source.

    It seems like a query that’s simple enough. After running the query, the MediaLinkTable looks perfect from SQLiteSpy. After running the query, the data look perfect from the RM user interface. After running the query, the data looks perfect in GEDCOM I export for use with GedSite. And after running the query, the data looks perfect in GedSite (well, not quite perfect – there is some duplication of some of the data, but that’s acknowledged to be a bug in GedSite and not a problem in the GEDCOM). So what’s the problem? After running the query, I’m no longer able to tag any media files in RM to any kind of object to which media files can be tagged.

    When I say that after running the query that the data looks perfect in the RM user interface, I mean the following. There are several ways to look at RM’s media tags, and for example you can do it from the Media Gallery. If you highlight a particular media item, you see all the tags in a panel on the right side of the screen. Tags to Master Sources manifest in the list as tags to “sources” and tags to Source Details manifest in the list as tags to “citations”. After running the query, all the tags are there, both the tags to “sources” that were there before running the query and tags to “citations” that were added by the query.

    A minor glitch that I don’t think has anything to do with anything is that the MediaLinkTable is an indexed table, and you have to run RM’s File->Database tools->Rebuild Indexes tool to rebuild the indexes after running the query, but I have done so.

    The symptom when trying to add new media tags from the RM user interface is that doing so initially looks like it has worked, but then what appears to be the newly added tag hasn’t been added after all. There is no error message. This is a wild shot in the dark, but I have the sense that my query needs to do something to “close out” or “commit” what it has done, and I’m probably missing something obvious. So any advice would be most appreciated. The text of my query follows. The rather curious looking condition AND S.Name LIKE(‘*%’) is because all my extremely split sources have a name that starts with an * and all my older lumped sources have a name that doesn’t start with an *. I only want to apply this query to the extremely split sources. The condition ON ML.OwnerType = 3 serves to select only existing media links to Master Sources because those are the ones I want to replicate as links to Source Details.

    Jerry

    P.S. This query needs to do an update on a JOIN and SQLite does not support doing an update on a JOIN, or at least not directly. You can still do an update on a JOIN in SQLite by hiding the JOIN in a sub-query or by hiding the JOIN in a view. I have hidden the JOIN (several of them, actually) in a view.

     

    ----------------------
    -- Create media links for all citations whose master sources have media links,
    -- making the media links for the citations match the media links for the master sources.
    DROP VIEW IF EXISTS CitationsNeedingMedia;
    CREATE TEMP VIEW CitationsNeedingMedia AS
    SELECT C.CitationID AS OwnerID, S.SourceID, S.Name, ML.MediaID, M.MediaFile, 4 as OwnerType,
           ML.IsPrimary, ML.Include1, ML.Include2, ML.Include3, ML.Include4,
           ML.SortOrder, ML.RectLeft, ML.RectTop, ML.RectRight, ML.RectBottom, ML.Note,
           ML.Caption, ML.RefNumber, ML.Date, ML.SortDate, ML.Description
    FROM CitationTable AS C
              JOIN
         SourceTable AS S ON C.SourceID = S.SourceID AND S.Name LIKE('*%')
            JOIN
         MediaLinktable AS ML ON ML.OwnerType = 3 AND ML.OwnerID = S.SourceID
            JOIN
         MultimediaTable AS M ON ML.MediaID = M.MediaID
    ORDER BY S.Name, C.CitationID;
     
    INSERT OR ROLLBACK INTO MediaLinkTable (MediaID, OwnerType, OwnerID,  -- These are the "new" items. In particular
                                                                          -- OwnerType must be 4 and OwnerID must be the CitationID
                                            IsPrimary, Include1, Include2, Include3, Include4,
                                            SortOrder, RectLeft, RectTop, RectRight, RectBottom, Note,
                                            Caption, RefNumber, Date, SortDate,Description)
    SELECT MediaID, OwnerType, OwnerID,
           IsPrimary, Include1,  Include2, Include3,  Include4,
           SortOrder, RectLeft,  RectTop,  RectRight, RectBottom, Note,
           Caption,   RefNumber, Date,     SortDate,  Description
    FROM CitationsNeedingMedia;

    Discussions & comments from Wikispaces site


    ve3meo

    RootsMagic needs to close and reopen the database

    ve3meo
    06 February 2017 22:11:05

    I see the same behaviour, Jerry. Closing and reopening RootsMagic clears the blockage; I expect it is simply a matter of closing and reopening the database. RM must set some internal counter or pointer on one of the affected tables to its last rowid and when it tries to append a record at that pointer + 1, it hits a SQLite error which it does not report. Your script has already created a row at that point and it has to be unique.

    Tom


    thejerrybryan

    thejerrybryan
    07 February 2017 03:21:04

    Thanks. I guess I thought I had already tried the close and open RM trick, but I’ll try it again. In the meantime, here is a question – is this an issue you have encountered before? Which is to say is the same thing likely to happen with any script that adds rows to a table? Most of my queries have been just reports not updates. And my few updates have usually updated a row in place rather than adding new rows. So this is sort of new territory for me.

    Jerry

    Source Citation Sort #sources #citations

    Contents

      A longstanding wish is for RootsMagic to list sources alphabetically in the Citation Manager rather than in the order they were cited for the particular fact. This would seem to be a fairly trivial programming change but as of RootsMagic 7, it has yet to be delivered. Here is a script that accomplishes the same ends by brute force – a complete reordering of the records in the CitationTable. Of course, that requires revising all other tables that point to it.
      CitationSort.png
      This first script is not essential to getting the Citation Manager sorted but does make inspection of the data tables a little easier. It reorders the rows in the SourceTable (the Master Sources) alphabetically by name. It requires a fake RMNOCASE collation extension and you should run RootsMagic File>Database tools>Rebuild Indexes on the database when you return to work on it. SourcesSort.sql

      This script does the job for Citation Manager. It, too, requires the SQLite manager to have a fake RMNOCASE collation but should not cause a failure in the RootsMagic Integrity Check nor require its Rebuild Indexes. CitationSort.sql

      Both were developed using SQLiteSpy.

      I have not tested extensively so be sure to make a backup or work on a copy of your database and check that there are no unwanted consequences before committing to it as your ongoing file.

      N.B. CitationSort.sql needs revision so that it will not disconnect or cross-connect Ancestry Sources linked by TreeShare in RootsMagic 7.5+. It can be used on databases that are not connected by TreeShare. There may be a similar issue for sources linked with FamilySearch through RootsMagic’s FamilySearch Central.