RM9 (now 10) Data Documentation Initiative

Please join me in welcoming Kevin McLarnon ( @kmclarnon ) in the role of Editor on this site and, especially, for volunteering to lead a renewed initiative to improve the documentation we have on the design of the RootsMagic v9 #database. He is counting on users better-versed than himself in sqlite3 and #RM9 (#RM10) to collaborate, contribute to and comment on the #datadefinitions but brings to the process experience with Google Workspace teams, websites and pedagogy.

The locus of activity is the page RM10 Data Dictionary which links to the collaborative spreadsheet on which all are invited to comment. Some of you will be approached for special contributions and you can also contact him directly. You will hear more from Kevin about the project in blog posts and/or Forum messages.

Richard Otter Genealogy Scripts

https://github.com/ricko2001/Genealogy-scripts

Richard Otter is a skilled, professional developer who is also a RootsMagic user and has been building a library of scripts that are worthwhile checking out for possible solutions for your own needs. He publishes them on GitGub, list of them screenshotted below 2023-05-19. Check for updates and fuller descriptions. He has posted notices of significant ones in the Forum and, hopefully, will continue to do so.

Screen capture of scripts on May 19, 2023.

Sources – Newspapers.com – Cleaner Footnotes and Simpler Bibliography

Issues

This project arose from a request Fix & Merge Hundreds of Newspapers.com Sources in the Forum. The poster is a heavy consumer of the Newspapers.com sources through RM’s TreeShare with Ancestry.com and had issues with:

  1. A long Source List in the application and repetitiously long report Bibliographies due to a different Master Source for each page of a newspaper.
  2. Repetitious listing of “Newspapers.com” in Source Names and in the Title in Bibliographies and in Footnotes. Her approach was to manually delete it in every Master Source but still had hundreds to do.
  3. Leading punctuation in the Footnotes and Bibliographies because the Author value is empty in sources from this Ancestry collection.
  4. “N.p.” and “n.d.” notations in Footnotes and Bibliographies when a value for Publisher, Publish Place or Publish Date is empty.

Solution

Because the sources were imported via TreeShare, they are Ancestry Record type, i.e., they are created using the built-in Ancestry Record Source Template. Built-in Source Templates are uneditable through the RM user interface but are defined in the same table that holds user-defined templates. Thus, the built-in templates can be modified by using SQLite to edit entries in the SourceTemplateTable. We can address Issues #3 and #4 by modifying the Footnote and Bibliography sentence templates in the Ancestry Record template. That will be of benefit also to citations having empty values from some other Ancestry Collections (see Ancestry TreeShare – Impact).

Issues #1 and #2 are more challenging because the values of the source and citation variables that appear in the Footnote and Bibliography sentences are stored in a XML data structure. To solve #1, we want to “lump” all citations of a given newspaper Title under one Master Source. That requires that the data that differentiates the Master Sources for a common newspaper must be deleted or transferred from the Master Source to the Citation Details. For example, the Page # must be extracted from the Source Name in the SourceTable and moved to the Detail ([Page] variable in XML) for each Citation of that Source in the CitationTable. There are more steps than that alone for each of that one newspaper’s multiple Master Sources and Citations.

Once all the data manipulations are complete, there will be multiple identical Master Sources for a given newspaper Title. RM’s AutoMerge Sources function can finish the job.

Before/After Screenshots

The database undergoing modification was from RM7, hence the screenshots are of RM7. However, the solution also works with RM8 and RM9.

Before

Example of one source in the original database. Note that the Source Name and the Title variable (italics) are identical and contain the unwanted “Newspapers.com”, the title of the newspaper, the publish date and the page number. The [Page] variable at the Citation level contains description of the item of interest and the publish date (repeated from the [Title] variable). All three sentences have unwanted leading punctuation and white space.
A Master Source for each page cited from a given newspaper. This example of ‘extreme splitting’ of sources is perfectly acceptable for some users while, for others, the long Source List and report Bibliographies are objectionable and ‘lumping’ to one Master Source per newspaper is preferred.

Transition

These Before/After shots of the Edit Source window show the operations needed to prepare Sources and Citations for lumping Sources by Newspaper Title and the resulting sentence previews from an improved Ancestry Record source template.

After

Now just one Arizona Republic in the Source List instead of many individual Page #’s. In some cases such as the Arizona Daily Star at the top of the list, RM’s Source AutoMerge leaves two Master Sources that look identical and it is necessary to Manual Merge the two to end with just one. Despite fields looking identical in the Source Editor, AutoMerge compares the full XML strings of each source and there’s no match if the order of otherwise identical fields is different.

Download Scripts

Procedure

  1. Backup your database in case you need to revert to it.
  2. Open your database with a SQLite manager having RMNOCASE – faking it in SQLiteSpy or RMNOCASE – faking it in SQLite Expert, command-line shell et al and supporting the REGEXP_REPLACE() function.
  3. Load and execute Sources-NewspapersCom-LumpClean.sql.
  4. If the Ancestry Record source template does not have ” – cleaned” appended to it, load and execute SourceTemplate-AncestryRecord-cleaned.sql.
  5. On returning to RM, run Rebuild Indexes in Database Tools.
  6. In RM, open the Source List and run AutoMerge.
  7. If you have two or so remaining sources for the same newspaper using the Ancestry Record template and you wish to have only one, use RM’s Manual Merge for Sources.
  8. Repeat after you have added more Newspapers.com sources via TreeShare.

Notes

  1. Should you have reason to revert the Ancestry Record source template to the format supplied by the application, load and execute in your SQLite manager SourceTemplate-AncestryRecord-Reset.sql, edited to find a RM database file of the same major version number to fetch the built-in format.
  2. Should you upgrade or drag’n’drop to another database, the “Ancestry Record – cleaned” template will revert to the built-in format. Run step #4 on the target database to restore it.
  3. The user reported that TreeShare does not report any change as a consequence of this procedure; it would seem to rely solely on the link to the Ancestry Record stored in the RM7 LinkAncestryTable (AncestryTable in RM8, RM9).
  4. The procedures should work also on RM8 and RM9.
  5. The main script is not what I would call ‘elegant’. It grew like Topsy as I explored the database and evolved the process through a sequence of building blocks. Someone cleverer than I with SQLite might well produce a better, faster version.

Database Revert RM9 to RM8 to RM7

Introduction

Conversion from #RM9 to #RM8 is pretty trivial with little loss except for RM9’s new features. From RM8 to #RM7 is much more complex given the significant structural changes between the two. To convert from RM9 to RM7 is simply a two-stage process with a RM8 compatible file as an intermediary. It’s possible to combine the two into one procedure but that’s for another time.

Losses

Comparison of database properties between the original RM9 database and after having been converted down to RM7 via RM8 and then upconverted back to RM9. Demonstrates that the core data has been preserved with the only variance in the number of Citations (0.05% increase) and the number of Tasks (12% decrease).

That there should be a variance in Tasks is unsurprising because of the structural changes between RM7 and RM9; Task Folders from RM9 could be non-existent in RM7.

The tiny variance in Citations is perhaps more surprising as it might be expected to go in the opposite direction, e.g., citations for Associations in the original RM9 database would be absent in RM7 and unrecoverable in the re-upgrade to RM9. In this case, however, it is because a few of the original RM9 Citations were reused and there were 6131 Citation Links which were transformed to 6131 individual Citations in the downgrade from RM8 to RM7.

The loss of Associations in the downgrade has no effect on the Events count as they are records in a different table, only reported in RM9’s Enhanced Properties List which also reveals the loss of Task folders.

Original RM9 database on the left; to the right is the traveller from RM9 down to RM7 and back.

Of course, the database properties lists do not tell the whole story. It would also be unsurprising to find that Notes, especially formatting, may not be preserved in a downgrade or upgrade due to the major reiterations of the Note Editor code. Certainly, new features in RM8 or RM9 will not be supported in a lower version and those that are database dependent, including Associations, Color Code Sets and Labels, and Saved Search Criteria cannot survive a round trip from a higher version to lower and back.

Stage 1: From RM9 to RM8

The only barrier to RM8 opening a RM9 .rmtree file is the version number of the database. For RM8, it cannot deal with a version higher than 8000; RM9’s is 9000 while RM7’s is 6000 (as it is for RM6). This statement opens the door to RM8:

UPDATE ConfigTable
  SET DataRec=
    CAST(
         REPLACE(DataRec,'<Root><Version>9000</Version>',
                         '<Root><Version>8000</Version>')
         AS BLOB
        )
WHERE RecID=1
; 

The next step is only needed to prevent RM8 from offering the Association fact type (which it cannot support) as a conventional individual type which one could use to add such events, leading to confusion should the database be reopened in RM9.

DELETE FROM FactTypeTable WHERE FactTypeID=902; -- Association Fact Type

--DELETE FROM EventTable WHERE EventType=902; -- Association individual event, if any

The second statement is only needed should such Association individual events have been created before upgrading back to RM9.

A possible enhancement to this down-conversion from RM9 to RM8 might be to convert Associations in RM9 to either Shared Events or simple individual type events with the name of the other person in the event description.

Stage 2: From RM8 to RM7

The heavy work in developing a procedure for this down-conversion was covered in Convert RM8 Database to RM7.

Source Names – Append Surnames

Sources view in RM9 after running the script to append Surnames of people using each Source.

In response to a request in the Forum, Adding Surname to Truncated Source Names in RM9, posted here is a script that may be of use to others. The user’s objective is to facilitate distinguishing which Master Sources (among many similar source names derived from imports from FamilySearch, Ancestry, et al) are relevant to families or persons of interest without clicking down through Citations, Citation Uses and dead ends from which he needs to back out to go onto the next. With 32k Sources, 32k Citations (he’s a ‘source-splitter’) and 113k ‘uses’ among 7k people and 3k families, one can appreciate the scope of his challenge.

Description

This batch script appends to the Source Name the unique surnames of all persons in the database who have a citation or ‘use’ of a Master Source in their profile. The list of surnames is enclosed in drawing symbols: ╣surnamelist╠ acting as bookends. The resulting extended name is gracefully truncated at 256 characters if caused by the operation, the maximum RM9 supports in a drag’n’drop transfer; the value is easily changed in four places in the same statement.

The script execution creates a series of temporary Views (in-memory queries) to build
the final View “TruncNewName” from which the SourceTable is updated with the
surnamelist╠ appended. These Views are lost when the SQLite manager closes the database.

At the start of the script after REINDEXing against the fake RMNOCASE collation, the SourceTable is updated with Names stripped of previous ╣surnamelist╠. Therefore, the script can be rerun again after changes have been made in the database.

Because the database has been REINDEXed against the fake RMNOCASE, the RM Rebuild Indexes tool must be run on returning to work on the database with the RootsMagic app.

Execution time for the script on the sample database with 32k sources is 20-25 seconds on a 5-yr old i5, middling laptop with HDD so it is not prohibitively long to use repeatedly. I do not know how the time would scale with larger databases.

Requires

Requires a SQLite manager that has a “fake RMNOCASE collation” and supports REGEXP_REPLACE(). Script was developed and tested with SQLiteSpy 1.9.16 64-bit and the fake RMNOCASE extension from
RMNOCASE – faking it in SQLiteSpy. Backup before using!

Script File Download

Further

For source ‘lumpers’, the string of surnames might well be too long for the Source Name. Even for ‘splitters’, some source such as a census page for a residential school, hospital, prison could have a long list of surnames. The current Sources view in RM9 is amenable to showing only 100-150 characters depending on screen size and the position of the divider while a drag’n’drop truncates at 256. That may call for two things:

  1. A similar procedure to append surnames to the Citation Name instead of the Surname which still risks field overflow and truncation for heavily ‘reused’ Citations, e.g., the above example and those sometimes resulting from the “Merge all duplicate citations” tool.
  2. An enhancement to RM to provide a popout of the full content of the Source and Citation Names when hovered over or selected.

Working on this project has got me thinking about a means to differentiate what RM thinks are duplicate citations when they differ in Media or Web tags. Stay tuned…

And please comment here or in the Forum on whether this script does anything positive or negative for you!

DBeaver Heredis 2023 and RootsMagic 9

They say a picture is worth a thousand words. This page compares pictures of the #RM9 and Heredis 2023 databases (both SQLite 3) as automatically created by the same tool.

Entity Relationship Diagrams

An Entity-relationship Diagram (ERD) says a lot about a database. @patjones recently posted a RM9 Database Diagram with many of the linkages or relationships among its tables that are made on-the-fly by the SQL queries from the RM software or the ones found on this site. What a database Entity Relationship Diagram editor generates from a database’s self-definition includes the tables, fields and types, stored Triggers (actions on 1 or more tables by some database event) and stored Views (stored queries). So it is interesting to compare the automatically generated ERDs of the RM9 database against that of the Heredis 2023, from the same editor, DBeaver.

RootsMagic 9.0 Diagram

Heredis 2023 Diagram

Comparison

ObjectRootsMagic 9.0Heredis 2023
Tables2965
Views00
Indexes46123
Table Triggers0156

The comparison table and diagram show that the Heredis database is considerably more complex than RootsMagic 9 with more than double the number of tables and nearly triple the number of indexes which accelerate the association of rows across tables and speed up searches. And more divergent is the extensive use of Triggers by H and their complete absence in RM. What do these numbers say about the two databases?

Tables

That there are more tables in H than RM is from a couple of causes:

  • data normalisation: I think H is more rigorous in applying the principal, e.g., it has a separate table for the Alternate Name while RM has but one for all types of names and uses a field to flag which is the Primary record for a given person. Moreover, H has names in multiple tables: the Primary appears to be in the equivalent of RM’s PersonTable (Individus) while there is another for Surnames (Noms) and one for Given Names (Prenoms). Similarly, RM’s MediaLinkTable which relates a media item to a Person, Couple, Event, Source, Citation, Place… (according to an OwnerType value) is achieved in H by separate tables for each ‘OwnerType’.
  • a larger feature set? Hard to say. I’m not using H sufficiently to identify the differences and my understanding of table and field names in French is not thorough.

Views

Neither has stored views (stored queries) so that tells us nothing. I’ve used temporary Views on RM databases, lasting until the database is closed by the sqlite manager. It’s possible that both applications do, too, but we cannot see them from outside the app’s sqlite engine.

Indexes

With double the number of tables, it stands to reason that there could be double the number of indexes. I don’t know why there should be triple but, in the past, I’ve suggested an index be added or revised in RM. SQLite is itself very good at optimising queries and may spawn an index that it thinks is appropriate unless there is a pre-defined one that includes all the key fields of the query. Maybe H developers have attempted to cover all the possible bases with explicit indexes.

What I have yet to find out is if there is any custom collation sequence in H as there is in RM which has given us database explorers so much grief.

Triggers

This is the most pronounced difference between RM and H. H has many routine operations embedded in the database, being executed by the database engine, rather than in the higher level code which then has to talk to the database engine to make the same thing happen. For example, H has a trigger named “MajDateMediasEvenementInsert” which automatically updates the ‘Date modified’ value in its ‘EventTable’ when a link has been added to their ‘MediaLinkTable’. Looks like this:

CREATE TRIGGER "MajDateMediasEvenementInsert"
AFTER INSERT ON LiensMediaEvenement
FOR EACH ROW
BEGIN
UPDATE Evenements SET DateModification=MAX(NEW.DateModification, Evenements.DateModification) WHERE Evenements.CodeID=NEW.XrefEvenement;
END;

Maybe not the best example for comparison because I don’t think RM would update the EventTable when a media item is tagged to an event.

DBeaver

Having searched around for a (free) application that could create an Entity Relationship Diagram (ERD) from a SQLite 3 database, I landed on one that might be described as an Integrated Development Environment (IDE) for databases, the DBeaver Community Edition. It is self-described as a “Universal Database Tool” and I’ve only superficially scratched its surface figuring out how to connect a SQLite database, generate the ERD from its definition and edit it to illustrate relationships. But it is much more powerful than that as it looks to be a multi-database manager with which one can query the database data. And there is a Pro version which adds many more capabilities.

RM9 Data Definitions #datadefinitions

RootsMagic 9.0.0.0 was released on Feb 27, 2023 with some new features necessitating changes to the database design. All previous version #database files undergo an upgrade procedure to the RM9 structure and are not backward compatible. However, #RM9 and #RM8 table structures are identical with the following exceptions:

PersonTable

9 new fields named color1, …, color9 have been added in support of the ColorCode sets 1 through 10. The original “color” field is set 1.

FanTable, FanTypeTable

New tables added in support of the new Associations feature. They bear a resemblance, respectively, to the EventTable and FactTypeTable.

PayloadTable

New table added in support of Saved Criteria Search and Saved Criteria Group. So far:

OwnerType

New OwnerType values have been added in support of the new tables. These will be identified as encountered through usage and testing.

TableRecTypeOwnerTypeOwnerID
Payload1 (SavedCriteriaSearch)80
Payload2 (SavedCriteriaGroup)20TagTable.TagValue, GroupTable.GroupID
FANTable19CitationLinkTable, MediaLinkTable,…

Table Specs

This is a tab-delimited result of a query of the tables.

Remove Duplicate Merged Notes

Introduction

This script was prompted by a discussion in the RootsMagic Community on Feb 10 ’23: Notes are duplicated. The duplicate begins with MERGED NOTE. It is not uncommon, viz this Google search for the MERGED NOTE flag generated by RootsMagic. Its results return many pages, from sites.rootsmagic.com and from other sites with content from RM databases. There must be many more unpublished cases among users who have used RM’s Merge functions.

Problem

From RM4 on, merging of persons preserved the Person (General) Note value of the secondary person by concatenating it with that of the primary person separated by the string ‘– MERGED NOTE ————‘ and some whitespace. In some cases, both persons had the same Note but RM concatenated them anyway. And in others, there have been multiple merges into a person with duplication of notes occurring in any pattern.

Complicating the test for duplication is that some notes are identical except for the loss of trailing whitespace (when GEDCOM is involved) or the dropping of a symbol such as “<” or perhaps some punctuation.

Solution

This script parses the PersonTable.Note into the parts between the MERGED NOTE flags, keeps the alphanumerically distinctly different parts and concatenates them into a new Note with a symbolic demarcation ‘-+-+-+-+-+-+-+-‘ that is less brazen but still useful. It presents a table with both the NewNote and the OldNote for inspection.

Result of executing the script. At this point, the original Note has not yet been updated. The last query in the script generates these results for reviewing, easily done by traversing the Note columns with the arrow keys.

Updating the PersonTable with the new Note is done by executing a selected statement within a comments block. The original flagged Notes are copied into a temporary table from which they can be restored, prior to re-executing the script after updating PersonTable with the revised Note.

Requirements

Uses Common Table Expression (CTE) and the REGEXP_REPLACE() function and was tested on SQLiteSpy 1.9.15 Win32. Any current SQLite manager that has been compiled with a relatively recent SQLite source code (CTE was introduced a few years ago) and has an extension or includes the REGEXP_REPLACE() function using the same syntax as SQLiteSpy should work.

Usage

  1. Close the database in RM with a backup.
  2. Open the database in SQLiteSpy or equivalent.
  3. Load script and execute.
    Executing the whole script (F9 in Spy) does not change the data in the PersonTable; it presents you with a table to compare the draft new Note to the original|current Note. You have to select|highlight the UPDATE statement in the comments at the bottom of the script and execute just it (Ctrl+F9) to apply the new Note to the PersonTable. I set it up that way so that unwanted outcomes might be found and mitigated by editing the original Note BEFORE the database is changed.
  4. Inspect the draft NewNote against the OldNote.
  5. If satisfied with the results, select the first UPDATE statement at the end of the script and execute it (Ctrl+F9 in Spy). See screenshot below. You could add a constraint to the query to update only to the last Person you have reviewed, assuming you are working in order of ascending RIN:
    AND PersonID <= last_PersonID_checked;
  6. If you wish to revert to the original note, select the second UPDATE statement at the end of the script before running the script again (which will update the temporary table PersonNoteOriginal) or before closing the database out of sqlite.
  7. The three temporary views and the temporary table created by the script won’t empty until the script is run a second time, reflecting that the table has been emptied of those with the MERGED NOTE flag; they will all be dropped on closing the database from the SQLite manager.
Select (highlight) this portion of the script and execute only it (CTRL+F9 in SQLiteSpy) to replace the original Note with the new.

Script Download

2023-02-17 Script revised for easier comparison of new vs old Notes and name changed.

Using ChatGPT AI to get help on querying the RM database

Introduction

With all the excitement surrounding how well Artificial Intelligence is doing to write papers that can fool teachers and having seen someone post a message about using it to generate some software coding, I was curious to see what the ChatGPT interactive AI research tool could say about creating a SQLite query of a RootsMagic database. The answer, at the moment, is ‘not much’ but it does tell you how to get started with the SQLite Command-Line Interface (CLI). The composition itself is very good!

Screenshot of part of the response, full response transcribed below.

How do I query my RootsMagic sqlite database?

You can use the SQLite command-line interface to query your RootsMagic database. To do this, follow these steps:

  1. Download and install the SQLite command-line interface if you don’t already have it: https://sqlite.org/download.html
  2. Open a terminal or command prompt window and navigate to the directory where the SQLite executable is located.
  3. Connect to your RootsMagic database by typing the following command, replacing “database_file” with the path to your database file:
sqlite3 database_file
  1. Once you are connected to the database, you can run SQL queries against it by typing them into the command prompt and hitting enter. For example, to get a list of all tables in the database, you can run the following query:
SELECT name FROM sqlite_master WHERE type='table';
  1. To see the schema for a particular table, you can use the .schema command, followed by the name of the table. For example:
.schema people
  1. To see the contents of a table, you can use the SELECT statement. For example:
SELECT * FROM people;

This will display all rows and columns in the “people” table. You can use various clauses in your SELECT statement, such as WHERE, GROUP BY, and ORDER BY, to filter and sort the results.

I hope this helps! Let me know if you have any questions. (How polite and generous is that!?)

What database engine is used for a RootsMagic 7 database?

RootsMagic 7 uses the Firebird (Oops! got that wrong- that’s pre-RM4) open-source database engine to store and manage data in its databases. Firebird is a powerful and reliable database engine that is well-suited for use in genealogy software like RootsMagic. It is capable of handling large amounts of data and can support a wide range of data types and structures. Firebird is also designed to be easy to use and maintain, making it a good choice for software like RootsMagic that is intended for use by non-technical users.

Firebird is based on the InterBase database engine, which was developed by Borland in the 1980s. Firebird was created as an open-source fork of InterBase in the early 2000s, and has since become a popular choice for a variety of applications that require a robust and reliable database engine.