In Praise of SQLite

Just learned of some interesting things about SQLite in this article that popped up in my Google News feed: In Praise of SQLite by Nikos Vaggalis, dated 28 July 2022. The latest version of SQLite is 3.39.2. I don’t monitor releases so this story made me aware of some significant changes delivered in 3.39.0 and other developments in the library and in extensions. I’ll highlight some that I think will be of potential interest to RootsMagic Toolmakers but recommend you read the original report. Check your current SQLite manager for updates that may bring it up to using the 3.39.x library.

  • Changes:
    • Added (long overdue) support for RIGHT and FULL OUTER JOIN
    • Added the sqlite3_db_name() interface
    • The HAVING clause of a SELECT statement is now allowed on any aggregate query, even queries that do not have a GROUP BY clause
  • sqlean, a Github repo that collects extensions covering all the missing SQLite functions: #sqlean
    • fileio: read and write files
    • fuzzy: fuzzy string matching and phonetics
    • re: regular expressions
    • text: string functions
    • unicode: Unicode support
    • uuid: Universally Unique IDentifiers
    • vsv: CSV files as virtual tables
    • closure: navigate hierarchic tables with parent/child relationships
    • envfuncs: read environment variables
    • eval: run arbitrary SQL statements
    • isodate: additional date and time functions
    • text2: additional string functions
    • xmltojson: convert XML to JSON string
    • zipfile: read and write zip files
  • Utilities:
    • LiteTree, a modification of the SQLite engine to support branching, like git!
    • SQLSite, a tool for serving simple websites, static files and JSON APIs directly from a SQLite database.
    • sql.js, SQLite compiled to JavaScript.sql.js is a javascript SQL database. It allows you to create a relational database and query it entirely in the browser

New App Aids RM Media Management

TestExternalFiles by Richard J. Otter is a new Windows utility that helps identify these issues:

  • A file on disk may get renamed, or moved, breaking the link
    from the database.
  • A file may be added to the media folder on disk but then not attached to the
    desired database element.
  • A file may be added to RM, but then detached from all sources, facts etc , leaving it
    “un-tagged”.
  • A file may be added to the database more than once.

That #media is one of the largest tags in the cloud in the sidebar says something about needed improvements in RootsMagic’s media management over its lifetime to date. Previously published scripts or apps may also have addressed one or more of these issues but what is novel or different about TestExternalFiles is:

  • Developed in the Python programming language
  • Executable either from the Python script (.py) if you have Python installed or as a standalone command line utility (.exe) for which #Python need not be installed
  • Optional exclusion of specified folders and/or files from the directory listing to be compared with the database’s collection of media items
  • Outputs a .txt file listing the results of its queries under these headings:
    • Files Not Found
    • Unreferenced Files
    • Referenced Folders
    • Files with no Tags
    • Duplicated Files

TestExternalFiles has no user interface. One edits a .ini file to specify paths to the database file under test, the root folder for the media files and the output file along with setting which options are to be run and listing the exclusions. Double-clicking on the .py or .exe in the File Manager (or otherwise launching the utility) results in a brief flash of a blank window and it’s done. It is fast! Look for and open the output file to see the results of its analysis and act on as wanted.

Download the latest release from Richard’s github server ricko2001/Genealogy-scripts where you will also find fuller documentation and a bunch of other tools he has developed for his own use. The quality of his coding reflects his professional career in software development.

Convert RM8 Database to RM7 #rm8 #rm7

Issue

As of writing, still no Book Publisher feature in RM8 and along with other persistent issues, some users wish to return to the RM7.5 database from which they had ‘upgraded’. However, the only ways back using the RM user interface are lossy:

  • #GEDCOM which, along with other attendant losses (see GEDCOM & DnD transfer losses) does not import #namedgroup and custom #reports.
  • RM8’s File > Export Data > DropBox translates the active .RMTREE file to a RM7 compatible .RMGC file intended for use with the RootsMagic Mobile app (now retired for Android) but it loses shared events, and likely more.

A more complete conversion is wanted.

2023-05-04: Update re #RM9: despite the inclusion of the Book Publisher feature missing from RM8, there are RM9 users who seek a way to return to RM7 for other reasons. A tweak to a RM9 database to change its version number to RM8 opens up the solution below to carry through to RM7. See Database Revert RM9 to RM8 to RM7.

Solution

Both RM7 and RM8 use similar SQLite databases so, obviously, a transfer of data through SQLite operations is the most direct means possible. However, “similar” is not “identical” and there are challenges with transferring Tasks, TreeShare, FamilySearch, Custom Reports and Report Settings and other File Settings, History, Bookmarks, et al. The script above provides the most thorough and complete conversion achieved to date and is built on and supersedes:

For a full understanding of what has gone into the current procedure, they remain useful references.

Procedure

  1. In RM8, open and prepare your RM8 database file for export by running the set of File>Tools>Database Tools. Note the file’s full path and name for entry into the SQLite script. Close the database.
  2. In RM7, create a new empty database or open the database you wish to overwrite (I’ll leave it to you to backup or make a copy!). If an existing file, run the RM7 set of File>Database Tools. Note the file’s full path and name so you can find it with your SQLite manager. Close the database.
  3. Open the target RM7 database with your SQLite manager (one with a fake RMNOCASE collation – see #rmnocase).
  4. Open your SQLite manager’s SQL Editor on this database file and load into it the script listed above under Solution.
  5. Edit this script to change the path and name in the ATTACH DATABASE line near the beginning to that of your source RM8 .rmtree database file. Edit 2 more lines near the end flagged by **** to explicit media paths.
  6. Execute the script.
  7. On reopening the target RM7 database with RM7, run File>Database Tools.
  8. If Media links are broken despite the edits you made to the script, use the Media Gallery’s Fix Broken Media Links tool and/or, as appropriate, Search & Replace (Ctrl+H) on Media filenames to fix them.

Notes

  1. Copies over the ConfigTable records from RM8 which can include Custom Reports and Book Publisher settings that originated in the RM7 database which had been upgraded to RM8, thereby making the round trip from RM7 to RM8 back to RM7 pretty much intact, except for possible losses in the area of Research Logs and Folders due to structural differences.
  2. This procedure is orders-of-magnitude faster than GEDCOM export-import, approximately 2 minutes from a 330MB RM8 database file, a few seconds for a small one. 
  3. A surprising outcome from this script development is that I’ve used the script to demonstrate  that RM 8.1.8 TreeShare falsely reports mismatches between Ancestry Sources and RM’s copies after applying ‘Merge all duplicate citations ‘. See: TreeShare mysteries from RM8.1.8 ‘Merge all duplicate citations’ (likewise for RM8.2).
  4. At some point, a RootsMagic 8 update or its successor will change the database in a way that breaks the script or causes an incompatibility.
  5. RootsMagic 7 is no longer being developed so features such as TreeShare and FamilySearch integration will break when those services change their APIs (the Application Program Interface with which RM7 interacts).
  6. Color-coding suffers in translation because RM8 has 28 colours, 13 of which are outside the range for RM7 and those that are within do not all map to the corresponding color.
  7. I was ‘lazy’ with data typing despite there having been changes between the two versions. SQLite itself is also ‘loose’ with enforcement. For example, many fields in RM7 that were type BLOB became type TEXT in RM8. My prior experience with RM in the past was that it did not care when the content was textual so I’ve made no attempt to CAST these TEXT fields back to BLOB on import. Yet, it’s possible it may give rise to some obscure error.
  8. I get a memory access error in citations using Find Everywhere on one file that originated in RM7 and imported back from RM8, yet there is no such error in the original nor in the RM8 upgrade. That’s an obscure error not present in other files I’ve converted.
  9. Mac users probably cannot run RM7 any more so this procedure is of little interest except for its potential to solve some RM8 problems by bouncing the data down to RM7 and back to RM8.
  10. Please let me know what errors you encounter, discoveries you’ve made, benefits you’ve realised…, probably best through the Forum, given its message editor is superior to the Comment editor.
  11. Good luck!

RMTREE Properties Query for RM8

This is an ‘upgrade’ from RMGC_Properties – Query #database for pre-RM8 databases. It provides an expansion over the information provided on the RM8 Home – File Properties screen. With a SQLite manager that supports runtime variables, a summary report can be given (just those variables with no leading hyphen).

It ATTACHes a reference empty, unused RM8 database that you provide from your up-to-date version of RM8 against which certain built-in items are compared, e.g., Fact Types and Source Templates, from your database under test. You’ll need to edit a line in the script to point to that empty database on your system.

Here’s an example of the script’s output for a database that was created around the time of the first public release:

ValueVariableRemark
6000Versionvs Control version 8000
– pre/post RM8 releaseWARNING! Database from a pre-release version of RM8
1608Peopleall records in PersonTable
0– Nameless Peopleno record in NameTable for that RIN
153– Unresolved Duplicate Name Pairspairs of Given and Surnames, not flagged as “Not a Problem”
20– Resolved* Duplicate Name Pairsflagged as “Not a Problem” – flags lost on transfer
5– Unresolved Duplicates with Media Linkssecondary persons’ links lost on merge
257Alternate namesall records in NameTable where IsPrimary=0
0– Orphaned Alternate names*no Primary name record found
551Familiesall records in FamilyTable
69Fact Typesno. of records from FactTypeTable
5– Custom Fact Typesno. of custom Fact Types
9– Customised Built-in Fact Typesno. of built-in Fact Types modified
34– Unused Fact Typesno. of Fact Types not used
0– Blank Fact Type NamesFactTypes must be named
0– Blank FactType SentencesFactType needing definition
6733Eventsall records of EventTable
0– Orphaned Eventsevents for which no person or family match in respective tables
138– Event WitnessesAll records in WitnessTable of persons sharing events
17— Nominal Witnessesnot Persons from database, but named in WitnessTable as sharing an event
0— Headless WitnessesPersonID (RIN) in WitnessTable missing from PersonTable
0— Witnesses to Lost EventsEventID in WitnessTable cannot be found in EventTable
0— Witnesses with blank Roleno role has been assigned from RoleTable or the RoleTable role is empty
5— Witnesses with Custom Sentencea custom sentence has been assigned, unique to this witness
26— Witnesses with Notea note has been entered for this witness to an event
84— Rolesno. of records from RoleTable
26— Custom Rolesno. of custom roles
1— Customised Built-in Rolesno. of built-in roles modified
54— Unused Rolesno. of roles not used
0— Blank Role NamesRoles needing definition
2— Blank Role SentencesRoles needing definition
1485Total Placesall records in PlaceTable incl Places and Place Details (Sites)
177– System Placessystem supplied Places: LDS Temples
1081– User Placesuser defined Places excl Sites
330— Used, having Geo-coordinatesnon-empty Lat or Long
42— Unused User Places*not used by EventTable, will be dropped in a transfer
227— User Place Detailsuser defined Sites
22— Used, having Place Detail Notes*Site Notes will be lost in a transfer
60— Used, having Geo-coordinatesnon-empty Lat or Long
13— Unused Place Details*Sites will be lost in a transfer
833Source Templates# of records from SourceTemplateTable
418– Custom Source Templates# of custom Source Templates
415– Unupdated Built-in SourceTemplates# not matching reference database
392– Unused Custom SourceTemplates*lost on transfer
1– Incomplete Source Templatesmissing part of definition
256Total Sourcesall records from SourceTable
3– Unused Sources*SourceTable records unused by CitationTable
6080Total Citationsall records from CitationTable
6080Total Citation Linksall records from CitationLinkTable
11– Duplicate Citationsidentical in most respects, cluttering reports
0– Sourceless Citations*no SourceTable record for this CitationTable record
54– Headless Citations*CitationTable records for which no Person, Event, Family, AltName found; cleaned on transfer
18Repositoriesall records from AddressTable of type Repository
827Multimedia itemsall records from MultimediaTable
7– lacking thumbnailprobably an imported reference to an image file that has yet to be found
0– duplicate multimedia filenamesprobably having different paths
25– with Date & Description*(TBC) if a record has both, the Description is lost in a transfer
3894Multimedia linksall records from MediaLinkTable
0– duplicate multimedia linksimage appears multiple times for person, fact
26Addressesall records from AddressTable of type Address
0– blank namesName field of AddressTable record is blank
43Tasksall records from TaskTable
2– Research Logall records from TaskTable of Type 1
41– ToDoall records from TaskTable of Type 2
0– Correspondenceall records from TaskTable of Type 3
1Foldersall records from TagTable of Type 1
34Groups*all records from TagTable of Type 0
* NOT TRANSFERABLEvia GEDCOM or Drag&Drop to another RM database

Future additions could include information about the FamilySearch and Ancestry linkages.

Returning to RM7.5 custom reports, groups and Publisher from RM8 via GEDCOM

UPDATE 2022-05-14: better than GEDCOM is to convert directly from RM8 to RM7 for which there are a couple of possibilities:

  1. RM8’s File > Export Data > DropBox creates a copy of the active .RMTREE file to a RM7 compatible .RMGC file intended for use with the RootsMagic Mobile app (now retired for Android). It is a complete conversion except, in my test, for Roles for shared events, which should be an easy fix for the developers. In my test, the RoleTable was empty which invalidates every shared event.
  2. Use one of my SQLite scripts that were developed subsequent to this original post:
    Direct Import of RM8 database into RM7 – Part 1
    Direct Import of RM8 database into RM7 – Part 2

Once the RM8 Export to DropBox is fixed, it would be operationally more attractive and is supported by RM Inc.

Issue

As of writing, still no Book Publisher feature in RM8 and along with other persistent issues, some users wish to return to the RM7.5 database from which they had ‘upgraded’. However, the only way back using the RM user interface is via #GEDCOM which, along with other attendant losses (see GEDCOM & DnD transfer losses) does not import #namedgroup and custom #reports. The original #RM7 database contains the custom reports and book specifications and possibly useful groups as they existed at the time of committing to the upgrade but importing into it results in an overload of duplicates, as described by PatrickR in his Apr 9 post to the RootsMagic User Community: Is there a way to copy custom reports to another database?

A solution

Years ago, I had an approach to creating a quasi-empty shell database file from a populated database that I thought might be adapted to this particular case: Depopulate but keep Customs, Places, Sources. Maybe a variant that depopulated but instead kept groups, custom reports, and book definitions could be used to import the GEDCOM from RM8 to good effect. I’m pleased to report that my experiment was successful, to the extent that GEDCOM transfer allows.

Procedure

  1. Review GEDCOM & DnD transfer losses and, in your RM8 database, ensure that your Fact Types are all set to export to GEDCOM. Run the RM8 database tools and Export the database having chosen to export everyone and everything.
  2. In RM7, make a copy or backup of the original database from which the RM8 ‘child’ was created. Run the RM7 database tools. Close the RM7 database file.
  3. Open the RM7 database file with your SQLite manager (one with a fake #RMNOCASE extension), load and execute this SQL file:
  1. Reopen the now-depopulated database file with RM7.
  2. Run the RM7 Database Tools.
  3. Import the GEDCOM file from RM8 choosing the option
    Preserve record numbers
  4. Done! Review what you’ve got. Note that the Media Gallery has to build thumbnails anew.

Possibilities

It should be possible to transfer data more completely by directly copying data from the RM8 database to the RM7.5 database via SQLite. However, because of changes in database structure, it’s not straightforward for several tables. Also, there may be serious incompatibility between the ConfigTable content of the two versions which may preclude the carryover of custom reports and other settings.

Privatise Extra BMD Events #events #reports #private

Problem

Many persons in my database have multiple Birth events created from different sources; likewise for other vital facts. Even though I may have marked one as Primary, the others still clutter reports: narratives look like repetitious gobbledygook; calendars have the same person multiple times, even on the same day. I want to clean up the publishing mess without losing the evidence.

This is especially the case for databases downloaded from an Ancestry Member Tree where events are generated by accepting or assigning a source and choosing to make the data into an alternate fact because it varies in some way from an existing fact.

Solution

Many report settings in RootsMagic have an option to include or exclude private events. I could reduce report clutter by setting the non-Primary Birth events to Private without losing the data that led to their creation. The Primary event should have whatever consideration and logic went into it being deemed the most accurate. I haven’t set a Primary in every case where my database has multiples so I want two things:

  1. A batch process that sets Private all the non-Primary events of a set having a Primary.
  2. A list of all the sets of multiple events that do not have a Primary so I can follow up.

Individual Summary with multiple Birth events, highlighted.

Individual Summary excluding Private events, in this case, the extra Birth events.

Implementation

BMD_private_multiples.sql 2020-02-15 Click to view; right-click to download.

The above SQLite script builds a series of temporary SQLite Views (stored queries), does the possible privatising and exits displaying the View “BMDmultstofix” which lists the persons needing follow up attention to set Primary one of the plural (multiple) events. The Views are all in memory and are lost when the SQLite manager exits the database.

Usage

The script executes very quickly on small files so it is possible to keep the SQLite manager open on the database and to re-execute the script at intervals while editing the database in RootsMagic. One could simply set the Primary event out of a group of multiple events for a person in RM’s Edit Person screen and then run the script to set the others Private or manually set them. Re-running the script will shrink the list needing Primaries by those you have set.

For family-type events (Marriage), the View “BMDmultstofix” will list both spouses separately. When the Marriage event is set to Primary on either spouse, the two separate listings will go.

Possible Enhancements

Colour Coding

The list of persons needing attention could be quite large, depending how assiduously you have been setting the Primary flag and the number of people and events in your database. So it might be nice to extend the last query or even last View to pick up the color code for each person which could be used to sort or filter the list.

Private as Default

Instead of privatising only those non-Primary events in groups having a Primary, another approach would be to privatise all non-Primaries. That would really clean out the reports and require user editing to get the desired outputs. This may be preferable.

Enhanced Version

BMD_private_multiples2.sql 2020-02-16

Both of the hypothesized enhancements have been incorporated. To ‘fix’ a group of multiples, one of the events must be set Primary and un-Privatised or another event added that consolidates one’s research into an assertion. By default, a new event is not Private and need only be made Primary.

In SQLiteSpy, any column can be sorted by clicking on the heading. Some other software, SQLite Expert is one example, also support filtering by column. These tools enable grouping color code or by surname without having to modify the script or write your own custom query of BMDmultsToFixClr view.

Screenshot

Screenshot from SQLiteSpy of results from running BMD_private_multiples2.sql

Trying to find duplicate spouses

I am trying to find duplicate spousess.
I created the following query:
select fatherid,
(select n1.surname || “, ” || n1.given from nametable n1 where fatherid = ownerid) as fathername,
motherid,
(select n2.surname || “, ” || n2.given from nametable n2 where motherid = ownerid) as mothername
from familytable
order by fatherid, motherid;

1. Should there be blank fatherid’s for a motherid in the result?
2. Or likewise blank motherids for a fatherid in the results?
3. Am I safe to think that if there are 2 motherid’s for the same fatherid that it is a duplicate?

Backup database open in both SQLite and RootsMagic #backup

Problem

Neither the RootsMagic File>Backup nor File>Copy can proceed because the database file “is being used by another process” (RootsMagic’s own or your SQLite manager or other outboard SQLite connection). And if you’ve chosen Backup, RM gracelessly blocks you from further work with an always-on-top fake news message window “Creating backup file…” requiring closure of RM to clear.

If you were using a SQLite manager that’s connected to the database file, you have to close it in order to use RM’s utilities and then reopen it with numerous steps to finally execute the SQL you’ve been carefully crafting to make some important changes to your data. That disruption takes time, may lose some valuable history from the SQLite manager and, perhaps, derail your train of thought. Same goes for ordinary users getting thrown off track after running Compact and other unknown things that screw up the RM backup.

Outboard zip utilities such as 7Zip, RAR, WinZip,… all, to the best of my knowledge, require the closure of all other processes for them to proceed but do handle such a blockage gracefully, unlike RM4-7. It’s not sufficient that RM get to handle the blockage gracefully; it shouldn’t be blocked at all!

Some Workarounds

Looking into this issue in 2019, I’ve discovered some useful alternatives to RM’s backup and copy utilities that have arisen since the days of RM4 and Windows XP in 2009/10 when the original SQLite Tools for RootsMagic 4 Wiki began. SQLite Expert Professional lists “Backup” as a feature that likely uses the Online Backup API, perhaps the best suited to the task but at a price.

This post presents four free, fast backup methods that allow the database to remain open in RootsMagic and in one (or more) outboard SQLite connections. They have been tested with a database open in RM7.5.9 and simultaneously in three SQLite managers: SQLite Expert Personal 4.2.0.784, SQLiteSpy 1.9.12 and SQLiteStudio  3.2.1.

  1. Windows 10 File Explorer
  2. SQLite Command: VACUUM INTO
  3. SQLite Archive File
  4. Export to SQL Text file

Windows 10 File Explorer

My first surprise was to discover that the Zip function integrated in File Explorer (possibly from even before Win10) does not object to the database file being open. Nor does its copy function. The latter may be the easiest and fastest method.

Copy

With the File Explorer window open and the “filename.rmgc” database file selected, Copy (Ctrl+C) and Paste (Ctrl+V) results in “filename – Copy.rmgc”. Repeat and the copy is automatically serialized as

filename – Copy (2).rmgc”
filename – Copy (3).rmgc”

No manual file naming is needed to prevent overwriting of the last copy and any copy can be immediately opened by RM or any of the SQLite apps. The Created Date timestamp for the copy is the Date Modified timestamp of the original, so its value puts the copy’s version of the data into context, provided you have some log (recorded or remembered) of what you did before and after.

Zip

Right-click on the selected “filename.rmgc” database file and, from the popup context menu, click on Send to > Compressed (zipped) folder. This will popup a dialog box to allow you to change the default name “filename.zip”.

Repeat and File Explorer automatically serializes the name:
filename (2).zip”
filename (3).zip”

You could change the zip filename extension to “.rmgb” at any of these backups so that RootsMagic could find and restore from it but I wouldn’t bother for two reasons:

  1. It interferes with serializing the filename to preserve successive backups.
  2. It’s easy to extract from the Zip folder using File Explorer which gives you great control over where the extracted file goes with opportunity to rename and move it afterward.

Using File Explorer from within the apps

All of the mentioned apps launch a basic Windows File Explorer dialog Window to Open a database or carry out some other operation. All support Copy, Paste and Send to Compressed (zipped) Folder. Because RM’s window filters for .rmgc files, it is quicker selecting the database file. Presumably, the speed of the subsequent operation is independent of the application that opened the File Explorer dialog. When the copy of zip is complete, simply Cancel the window to back out to where you were with the database.

Copying or Zipping from the Command-Line or PowerShell

If File Explorer could copy or zip the database file while open in RootsMagic and SQLite, it seemed reasonable that these other Windows utilities could, too. The Zip is only available in PowerShell but both can copy files. Unfortunately, both the CLI and PS detected another process and refused to proceed.

SQLite command: VACUUM INTO

The VACUUM command was provided in SQLite 3 and all of the referenced SQLite managers from the get-go but it operated directly on the database in memory. About two months ago (ver 3.27), the command was extended with an INTO clause that writes the result of the VACUUM to a new database file without modifying the original. Effectively, this means that one can make a vacuumed copy of the currently opened database from a SQLite manager as a backup. If the original has not been recently vacuumed by either RootsMagic or by an outboard SQLite manager, then the vacuumed copy will be smaller in size.

The copy can be arbitrarily named and, if named with the .rmgc extension, can be opened directly by RootsMagic without going through a Restore process.

Typing the command and filename and ensuring the latter is unique is less efficient than the File Explorer method. A potential productivity gain would be to write a Windows command or Powershell script to launch the current sqlite3.exe command line shell program to open the database file to VACUUM INTO a filename defined by the script; it could incorporate date and time in the filename. Alternatively, the batch or command script could also zip the resulting .rmgc file and give the zip file the unique name.

After some time, the referenced SQLite managers will undoubtedly update to SQLite 3.27 to support the command and some may incorporate some higher level controls such as a file explorer dialog to facilitate the naming of the vacuumed file.

This command should work in SQLite for both Windows and MacOS.

For the SQLite 3 command line shell program, look for sqlite-tools-win32-x86-3270200.zip or later at https://www.sqlite.org/download.html. Also available for MacOS: sqlite-tools-osx-x86-3270200.zip.

SQLite Archive File

Since 2014, there has been development of a SQLite Archiver utility “sqlar” using SQLite and zlib to archive files of any type to a SQLite database container. The file contents are zipped to a Blob-type field in a table with one row for each archived file. In 2018, it has been made into an optional extension to the SQLite source code and is now integrated in the SQLite command-line shell tool mentioned above. It is possible for a SQLite manager to be compiled with this extension but I don’t know of any yet.

The SQLite command-line tool from 3.22 can be opened and at its prompt given the “.ar” command to  create a SQLite Archive file, add|update files to it, list contents, and extract files from it. From 3.23, the recommended use is from the OS CLI to call sqlite3.exe with the -A option.

In this example in sqlite3.exe, I updated the SQLite Archive file “_LGS_pres.db” with the database “_LGS_pres.rmgc” that was open in RM and the 3 SQLite managers.

sqlite> .ar -uvf _LGS_pres.db _LGS_pres.rmgc
_LGS_pres.rmgc

The Archive file could contain many other files, e.g., the media files or other unrelated files, just as a Zip folder can.

Moreover, the SQLite Archiver can operate as a zip utility producing a .zip file (“compressed folder” in Win10) instead of a SQLite Archive file, skipping over the locks that trip up RootsMagic’s backup and 7Zip.

For more, see SQLite Archive Files.

Export to SQL Text file

This is an old method that has merit because it reads the SQLite database, not just copying it, to create SQL commands which can later be executed to build the identical database. Some SQLite managers support it (SQLiteStudio) and it is available in the command-line shell:

Use the “.dump” command to convert the entire contents of a database into a single ASCII text file. This file can be converted back into a database by piping it back into sqlite3.

The resulting text file is bigger than the database file but it can be piped into a zip utility for compression. Whether there is a limit that a large database might run into remains to be seen.

For more, see Converting An Entire Database To An ASCII Text File

Observations

  1. There are multiple ways to backup a RootsMagic database while it is open in both the RootsMagic application and one or more outboard SQLite connections.
  2. Using these other ways instead of the RootsMagic backup function may also avoid the recurring situation when RootsMagic blocks itself from completing a backup; a forced shutdown is the only way to resume working in RootsMagic.
  3. For the less technically proficient, Windows File Explorer provides a convenient alternative to the RM backup, even from within the RootsMagic user interface.
  4. RootsMagic developers should look into some of these alternatives that work with SQLite to resolve the conflict with outboard SQLite connections. Especially appropriate could be the Online Backup API because it is native to SQLite and aware of other users of the database, allowing them to “continue uninterrupted while a backup of an online database is made”.

Caveats

  1. Testing these has been necessarily superficial so there may be issues as yet undetected. Please comment below if you come across any.
  2. Make sure that the database is not being modified by one of the connections while the ‘backup’ is going on. When the backup is opened, there could be unexpected results. This is especially true of those methods that work without a SQLite connection to the database, i.e., File Explorer, SQLite Archiver.
  3. Those methods that require a SQLite connection to the database, i.e., VACUUM INTO, Export and Online Backup API, need the RMNOCASE collation (see RMNOCASE; faking it in SQLite Expert, command-line shell et al). It is possible that the resulting database will fail the RootsMagic database integrity test; simply use its Rebuild Indexes tool to correct.

Media Duplicates – Reports and Remedies

Problem

TreeShare update operations that add images from Ancestry to the RootsMagic database can result in successive downloads of the same image file saved to a revised name, each added to the Media Gallery and independently tagged to their respective subjects. I don’t think this is either necessary or desirable (maybe there is a good reason yet to be revealed by the RootsMagician!). Moreover, such duplication (or worse, multiplication) bloats usage of storage space and inflates the Media Gallery making it harder to manage, navigate, find things. The few respondents to my survey  reported 10-20% of their media files were duplicates while an exercise with a 10-person subset of my 420 person Ancestry Tree after many TreeShare ops trying to resolve differences was far higher. The discussion and scripts that follow help understand the magnitude of the bloat and offer surgical tools to reduce it. These tools reduced the 92 media in the Gallery to 49.

Background

The issue predates TreeShare and could arise from combining databases but it is likely that it had not been noticed as frequently or to the same degree. Earlier effort dating back to RM4 (see Media Repair Queries) concerned itself with duplicate file names in different folders, not with duplicate content within differently named files.

RootsMagic places all TreeShare media from Ancestry in one folder for a given database. It does not attempt to substitute an existing duplicate media file for the one to be or is being downloaded; rather, it modifies the name of the downloading duplicate file with a unique suffix. Consequently, the old script would not even detect these differently named files with identical content, only those having the same name in separate folders. Moreover, it could generate false positives for files of the same name with different content leading to the deletion of potentially valuable information.

Finding Duplicates

By comparing RootsMagic thumbnails

RootsMagic’s database has a property for each image file that can be used as a basis for finding duplicate image content – the thumbnail image it creates of the image file. This thumbnail is stored in the database in the Thumbnail field in the MultiMediaTable. However, because it is a tiny thumbnail, it is a crude approximation of the original and it may be possible that similar images, such as some pairs of Census pages or BMD Indexes could give rise to identical thumbnails. Experience will tell. It’s a very convenient parameter to exploit as seen in this query result from a tiny, test database.

Examining the database’s MultimediaTable shows the duplicate file’s revised name:
 In this case, we can confidently surmise from the common part of the filenames AND the identical thumbnails that these undoubtedly have duplicate content.

By using CCleaner

Other files may not have strongly similar filenames despite identical thumbnails and, given the risk of false positives from the latter, a further test is required. The free CCleaner and other utilities can resolve if they have the same or different content; we’ll continue here with CCleaner and its Tools > Duplicate Finder feature set to search the folders of interest for image files that match each other only by Size and by Content. Here is a screenshot of its results:
The highlighted names are of those in the database’s Media Gallery. The others are copies made using Windows File Explorer and are unused by (and therefore unrecorded in) the RM database. The script correctly reported the two matched files used by the database from their identical thumbnails.

CCleaner exports its results as a tab-delimited text file which can be imported into a spreadsheet:

CCleaner Duplicate Files report imported into Google Sheets; headers added by user.

Combining the data from the RootsMagic database with that from the CCleaner Duplicate Files report should provide what is needed to identify and replace duplicates in the Media Gallery and to delete the then unused duplicate files from computer storage. Combining may be done by importing the CCleaner data into the RM file, temporarily, or by exporting RM data to the spreadsheet. What follows is the first approach.

Migrate CCleaner Duplicate Search results to SQLite

With the CCleaner data in the spreadsheet, formulas can be applied to create a SQLite INSERT statement for each reported file. We will create an empty temporary table with a corresponding structure into which the generated statements will insert values from the spreadsheet. I’ve created a publicly accessible Google Sheet which anyone can copy to their own spreadsheet to work on or to use directly.

Scripts for finding Duplicates

Before running any of these scripts, you should use RM’s File > Database Tools:

  1. Test Database Integrity
  2. If #1 fails, Rebuild Indexes and repeat #1
  3. Delete Phantoms

And it is probably advisable to run the latest script from Delete Phantoms to clear out any detritus that may just confuse or exaggerate comparisons of Database Properties before and after the Merge Duplicates script below. Use your OS File Explorer/Manager to get the properties of the media folder(s) used by the database.

Thumbnails

Media-DupThumbnailsList.sql Make sure you open the RM Media Gallery and let thumbnail creation complete before running this script against your database.  It creates a temp table named zDupMediaTable which is displayed at the end of script execution to show you the duplicates found. The script makes no modifications to the database; this table will be used by the merge script.  Note that it ignores media files that are not image files or any file type that is not in the subject database’s Media Gallery. And beware that there is a risk of false positives because thumbnail images are of such low resolution. If you intend to proceed to merge duplicates, do not close the database from the SQLite manager as that will delete the temp table needed.

Files

Media-DupFilesList.sql You have to edit this script to include the INSERT commands you will copy from the spreadsheet before running it on the database. It creates an identically named temp zDupMediaTable to be used by the merge script and makes no changes to the database. Because Ccleaner does a bit-wise or byte-wise comparison, there is no chance of a false match between different images. However, it will not match files having exactly the same image data but a difference in just one character of metadata. If you intend to proceed to merge duplicates, do not close the database from the SQLite manager as that will delete the temp table needed.

Merging Duplicates

Media-MergeDuplicates.sql This is where the rubber hits the road and database changes are afoot. You have taken safety precautions? And your SQLite manager still has the temp zDupMediaTable in the database from either of the above scripts? This script won’t do anything without it. Fire away!

When you next run TreeShare after script execution, all the people whose media has been affected by the merge and is tagged to anything related to the person will be marked as Changed and all previous Changed people will first have been cleared. Nothing material should have been changed by the script but flagging them so helps you to confirm that is the case.

Cleanup

Repeat the initial steps of RM Database Tools and Delete Phantoms  to clean out new detritus and then RM Database Properties for comparison of Before and After.

Delete surplus duplicate media files

The last step in the script lists the commands for deleting duplicate files from the media folder(s) that are no longer used by the database. Copy and paste them into a text editor for review and to avoid their loss if you have to close the database in the SQLite manager (some RootsMagic operations such as Backup and Restore cannot proceed if the database is open in another application). If you are confident that any or all are unused by any other database, copy and paste the respective ERASE commands into the CMD shell. They will execute immediately without asking for confirmation; the script could be modified to require a confirmation of each ERASE. For a large set, break up the command set into chunks.

Repeat getting folder properties for the After results to compare with Before, i.e., number of media files and total size.

Miscellaneous

  1. How well this procedure will scale with the size of the database is unknown. Certainly, once some large numbers of INSERTs and ERASEs is reached, there will be issues with copying and pasting and perhaps the size of the script for the INSERTs. Nor can we predict the speed with which the scripts will execute.
  2. How TreeShare functions remains a mystery in part. Events for a couple, such as Marriage, are especially problematic because we are advised that Ancestry handles them as separate individual events. It seems advisable, if not necessary, to have on the Ancestry Tree identical sources for the Marriage event for each person to be assured that RM will create the couple’s Marriage event with all the sources. And updating via TreeShare sometimes is a tail-chasing exercise in which ostensibly identical citations or media on both sides are perceived as new for the other with the update one way provoking the offer of a new one the other.
  3. The flagging of Changed People is done by setting the Modified field to 1 for records in LinkAncestryTable whose LinkType is 0. The script also sets the Modified field to 1 for records of LinkType 4 (citations) and 11 (media) but these have no effect on anything apparent in RM 7590.
  4. This procedure does not provide ERASE commands for media files that are unique (not duplicates) and unused by the database. The procedure on this early page could be adapted to do so: Scrapbook Files Status

Updates responding to TreeShare

Working out the wrinkles in a new script addressing some TreeShare issues, I was using some other scripts to help me understand what was going on and ran into a couple of issues that I’ve fixed: