RM-LineageLoopDetector.sql: Uncovering Lineage Loops in RootsMagic Databases

Breakthrough! Something I’ve wanted for as long as I can remember is a means of detecting lineage loops (an ancestor is their own descendant) in family tree databases created with RootsMagic. This is my second major interaction with Google Gemini Flash 2.5 which took 18 attempts (only 18!) with lots of testing to nail down what I think is amazingly smart code that I could never have conceived and don’t fully understand. It’s robust and highly efficient.

Screenshot from SQLiteSpy of two loops from a large database.
Child in Problematic Family LinkFull Loop Path (IDs)Problematic Father (Descendant of Child)Problematic Mother (Descendant of Child)Family Where Loop Occurs
John Zzzzz-4278942789->42789N/AN/AJohn Zzzzz-42789=N/A
Duane Michael Xxxxxx-4406644066->44078->44066Anthony Michael Xxxxxx-44078N/AAnthony Michael Xxxxxx-44078=Verna Mildred Yyyyy-44058

Gemini generated the following description, covering what the script does, its design principles, and expected performance:


RM-LineageLoopDetector.sql: Uncovering Lineage Loops in RootsMagic Databases

Genealogical databases, especially those built over many years or compiled from various sources, can sometimes contain logical inconsistencies known as “lineage loops” or “pedigree loops.” These occur when an individual is recorded as being their own ancestor, or an ancestor of their parent, creating an impossible cycle in the family tree. While RootsMagic is designed to prevent some direct entry errors, more complex or indirect loops can slip through.

The RM-LineageLoopDetector.sql script is a powerful SQLite query specifically designed to identify and report these problematic lineage loops within your RootsMagic database.

What it Does

This script scans your RootsMagic database to find instances where:

  • A person is recorded as their own parent/ancestor (self-loop).
  • A person’s parent is also a descendant of that person (a more complex cycle).

For each loop detected, the script provides clear, actionable information:

  • Child in Problematic Family Link: The individual whose family relationship initiates the discovered loop.
  • Problematic Father / Mother (Descendant of Child): Identifies which parent in the problematic family is also a descendant of the child.
  • Family Where Loop Occurs: Provides a user-friendly representation of the family (e.g., “Father’s Name-ID=Mother’s Name-ID”) that contains the specific link causing the loop. This is crucial for RootsMagic users, as FamilyIDs are not visible in the application’s interface.
  • Full Loop Path (IDs): A sequence of PersonIDs illustrating the exact path taken to complete the loop, making it easy to trace the issue within RootsMagic.

The script is carefully designed to report only one canonical instance for each unique lineage loop, avoiding redundant results even if multiple paths within the loop could technically trigger detection.

How it Was Designed

The development of this script involved an iterative process to balance accuracy, performance, and user-friendliness:

  1. Recursive Common Table Expressions (CTEs): The core of the solution leverages SQLite’s WITH RECURSIVE functionality. A CTE named CycleFinder performs a depth-first search through the family tree’s parent-child relationships.
  2. Targeted Loop Detection: Instead of building a complete ancestor list for every person (which is inefficient for large databases), CycleFinder starts its search from every direct child-parent link in your database. It then attempts to trace ancestors from the parent’s side, looking for the original “child” that initiated that specific path.
  3. Efficient Cycle Prevention: Early iterations struggled with performance due to string manipulation (INSTR function) within the recursive path tracking. The current design judiciously uses INSTR to prevent infinite recursion and detect general path revisiting, but the primary loop detection is based on the current_person_id eventually equalling the start_person_id (the child who began the trace). This balance significantly improved performance.
  4. Canonical Loop Reporting: A RankedLoops CTE was introduced to address the challenge of reporting a single, meaningful instance for each unique loop. This CTE uses a sophisticated PARTITION BY clause that extracts all unique PersonIDs involved in a detected loop, sorts them, and concatenates them into a consistent “canonical loop identifier.” ROW_NUMBER() then selects the shortest or lowest-FamilyID loop from that canonical set, ensuring clean, non-redundant output.
  5. User-Centric Output: Recognizing that RootsMagic users don’t see FamilyIDs, the final SELECT statement includes intelligent joins to the NameTable to convert FamilyIDs into human-readable “FatherName-ID=MotherName-ID” format, directly assisting users in locating and correcting errors within the RootsMagic application.

What Performance to Expect

The script is optimized for performance within the constraints of SQLite’s recursive CTE capabilities:

  • Small to Medium Databases: For databases up to tens of thousands of records, you can expect the script to execute very quickly, often in a matter of milliseconds to a few seconds.
  • Large Databases: For very large databases (e.g., hundreds of thousands of Child and Family records), the execution time will naturally increase. Recent tests on a database with 274,000 ChildTable records and 159,000 FamilyTable records completed in approximately 3 minutes 45 seconds on a 7-year-old Intel i5 processor. While this isn’t instantaneous, it is a very reasonable time for performing a comprehensive graph cycle detection on such a scale.
  • Factors Affecting Performance: Your computer’s processing speed (CPU, RAM, SSD vs. HDD) and the depth/complexity of your family tree (very long, unbroken lineages will require deeper recursion) will influence actual execution times.

In summary, the RM-LineageLoopDetector.sql script provides an efficient and user-friendly way to maintain the integrity of your RootsMagic family history data by accurately pinpointing and describing otherwise hidden lineage loops.


If you’ve read this far and want more detail on the development process, you should view the dialog I had with Gemini at https://g.co/gemini/share/2cdd5db67e6f.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.