Forum

Please or Register to create posts and topics.

Diving into UTCModDate | Date Last Edited statistics

Picking up on @thejerrybryan comment about doing 'real genealogy' rather than developing SQLite scripts, I'm guilty of the latter. My recent dive into @cherylc 's problem with drag'n'drop also got me wanting to explore the UTCModDate column that is found in the RM database structure since RM8 was introduced. Examination of it could tell something about the history of a database and be of possible value in identifying when major changes occurred, such as a mass import or drag'n'drop or an automatic merge. Perhaps also useful in picking through backups to find the last one before a data catastrophe.

The attached script DatesLastEdited.sql collects the UTCModDate for all or selected records from the tables having greatest bearing on each person into a temporary table that can then be queried to aid one's analysis.

Included in the script file are a number of such queries that can be selectively executed:

  • LIST EARLIEST Record DateTime by PID (PersonID)
  • LIST LATEST Record DateTime by PID
  • LIST MEDIAN record DateTime for each person (requires sqlean extension for stats_median(); the extension also has functions for standard deviation and variance)

The above three are stored as VIEWs for subsequent queries.

  • COUNT RECORDS for each person [PID]
  • LIST Range of DateTimes for each person's records
  • LIST Range of DateTimes + Median DateTime for each person's records
  • LIST DateLastEdited for all records for a given person
  • LIST DISTRIBUTION BY DateTime of all Records
  • LIST DISTRIBUTION BY DATE of all Records
DateCOUNT()
6514
1899-12-301856
1999-03-141
2001-03-021
2001-05-221
2002-06-301
2005-03-181
2008-03-102
2009-03-31279
2009-10-155

This is an example from a small, play database that originated in FamilyOrigins 9 if I recall correctly which might explain the early dates. However, EditDate and UTCModDate were not widely available in the database until RM8 which accounts for the large number of null values or 0.0  which translates to 1899-12-30.


PIDTableRecordUTCModDateDateTime
428UT3195
428UT3196
428NT4540.01899-12-30 00:00:00
428NT17120.01899-12-30 00:00:00
428ET92840122.44968442132009-11-05 10:47:32
428ET92940122.44968442132009-11-05 10:47:32

This is a subset of the records associated with the Person having a RIN (PID or PersonID) = 428. Two WebTag records with null dates from the URLTable UT, followed by two Name records from the NameTable NT having the 0.0 date and then two records from the EventTable  ET with valid dates.


PIDTableEarliestMedianLatestTable
658ET2009-11-05 10:47:322012-11-06 17:52:372024-12-31 01:13:58NT

The earliest record for person 658 (apart from null and 0.0) is from the EventTable in 2009, the median (half of the records are below and half are above this value) is late 2012 and the latest the end of last year from the NameTable NT. That the Median is so much closer to the Earliest than to the Latest indicates that most of the records are earlier. Other statistical calculations are supported by the sqlean extension.


With the PID or RIN for the person and table and record identified, it is possible to dig further into the person's profile. A useful adjunct would be one of the views in Search - wayfinding from data tables to RootsMagic screens

Uploaded files:
kevync has reacted to this post.
kevync

There are few things that do not touch the UTCmodeDate that some user might  confused by.  For example. FamilySearch and Ancestry depending on  hints etc.might be deceiving.

To you point I have found  some points of reference when reviewing database & moddates that told me the "history"