Forum

Please or Register to create posts and topics.

Question re orphan citations #rm8

Page 1 of 2Next

This was posted by Charles Banks as a comment on RM8 Data Definitions Spreadsheet #database #datadefinitions

Tom,
I imported my tree from Family Tree Maker using a GEDCOM file. In FTM I had many duplicate events which I have used SQLite to locate and delete. What I want to do now is ensure that I don’t have any orphan citations.

In #RM8, a citation may be 'orphaned' in a couple of ways:

  1. The CitationLinkTable.OwnerID may point to a non-existent record in the 'owner' table (according to OwnerType).
  2. The CitationLinkTable.CitationID may point to a non-existent record in the CitationTable, the 'master' citation.

The second case is the easier so let's start with it:

DELETE FROM CitationLinkTable
WHERE CitationID
IN (SELECT CitationID FROM CitationLinkTable
EXCEPT SELECT CitationID FROM CitationTable
)

The first case is an elaboration of the same because the link may be 'owned' by one of several tables:
OwnerType 'ownerTable'
0                     PersonTable
1                      Family...
2                     Event
7                     Name
So, there are 4 similar statements or queries required, one for each OwnerType, e.g.,:

Personal or General citations (OwnerType=0)

DELETE FROM CitationLinkTable
WHERE OwnerType=0
AND OwnerID IN
(
SELECT OwnerID FROM CitationLinkTable WHERE OwnerType=0
EXCEPT SELECT PersonID FROM PersonTable
)

Family citations (OwnerType=1)

DELETE FROM CitationLinkTable
WHERE OwnerType=1
AND OwnerID IN
(
SELECT OwnerID FROM CitationLinkTable WHERE OwnerType=1
EXCEPT SELECT FamilyID FROM FamilyTable
)

Event citations (OwnerType=2)

DELETE FROM CitationLinkTable
WHERE OwnerType=2
AND OwnerID IN
(
SELECT OwnerID FROM CitationLinkTable WHERE OwnerType=2
EXCEPT SELECT EventID FROM EventTable
)

Name Citations (OwnerType=7)

DELETE FROM CitationLinkTable
WHERE OwnerType=7
AND OwnerID IN
(
SELECT OwnerID FROM CitationLinkTable WHERE OwnerType=7
EXCEPT SELECT NameID FROM NameTable
)

 

@sbankscharles, continuing with your question on orphaned citations, in the discussion about navigating from a citation to its uses you added this: "In addition, I would like to see if there are citations that have no sources or events". That's a little different from your OP in which you wanted to "ensure that I don't have any orphan citations". I responded to the latter above with queries that delete records from the CitationLinkTable not linked to a record in the CitationTable or not linked to a record in any of the 'owner' tables (Person, Name, Family, Event). There did and does not seem to be any point in seeing these records.

However, I did not address the orphaned Citation record having no Master Source. Again, I don't think there is a need to see the record; just delete it. For pre-RM8 databases, it was covered in Delete Phantoms, which has yet to be updated. Here's a query to delete these 'headless' (or 'source-less') citation records (I see it differs from what I wrote 12 years ago!)

DELETE FROM CitationTable
WHERE CitationID IN
(SELECT CitationID FROM CitationTable C
WHERE C.SourceID IN
(SELECT SourceID FROM CitationTable
EXCEPT
SELECT SourceID FROM SourceTable)
)
;

 

There are two other reasons for orphaned records:

  1. Corruption or bad data from merging data in FTM2019
  2. Deleting an event using SQL. (Guilty)

I replaced the DELETE query for Owner Type 2 with a SELECT statement and that returned 443 rows.  The first row just happened to be an issue I was looking into.

First question; Are you saying the results show I have 443 orphaned citations?  The citation does link to a birth event, along with 2 other citations.  However, one citation in that record points to a source simply titled "Source #2".  I have 87 of those, which all appear to be bad data.

Charles

 

Just for fun, here's the old query I wrote to delete source-less citation records:

DELETE FROM CitationTable
WHERE CitationID IN
(SELECT CitationID FROM CitationTable c
LEFT JOIN SourceTable s ON c.SourceID=s.SourceID
WHERE s.SourceID ISNULL
)
;

The newer one seems to be more consistent time-wise on repeated executions, appearing to be in the ballpark of the average of the times for the old one. But the new one occasionally took much longer than the max of the older. This was on a smallish dataset (2600 records in CitationTable, 380 in SourceTable) ; more tellingly would be a very large database.

Quote from Charles Banks on 2022-04-17, 11:20 am

Are you saying the results show I have 443 orphaned citations?

That result shows there are 443 records in the CitationLinkTable pointing to non-existent records in the EventTable. Those records may or may not point to existing records in the CitationTable. Either way, the link from a 'master'citation to an event is broken and useless.

The existence of a Master Source named "Source #2" with bad data in the SourceTable corrupts all of its citations but would not cause any of them to appear 'orphaned' or be deleted by any of these queries.

Tom,

I'm really confused by things and hope you will continue to help.  I don't like running delete queries without first verifying the results.  That is the reason I would like to see a list of orphan citations.  In addition, since many citations include information about who the citation is used for, it would also let me know to check on certain people.  Family Tree Maker 2019 introduced a lot garbage into my tree.  I have found and resolved a lot of issues my doing a methodical job of cleaning up records.

With the queries you have shared, I replaced "DELETE FROM" WITH "SELECT * FROM".  I guess the first question is, can I do that with these queries.  Out of the 433 records return, none appear to be orphans.  There may be other issues with the records, but I would not want to delete them without verifying.  I really do want to see a list of orphans.

Thanks

Charles

 

 

Quote from Charles Banks on 2022-04-20, 7:47 am

 

With the queries you have shared, I replaced "DELETE FROM" WITH "SELECT * FROM".  I guess the first question is, can I do that with these queries.

Yes.

Out of the 433 records return, none appear to be orphans.  There may be other issues with the records, but I would not want to delete them without verifying.  I really do want to see a list of orphans.

Semantics! Which query returns the 433 records? Whatever one calls it, one of the links between master source, a master citation thereof, the record linking the master citation to an entity (a record for a Person, Name, Family, Event) is broken. So it sounds like you want to get the data from the breakpoint backward or forward along the intact part of the chain. Examine that data for clues to what it could have been linked to in order to reconstruct the link or embark on research to replace the missing information.

For example, I just tried a file that returns 141 broken links from the CitationLink Table to the NameTable:

SELECT CitationID FROM CitationLinkTable WHERE OwnerType=7
EXCEPT SELECT NameID FROM NameTable ;

So I want to go the opposite direction to the master source via the master citation from this phantom 'use' of the citation. The WayMarks Views are unhelpful as they were designed to show the way through the RM user interface - I don't think there is a way to do so in RM (there wasn't in 7). So, quick and dirty without decoding XML content et al, here's a stab at going up the chain to the Source (if any, and there might not even be a Citation record, in which case that is truly an orphaned and childless record in the CitationLinkTable).

SELECT CL.CitationID CitLink, * FROM
(SELECT CitationID FROM CitationLinkTable
WHERE OwnerType=7
EXCEPT SELECT NameID FROM NameTable
) AS CL
LEFT JOIN CitationTable C USING(CitationID)
LEFT JOIN SourceTable S USING(SourceID)
;

Of course, you will want to replace the * with the selected fields you would want for your inspection. In my case, all the broken links have a complete path to the master source. Because of the LEFT JOINs, you should see [null] in those columns from a table for which no record is found.

In this test case, I can see that there are instances where there is useful information with which to find the person in the database and determine whether the citation is redundant or adds value if connected. Particularly handy was that many of the C.Comments fields have a URL to an Ancestry source record. While I stored these insecure http URLs several years ago, Ancestry still recognises them and redirects to its current https URL for the same record.

Not sure what you mean by "Semantics!"  but the query I've been working is

DELETE FROM CitationLinkTable
WHERE OwnerType=2
AND OwnerID IN
(
SELECT OwnerID FROM CitationLinkTable WHERE OwnerType=2
EXCEPT SELECT EventID FROM EventTable
)

Since I've been deleting events I have use 2.  Neither of the two additional queries you just provide return null information.

I was under the impression that citations were only linked to one person.  I have a lot of citations linked to one person that state the citation is for the record of another person.

So, I have 87 Sources that all a Source Name of 'Source #1' or 'Source #22.  All the citations linked to them have no name or additional information.  Two people who I've checked so far have additional sources tied to the same event.  Should all citations have a name?  Here is what I'd like to do.  How do I find the events from the results of this query

SELECT st.SourceID, st.name, c.CitationID from SourceTable st
LEFT JOIN CitationTable C USING(SourceID)
where st.SourceID < 86

 

Quote from Charles Banks on 2022-04-21, 5:52 am

Not sure what you mean by "Semantics!"

Semantics: the terms we use versus what we mean by a table record that is orphaned or headless or childless or has a broken link and, in some cases, broken in which direction. I just thought of another name that is very appropriate to RM fieldnames: 'ownerless' meaning its OwnerID cannot be found in the corresponding primary key in the table designated by OwnerType.

but the query I've been working is

DELETE FROM CitationLinkTable
WHERE OwnerType=2
AND OwnerID IN
(
SELECT OwnerID FROM CitationLinkTable WHERE OwnerType=2
EXCEPT SELECT EventID FROM EventTable
)

Since I've been deleting events I have use 2.  Neither of the two additional queries you just provide return null information.

The two additional queries from my #7 have nothing to do with events - they are for citations for people's Names (OwnerType=7) and were intended to give you an example that you could adapt for other entities (the OwnerType for Events is 2). You must get your head around how the OwnerType in a record in one table dictates which other table contains the record that 'owns' it.

I was under the impression that citations were only linked to one person.  I have a lot of citations linked to one person that state the citation is for the record of another person.

If you have used RM8's Merge Duplicate Citations, then the same Citation may be linked to multiple entities. All Citation 'uses' are linked to their 'owner' tables through the CitationLinkTable.

Page 1 of 2Next