Adding Marriage events based on Marriage Bond event
Quote from Mark Whidby on 2024-06-23, 7:28 pmI created a custom event type Marriage Bond to accurately reflect the first-party evidence I have. Now, I want to automatically create a Marriage event based on Marriage Bond events so that marriage date is populated in the RM UI.
Some couples already have a Marriage event, most do not. I have the SQL to identify the Marriage Bond events and the couple, but I'm struggling with the "WHERE NOT IN" sub-select I need in order to skip couples that already have a Marriage event for the Marriage Bond event.
Help!
I created a custom event type Marriage Bond to accurately reflect the first-party evidence I have. Now, I want to automatically create a Marriage event based on Marriage Bond events so that marriage date is populated in the RM UI.
Some couples already have a Marriage event, most do not. I have the SQL to identify the Marriage Bond events and the couple, but I'm struggling with the "WHERE NOT IN" sub-select I need in order to skip couples that already have a Marriage event for the Marriage Bond event.
Help!
Quote from thejerrybryan on 2024-06-23, 9:00 pmTry something like the following to find the FamilyID's where there is a Marriage Bond fact but no Marriage fact.
Normally when working with the EventTable, I'm careful to specify the EventTable.OwnerType to distinguish between individual facts and family facts. But in this case, the filtering by fact type Marriage Bond and fact type Marriage takes care of having the correct EventTable.OwnerType. The filtering is done by doing a JOIN with the FactTypeTable.
The outer "A" query is going to become your subquery. But it itself contains a lower level subquery. The outer query which will become a subquery for your code is for couples with a Marriage Bond fact and the lower level subquery is for couples without a Marriage fact.
I make heavy uses of aliases anyway, but I think this might be a case where aliases are mandatory. Which is to say that a main query cannot see inside a subquery except for the columns that are directly in the subquery's SELECT statement, but a subquery can see the data in the main query. So I use aliases E and FT in the main query and EE and FTFT in the subquery to be sure the subquery is using its own data and not using data from the main query.
I have a very fuzzy understanding in SQL of this issue of what is called scope in C++. Namely, what data is defined where in the code and what code in the program can access the data has been defined. Some languages have almost a global scope system where all data can be accessed everywhere in the code. In the case of C++, the scope of data is very tightly controlled and limited, and I have a strong understanding of the C++ rules. But I'm pretty fuzzy about the scope of data in SQLite scripts. Perhaps Tom or other gurus can further explicate this issue. I have learned just enough through the years that they way I used aliases in this script manages the scope of the data correctly so that the subquery can't see the data in the main query by accident.
SELECT A.*
FROM
(
SELECT E.OwnerID
FROM EventTable AS E
JOIN FactTypeTable AS FT ON FT.FactTypeID = E.EventType AND FT.Name LIKE 'Marriage Bond'
WHERE E.OwneriD NOT IN
(
SELECT EE.OwnerID
FROM EventTable AS EE
JOIN FactTypeTable AS FTFT ON FTFT.FactTypeID = EE.EventType AND FTFT.Name LIKE 'Marriage'
)
) AS A
Try something like the following to find the FamilyID's where there is a Marriage Bond fact but no Marriage fact.
Normally when working with the EventTable, I'm careful to specify the EventTable.OwnerType to distinguish between individual facts and family facts. But in this case, the filtering by fact type Marriage Bond and fact type Marriage takes care of having the correct EventTable.OwnerType. The filtering is done by doing a JOIN with the FactTypeTable.
The outer "A" query is going to become your subquery. But it itself contains a lower level subquery. The outer query which will become a subquery for your code is for couples with a Marriage Bond fact and the lower level subquery is for couples without a Marriage fact.
I make heavy uses of aliases anyway, but I think this might be a case where aliases are mandatory. Which is to say that a main query cannot see inside a subquery except for the columns that are directly in the subquery's SELECT statement, but a subquery can see the data in the main query. So I use aliases E and FT in the main query and EE and FTFT in the subquery to be sure the subquery is using its own data and not using data from the main query.
I have a very fuzzy understanding in SQL of this issue of what is called scope in C++. Namely, what data is defined where in the code and what code in the program can access the data has been defined. Some languages have almost a global scope system where all data can be accessed everywhere in the code. In the case of C++, the scope of data is very tightly controlled and limited, and I have a strong understanding of the C++ rules. But I'm pretty fuzzy about the scope of data in SQLite scripts. Perhaps Tom or other gurus can further explicate this issue. I have learned just enough through the years that they way I used aliases in this script manages the scope of the data correctly so that the subquery can't see the data in the main query by accident.
SELECT A.*
FROM
(
SELECT E.OwnerID
FROM EventTable AS E
JOIN FactTypeTable AS FT ON FT.FactTypeID = E.EventType AND FT.Name LIKE 'Marriage Bond'
WHERE E.OwneriD NOT IN
(
SELECT EE.OwnerID
FROM EventTable AS EE
JOIN FactTypeTable AS FTFT ON FTFT.FactTypeID = EE.EventType AND FTFT.Name LIKE 'Marriage'
)
) AS A
Quote from Mark Whidby on 2024-06-27, 12:10 pmThanks very much, I'll give that a try.
Thanks very much, I'll give that a try.
Quote from Mark Whidby on 2024-06-27, 1:32 pmTurns out that was essentially what I was doing, but it's not quite what I need. I'm struggling with a scenario where a husband has multiple event types like this example:
- Husband A has Marriage Bond with Wife 1 and Marriage with Wife 1.
- Husband A has Marriage Bond with Wife 2, but no Marriage with Wife 2.
- Husband A has Marriage with Wife 3 but no Marriage Bond with Wife 3.
The code handles use cases 1 & 3, but not 2. I think I need to look at the couples on the two event types to accomplish what I seek to do.
Thoughts?
Turns out that was essentially what I was doing, but it's not quite what I need. I'm struggling with a scenario where a husband has multiple event types like this example:
- Husband A has Marriage Bond with Wife 1 and Marriage with Wife 1.
- Husband A has Marriage Bond with Wife 2, but no Marriage with Wife 2.
- Husband A has Marriage with Wife 3 but no Marriage Bond with Wife 3.
The code handles use cases 1 & 3, but not 2. I think I need to look at the couples on the two event types to accomplish what I seek to do.
Thoughts?
Quote from thejerrybryan on 2024-06-27, 2:03 pmSomething sounds exactly backwards. The code is designed to handle your option 2. The code is not designed to handle your options 1 or 3. That is the reverse of what you are seeing. So I wonder if your larger query is somehow or other reversing the logic from this query when it is run as a subquery.
What happens if you run the code as a standalone query instead of running as a subquery in a larger query? Does it return the correct families?
The code is already looking at the couples. Which is to say it's matching Husband A and Wife 1 stuff with Husband A and Wife 1 stuff, and similarly for Husband A and Wife 2 and for Husband A and wife 3. But it's displaying only the FamilyID and not the FatherID nor the MotherID. If you want the FatherID and the MotherID, you can get them as follows. The FatherID and MotherID will correspond to the PersonID in the PersonTable.
SELECT A.*, F.FatherID, F.MotherID
FROM
(
SELECT E.OwnerID
FROM EventTable AS E
JOIN FactTypeTable AS FT ON FT.FactTypeID = E.EventType AND FT.Name LIKE 'Marriage Bond'
WHERE E.OwneriD NOT IN
(
SELECT EE.OwnerID
FROM EventTable AS EE
JOIN FactTypeTable AS FTFT ON FTFT.FactTypeID = EE.EventType AND FTFT.Name LIKE 'Marriage'
)
) AS A
JOIN FamilyTable AS F ON F.FamilyID = A.OwnerID
Something sounds exactly backwards. The code is designed to handle your option 2. The code is not designed to handle your options 1 or 3. That is the reverse of what you are seeing. So I wonder if your larger query is somehow or other reversing the logic from this query when it is run as a subquery.
What happens if you run the code as a standalone query instead of running as a subquery in a larger query? Does it return the correct families?
The code is already looking at the couples. Which is to say it's matching Husband A and Wife 1 stuff with Husband A and Wife 1 stuff, and similarly for Husband A and Wife 2 and for Husband A and wife 3. But it's displaying only the FamilyID and not the FatherID nor the MotherID. If you want the FatherID and the MotherID, you can get them as follows. The FatherID and MotherID will correspond to the PersonID in the PersonTable.
SELECT A.*, F.FatherID, F.MotherID
FROM
(
SELECT E.OwnerID
FROM EventTable AS E
JOIN FactTypeTable AS FT ON FT.FactTypeID = E.EventType AND FT.Name LIKE 'Marriage Bond'
WHERE E.OwneriD NOT IN
(
SELECT EE.OwnerID
FROM EventTable AS EE
JOIN FactTypeTable AS FTFT ON FTFT.FactTypeID = EE.EventType AND FTFT.Name LIKE 'Marriage'
)
) AS A
JOIN FamilyTable AS F ON F.FamilyID = A.OwnerID
Quote from Mark Whidby on 2024-06-27, 2:55 pmI see my mistake. I wanted to pull Father and Mother names to spot-check my code before making any changes so I was joining FamilyTable to NameTable twice on FatherID and MotherID. But in my NOT IN clause I lost track of what I was doing and used FatherID instead of OwnerID. Correcting that makes the code work. Thanks!
I see my mistake. I wanted to pull Father and Mother names to spot-check my code before making any changes so I was joining FamilyTable to NameTable twice on FatherID and MotherID. But in my NOT IN clause I lost track of what I was doing and used FatherID instead of OwnerID. Correcting that makes the code work. Thanks!
Quote from Mark Whidby on 2024-06-30, 10:48 amOK I have it working thanks to your help; here's my final script:
INSERT OR REPLACE INTO EventTable (EventType, OwnerType, OwnerID, Date, SortDate)
SELECT 300, 1, Event.OwnerID, Event.Date, Event.SortDate
FROM EventTable Event
WHERE Event.EventType = 1008
AND Event.OwnerID NOT IN (
SELECT DISTINCT
Event.OwnerID
FROM EventTable Event
WHERE Event.EventType = 300
)However, I noticed that in some cases RM shows the Marriage event before the Marriage Bond event, so I have to manually tweak SortDate on the Marriage event to make it show up after the Marriage Bond.
I don't fully grok how to work with Dates in RM yet, but what's the best way to have SortDate = Date + 1 day?
OK I have it working thanks to your help; here's my final script:
INSERT OR REPLACE INTO EventTable (EventType, OwnerType, OwnerID, Date, SortDate)
SELECT 300, 1, Event.OwnerID, Event.Date, Event.SortDate
FROM EventTable Event
WHERE Event.EventType = 1008
AND Event.OwnerID NOT IN (
SELECT DISTINCT
Event.OwnerID
FROM EventTable Event
WHERE Event.EventType = 300
)
However, I noticed that in some cases RM shows the Marriage event before the Marriage Bond event, so I have to manually tweak SortDate on the Marriage event to make it show up after the Marriage Bond.
I don't fully grok how to work with Dates in RM yet, but what's the best way to have SortDate = Date + 1 day?
Quote from thejerrybryan on 2024-06-30, 1:07 pmCheck out https://sqlitetoolsforrootsmagic.com/dates-sortdate-algorithm/
In general, when I am typing in dates by hand, I don't like to provide fake sort dates just to get facts into the correct order. So if I have two facts both on 12 Jul 1944, I will give one of them a sort date of 12 Jul 1944-1 and the other one a sort date of 12 Jul 1944-2 rather than giving them sort dates of 12 Jul 1944 and 13 Jul 1944. But in your use case, you are not typing in the sort dates. So to use the sort date suffix tool properly, you would need to adjust the sort date both on the fact that's already there and on the new fact you are inserting. That would be quite a mess. So doing it with sort date + 1 is surely the best strategy for your use case.
Check out https://sqlitetoolsforrootsmagic.com/dates-sortdate-algorithm/
In general, when I am typing in dates by hand, I don't like to provide fake sort dates just to get facts into the correct order. So if I have two facts both on 12 Jul 1944, I will give one of them a sort date of 12 Jul 1944-1 and the other one a sort date of 12 Jul 1944-2 rather than giving them sort dates of 12 Jul 1944 and 13 Jul 1944. But in your use case, you are not typing in the sort dates. So to use the sort date suffix tool properly, you would need to adjust the sort date both on the fact that's already there and on the new fact you are inserting. That would be quite a mess. So doing it with sort date + 1 is surely the best strategy for your use case.
Quote from Mark Whidby on 2024-06-30, 1:14 pmThanks as always for the guidance, and thank you very much for the link.
Thanks as always for the guidance, and thank you very much for the link.
Quote from Tom Holden on 2024-06-30, 8:27 pmWhile simply incrementing the SortDate value by 1 will change the order, it is actually messing with the bit range reserved for Flags. A shift of +1 from an unmodified date has no visible effect on the textual date displayed (so it's not obvious that the value has been changed) but a larger shift invokes date modifiers which may be unnecessarily confusing.
Invoking the -1, -2, ... date offsets would be ideal but, as Jerry points out, you would have to apply the "-1" offset to the one you want first followed by "-2" because the SortDate value for the offsetted date is actually less than that of the un-offsetted date. So that does get a bit convoluted compared to setting the SortDate to 1 day greater.
Adding 1 day is itself not trivial. I spent far too long getting my order of operations right but this algorithm does increment by 1 day for most dates.
SortDate=(SortDate>>45<<45) + (SortDate&((1<<39)-1)) + ((((SortDate>>39) & 63)+1)<<39)
However, it does not rollover ends of months, years so maybe a solution based on offsets would be worth the effort. Have a look at Dates - Same Day Sort Order for a solution you might adapt.
While simply incrementing the SortDate value by 1 will change the order, it is actually messing with the bit range reserved for Flags. A shift of +1 from an unmodified date has no visible effect on the textual date displayed (so it's not obvious that the value has been changed) but a larger shift invokes date modifiers which may be unnecessarily confusing.
Invoking the -1, -2, ... date offsets would be ideal but, as Jerry points out, you would have to apply the "-1" offset to the one you want first followed by "-2" because the SortDate value for the offsetted date is actually less than that of the un-offsetted date. So that does get a bit convoluted compared to setting the SortDate to 1 day greater.
Adding 1 day is itself not trivial. I spent far too long getting my order of operations right but this algorithm does increment by 1 day for most dates.
SortDate=(SortDate>>45<<45) + (SortDate&((1<<39)-1)) + ((((SortDate>>39) & 63)+1)<<39)
However, it does not rollover ends of months, years so maybe a solution based on offsets would be worth the effort. Have a look at Dates - Same Day Sort Order for a solution you might adapt.