Another version of a Set Living query #living

Superseded 2015-01-25. See Living Flag – Set Globally.

I’m posting my version of SQL for Set Living. It proves to be remarkably similar to Tom’s.

SQL #1, we turn on the Living Flag for everybody in the database. Essentially, we make everybody private and then followup later by setting individuals as not private when they don’t need to be.

--                             Start by privatizing everyone in the database
 
UPDATE PersonTable
  SET Living = 1     -- 0 is not living, 1 is living
  WHERE Living != 1  -- In case it's faster not to update the ones that are already set to living
;

This code differs from Tom’s in that it includes the WHERE Living != 1 clause. The idea is to improve performance slightly by only setting to 1 those flags that are not already 1. In most programming contexts, this would be a silly distinction to make. In a procedural language such as C/C++, I would never consider code such as if (x != 1) x = 1 when a simple x = 1 would do, and indeed the simple form would run faster. But disks are vastly slower than main computer memory and it seems to me that avoiding the write operation associated with SET LIVING = 1 whenever possible might improve performance. But this case the UPDATE runs so quickly either way that I can’t prove which is faster. However, in later queries in this sequence it turns out that it’s important to keep the number of rows being operated on as small as possible, not so much for performance reasons as for other reasons.

SQL #2, we turn off the Living Flag for everyone whose birth date is before 1906.

--                             Unprivatize everyone born before 1906
 
UPDATE PersonTable
  SET Living = 0
  WHERE PersonID IN
    (
     SELECT P.PersonID
       FROM PersonTable AS P
              INNER JOIN
            NameTable AS N ON ( (P.PersonID = N.OwnerID) AND (N.NameType = 0) )  -- Use only the primary name table entry
       WHERE N.BirthYear > 0     --  BirthYear Exists
                AND
             N.BirthYear < 1906
    )
;

This is a case where I like Tom’s code much better than mine. Mine uses the birth year from the NameTable, which is not as reliable as using the EventTable as does Tom. Also, mine only uses the birth date, whereas Tom’s code uses any fact date from the EventTable. So his code is going to detect more people who should have the Living Flag turned off than mine. I had intended my code as a quick and dirty proof of concept, and I had intended to go back and reference the EventTable later with my final code. In looking Tom’s code to fully interpret dates, it seemed more complicated than I wanted to deal with until I had completed my proof of concept. But Tom already wrote very simple code to interpret dates as needed by this query.

However, Tom’s code at this point also includes turning off the the Living Flag for every individual who has a Death Fact, irrespective of what date (if any) is associated with the Death Fact. That’s an important item to accomplish, but my overall process requires that it be left until last.

SQL #3, we turn off the Living Flag for the parents of everyone whose Living Flag has already been turned off. It’s essential at this point in the process that the only reason the Living Flag has been turned off for anybody is because they have an Event Date (any event) prior to 1906. The parents of any such individuals are guaranteed to be deceased. But if the Living Flag is turned off for somebody because they have a Death Fact and the Death Date is 2007, then there is no guarantee that the parents are deceased.

--                             Unprivatize parents of everyone who is already unprivatized
 
UPDATE PersonTable
  SET Living = 0
  WHERE PersonID IN
    (
          SELECT PP.PersonID
            FROM PersonTable AS P
                   INNER JOIN
                 FamilyTable AS F ON (F.FamilyID = P.ParentID)
                   INNER JOIN
                 PersonTable AS PP ON (F.FatherID = PP.PersonID)
           WHERE ( (P.Living = 0) AND (PP.Living = 1) )
 
          UNION
 
          SELECT PP.PersonID
            FROM PersonTable AS P
                   INNER JOIN
                 FamilyTable AS F ON (F.FamilyID = P.ParentID)
                   INNER JOIN
                 PersonTable AS PP ON (F.MotherID = PP.PersonID)
           WHERE ( (P.Living = 0) AND (PP.Living = 1) )
    )
;

A UNION is required to get both the father and the mother. But once you figure out how to do the father, you just clone the father code to make it into the mother code and combine the two with UNION.

It took me a long time to figure out how to do this UPDATE. The primary conceptual problem was understanding how to reference both the PersonTable.PersonID of the person and the PersonTable.PersonID of the parents in the same sub-query, and then how to pass the correct PersonID out of the sub-query back to the query on the outside of the sub-query. In retrospect, it seems obvious and simple. But before I figured it out, it seemed almost impossible. I plan to post a separate "lessons learned" page about just that issue.

Note that contrary to intuition, SQL #3 doesn’t need to look at any dates whatsoever because the heavy lifting on dates has already been completed in SQL #2. And if I were to change SQL #2 to look at the dates of all fact types as in Tom’s code, I still would not need to change SQL #3 at all.

As Tom pointed out, the problem at this point is that you really need to run a recursion on this query, and to run the recursion enough labels deep to get all the ancestors, not just the parents. I can’t think of any way to accomplish such a recursion within a single UPDATE. But SQL #3 can be run as many times as you wish, and each time it is run it will get one additional generation of ancestors. But how many times should you run it? To that end, we can do the following.

SQL #4, for which individuals (or for how many) could we turn off the Living Flag if we ran SQL #3 again.

SELECT Z.PersonID           -- or SELECT COUNT(Z.PersonID)
  FROM
    (
          SELECT PP.PersonID
            FROM PersonTable AS P
                   INNER JOIN
                 FamilyTable AS F ON (F.FamilyID = P.ParentID)
                   INNER JOIN
                 PersonTable AS PP ON (F.FatherID = PP.PersonID)
           WHERE ( (P.Living = 0) AND (PP.Living = 1) )
 
          UNION
 
          SELECT PP.PersonID
            FROM PersonTable AS P
                   INNER JOIN
                 FamilyTable AS F ON (F.FamilyID = P.ParentID)
                   INNER JOIN
                 PersonTable AS PP ON (F.MotherID = PP.PersonID)
           WHERE ( (P.Living = 0) AND (PP.Living = 1) )
    ) AS Z
;

So we can run SQL #3 and SQL #4, followed by SQL #3 and SQL #4, over and over again until SQL #4 yields no rows. At that point, SQL #3 will have done all it can do and we can proceed to SQL #5.

For this query, P.Living in the sub-query is the Living Flag for the individual, and PP.Living in the sub-query is the Living Flag for the parent of the individual. The WHERE statement is coded as WHERE ( (P.Living = 0) AND (PP.Living = 1) ), which tests for situations where the individual is not living but the parent is still flagged as living. And remember that the not living condition at this point is strictly due to event dates, not yet on the presence of a death fact. So it would be sufficient to code the WHERE simply as WHERE P.Living = 0 in the SQL #3. But writing the WHERE statement as it was written is more efficient than just testing for P.Living because it keeps the number of rows in the sub-query as small as possible. And more importantly, SQL #4 will not produce the correct results without testing both P.Living and PP.Living.

SQL #5, turn off the Living Flag for all individuals who have a death fact. It is now safe to do so because we have done everything we can do with ancestors.

--                             Unprivatize everyone with a valid death date
--
--   We separate the processing of the death year from the processing of the birth year because
--   we need to process ancestors before we unprivatize based on the existence of a death date
 
UPDATE PersonTable
  SET Living = 0
  WHERE PersonID IN
    (
     SELECT P.PersonID
       FROM PersonTable AS P
              INNER JOIN
            NameTable AS N ON ( (P.PersonID = N.OwnerID) AND (N.NameType = 0) )  -- Use only the primary name table entry
       WHERE
         N.DeathYear > 0         -- Death year exists, doesn't matter what it is if it exists
     )
;

My code tests for the presence of a death date, whereas Tom’s code tests for the presence of a death fact. Tom’s code is complete and mine is not. I need to change mine to match Tom’s.

Jerry

Discussions & comments from Wikispaces site

Leave a Reply Cancel reply

Discussions & comments from Wikispaces site

Comment: “…really need to run a recursion on this query…”

Leave a Reply Cancel reply