I don’t post here as often as I probably should. I find that most of my SQL queries are quite specific to my own personal research and are of little general interest. On the other hand, Tom has a genius for coming up with queries that are of very general utility. The queries I’m going to post in this particular message are probably very much in the category of being quite specific to my own personal research, but perhaps some of the techniques I’m using might be of interest to others. In particular, for complex queries I now use SQL VIEW statements to create a number of views, and I then construct my queries from these views. This breaks the query into smaller pieces, which I find simpler. I’m just a simple country boy, so I like simple.
Having said that, breaking complex tasks into many smaller pieces can be taken too far, to the point that managing the pieces can become more complex than the original problem was. The reader will have to decide if I have gone too far in that direction. Also, using VIEW statements can make SQL queries run a bit slower than they would
otherwise. Again, the reader will have to decide if the technique of using views is advantageous or not. It is advantageous for me, but it might not be advantageous for everybody.
The overall query we are going to be working on is to identify facts without sources for a particular sub-population of individuals in my database. So we have to go through a process identifying the individuals in the sub-population, identifying their facts (both individual and family), and identifying which of those facts do not have sources. We start by considering a very simple query to list the values in RM’s PersonTable for everyone who is color coded as red. Events-CitationsMissing-UsingViews.sql 2017-04-17 This compilation of many of the following scripts can be downloaded and executed to list events with no citations.
SELECT P.* FROM PersonTable AS P WHERE P.Color = 1;
I’m going to present a number of similar examples in the same general vein, but in the meantime I would point out the following. There is no particular magic about the color red. I just needed an example. I could have chosen a different color or combination of colors for the example. I could could have chosen an RM Named Group for the example, but the example would have been slightly more complicated. Etc. But the point is that it’s very easy in SQL to test for color coding. So I have moved some selection logic that might possibly be very complicated to do in SQL such as “all the descendants of John Doe” out of SQL and into RM. Which is to say, it’s very easy in RM to color code all descendants of John Doe as red but it’s not so easy in SQL to select all the descendants of John Doe.
Having created this query, we can run it to test it. We can then convert it into a VIEW as follows. We DROP the VIEW before doing a CREATE so that the query can be run multiple times. We include IF EXISTS so that the DROP will run correctly the first time before the VIEW has ever been created. And we make a TEMP VIEW so that the VIEW is kept in memory and doesn’t clutter up the actual database.
DROP VIEW IF EXISTS PersonTableView; CREATE TEMP VIEW PersonTableView AS SELECT P.* FROM PersonTable AS P WHERE P.Color = 1;
Having created a view in this manner, we can then invoke it to actually use it or just to test it again as follows.
SELECT P.* FROM PersonTableView AS P;
Obviously, if this is all we have done, we have introduced extra work and complexity into an SQL query process for no particular advantage. The advantage comes only when we construct more complicated queries. In particular, we can replace every reference to PersonTable in a complex query with a reference to PersonTableView. Having done so, the query is automatically filtered to include only those individuals who are color coded as red (or whatever filtering we have specified in the definition of PersonTableView).
Here are some other examples of other possible definitions of PersonTableView.
-- PersonTableView Example #1 -- -- This one filters on the color maroon and it omits any individuals whose -- given name contains the string "dummy". I have a number of dummy people -- in my database, denoted by the string "dummy" in the given name field. -- For many queries, I wish to exclude these dummy people. This query does -- a JOIN with the NameTable to gain access to the given name, but it does -- not select any of the columns from the NameTable to be included in the view. -- A similar idea could be used to filter on a particular surname, e.g. -- -- AND N.Surname LIKE('Smith%'); -- DROP VIEW IF EXISTS PersonTableView; CREATE TEMP VIEW PersonTableView AS SELECT P.* FROM PersonTable AS P INNER JOIN NameTable AS N ON N.OwnerID = P.PersonID WHERE P.Color = 8 AND N.Given NOT LIKE('%dummy%'); -- color 8 is maroon -- PersonTableView Example #2 -- -- This example filters on a Named Group which is defined within -- RM. I have hard coded the group number as group 37 for simplicity, -- but it would also be easy to include a JOIN between the GroupTable -- and the LabelTable to specify the name of the group by its name. -- In order to hard code the group number, you have to look up the -- group number in the LabelTable yourself which is completely trivial. -- For the purposes of this example, I have assumed that we have done -- so and that the group of interest is group 37. The query -- does a cross JOIN between the GroupTable and the PersonTable for the -- purpose of making the selection with a WHERE statement, but no columns -- from the GroupTable are selected in the query. (A cross JOIN is when you -- list multiple table names and you are separate them with a comma rather -- than with a JOIN statement.) -- DROP VIEW IF EXISTS PersonTableView; CREATE TEMP VIEW PersonTableView AS SELECT P.* FROM PersonTable AS P, GroupTable AS G WHERE G.GroupID=37 AND G.StartID <= P.PersonID AND P.PersonID <= G.EndID; -- PersonTableView Example #3 -- -- This example filters on a small list of people based on their -- PersonID. Such a VIEW is usually most useful during the -- development and debugging of a query to target the query -- towards a very small population of individuals who have -- the properties of interest to the query. -- DROP VIEW IF EXISTS PersonTableView; CREATE TEMP VIEW PersonTableView AS SELECT P.* FROM PersonTable AS P WHERE P.PersonID IN (3,17,33);
Having created a filtered PersonTable called PersonTableView, you might like to have a filtered NameTable called NameTableView to go along with it. Such a view might be created by SQL such as the following.
-- NameTableView example.-- Create a view of the FactTypeTable filtered to exclude dummy facts. DROP VIEW IF EXISTS NameTableView; CREATE TEMP VIEW NameTableView AS SELECT N.* FROM NameTable AS N INNER JOIN PersonTableView AS P ON N.OwnerID = P.PersonID;
You could then use NameTableView instead of NameTable in queries, or you could just list your filtered NameTable as follows.
SELECT N.* FROM NameTableView AS N;
Having created a filtered PersonTable called PersonTableView, you might like to have a filtered FamilyTable called FamilyTableView to go along with it. Such a view might be created by SQL such as the following.
-- FamilyTableView example. -- -- The FamilyTable is filtered to include only individuals who are included in the filtered PersonTable. -- The DISTINCT parameter is needed to avoid selecting the same row of the FamilyTable twice if -- both the Father and the Mother are in PersonTableView. The VIEW is set up to select the family -- if either or both spouses are in the PersonTableView. If you want the VIEW to select the family -- only if both spouses are in the PersonTableView, change the OR to AND. DROP VIEW IF EXISTS FamilyTableView; CREATE TEMP VIEW FamilyTableView AS SELECT DISTINCT F.* FROM FamilyTable AS F INNER JOIN PersonTableView AS P ON (F.FatherID = P.PersonID OR F.MotherID = P.PersonID);
We could go on in this same vein for a while, just giving examples of how to make queries without actually making a query. Instead, let us now try to make an example that might actually be useful. In particular, let us make a query that lists facts without a source. We start by making a View of the FactTypeTable. As will continue to be the case, much of this view is specific to my own research, but I include it to present ideas.
-- FactTypeTableView example. -- -- I use a lot of dummy facts - facts that will never appear in any reports but -- which record "to do" type of information, research log type of information, etc. -- Probably nobody else than me does this, but it still provides a good example. -- The VIEW omits these dummy facts. I use the standard Reference Number fact -- as a dummy fact (I'm phasing out this usage), and facts whose name incudes an -- asterisk as dummy facts. This query omits such fact types from the VIEW. -- If this type of filtering is not needed, you could just reference the FactTypeTable -- directly without creating a VIEW or else have a FactTypeView that includes -- the entire FactTypeTable. -- DROP VIEW IF EXISTS FactTypeTableView; CREATE TEMP VIEW FactTypeTableView AS SELECT F.* FROM FactTypeTable AS F WHERE (F.Abbrev NOT LIKE '%*%' AND F.Abbrev NOT LIKE 'Ref%');
We now use the filtered VIEW of the FactTypeTable to create a filtered VIEW of the EventTable. Obviously, the filtering of the FactTypeTable and the filtering of the EventTable could have been done in a single VIEW. But I find it simpler to split the filtering into separate pieces.
-- EventTableView example. -- -- This VIEW simply applies the concept of dummy facts from the FactTypeTable to the EventTable. -- It also adds one new variable so that that the EventTableView has the name of fact type available -- to display for each event, a piece of data that is not available in the EventTable itself. -- I wonder about the wisdom if this approach. We could instead leave out the name of the fact type -- for now and bring it in later in the query. That way, the EventTableView would be a pure -- representation of what is in the actual EventTable. -- DROP VIEW IF EXISTS EventTableView; CREATE TEMP VIEW EventTableView AS SELECT E.*, F.Abbrev FROM EventTable AS E INNER JOIN FactTypeTableView AS F ON E.EventType = F.FactTypeID;
We now use the filtered VIEW of the EventTable to create two new filtered VIEW’s of the EventTable – one filtered VIEW for individual facts and one filtered VIEW for family facts. The two filtered views can be combined together with a UNION as required, but care is required because individual facts and family facts work somewhat differently.
-- Two EventTableView's - example to deal with individual facts vs. family facts. DROP VIEW IF EXISTS IndividualEventView; CREATE TEMP VIEW IndividualEventView AS SELECT E.* FROM EventTableView AS E INNER JOIN PersonTableView AS P ON E.OwnerID = P.PersonID WHERE E.OwnerType = 0; -- DROP VIEW IF EXISTS FamilyEventView; CREATE TEMP VIEW FamilyEventView AS SELECT E.* FROM EventTableView AS E INNER JOIN FamilyTableView AS F ON E.OwnerID = F.FamilyID WHERE E.OwnerType = 1;
We are now getting close to our goal of making a report of facts without sources. We require three queries for facts without sources, each of which we make into a VIEW. One query is for individual facts. One query is for family facts where the person is a Father. And one query is for family facts where the person is Mother. The latter two could surely be combined, but I have chosen not to do so. Note that the queries need to test for NULL and testing for NULL with SQL can be tricky and error prone if you are not extremely careful and mindful of how it works. I believe and hope that I have gotten it right. At least my code seems to produce correct results.
-- CitationViewIndividualEvents example. -- -- We use a LEFT JOIN from the individuals to the citations to identify events -- without citations, and the test for NULL effectively takes place after -- the JOIN is complete. -- -- First, we create a VIEW of the CitationTable filtered on OwnerType = 2 (citations for events) -- The VIEW will be used several times and has the effect of isolating the test for citations -- for events to being made at only a single place in the overall query. -- DROP VIEW IF EXISTS CitationViewEvent; CREATE TEMP VIEW CitationViewEvent AS SELECT C.* FROM CitationTable AS C WHERE C.OwnerType = 2; -- -- We now create the three queries for individual events, family events for the father, and family events for the mother. -- The JOIN structure for the family event queries is slightly more complicated than is the JOIN structure for the -- individual event query. The extra complication is because the RIN number comes from a different table than the -- name of the fact when a family fact is involved. -- DROP VIEW IF EXISTS CitationViewIndividualEvents; CREATE TEMP VIEW CitationViewIndividualEvents AS -- query for individual events SELECT E.Ownerid AS RIN, E.Abbrev FROM IndividualEventView AS E LEFT JOIN CitationViewEvent AS C ON C.OwnerID = E.EventID WHERE C.OwnerID IS NULL; DROP VIEW IF EXISTS FatherFactCitations; CREATE TEMP VIEW FatherFactCitations AS SELECT F.FatherID AS RIN, E.Abbrev FROM FamilyEventView AS E INNER JOIN FamilyTableView AS F ON E.OwnerID = F.FamilyID LEFT JOIN CitationViewEvent AS C ON C.OwnerID = E.EventID WHERE C.OwnerID IS NULL; DROP VIEW IF EXISTS MotherFactCitations; CREATE TEMP VIEW MotherFactCitations AS SELECT F.MotherID AS RIN, E.Abbrev FROM FamilyEventView AS E INNER JOIN FamilyTableView AS F ON E.OwnerID = F.FamilyID LEFT JOIN CitationViewEvent AS C ON C.OwnerID = E.EventID WHERE C.OwnerID IS NULL;
Finally, each VIEW we need has been created. We can list the results with a query like the following. The query lists only the RIN number and the name of each fact for that person with no citation. We could easily list additional information, but I find that the RIN number is all I need. I use it in RM with Magnifying Glass -> Record to navigate to the correct person. The query has a minor quirk that is not worth fixing. Namely, a family fact without a citation will be listed for both spouses even if only one of the spouses is included in PersonTableView. But I regard a missing source as a problem to be fixed, and for a family fact the missing source can be fixed just as well from either spouse. Whenever I fix such a problem, I immediately rerun the query to get a reduced list of problems to be fixed.
SELECT * FROM CitationViewIndividualEvents UNION SELECT * FROM FatherFactCitations UNION SELECT * FROM MotherFactCitations;
I will end my narrative by reiterating what I said at the beginning: many (maybe most) users may find this way of developing queries not to be any simpler than a more normal way. But I really like developing such a script just a piece at a time. And by developing it in the order described, I can easily test and validate each piece of the script as I go. So by the end, it should just work.
Discussions & comments from Wikispaces site
19 August 2016 00:05:58
Great development and exposition! A valuable learning page.