CTE's and Recursive Queries

#1 · 2024-07-21, 10:34 pm

I had to develop a new version of my query to color code people based on their relationship values after the release of 10.0.1. I had to it without using recursive queries for the simple reason that I didn't know how to do recursive queries. And as I have been reading up on recursive queries, I also came to realize that using Common Table Expressions (CTE's) can be a big assist in writing recursive queries.

So I did a search in this forum for references to both features. The first reference to recursive queries that I found was in 2015 by Tom The first reference to CTE's that I found was in 2017 by Pat

So I'm only 9 years behind Tom and only 7 years behind Pat on this stuff. I haven't written a sample recursive query yet, but I have now written a very simple but possibly useful sample of a CTE. A CTE turns out to be just a subquery that you give a name and define ahead of time. I did not pick up that simple meaning of what a CTE just from the name of the feature. I love subqueries, but they can sometimes really clutter up a script. Therefore, I think I'm going to like CTE's. So here is my sample script using a CTE. And as you can tell, I still really like using aliases.

WITH BirthEvents AS
(
SELECT E.*
FROM EventTable AS E
JOIN FactTypeTable AS FT ON FT.FactTypeID = E.EventType AND FT.Name LIKE 'Birth'
)

SELECT B.* -- or just SELECT * FROM BirthEvents;
FROM BirthEvents AS B;

kevync has reacted to this post.

#2 · 2024-07-22, 11:35 am

Check out these recursive queries for RM relatives.
https://github.com/ricko2001/Genealogy-scripts/tree/main/RM%20-SQL%20for%20creating%20useful%20groups

Also based on Tom's work, but expanded.
I use them in the GroupFromSQL utility.

kevync has reacted to this post.

#3 · 2024-07-22, 8:07 pm

I love subqueries, but they can sometimes really clutter up a script. Therefore, I think I'm going to like CTE's.

I think recursive / CTE queries could have a lot of uses (especially for relationships (cousins, aunts, uncles etc). I just have not take the time to play with them to figure out the syntax to to get them to work for me me other than a couple test cases

#4 · 2024-07-22, 11:43 pm

Quote from thejerrybryan on 2024-07-22, 11:43 pm
I have now completed my first actual recursive / CTE query. Curiously, the one I chose to do first had nothing to do with relationships and my color coding project. I will get back to that one. The query I did had to do with parsing RM's internal XML strings. The first XML string I did was SourceTable.Fields, but the same basic code should work for any of RM's XML columns. I only broke the XML strings down as far as to the <Field>...</Field> level. Each separate <Field>...</Field> becomes a row that is returned from the recursive CTE. I plan to add to the query to parse on down <Value>...</Value> level etc.
A recursive / CTE is not the best XML parsing tool in the world, but I can make it work, whereas I could not really parse RM's XML strings without recursive queries.
I had sort of two problems in working on the query. The first one was a very rookie mistake and really didn't have to do with recursive /CTE queries. Namely I needed to use the INSTR() function to find the locations of the <Field> and </Field> tags. My code looked perfect, but it the recursion wasn't happening. The problem is that I was giving the INSTR() function search arguments of <field> and </field>, but the INSTR() function is case sensitive.
The other problem isn't really a problem exactly. It's just that I must have missed the SQL tutorial that talks about the scope of data - what data that outer queries can see inside a subquery, and what data that subqueries can see in the outer query. I have basically figured it out, but the concept comes in play big time with recursive / CTE queries. That's because the recursive / CTE queries effectively become a deeply nested collection of identical subqueries and data has to pass up and down the nested levels. The tutorials I have seen seem not to address this issue at all. And the sample codes do things with columns names at the various levels of the recursion that seem really scary, but that actually do seem to work quite well without any explanation all of how they work.
Well, a third issue that wasn't a problem because I have a lot experience with it is that SourceTable.Fields is a BLOB field. As such, the SQLite string functions sometimes seem to me to pick up UTF-8 header data as if it was a part of the character string. I decided a long time ago that any time I encountered a BLOB field in RM's database that was really text, I would always CAST it to text before doing any processing on the data.
The best recursive / CTE tutorial I have run into so far may be found at this address.

I have now completed my first actual recursive / CTE query. Curiously, the one I chose to do first had nothing to do with relationships and my color coding project. I will get back to that one. The query I did had to do with parsing RM's internal XML strings. The first XML string I did was SourceTable.Fields, but the same basic code should work for any of RM's XML columns. I only broke the XML strings down as far as to the <Field>...</Field> level. Each separate <Field>...</Field> becomes a row that is returned from the recursive CTE. I plan to add to the query to parse on down <Value>...</Value> level etc.

A recursive / CTE is not the best XML parsing tool in the world, but I can make it work, whereas I could not really parse RM's XML strings without recursive queries.

I had sort of two problems in working on the query. The first one was a very rookie mistake and really didn't have to do with recursive /CTE queries. Namely I needed to use the INSTR() function to find the locations of the <Field> and </Field> tags. My code looked perfect, but it the recursion wasn't happening. The problem is that I was giving the INSTR() function search arguments of <field> and </field>, but the INSTR() function is case sensitive.

The other problem isn't really a problem exactly. It's just that I must have missed the SQL tutorial that talks about the scope of data - what data that outer queries can see inside a subquery, and what data that subqueries can see in the outer query. I have basically figured it out, but the concept comes in play big time with recursive / CTE queries. That's because the recursive / CTE queries effectively become a deeply nested collection of identical subqueries and data has to pass up and down the nested levels. The tutorials I have seen seem not to address this issue at all. And the sample codes do things with columns names at the various levels of the recursion that seem really scary, but that actually do seem to work quite well without any explanation all of how they work.

Well, a third issue that wasn't a problem because I have a lot experience with it is that SourceTable.Fields is a BLOB field. As such, the SQLite string functions sometimes seem to me to pick up UTF-8 header data as if it was a part of the character string. I decided a long time ago that any time I encountered a BLOB field in RM's database that was really text, I would always CAST it to text before doing any processing on the data.

The best recursive / CTE tutorial I have run into so far may be found at this address.

kevync has reacted to this post.

#5 · 2024-07-23, 1:56 pm

Thanks for sharing!

Many of the recursive queries I have seen are confusing. I basically understand the results -- but not the path the recursive steps the query took to arrive.

Recursive seem to be perfect approach in most cases for relationships, ancestor, descendants. Finding all 2nd cousins (including nth removed) from start person for example. Not sure how to go about that exactly.

Kevin

#6 · 2024-07-23, 3:06 pm

Quote from thejerrybryan on 2024-07-23, 3:06 pm
I have developed some very simple examples that have helped me understand how recursive CTE's work. These examples do not involved the RM database at all. They are much simpler than that, so they do not provide any practical advice on how to use recursive CTE's to walk the RM database. But these examples have helped me to understand the underpinnings of CTE's as just being nested subqueries in disguise.
In fact, anything you could do with recursive CTE's you could in principle do with nested subqueries. However, the trouble with nested subqueries is that you would have to know in advance how many levels to nest the subqueries and you would have to replicate the same code in each level of the nested subqueries. So that means that using nested subqueries in place of recursive CTE's is extremely hard to do in practice, even if it could be done in principle.
In any case, what I have done is to have written nested subqueries to do the same thing as a recursive CTE. Here is the first example.
-- #3
SELECT 1 AS x, 'eee' AS y
UNION ALL
SELECT x + 1 AS x, 'ddd' AS y
FROM
(
-- #2
SELECT 1 AS x, 'ccc' AS y
UNION ALL
SELECT x + 1 AS x, 'bbb' AS y
FROM
(
-- #1
SELECT 1 AS x, 'aaa' AS y
)
)
When you run this query, here are the results. Basically, I am running three iterations that start with x = 1 and add 1 to x each time.
x y
1 eee
2 ddd
3 ddd
I added a second variable y to provide character string labels from aaa to eee to be able to see which lines are being executed. Except, the only lines for which you see the labels appear are ddd and eee and they are in the main query. That's because the main query is the only one that produces rows you can see. The main query is gathering data from the subqueries, but you can't see the results from the subqueries directly.
Remember that subqueries run before the queries that call them. In this case main query #3 runs last, subquery #2 runs second, and subquery #1 runs first. It's like when you analyze mathematical expressions with parentheses. You work your way from the inside out, and a subquery is "inside" the query that calls it. So the x with the value of 1 was the last assigned and it was never incremented. The x with the value of 2 was the second assigned. It was assigned the value of 1 and it was incremented once. The x with the value 3 was the first assigned. It was assigned the value of 1 and it was incremented twice. It's important to make note of the fact that none of the y values were ever propagated up from a subquery to the query that called the subquery.
Here's a second version that's the same script except that the y values were propagated upwards so you can see the subquery that first initiated the x value. The x = 1 has a y = ccc and hence was initiated in the main query. It never went through any of the subqueries and was never incremented. The x = 2 has a y = bbb and hence was initiated in subquery #2. It went through one subquery and was incremented once. The x = 3 has a y = ccc and hence was initiated in subquery #3. It went through two subqueries and was incremented twice.
-- #3
SELECT 1 AS x, 'ccc' AS y
UNION ALL
SELECT x + 1 AS x, y AS y
FROM
(
-- #2
SELECT 1 AS x, 'bbb' AS y
UNION ALL
SELECT x + 1 AS x, y AS y
FROM
(
-- #1
SELECT 1 AS X, 'aaa' AS y
)
)
x y
1 ccc
2 bbb
3 aaa
The treatment of y deserves some attention. In particular, it is selected in some cases as y AS y. It looks sort of funny, but what's actually happening is that the first value of y listed was returned by this particular subquery's lower level subquery and the second value of y listed will be returned to the higher level query that called this particular query.
Finally, here is a recursive CTE version of the same query. It was going to be a mess to pass the y variables between the levels of subquery, so in the interest of keeping the query simple I only have an x variable. But the actual processing is exactly like other two scripts except that the subqueries are handled by the recursive CTE mechanism. It's the same subqueries, except they are generated dynamically instead me having to type them in, and except that I don't have to know how many levels of subquery there are going to be ahead of time.
WITH RECURSIVE my_CTE( x ) AS
(
SELECT 1 AS x
UNION ALL
SELECT x + 1 AS x
FROM my_CTE
WHERE x < 3
)
SELECT x
FROM my_CTE
x
1
2
3
I want to make note of one final thing. It was also in the subquery version, but it seems to stand out more in the recursive CTE version. It's where we have x + 1 AS x. Just as I described before with y, the first x is the x returned from the lower level recursion. Then 1 is added to it and the second x is the x that is returned to the higher level recursion.

I have developed some very simple examples that have helped me understand how recursive CTE's work. These examples do not involved the RM database at all. They are much simpler than that, so they do not provide any practical advice on how to use recursive CTE's to walk the RM database. But these examples have helped me to understand the underpinnings of CTE's as just being nested subqueries in disguise.

In fact, anything you could do with recursive CTE's you could in principle do with nested subqueries. However, the trouble with nested subqueries is that you would have to know in advance how many levels to nest the subqueries and you would have to replicate the same code in each level of the nested subqueries. So that means that using nested subqueries in place of recursive CTE's is extremely hard to do in practice, even if it could be done in principle.

In any case, what I have done is to have written nested subqueries to do the same thing as a recursive CTE. Here is the first example.

-- #3
SELECT 1 AS x, 'eee' AS y
UNION ALL
SELECT x + 1 AS x, 'ddd' AS y
FROM
(
-- #2
SELECT 1 AS x, 'ccc' AS y
UNION ALL
SELECT x + 1 AS x, 'bbb' AS y
FROM
(
-- #1
SELECT 1 AS x, 'aaa' AS y
)
)

When you run this query, here are the results. Basically, I am running three iterations that start with x = 1 and add 1 to x each time.

x y
1 eee
2 ddd
3 ddd

I added a second variable y to provide character string labels from aaa to eee to be able to see which lines are being executed. Except, the only lines for which you see the labels appear are ddd and eee and they are in the main query. That's because the main query is the only one that produces rows you can see. The main query is gathering data from the subqueries, but you can't see the results from the subqueries directly.

Remember that subqueries run before the queries that call them. In this case main query #3 runs last, subquery #2 runs second, and subquery #1 runs first. It's like when you analyze mathematical expressions with parentheses. You work your way from the inside out, and a subquery is "inside" the query that calls it. So the x with the value of 1 was the last assigned and it was never incremented. The x with the value of 2 was the second assigned. It was assigned the value of 1 and it was incremented once. The x with the value 3 was the first assigned. It was assigned the value of 1 and it was incremented twice. It's important to make note of the fact that none of the y values were ever propagated up from a subquery to the query that called the subquery.

Here's a second version that's the same script except that the y values were propagated upwards so you can see the subquery that first initiated the x value. The x = 1 has a y = ccc and hence was initiated in the main query. It never went through any of the subqueries and was never incremented. The x = 2 has a y = bbb and hence was initiated in subquery #2. It went through one subquery and was incremented once. The x = 3 has a y = ccc and hence was initiated in subquery #3. It went through two subqueries and was incremented twice.

-- #3
SELECT 1 AS x, 'ccc' AS y
UNION ALL
SELECT x + 1 AS x, y AS y
FROM
(
-- #2
SELECT 1 AS x, 'bbb' AS y
UNION ALL
SELECT x + 1 AS x, y AS y
FROM
(
-- #1
SELECT 1 AS X, 'aaa' AS y
)
)

x y
1 ccc
2 bbb
3 aaa

The treatment of y deserves some attention. In particular, it is selected in some cases as y AS y. It looks sort of funny, but what's actually happening is that the first value of y listed was returned by this particular subquery's lower level subquery and the second value of y listed will be returned to the higher level query that called this particular query.

Finally, here is a recursive CTE version of the same query. It was going to be a mess to pass the y variables between the levels of subquery, so in the interest of keeping the query simple I only have an x variable. But the actual processing is exactly like other two scripts except that the subqueries are handled by the recursive CTE mechanism. It's the same subqueries, except they are generated dynamically instead me having to type them in, and except that I don't have to know how many levels of subquery there are going to be ahead of time.

WITH RECURSIVE my_CTE( x ) AS
(
SELECT 1 AS x

UNION ALL

SELECT x + 1 AS x
FROM my_CTE
WHERE x < 3

)

SELECT x
FROM my_CTE

x
1
2
3

I want to make note of one final thing. It was also in the subquery version, but it seems to stand out more in the recursive CTE version. It's where we have x + 1 AS x. Just as I described before with y, the first x is the x returned from the lower level recursion. Then 1 is added to it and the second x is the x that is returned to the higher level recursion.

kevync has reacted to this post.

#7 · 2024-07-26, 3:48 am

I'm late to this thread but I'm a CTE addict and use them all the time in my day job. Recursive CTE are indeed very confusing but there are several articles on line in the SQL Server database community that help since SQLite is very similar to SQL Server where CTE are concerned.

kevync has reacted to this post.