Recursive CTE to Parse SourceTable.Fields

#1 · 2024-07-29, 11:44 am

This is closely related to several recent threads, but it's going to be large enough that I decided to start a new thread. Namely, I am now posting my first attempt at using a recursive CTE for a real query against an RM database. This query parses out the <Field>...</Field> values in SourceTable.Fields. I will not describe the functioning of the query in detail because there are some detailed comments within the body of the code.

I am a source splitter. For anyone who is not a source splitter, the same code would need to be replicated and rehomed to CitationTable.Fields, and then a JOIN would be needed between the two queries.

Here's an example of what this query can produce for my obituary citations. Notice that the Funeral Home or NewsPaper column should have some data, but not both. I'm bothered by some data that's apparently missing, which is why I wanted to create this query. For example, why do I not have page numbers for all the newspaper obituaries? I need to find out. And why do I not have Web sites for more of the newspaper obituaries? It's probably ok because I have the clippings. But I want to double check. It's virtually impossible to do this kind of checking from within the RM user interface.

Uploaded files:

recursive_cte_source_fields.sql

kevync has reacted to this post.

#2 · 2024-07-31, 4:14 pm

I've been pre-occupied with guests and prep for same in our new house to pay attention to complex posts here (and elsewhere) but I must say that I'm looking forward to reviewing this one and your earlier one delving into CTE and recursion. The latter appeared to give fresh insight in layman's language that may help me graduate from having to find examples that work whenever I try to create one (long intervals in-between don't help). This example where you have transformed each string of XML into a table row is really interesting and puts the data into a form that facilitates visual inspection. For that purpose it is far superior to what I was able to do in Source List Query from which this screenshot is extracted:

The columns "Src Fields" and "Cit Fields" are the contents of the XML blobs in SourceTable and CitationTable stripped of XML tags and putting each XML Name and Value pair on a new line. While much more readable than the raw XML it's not nearly as easy to inspect as your format. It was probably done before I'd heard of CTE's and certainly at a time that I wouldn't have known how to begin!

#3 · 2024-07-31, 4:59 pm

Do be aware that the nice format has little to do with CTE's or recursive CTE's. The CTE and the recursive CTE simply collect and parse the SourceTable.Fields data into rows (and could be expanded to collect and parse the CitationTable.Fields data into rows). But the CTE and the recursive CTE don't put data into the nice format. Rather, the conversion of rows into columns is a painful manual process in writing the SQL code. The pain is a direct consequence of SQLite not supporting anything remotely resembling a built-in TRANSPOSE or PIVOT function.

To do this painful process, I used a trick that I found on the Internet rather than a trick that I invented. The trick essentially transposes or pivots a single cell at a time. This conceptually creates a very sparse matrix where each row is null except for one column and each column is null except for one row. Then a GROUP BY combined with a MAX function on each column collapses the entire sparse matrix into a single good looking row that has been transposed or pivoted as compared to the original data. You never actually see the sparse matrix because of the GROUP BY and the MAX function, even during debugging. But it's there. You could see it by temporarily removing the GROUP BY and MAX, but I didn't see any reason to do so after I understood the concept.

But to present the SourceTable.Fields data and the CitationTable.Fields data in this nice manner, the same trick would have to be written for each separate source template of interest. That's because it's the <Field>...</Field> items in the Fields data that are becoming the nice looking columns. The <Field>...</Field> items in the Fields data correspond to the variables in the source template, and each source template can and usually does have different variables than other source templates.