Simple SQLite Question – maybe for Tom – Forum

#1 · 2019-04-09, 9:40 am

The following SQLite query works perfectly for my purposes.

SELECT FT.Name, FT.Abbrev, FT.GedcomTag, E.EventType, FT.FactTypeID, E.EventID, E.OwnerID, E.Note

FROM FactTypeTable AS FT

JOIN

(SELECT E.* FROM EventTable AS E WHERE OwnerType = 1) AS E ON FT.FactTypeID = E.EventType

ORDER BY FT.Abbrev, E.OwnerID;

However, if instead I do ORDER BY FT.Name, E.OwnerID; then the ordering is not quite right, even though FT.Abbrev and FT.Name have the same value. In particular, I have quite a few user defined facts of with a name of the form "Marriage xxxxxxx" such as "Marriage Record", "Marriage Narrative", etc. These are all mixed together as if they were just named "Marriage". I can't see any problem with the names, either with SQLite or in the RM user interface. So why won't the FT.Name field sort correctly?

Is there a way in this WordPress forum to enter SQL code as type code so it looks better?

#2 · 2019-04-09, 2:12 pm

That mystifies me, too. At first, I wondered if the repeated "E" alias and what seemed an unneeded SELECT might have a bearing and, maybe it does, because I don't have a sample database so I revised your query to:

SELECT FT.Name, FT.Abbrev, FT.GedcomTag, E.EventType, FT.FactTypeID, E.EventID, E.OwnerID, E.Note

FROM FactTypeTable AS FT

JOIN

EventTable AS E ON FT.FactTypeID = E.EventType AND E.OwnerType = 1

ORDER BY FT.Name, E.OwnerID;

Try it to see if it has any bearing on your results; I doubt it will. What surprised me is that, despite SQLite's query optimiser adding an existing index to the process, it actually is consistently slower than your original by 20-30%.

Maybe your issue is that you have not REINDEXed in SQLite so that the indexes are set by the fake RMNOCASE collation. The NAME field is collated by RMNOCASE while ABBREV is not.

~~As to entering code in Forum posts, I regret that this Forum plugin does not appear to support a code block.~~ Discovered a setting for Minimalistic Editor which when unchecked opens up more editing capabilities! That ? on the menu lists keyboard shortcuts, one of which (shift+alt+x) formats a block of selected text as code.

The forum also supports .sql attachments.

#3 · 2019-04-09, 4:28 pm

The redundant E alias is sort of a lame attempt at optimization, and I realize the optimizer should take care of what I was trying to accomplish. But my own optimization does seem to work well. (I trust it's clear what the optimization is trying to accomplish.)

I'm sure RMNOCASE must be the problem, but it's hard to see the details of exactly how RMNOCASE would be the problem. The "big picture" sorting on the Name field of the FactTypeTable is correct - Annulment is before Divorce, Divorce is before Marriage, Marriage is before Wedding Announcement, etc. It's only within the event types whose names start with "Marriage ..." that there is a problem.

Even before you replied, I tried it without the otherwise "redundant" E alias, and the results are no different. Also, there is no problem if you only reference the FactTypeTable. The "Marriage ..." fact types respond properly to ORDER statements in that case. Here follows an example where the sorting is still wrong.

SELECT FT.FactTypeID, FT.Name, FT. Abbrev, FT.GedcomTag
FROM FactTypeTable AS FT
JOIN
EventTable AS E ON FT.FactTypeID = E.EventType
WHERE FT.Name LIKE('%marriage%')
ORDER BY FT.Name;

But if you comment out the JOIN to the EventTable, the sorting is correct, viz.

SELECT FT.FactTypeID, FT.Name, FT. Abbrev, FT.GedcomTag
FROM FactTypeTable AS FT
--JOIN
--EventTable AS E ON FT.FactTypeID = E.EventType
WHERE FT.Name LIKE('%marriage%')
ORDER BY FT.Name;

So some part of the problem must have something to do with the JOIN. But it's a perfectly normal and correct JOIN, the best I can tell. I also tried the JOIN in both orders, FactTypeTable joined to EventTable vs. EventTable joined to FactTypeTable, and as expected the behavior is the same in either case.

#4 · 2019-04-09, 4:37 pm

The addition of COLLATE NOCASE to the query solves the problem. But I repeat that it's difficult to see what characteristic of RMNOCASE there could be that would produce my particular symptoms.

Obviously, this is a really trivial issue - solved either by COLLATE NOCASE or by sorting on Abbrev instead of Name. It's just a very puzzling mystery that I'm very curious about.

#5 · 2019-04-09, 4:44 pm

Quote from thejerrybryan on 2019-04-09, 4:28 pm
(I trust it's clear what the optimization is trying to accomplish.)

No, it's not. EXPLAIN QUERY PLAN shows that it suppresses the use of an existing compound index but the sequence and the numbers of steps is the same. The optimiser thinks the index is appropriate but it actually degrades the speed. If you were deliberately trying to suppress the optimiser, there is probably a more obvious way to do so.

I'm sure RMNOCASE must be the problem, but it's hard to see the details of exactly how RMNOCASE would be the problem. The "big picture" sorting on the Name field of the FactTypeTable is correct - Annulment is before Divorce, Divorce is before Marriage, Marriage is before Wedding Announcement, etc. It's only within the event types whose names start with "Marriage ..." that there is a problem.

Did you REINDEX in SQLIte?

Also try ORDER BY FT.Name COLLATE NOCASE, E.OwnerID;

#6 · 2019-04-09, 4:47 pm

We doubled. What is the collation sequence you are using as the fake RMNOCASE?

#7 · 2019-04-09, 8:31 pm

I'm using SQLiteSpy with rmnocase_fake-sqlitespy.dll

As far as optimization, a JOIN conceptually (but not really) first does a full cross join which can result in an enormous number of rows. For example, a full cross join on two tables with 1,000 rows each results in a table with 1,000,000 rows. A JOIN then conceptually (but not really) applies the JOIN criteria plus any WHERE criteria to the full cross join. If relational databases really worked like that, they would never work in practice because they would use too much memory and they would be too slow - hence an optimizer is an essential part of any practical relational database engine. In effect, one key role of the the optimizer is to reverse the conception and behind the scenes to do the WHERE and ON filtering before forming the JOIN instead of after.

Putting WHERE criteria inside a sub-query conceptually (but not not necessarily really) can greatly reduce the number of rows being joined before rather than after the JOIN is processed. In practice, it's probably more of a way to think about the problem than achieving any actual optimization. In any case, putting WHERE criteria inside a sub-query is functionally equivalent to the trick I sometimes use of putting WHERE criteria into a temporary VIEW, except that the sub-query usually runs much faster than the temporary VIEW.

I wish I hadn't introduced the sub-query into this discussion. The real issue remains the curious behavior of RMNOCASE for my particular query.

Re-indexing from within the RM user interface has no effect on the problem.

#8 · 2019-04-09, 10:15 pm

I meant, did you run the SQLite command:

REINDEX;

before your query so that the indexes are based on the fake collation, not the real RMNOCASE? After finishing with SQLite, you would use the RM Rebuild Indexes to cause its SQLite engine to REINDEX using the real RMNOCASE.

#9 · 2019-04-10, 11:51 am

The REINDEX from within SQLiteSpy does not fix it. The symptoms are the same. Writ large, the sorting is "correct" Divorce before Marriage, etc. But within "Marriage ..." the sorting appears random. It's like it's doing a substring to look only at the first few characters. But My code certainly doesn't have a substring or anything similar.

Just to confirm, sorting on the Abbrev field continues to yield correct sorting results after the REINDEX from within SQLiteSpy.

#10 · 2019-04-10, 11:59 am

On a different subject but one that came up in this thread, Tom got me curious about the SQL rules for aliases. Namely, what happens if there is a naming collision between an alias inside a subquery and an alias in the main query outside the subquery. As I had been assuming, there is no conflict. Inside the subquery, the alias refers to the definition that is inside the subquery, and outside the the subquery the same named alias refers to the definition that is outside the subquery.

But ---- a subquery can refer to an alias that is defined outside the subquery. In order for a subquery to do so, the same name must not be defined as an alias inside the subquery. So even though apparent naming collisions are legal and and the behavior is well defined, it can be confusing to the reader and such practice is probably not in good odor.

Simple SQLite Question - maybe for Tom