Forum

Please or Register to create posts and topics.

Further GROUP BY Strangeness

First of all, because I have been doing so much work comparing data between two databases, I have gotten into the habit of doing an ATTACH rather than an OPEN. For example, to compare RM7 and RM9 databases, I first execute the following ATTACH operations.

ATTACH DATABASE 'C:\Users\jbryan\OneDrive\rm7\jerryrm7.rmgc' AS RM7;
ATTACH DATABASE 'C:\Users\jbryan\OneDrive\rm9\jerryrm9.rmtree' AS RM9;

Having done so, I can then do either of the following queries to query either my RM7 database or my RM9 database.

SELECT FT.*
FROM RM7.FactTypeTable AS FT;

or

SELECT FT.*
FROM RM9.FactType AS FT;

With that in mind, I would expect either of the following queries to produce identical results, but they don't.

-- #1 USING FT.Name directly
SELECT COUNT(A.FactTypeID) AS FT_Count, A.*
FROM
(
SELECT 7 AS Version, FT.FactTypeID, FT.Name, FT.Abbrev
FROM RM7.FactTypeTable AS FT

     UNION

SELECT 9 AS Version, FT.FactTypeID, FT.Name, FT.Abbrev
FROM RM9.FactTypeTable AS FT

) AS A
WHERE A.Name LIKE '%Census%'
GROUP BY A.Name;

 

-- #2 using SUBSTR on FT.Name
SELECT COUNT(A.FactTypeID) AS FT_Count, A.*
FROM
(
SELECT 7 AS Version, FT.FactTypeID, SUBSTR(FT.Name,1) AS FT_Name, FT.Abbrev
FROM RM7.FactTypeTable AS FT

     UNION

SELECT 9 AS Version, FT.FactTypeID, SUBSTR(FT.Name,1) AS FT_Name, FT.Abbrev
FROM RM9.FactTypeTable AS FT

) AS A
WHERE A.FT_Name LIKE '%Census%'
GROUP BY A.FT_Name;

Query #1 gets an FT_COUNT of 1 for Census and an FT_Count of 3 for Census (family), which is incorrect.

Query #2 gets an FT_COUNT of 2 for Census and an FT_Count of 2 for Census (family), which is correct. The count is 2 rather than 1 because the UNION provides a Census row from RM7 and also a Census row from RM9, and similarly for Census (family).

I’m very much mystified. The FT.Name column is TEXT rather than BLOB in both RM7 and RM9, and the column is indexed in both RM7 and RM9. I have tried the FT.Abbrev column instead, and it works fine if I use it directly rather than using a SUBSTR of it. FT.Abbrev also is TEXT rather than BLOB and is indexed, both in RM7 and RM9. In any case, I’m using the SUBSTR technique successfully to get around the problem. The problem happens in other tables than the FactTypeTable. It's just that the FactTypeTable is small and therefore is an easy table for running tests.

For the purposes of this message to the forum, I actually ran these queries against new and empty RM7 and RM9 databases because my production databases have extra rows in the FactTypeTable for census. Anyone else should be able to use the same technique to reproduce my results using new and empty RM7 and RM9 databases.

This sort of processing is the key to the way my scripts work for comparing an RM7 and an RM9 database. I run a query for both databases while adding a version number column for RM7 vs. RM9. I then do a GROUP BY on the columns that should be the same. I follow the GROUP BY with HAVING FT_COUNT != 2 and any rows that show up in the query represent differences between my RM7 and RM9 databases.

 

My quick test of your scripts does not produce any discrepancy between the results which are as expected when applied to two virgin database, one from RM7 and one from RM9. Your unexpected inconsistency must point to a problem within your system, which seems really strange. Maybe you should try a different sqlite tool. I used SQLiteSpy 64-bit and ran the scripts both without REINDEXing and with a fake RMNOCASE.

I updated recently to what appeared to be the newest SQLiteSpy, except that when I downloaded what I thought was the 64 bit version it turned out after installation to be 32 bit. Do you have a link to the 64 bit download?

I ran my script #1 again just to be sure that I'm reporting the results correctly. It definitely does not GROUP BY correctly, getting a count of 1 and a count of 3 instead of two counts of 2. Did you run against a new and empty RM7 database and a new and empty RM9 database.

Uploaded files:
  • group_by_problem.jpg

Got the newest 64 bit SQLiteSpy installed, and the problem disappeared. Sorry for the trouble. I think I know what happened when I thought I was installing the newest 64 bit version and I was really installing the newest 32 bit version. I won't admit to the embarrassing details, but it was a rookie mistake.

Just thought that I'd mention that another possible variable in your tests is the SQLite library version that your query tool is compiled/linked to. (It certainly doesn't explain your different query results in the same session...)

I had a recent experience that surprised me. I updated my tool of choice and found that one of y queries suddenly slowed to a crawl. Turns out it was the SQLite library version. I fixed the badly written query.