When is REINDEX required before running a script?

Quote from thejerrybryan on 2025-08-28, 10:01 amI seldom run a REINDEX in SQLite before running a script, but sometimes I get a Malformed Database error unless I do run the REINDEX. I have wondered why it is necessary sometimes to REINDEX and not at other times. I think I now have a suspicion of why.
When I was working on a script to help another user add Web Tag text to citations, I had a small subset of the user's database. I didn't use REINDEX and had no problems. I developed the script, tested against the small subset of the database, and all was well. But when I ran the script against the full database, I got the Malformed Database error.
The problem almost certainly has to be related to RMNOCASE, and in particular that the fake RMNOCASE I use with SQLite is not an exact match with the real RMNOCASE in RM. So it occurs to me that it's possible for the fake RMNOCASE to place items in the exact same order as the real RMNOCASE some of the time but not at other times, depending on how much data there is and on what the data is.
As an extreme example, a column containing only one row can never be out of order, no matter what. The fake RMNOCASE is likely to treat major items like English language letters the same as the real RMNOCASE, but is likely to miss on more obscure items because of the difficulty of reverse engineering all the minute details how the real RMNOCASE really works. So it seems to me that my small subset database didn't include any data where the fake RMNOCASE and the real RMNOCASE put any items in a different order, but the full database did include such data.
Or at least that's my best guess as of today. Does that correspond to the experience and opinions of the rest of you, or am I way off base?
I seldom run a REINDEX in SQLite before running a script, but sometimes I get a Malformed Database error unless I do run the REINDEX. I have wondered why it is necessary sometimes to REINDEX and not at other times. I think I now have a suspicion of why.
When I was working on a script to help another user add Web Tag text to citations, I had a small subset of the user's database. I didn't use REINDEX and had no problems. I developed the script, tested against the small subset of the database, and all was well. But when I ran the script against the full database, I got the Malformed Database error.
The problem almost certainly has to be related to RMNOCASE, and in particular that the fake RMNOCASE I use with SQLite is not an exact match with the real RMNOCASE in RM. So it occurs to me that it's possible for the fake RMNOCASE to place items in the exact same order as the real RMNOCASE some of the time but not at other times, depending on how much data there is and on what the data is.
As an extreme example, a column containing only one row can never be out of order, no matter what. The fake RMNOCASE is likely to treat major items like English language letters the same as the real RMNOCASE, but is likely to miss on more obscure items because of the difficulty of reverse engineering all the minute details how the real RMNOCASE really works. So it seems to me that my small subset database didn't include any data where the fake RMNOCASE and the real RMNOCASE put any items in a different order, but the full database did include such data.
Or at least that's my best guess as of today. Does that correspond to the experience and opinions of the rest of you, or am I way off base?

Quote from Tom Holden on 2025-08-28, 10:20 amYes, that's my take on it, too. While an Integrity Check always fails in SQLite on a database that passes the test in RM before you run REINDEX, it seems to be largely confined to rows in one of indexes for SourceTable (or maybe it's SourceTemplateTable) when you use the better fake RMNOCASE. If your query avoids those indexes it succeeds without that error. However, as you surmise, if data that is indexed is in the mismatches between fake and proprietary RMNOCASE AND said index is used (esp. modified) by the script, then an error can arise.
Years ago, I ran a test case with large number of Unicode characters in the Name table.Surname field, one per record:
Against a RootsMagic database comprising 8091 different Unicode characters in Surnames, PRAGMA Integrity_Check returns 187 missing rowids from indexes idxSurname and idxSurnameGiven with SQLite Expert using this substitute collation. SQLiteSpy returns 16042 missing rowids from the same two indexes including 9 from idxSourceTemplateName with its fake collation. RootsMagic itself reports two missing rowids from each of the two Person Name indexes so there is something wrong even with the genuine RMNOCASE collation.
... https://sqlitetoolsforrootsmagic.com/rmnocase-faking-it-in-sqlite-expert-command-line-shell-et-al/
Yes, that's my take on it, too. While an Integrity Check always fails in SQLite on a database that passes the test in RM before you run REINDEX, it seems to be largely confined to rows in one of indexes for SourceTable (or maybe it's SourceTemplateTable) when you use the better fake RMNOCASE. If your query avoids those indexes it succeeds without that error. However, as you surmise, if data that is indexed is in the mismatches between fake and proprietary RMNOCASE AND said index is used (esp. modified) by the script, then an error can arise.
Years ago, I ran a test case with large number of Unicode characters in the Name table.Surname field, one per record:
Against a RootsMagic database comprising 8091 different Unicode characters in Surnames, PRAGMA Integrity_Check returns 187 missing rowids from indexes idxSurname and idxSurnameGiven with SQLite Expert using this substitute collation. SQLiteSpy returns 16042 missing rowids from the same two indexes including 9 from idxSourceTemplateName with its fake collation. RootsMagic itself reports two missing rowids from each of the two Person Name indexes so there is something wrong even with the genuine RMNOCASE collation.
... https://sqlitetoolsforrootsmagic.com/rmnocase-faking-it-in-sqlite-expert-command-line-shell-et-al/