RMNOCASE for future versions

#1 · 2024-06-05, 9:26 pm

Curious what anyone might think the chances for RMNOCASE for be removed future versions (RM10 and beyond). It seem like it creates a lot of headaches and the substitutes may not always work for every users needs.

Kevin

#2 · 2024-06-05, 9:32 pm

I'd estimate that the chance of that happening is 10%.

How do you like that number. Everyone else will say 0%, but I'm ever hopeful.

I'm hopeful because there is already an issue with RMNOCASE being different on Windows and MacOS. That should be fixed.

It doesn't have to be removed, just documented, or made compatible with another open source collation. Some kind of Unicode supporting collation is required for user happiness.

I'm thinking of removing RMNOCASE from a test database. Just recreate all of the tables using NOCASE instead of RMNOCASE. I think that one can't just alter the table, but would have to do a data move within the database to get the desired result.
The next question is whether RM will notice.
I will, because of the large number of accented chars in my db, but it would be an interesting exercise.

kevync has reacted to this post.

#3 · 2024-06-05, 10:08 pm

I would probably go with 2% - not impossible, but highly improbable. I do think the existing RMNOCASE is a hindrance to certain needed enhancements to RM. I would honestly prefer just getting rid of it if it cannot be replaced by something better. But I'm not exactly sure what that something better would be.

I don't remember Bruce Buzbee ever saying anything about any future plans for multilingual support for RM, but Michael Booth did say on one of the RM forums at least once that there were future plans for multilingual support. I think the problem is extremely difficult if multilingual support is to include multilingual sorting. And I think multilingual sorting is necessary in things like sorted lists of people such as People List View or name indexes in reports and is also necessary for searching. There are lots of other examples of places in RM where multilingual sort is needed if there is to be true multilingual support.

I don't understand all the issues associated with multilingual sorting well enough to comment much further. I do know that UNICODE does not inherently handle multilingual sorting. Rather, multilingual sorting of UNICODE has to be handled by the application writ large. Which is to say, I suppose it could be handled at the application database level such as by collations within SQLIte or handled at the actual application level such as by code in RM.

kevync has reacted to this post.

#4 · 2024-06-06, 6:16 am

I'm thinking of removing RMNOCASE from a test database. Just recreate all of the tables using NOCASE instead of RMNOCASE. I think that one can't just alter the table, but would have to do a data move within the database to get the desired result.

interesting I said the same thing to a friend about his struggles although basically going to move to a separate sqlite database then copy data back to RM database. Curious if RM would notice RMNOCASE missing but I am guessing at some point it likely would if not immediately.

#5 · 2024-06-06, 6:23 am

And I think multilingual sorting is necessary in things like sorted lists of people such as People List View or name indexes in reports and is also necessary for searching. There are lots of other examples of places in RM where multilingual sort is needed if there is to be true multilingual support.

I guess since most other programs are not "open" (Database) then that is why it does not matter. I get the need for multilingual but do not appreciate the challenges thereof. I have a friend that has been very challenged with his C apps. So far, I have not done anything too extensive with adding info that is impacted by RMNOCASE so I have been okay, and in part because I have avoided

#6 · 2024-06-06, 8:14 am

Quote from Richard Otter on 2024-06-05, 9:32 pm
...
I'm thinking of removing RMNOCASE from a test database. Just recreate all of the tables using NOCASE instead of RMNOCASE. I think that one can't just alter the table, but would have to do a data move within the database to get the desired result.
The next question is whether RM will notice.
...

I did that many years ago before getting a fake RMNOCASE going. Created a new empty database using the RM DDL with NOCASE in place of RMNOCASE and copied all the records over from the RM database. That let me freely operate with sqlite having no objection. I don't recall that the RM app had any issue but my data was devoid of accents and other alphabets.

Richard Otter and kevync have reacted to this post.

#7 · 2024-06-06, 4:22 pm

Looking at https://www.sqlite.org/draft/datatype3.html

7.1 & 7.2

Collating functions only matter when comparing string values. Numeric values are always compared numerically, and BLOBs are always compared byte-by-byte using memcmp().

RMNOCASE would seem to use a Proprietary collating sequence

Maybe someone knows more or has more experience.

#8 · 2024-06-06, 4:49 pm

"RMNOCASE would seem to use a Proprietary collating sequence"

It's not so much that RMNOCASE uses a proprietary collating sequence as that it actually is a proprietary collating sequence. I haven't yet found the source code for the fake RMNOCASE we can use with SQLite scripts. But I know roughly the concept of how the real RMNOCASE works.

First of all, it collates the English alphabet in the same manner as does the standard NOCASE works. Namely, it collates a the same as A, it collates b the same as B, etc. Even in UNICODE, this can be done in the same manner as it is done in ASCII/ANSI. Namely, you make a temporary copy of the strings to be compared. Then in the temporary copy, you convert all lower case English letters to the equivalent upper case English letters, leaving all other characters unchanged. This can be done by flipping a single bit or by using a translate table. Then you compare the temporary strings.

RMNOCASE adds to this process the conversion of certain non-English characters in the temporary strings to certain English letters. For example, it converts the characters Å and å both to A, it converts the characters Ü and ü both to U, and it converts the characters Ñ and ñ both to N. This sometimes has the effect of making names in languages that use those characters sort incorrectly for those languages.

I don't know or understand every aspect of how RMNOCASE does this. For example, I really don't know how it handles accents in French because French doesn't treat an accented letter as a new letter. But in any case, after the temporary strings are converted, it is the temporary strings that are compared. The temporary strings are never stored in the RM database nor are they ever displayed in the RM user interface. They are only used for the comparisons.

kevync has reacted to this post.

#9 · 2024-06-06, 5:01 pm

Look and you shall see. Edited: it didn't take my link, so let's try again.

https://github.com/mooredan/unifuzz/blob/master/unifuzz.c.

kevync has reacted to this post.

#10 · 2024-06-06, 5:18 pm

I understand the concept of bit flipping and it used for other things like math so that is interesting. I wonder if something link all the diff char sets are 128 characters apart then it would be relatively simple. Thanks for your great insight and detailed response as always.

Kevin