The Encoding of SortDates such as 1840-1 and 1840-2
Quote from thejerrybryan on 2025-12-24, 2:17 pmI have been trying to understand RM's encoding of SortDates such as 1840-1 and 1840-2 so that possibly I could improve my draft script to handle Jaime's problem with Burial facts that have year only dates. It turns out that the answer is already there at RM SortDates , although for me at least I had to work out an additional nuance.
The first basic idea is that EventTable.Date always provides room for two dates - a begin date and an end date. For most events most of the time, there is only one date - namely the date, and the date is stored in the begin date area. When there is only one date, the end date is left "blank" (it's not really blanks). And EventTable.SortDate echos this structure in that there are always two dates encoded in a SortDate.
EventTable.Date is a a character field and the various characters are pretty easy to figure out. EventTable.SortDate is an integer field and the various data elements are encoded as bit strings that have to be extracted by bit twiddling - shifting bits and AND-ing and OR-ing bits - that sort of thing. They are harder to figure out, but it has already been done.
When there is an end date, you could have date ranges such as 1840-1845 and 1840-1846. Because both dates are encoded into a single integer field in EventTable.SortDate and since the bits for the begin date are to the left of the bits for the end date, 1840-1845 sorts ahead of 1840-1846. The 1840 is a tie in the sort, and the 1845 vs. 1846 becomes the tie breaker. But it's not multiple compares one after the other. It's just one compare of two 64 bit integers.
When there there is no end date, the fact that there is no end date is encoded in the SortDate fields as 14 consecutive 1 bits - 0x3fff in hex. Therefore, a SortDate of just 1840 will always sort after a SortDate of 1840-1845 or 1840-1846.
That brings us to sort dates such as 1840-1 and 1840-2. How are they encoded? The nuance that I didn't understand is that RM has no special code to handle this case. 1840-1 is just a range of two dates - begin = 1840 and end = 1. On its face, a range going from year 1840 to year 1 is totally illogical, as is range going from year 1840 to year 2. But it doesn't matter how logical or illogical it is. All that matters is that the compare works and that 1840-1 and 1840-2 sort in the correct order as sort dates. And both 1840-1 and 1840-2 sort as sort dates before just plain 1840. Also, you wouldn't enter 1840-1 or 1840-2 into RM's Date field. You would only enter such "illogical" dates into RM's SortDate field.
Another little nuance is that the largest year that the RM UI supports is 4999, although the 0x3fff value for the bits of the SortDate is larger than 4999. If you try to enter a dash date bigger than 4999 then RM just ignores it. I can't imagine anybody entering dash dates more than about -10 or -15, so the 4999 limit is no problem at all.
I have been trying to understand RM's encoding of SortDates such as 1840-1 and 1840-2 so that possibly I could improve my draft script to handle Jaime's problem with Burial facts that have year only dates. It turns out that the answer is already there at RM SortDates , although for me at least I had to work out an additional nuance.
The first basic idea is that EventTable.Date always provides room for two dates - a begin date and an end date. For most events most of the time, there is only one date - namely the date, and the date is stored in the begin date area. When there is only one date, the end date is left "blank" (it's not really blanks). And EventTable.SortDate echos this structure in that there are always two dates encoded in a SortDate.
EventTable.Date is a a character field and the various characters are pretty easy to figure out. EventTable.SortDate is an integer field and the various data elements are encoded as bit strings that have to be extracted by bit twiddling - shifting bits and AND-ing and OR-ing bits - that sort of thing. They are harder to figure out, but it has already been done.
When there is an end date, you could have date ranges such as 1840-1845 and 1840-1846. Because both dates are encoded into a single integer field in EventTable.SortDate and since the bits for the begin date are to the left of the bits for the end date, 1840-1845 sorts ahead of 1840-1846. The 1840 is a tie in the sort, and the 1845 vs. 1846 becomes the tie breaker. But it's not multiple compares one after the other. It's just one compare of two 64 bit integers.
When there there is no end date, the fact that there is no end date is encoded in the SortDate fields as 14 consecutive 1 bits - 0x3fff in hex. Therefore, a SortDate of just 1840 will always sort after a SortDate of 1840-1845 or 1840-1846.
That brings us to sort dates such as 1840-1 and 1840-2. How are they encoded? The nuance that I didn't understand is that RM has no special code to handle this case. 1840-1 is just a range of two dates - begin = 1840 and end = 1. On its face, a range going from year 1840 to year 1 is totally illogical, as is range going from year 1840 to year 2. But it doesn't matter how logical or illogical it is. All that matters is that the compare works and that 1840-1 and 1840-2 sort in the correct order as sort dates. And both 1840-1 and 1840-2 sort as sort dates before just plain 1840. Also, you wouldn't enter 1840-1 or 1840-2 into RM's Date field. You would only enter such "illogical" dates into RM's SortDate field.
Another little nuance is that the largest year that the RM UI supports is 4999, although the 0x3fff value for the bits of the SortDate is larger than 4999. If you try to enter a dash date bigger than 4999 then RM just ignores it. I can't imagine anybody entering dash dates more than about -10 or -15, so the 4999 limit is no problem at all.