Contents
- 1Initial SortDate Algorithm
- 1.1Download
- 2Advanced Date Algorithms by R. Steven Turley
- 2.1Algorithms
- 2.2Downloads
- 2.3Discussions & comments from Wikispaces site
- 2.3.1Extended modifier codes and revised algorithm
- 2.3.2Comment: "If the date is missing, the value 0xffffffff (2^65-1) is stored"
- 2.3.3Comment: flags "27 from 30 since 31 after"
- 2.3.4Comment: "Ds = If no Date then 9223372036854775807 else ((Y1 + 10000)
- 2.4Inline comments
- 2.4.1Comment: second year Y2 bits
Initial SortDate Algorithm
The SortDate column in the EventTable contains very large numbers that have defied until now any easy method of decoding. After many experiments with date entry and inspection of the stored values, I now have deduced an algorithm that works fine for encoding single, pure dates and decoding corresponding SortDates. Date modifiers such as Bef, Aft, From-To, etc. have effects that I do not fully understand but there are patterns emerging which can be more completely identified only with more comprehensive testing.
My current model of the RM4 SortDate encoding algorithm begins with a base year of 10000BC that has an already pretty large number as a constant, C. From the date, the year number is extracted, added to 10000 and the result multiplied by a very large coefficient, Ky. Added to that is the month number multiplied by the month coefficient, Km, plus the day number multiplied by the day coefficient, Kd. Algebraically, the encoding expression looks like this:
SortDate (Ds) = Ky*(Y + 10000) + Km*M + Kd*D + C, where Ky = 562949953421312 (2^49) Km = 35184372088832 (2^45) Kd = 549755813888 (2^39) C = 17178820620 (2^33.9999119432032) Y = yyyy M = mm D = dd
The decoding expressions are the inverse:
Y = (Ds - C) / Ky - 10000 M = (Ds - C) % Ky / Km D = (Ds - C) % Ky % Km / Kd R = (Ds - C) % Ky % Km % Kd where Ds = SortDate R = remainder % = modulus operator
The remainder, R, is significant for the decoding of modifiers while M=15, and D=63 are results when modifiers are used, as can be seen in the filtered result set below from a RM4 database, using the current decoder:
Date SortDate Year Month Day Remainder Single Dates D.+19470000..+00000000.. 6725563110703235084 1947 0 0 0 D.+19470100..+00000000.. 6725598295075323916 1947 1 0 0 D.+19470101..+00000000.. 6725598844831137804 1947 1 1 0 D.+19470102..+00000000.. 6725599394586951692 1947 1 2 0 D.+18501214..+00000000.. 6671386874267828236 1850 12 14 0 Modified Dates After DA+18280000..+00000000.. 6659134466444753951 1828 15 63 1047571 DA+18710000..+00000000.. 6683341314441870367 1871 15 63 1047571 Range DR+18100000..+18120000.. 6648438962291474447 1809 15 63 544962772995 DR+19160000..+19180000.. 6708111657465282575 1915 15 63 545073922051 DS+18980000..+19500000.. 6697978558337253394 1897 15 63 545107476486 Before DB+18250000..+00000000.. 6656883216385835008 1824 15 63 549755813876 DB+19130000..+00000000.. 6706422812286910464 1912 15 63 549755813876
It appears that the Before and After modifiers result in remainders that are constants, independent of the date, while the Range modifiers result in varying remainders. Thus inspection of the remainder and testing for known constants should result in ascertaining the type of modifier that was used.
Download
SortDateEncodeDecode.sql
Contains both the encoder query and the decoder query. Separate as required. Most GUI SQLite managers allow you to select the statements you want to execute. This script is deprecated; see below for more complete solutions.
Advanced Date Algorithms by R. Steven Turley
Here are the algorithms for decoding RM5 dates. These may be the same as RM4, but I haven’t tested them (ve3meo Dec 31, 2012 tested in RM6 and okay). They work for all normal dates, but I haven’t tested them on old dates or Quaker dates. I’m hoping those will just translate to their Julian equivalents and work okay. I’ll report those tests later. These dates are best understood in terms of bit fields on a long (64 bit) integer. Bit 0 is the least significant bit.
bits | value |
0-9 | flag indicating date modifiers (see table below) |
10-15 | day for the second date; 0 if no second date |
16-19 | month for the second date (1=Jan, 2=Feb, etc), 0 if no second date |
20-33 | year for the second date+10000; 0x3fff if no second date |
34-38 | these were always zero in my tests, must be reserved for something else |
39-44 | day for the first date; 0 if no day is supplied |
45-48 | month for the first date; 0 if no month is supplied |
49-64 | year for the first date + 10000 |
If the date is missing, the value 0xffffffff (2^65-1) is stored. This makes an event with no date, the last date in the list as far as the sort order is concerned.
The flag fields tell what kind of modifiers the date has. Certainty modifiers are not included in SortDates since they don’t affect the sort order. Here are the possible flag values:
flag value | meaning |
0 | before |
3 | by |
6 | to |
9 | until |
12 | normal date (no modifier) |
15 | between/and |
18 | from/to |
21 | dash (25 Oct 2011-28 Oct 2011) |
24 | or |
27 | from |
30 | since |
31 | after |
Note that the value of these flags determines the sort order for dates as listed above. Since dates that don’t have ranges store their year as 0x3fff, they will always come after dates with ranges that have a year less than this.
Algorithms
For almost all compilers, shift and bit-wise mask operators are faster than the division and remainder operations outlines above. If your system supports it, I recommend the following algorithms to extract the dates and flag from sort dates. In these formulas i>>j is the operation of shifting the number i j bits to the right. The operation i<<j shifts the number i j bits to the left. The operation i&j performs a bitwise and of the two numbers. This is the notation used in C, C++, C#, and python. I’m not sure about other languages. Let Ds be the sort date, as above, Y1 be the year of the first date, M1 be the month of the first date, D1 be the day of the first date, Y2 be the year of the second date, M2 be the month of the second date, D2 by the day of the second date, and F by the flag.
Ds = ((Y1 + 10000)<<49) & (M1<<45) & (D1<<39) & (Y2<<20) & (M2<<16) & (D2<<10) & F ??
Ds = If no Date then
9223372036854775807
else
((Y1 + 10000)<<49) + (M1<<45) + (D1<<39) + (If Date2 then (Y2 + 10000)<<20 else
17178820608)
+ (M2<<16) + (D2<<10) + F
Y1 = (Ds>>49) – 10000
M1 = (Ds>>45) & 0xf
D1 = (Ds>>39) & 0x3f
Y2 = (Ds>>20) & 0x3fff – 10000
M2 = (Ds>>16) & 0xf
D2 = (Ds>>10) & 0x3f
F = Ds & 0x3ff
Downloads
SortDateDecodeDev.sql an example decoder using Steve’s algorithm
SortDateEncoderDev2.sql an encoding example using the modified version of Steve’s algorithm; streamlined, faster and more readable code than SortDateEncoderDev.sql.
SortDates (2013-01-01).rmgbRM6 database with samples of most date formats for testing with above.
ve3meo
04 November 2012 02:40:45
Steve, I thought your statement about the comparative speed of division versus bit-shifting operations was right on and would have a big difference. Much to my surprise, it works out to be only about 20-25% faster on a very large record set (over 317,000). On my computer, the algorithm executed as division and modulus took around 2s while the same using bit-shifting took around 1.6s with SQLiteSpy.
/* CompareSortDateDecoderSpeeds */
SELECT Date,
(SortDate – 17178820620) / 562949953421312 – 10000 AS Year,
(SortDate – 17178820620) % 562949953421312 / 35184372088832 AS Month,
(SortDate – 17178820620) % 562949953421312 % 35184372088832 / 549755813888 AS Day,
(SortDate – 17178820620) % 562949953421312 % 35184372088832 % 549755813888 AS Remainder
FROM EventTable
;
SELECT Date,
(Ds >> 49) – 10000 AS Y1,
(Ds >> 45) & 15 AS M1,
(Ds >> 39) & 63 AS D1,
Ds & 1023 AS F
FROM (SELECT Date, SortDate AS Ds FROM EventTable)
;