Why are BirthYear and DeathYear in NameTable?

#1 · 2022-08-07, 5:09 pm

Why does NameTable have BirthYear and DeathYear attributes since they doesn't have anything to do with person's names? And even if a person has several names (say, several different given name variations), the BirthYear and DeathYear are identical for every name.

And the rows selected by the query

SELECT OwnerID, count(*) as total_names from NameTable group by OwnerID having total_names > 1 order by OwnerID;

are the same as when you group by OwnerID, BirthYear, DeathYear:

SELECT OwnerID, BirthYear,DeathYear, count(*) as tuple_total from NameTable group by OwnerID, BirthYear, DeathYear having tuple_total > 1

So joining these queries and selecting results from the join where total_names != tuple_total never returns anything:

select r1.OwnerID, total_names, r2.BirthYear, r2.DeathYear, tuple_total from (SELECT OwnerID, count(*) as total_names from NameTable group by OwnerID having total_names > 1) as r1
join
(SELECT OwnerID, BirthYear,DeathYear, count(*) as tuple_total from NameTable group by OwnerID, BirthYear, DeathYear having tuple_total > 1) as r2
on r1.OwnerID=r2.OwnerID
WHERE total_names!=tuple_total;

#2 · 2022-08-07, 10:01 pm

I believe that this is a structural holdover from the era of slower computers and different database engines intended to speed up the display of people index and other output functions where the Primary name and year of Birth and/or Death are required. That saves having to correlate the name from the NameTable with the much larger EventTable and finding the related Birth/Death events therein and extracting the year from the encoded date field, every time such output is needed. Thus those two fields in the NameTable are intermediate registers between the EventTable and outputs requiring name and YoB/YoD.

The Database Tools > RebuildIndexes function repopulates those fields en masse (along with launching the SQLite VACUUM function). They can get out of sync with the EventTable due to certain editing sequences not treated by the program as a trigger for an update.

I don't think it should be necessary these days to have this redundant data. Contemporary computer platforms for RootsMagic 4+ should be fast enough to get the results of a query of the EventTable for the YoB/YoD in the blink of an eye. That would eliminate the risk of a mismatch between the primary B/D event and the display of the People index et al.

#3 · 2022-08-08, 3:20 pm

Thanks a lot!

#4 · 2022-09-07, 12:56 am

It's called denormalization of data.
The whole idea of relational database design is to store a piece of data one time.
As Tom explains, sometime sacrifices need to be made to made a concept practical.

#5 · 2023-03-16, 8:39 pm

I think there are some subtle differences between the birth and death information in the NameTable vs. the EventTable.

In the NameTable, it's birth year and death year. These are always exact numbers. In the EventTable, it's Birth Date and Death Date. these can be exact dates, and they can also be Before dates and After dates and About dates and Estimated dates and date ranges.

Each person only has one Birth year and one Death year in the Name table. A person can have multiple Birth facts in the EventTable and multiple Death facts in the EventTable.

The Birth and Death facts in the EventTable can be private or not, primary or not, and they can include proof indicators. The Birth year and Death year in the Name table include none of that additional data.

That being said, I feel certain that the driving force for including the Birth year and Death year in the NameTable was the ease of programming the sidebar Index of names. Otherwise, the sidebar Index of names would have had to deal with all the logic of multiple Birth and Death facts that might or might not have a variety of kinds of dates and that might or might not have a variety of kinds of private, primary, and proof indicators.

Whether all that constitutes denormalization or not, I am not 100% sure. The Birth year and Death year can certainly be derived from the EventTable via come calculations. But the calculations would be very convoluted and complex.

#6 · 2023-03-16, 11:06 pm

I haven't checked but I would guess that the birth and death facts marked primary would be the ones whose dates are copied to the names table.

And, I believe that the term denormalized is exactly applicable in this case. One piece of data is completely derived from another. As I sad, you gotta do what you gotta do.
However, I'll be this has been the design for many years, maybe before the switch to SQLite. I wonder whether the addition of an index that includes the primary flag would eliminate the need for these fields in the names table. There are certainly plenty of questions in the forums about out-of-sync data. The experiment should be run by RM devs.

#7 · 2023-03-16, 11:41 pm

The Year dates get populated from other fields in the absence of Birth and Death, i.e., Christen and Burial, respectively.

Indeed, the redundant Years date from Family Origins in the '90s when computers were much slower. There would still be some adverse impact from normalising using current hardware but the tolerable size of the database would be very much greater.

RootsMagic's Rebuild Indexes was and maybe still is a bit of a slogger. See Rebuild Indexes and Update Birth and Death Years from over a decade ago!

Your mention of indexing on the IsPrimary flag recalls that issue back then. There was a problem with one or some indexes that slowed performance which I got around with in sqlite by doing some operation on IsPrimary such as WHERE abs(IsPrimary)... There was a big improvement in speed across many different queries.