Contents
They say a picture is worth a thousand words. This page compares pictures of the #RM9 and Heredis 2023 databases (both SQLite 3) as automatically created by the same tool.
Entity Relationship Diagrams
An Entity-relationship Diagram (ERD) says a lot about a database. @patjones recently posted a RM9 Database Diagram with many of the linkages or relationships among its tables that are made on-the-fly by the SQL queries from the RM software or the ones found on this site. What a database Entity Relationship Diagram editor generates from a database’s self-definition includes the tables, fields and types, stored Triggers (actions on 1 or more tables by some database event) and stored Views (stored queries). So it is interesting to compare the automatically generated ERDs of the RM9 database against that of the Heredis 2023, from the same editor, DBeaver.
RootsMagic 9.0 Diagram
Heredis 2023 Diagram
Comparison
Object | RootsMagic 9.0 | Heredis 2023 |
Tables | 29 | 65 |
Views | 0 | 0 |
Indexes | 46 | 123 |
Table Triggers | 0 | 156 |
The comparison table and diagram show that the Heredis database is considerably more complex than RootsMagic 9 with more than double the number of tables and nearly triple the number of indexes which accelerate the association of rows across tables and speed up searches. And more divergent is the extensive use of Triggers by H and their complete absence in RM. What do these numbers say about the two databases?
Tables
That there are more tables in H than RM is from a couple of causes:
- data normalisation: I think H is more rigorous in applying the principal, e.g., it has a separate table for the Alternate Name while RM has but one for all types of names and uses a field to flag which is the Primary record for a given person. Moreover, H has names in multiple tables: the Primary appears to be in the equivalent of RM’s PersonTable (Individus) while there is another for Surnames (Noms) and one for Given Names (Prenoms). Similarly, RM’s MediaLinkTable which relates a media item to a Person, Couple, Event, Source, Citation, Place… (according to an OwnerType value) is achieved in H by separate tables for each ‘OwnerType’.
- a larger feature set? Hard to say. I’m not using H sufficiently to identify the differences and my understanding of table and field names in French is not thorough.
Views
Neither has stored views (stored queries) so that tells us nothing. I’ve used temporary Views on RM databases, lasting until the database is closed by the sqlite manager. It’s possible that both applications do, too, but we cannot see them from outside the app’s sqlite engine.
Indexes
With double the number of tables, it stands to reason that there could be double the number of indexes. I don’t know why there should be triple but, in the past, I’ve suggested an index be added or revised in RM. SQLite is itself very good at optimising queries and may spawn an index that it thinks is appropriate unless there is a pre-defined one that includes all the key fields of the query. Maybe H developers have attempted to cover all the possible bases with explicit indexes.
What I have yet to find out is if there is any custom collation sequence in H as there is in RM which has given us database explorers so much grief.
Triggers
This is the most pronounced difference between RM and H. H has many routine operations embedded in the database, being executed by the database engine, rather than in the higher level code which then has to talk to the database engine to make the same thing happen. For example, H has a trigger named “MajDateMediasEvenementInsert” which automatically updates the ‘Date modified’ value in its ‘EventTable’ when a link has been added to their ‘MediaLinkTable’. Looks like this:
CREATE TRIGGER "MajDateMediasEvenementInsert"
AFTER INSERT ON LiensMediaEvenement
FOR EACH ROW
BEGIN
UPDATE Evenements SET DateModification=MAX(NEW.DateModification, Evenements.DateModification) WHERE Evenements.CodeID=NEW.XrefEvenement;
END;
Maybe not the best example for comparison because I don’t think RM would update the EventTable when a media item is tagged to an event.
DBeaver
Having searched around for a (free) application that could create an Entity Relationship Diagram (ERD) from a SQLite 3 database, I landed on one that might be described as an Integrated Development Environment (IDE) for databases, the DBeaver Community Edition. It is self-described as a “Universal Database Tool” and I’ve only superficially scratched its surface figuring out how to connect a SQLite database, generate the ERD from its definition and edit it to illustrate relationships. But it is much more powerful than that as it looks to be a multi-database manager with which one can query the database data. And there is a Pro version which adds many more capabilities.
Triggers are not commonly used in transactional databases where there are lots of changes happening as they can slow down processes considerably. The same goes for fixed relationships which force table inserts & edits to be done in a particular order and require all linked data to be entered, which isn’t always desirable or possible with software such as RootsMagic where users want to enter data in various orders and not be forced to enter particular data.
Thanks for your observations, Pat. What I note about many of the triggers among the few I’ve looked at is that they are doing what I might call ‘housekeeping’ such as adding or updating the two date fields found in most every table: Creation and Modified. And deleting a record in a link table, such as the MediaLinkTable in RM when either the MediaTable record is deleted or the Owner record is deleted. RM has been remiss in leaving such things behind which is why the Delete Phantoms tool was added (incomplete as it was and probably still is); sometimes they might cause a problem. I don’t have your experience with transactional databases but it seems to me a good idea for performance and integrity to build as much as one can into the database engine to process rather than out into the application platform.
I’m unsure how DBeaver got the relationships or whatever is meant by the lines joining the tables – are they derived from the Triggers? I don’t recall any specific SQL or PRAGMAS that define relationships among tables as a step in building a database structure in SQLite. As far as I can tell, the popup label for each of the dozen lines terminating at the Notes table on the extreme right just identify the two linked tables, e.g. “FK_Repositories_Notes [Foreign Key]”. If I click on Notes.CodeID, all of those lines highlight in green along with the fields in the linked tables that relate to it. But they don’t seem to correlate with the few Triggers I’ve looked at.
AHA! I found it. Just reflects how I’ve still only scratched the surface of SQLite. In the Repositories table definition, we see:
XrefNote integer UNIQUE DEFAULT NULL REFERENCES Notes(CodeID) ON DELETE SET NULL
I’ve learned something more from playing with DBeaver’s ERD!
I have looked at the Heredis schema with great interest. It reminds me of a couple of times I have looked at Gramps.
The first time I looked at Gramps it was version 1.0. The functionality was very limited and the database was not SQLite. The second time I looked at Gramps, it was version 5.0. The function was still very limited as compared to something like RootsMagic but it was much better than was version 1.0. But of more interest, by the time Gramps had gotten to version 5.0, it was able to run on two different databases, one of which was SQLite.
So I created a little Gramps database that was based on SQLite and then looked at the SQLite scheme. It was very easy to figure out. It had many fewer tables than either RootsMagic or Heredis, and the tables that did exist had many fewer columns than the equivalent tables in either RootsMagic or Heredis. The point being, the database schema tells you a lot about the capabilities of the app – both for better and for worse.
I agree with your closing comment. My sense of what I’ve seen so far with Heredis is that it is a more rigorously defined database than is RM with measures within the database that ensure referential integrity and with a higher degree of data normalisation. That seems to me to be a good thing but may require more complex queries to get the same result set. For example, all Note data for Persons, Events, Citations, Places et al reside in a common set of 7 tables with names ranging from Notes to NotesFullText_stat (whatever that means!).
Another comment from a database developer perspective is that multiple indexes slow down data saving as each index has to be updated for each record saved. So much so that when bulk data loading it is normal to disable all but the clustered index (the main index) and rebuild them after loading. It is a trade off depending on the database use – is it primarily input or primarily lookup or a balancing act in between?
Also the development software used can dictate a lot of the database structure – with a lot done in the intermediate layer between database and user interface, there is no need for multiple foreign keys. I’ve rarely implemented foreign key relationships in databases for these various reasons – would you want to be forced to put in a place every time you enter a fact? – a foreign key relationship would force that by preventing saving until the place data was saved first.
But I do like a nice database diagram with relationships added in!