WebTrees Website – Pre-Processing Tools using SQLite in Visual Basic et al #webtrees #visualbasic

The following paragraphs are extracted from private messages from Tom, davistom1, who kindly agreed to their being brought onto a page for wider dissemination. He talks about the relationship between his RM4 database and media collection and two websites he operates, one a webtrees, the other GenWeb, both worth a visit. He developed some procedures using SQLite and Visual Basic that he runs against his RM database prior to exporting a GEDCOM for upload to the sites. Tom has made a deployable version of his RMSanity application along with its source code and it’s available for download on this page. (page created by another Tom, ve3meo).


I long ago developed a SQLITE based preprocessing program which reads the RM4 database (or RM5 for that matter) and does a whole bunch of sanity tests, including locating any duplicate or unused places, place details, media files, etc. which I can then resolve directly in RM4 (or RM5). I do the preprocessing/cleanup before export to GEDCOM in preparation for importing the latter into a webtrees database for a public website which I operate.

The computer based parts of my genealogy activity center around a desktop RM4 database (I haven’t actually switched to RM5 yet because of some of the headaches associated with the media handling re-design) and two websites which I operate. One website is my own webtrees based family history portal located at http://fotgp.com and the other is the Northampton County, North Carolina GenWeb site located at http://ncgenweb.us/northampton, for which I am site coordinator. All of these activities share parts of a large, currently ~11Gb, media library. The media library file tree is organized with top level directories representing some 75 repositories, mostly public record repositories, with sub tree branches mimicking the repository filing system but with some ad hoc repository branches also. My RM4 database currently links into about 7500 items in this media library. I have a whole bunch of homebrew code used to manage this mess, and a good bit of it directly queries the RM4 SQLite database for an array of sanity testing and data transformation between the various end uses.

My webtrees site is updated frequently from the RM4 database by first sanity testing it, manually tracking down and correcting discrepancies in it, backing it up, exporting it to GEDCOM, processing the GEDCOM through a bunch of transformation steps and finally importing the transformed GEDCOM into the site database. I do not routinely edit the webtrees database via its web interface, though that is what webtrees and its predecessor, phpGedView, were designed to do. My personal genealogy data baseline is my RM4 desktop database.

The connection with the GenWeb site is primarily the latter’s Cemeteries and Burials, Photo Albums and Places portals, the homebrew MySQL database which these portals employ, their links into the previously described media library and the homebrew code used to update the MySQL database. The only link between these and RM4 SQLite is that they share the same master media library, so no more about them here.

I get into RM SQLite databases two ways: (1) for some purposes I use the SQLite Developer app but mostly, (2) I use some homebrew code with the System.Data.SQLite ADO.NET provider library. I use SQLite Developer mainly in developing and debugging queries. Trying to distill the logic of my various SQLite queries to something which is both (1) compatible with the focus of your wiki and (2) presentable is a significant undertaking. Let me elaborate a bit and then we can explore whether it would be useful.

The RM4 database sanity test noted above is illustrative. I have written a desktop app which uses the aforementioned System.Data.SQLite ADO.NET provider to test for and report discrepancies in the following: (1) broken media links, (2) non-privatized SSNs, (3) orphaned place details, (4) un-used place details, (5) un-used places, (6) un-used media items and (7) duplicate media items. The app also requires two SQLite extensions, a function which tests for existence of a media library file denoted by its path and a NOCASE collation substitute for the infamous RMNOCASE. Incidentally, I don’t ever write to the RM4 database outside the RM4 app.

SQLite Queries in RMSanity

Following are examples of the hard coded SQLite queries used for some of these tests:

-- Broken media links
SELECT m, MediaFileExists(m) AS e
FROM (
      SELECT mediapath || mediafile AS m
      FROM MultimediaTable
      ) AS mm
WHERE e=0;
 
-- Non-Privatized SSNs
SELECT et.OwnerID
FROM EventTable AS et
JOIN FactTypeTable AS ft
ON et.EventType=ft.FactTypeID
WHERE ft.Name='Soc Sec No'
AND NOT et.IsPrivate=1;
 
-- Orphaned place details
SELECT pd.*, p.Name
FROM PlaceTable AS pd
LEFT OUTER JOIN PlaceTable AS p
ON pd.MasterID=p.PlaceID
WHERE pd.PlaceType=2
AND p.PlaceID ISNULL;
 
-- Un-Used media items
SELECT MediaPath || MediaFile AS media
FROM MultimediaTable AS mm
LEFT OUTER JOIN MediaLinkTable AS ml
ON mm.MediaID = ml.MediaID
WHERE ml.MediaID ISNULL;
 
-- Duplicate media items
SELECT *
FROM (
      SELECT COUNT(MediaID) AS c, MediaPath, MediaFile
      FROM MultimediaTable
      GROUP BY MediaPath, MediaFile
      )
WHERE c > 1;

SQLite Extensions in RMSanity

VB versions of the two extensions are:

<SQLiteFunction(Name:="RMNOCASE", FuncType:=FunctionType.Collation)> _
 
Friend Class NoCaseCollation
 
Inherits SQLiteFunction
 
Public Overrides Function Compare(ByVal param1 As String, ByVal param2 As String) As Integer
 
Return String.Compare(param1, param2, True)
 
End Function
 
End Class

and:

<SQLiteFunction(Name:="MediaFileExists", FuncType:=FunctionType.Scalar)> _
 
Friend Class MediaFileExists
 
Inherits SQLiteFunction
 
Public Overrides Function Invoke(ByVal args() As Object) As Object
 
Return My.Computer.FileSystem.FileExists(args(0))
 
End Function
 
End Class

Usage of RMSanity

When successfully installed, the RMSanity app will be found in the Start Menu. Starts quickly, with a blank screen Windows Console followed shortly by an Open File dialog filtered to show RootsMagic (*.rmgc) database files. Open any database and then an Open File dialog opens to name and locate the RMSanity log file. On completing that step, RMSanity proceeds to run its seven tests:

  1. Broken media links
  2. Non-privatized SSNs
  3. Orphaned place details
  4. Unused media items
  5. Unused place details
  6. Unused places
  7. Duplicate media items

If all tests pass, the console window screen looks similar to this:

RMSanityOK.PNG
RMSanity: all tests passed.

As written, the application stops testing with the first test failed:
RMSanityFail.PNG
Each failed test then lists the information with which you can find and fix the problem using RootsMagic. In the above example, the individual with RIN 120 (I120) has a SSN fact that has not been privatized and so would be exported in the GEDCOM possibly to a website. Clearing all reported problems is Tom’s objective before exporting to his webtrees website. The log file contains exactly what you see on the screen so that you can refer to it as you work through the problems.

To close the application, DO NOT USE the windows X if you do not wish to be alarmed by an error message. Type ‘exit’ into the RMSanity screen and press Enter.

Download and Installation

Caveat Emptor

The first thing Tom says is that he developed RMSanity for his own use and purposes. If it is of use to anyone else, great, but he cannot support it. Use at your own risk! Risk is negligible because it is a read-only application that does not modify the database.

That said, this Tom thought it would be of interest to some and perhaps stimulate some others into further development of utilities in support of our use of RootsMagic. That RMSanity is in Visual Basic while the intent has been to develop the Bundled Utilities in Visual C# ought not be a deterrent – concepts are transferable if code is not. Those more comfortable with Visual Basic may find that the RMSanity source code is a great place to start.

Prerequisites

RMSanity uses the version of SQLite that requires Windows .NET 4 Framework (not just the Client). Install .NET 4 Framework.

Download

RMSanity.zip

Installation

Extract all the files under the Publish folder to a temporary folder. Find and click on setup.exe in the temporary folder. The installation will add a RMSanity folder to your start menu with a link to rmsanity.exe.

Won’t run?

Probably the same problem and solution as described for RMtrix.

Source code

Open Module1.vb to view the Visual Basic source code.

Discussions & comments from Wikispaces site


LessTX

Simple Understanding

LessTX
13 February 2012 23:26:32

I’m going to try to translate that page into what a typical end user might understand, tell me where I’m wrong:

This program checks your database for inconsistencies and unused items, and then alerts you to things you want to fix.

So, to compare it to the TMG Utility (the only 3rd party genealogy utility with which I am familiar), it would have 7 items on the list
(Broken media links, Non-privatized SSNs, Orphaned place details, Unused media items, Unused place details, Unused places, Duplicate media items) and be the “log only” option. It doesn’t actually change the database in any way.

These 7 things seem like very good items to have on the wish list for the utility program, but there would need to be the step of “fixing” those things within the utility.

Excellent stuff!


ve3meo

ve3meo
14 February 2012 02:29:47

That’s it in a nutshell, Less.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.