Surname ALL-CAPS to Propercase #names

Introduction

This procedure responds to a call for help from a RootsMagic user who had received a 40,000 person GEDCOM file in which all the surnames were in capital letters and which he wanted to integrate with a database in which surnames are mixed case. How could the SURNAMES be converted to Surnames without editing in RM each of the 40k people or a Search and Replace for each unique name (numbering in the thousands)? I suggested a few ways (see Alternative Solutions below) but realised that a wholly SQLite solution working directly on the database might be do-able and be the best.

Update

2022-11-02: I have to admit that I undertook the challenge of doing this procedure without realising that NameClean had the ALL-CAPS conversion feature. I may have had no prior need to use it and merrily presumed that the requestor had hit a dead end trying to do it in RM7. While NameClean executes the updates much more slowly than mine does, that may be of no serious consequence for a one-time operation. And, to my surprise, NameClean does have some inventory of Scottish “Mac…” surnames which it defaults to, rather than upper-casing the following letter, regardless, i.e., MACRON (French) does not become MacRon but Macron, MACKINTOSH becomes Mackintosh whereas MACNEILL=>MacNeil, etc. “MC” is likely rule-based: MCCOY=>McCoy.

Development

The current release of SQLite 3 does not have a Titlecase or Propercase conversion function but at least one SQLite manager, SQLiteSpeed, has an internal extension with it. I tried it first with mixed results on a small set of test names (second column in the table below). The white cells have acceptable results, yellow unacceptable, peach ambiguous because the result might be acceptable for one person but not for another (cultural variation…).

Scouring the Internet, I turned up a very clever recursive query for Titlecase conversion that could work in any current-enough SQLite manager as it uses the standard functions. Tried it with SQLiteSpeed, SQLiteSpy and SQLite Expert Personal. The first two seem to have inconsistent support for Unicode, e.g., in the lower() or substr() functions while Expert provided the best results and execution was very fast.

Considering the procedure, I thought the conversion should only apply to names that are ALL CAPS and not to any others so as not to undo correct entries such as the green test names. That prompted the REGEXP filter which behaved differently on Spy and Expert and not at all on Speed but I eventually settled on one using standard functions common to all.

Further, there should be an intermediate step to inspect the converted names before applying them to the database NameTable to allow for deletions and corrections of undesired results that can only be found through inspection.

The end result is a SQL file that is not a single script intended to be executed in one go. Rather, it contains a series of short scripts enclosed as comments, each being a step to be executed in order. If you execute the whole file accidentally, there is no result because everything is commented. Each step executes very quickly but for the user interaction. The slowest execution is the actual update of the NameTable which still took only 35s for 35,000 records on a near-obsolete i5 Windows 10 laptop and hard-disk drive.

Procedure

Requirements

  1. SQLite Expert Personal or equivalent with full support for Unicode, run-time variables and loadable extensions
  2. RMNOCASE – faking it in SQLite Expert, command-line shell et al #rmnocase
  3. You’ve backed up your database or are working on a disposable version!
  4. Read and follow the instructions
  5. If you are on MacOS and can make it work with some SQLite tool, please let us know!

Download

29 Oct 2022

Usage

  1. Open and select the RM database in SQLIte Expert Personal and ensure the RMNOCASE extension is loaded. Any RM database from V4 to at least V8 should work.
  2. Load the SQL file in the Editor. Read through it to familiarise yourself with its comments, format…
  3. You will get no result other than a warning if you simply execute the file because the code is within comments. You must select (highlight) a block of code within each STEP and execute it, in sequence.
  4. STEP ONE: makes a copy of the Surnames and key field from NameTable in the permanent SurnamesOLD table. The last step removes this table from the database. All other tables created in the procedure are temporary, in memory only, and are lost when the database is closed.
  5. STEP TWO: makes temp.SurnameWORK table of unique UPPER CASE Surnames to be converted to Proper Case. Prompts for first letter on which to filter. Leave blank to get all.
    You will be repeatedly prompted in subsequent steps.
  6. STEP THREE: Converts Surnames from SurnamesWORK to Proper Names in the temp.ProperName table using the recursive query adapted for this purpose. The query lists the results for review but edits must be made in the Data view on the ProperName table. You can use its sorting and filtering tools to help with a large number of records.
    1. ADDENDUM 1: Creates temp.SpecialName table to adjust names beginning with Mac or Mc by capitalising the following letter. This code could be expanded for other special cases. The table can be reviewed and edited before being applied to the ProperName table in…
    2. ADDENDUM 2: Updates the ProperName table from the SpecialName table.
  7. STEP FOUR: Updates the NameTable from the ProperName table.
  8. STEP FIVE: Contains a number of queries with which to check results.
  9. STEP SIX: Leave no trace behind if you are finished – deletes the ProperName table from the database.
  10. N.B.: When you next open the database with RootsMagic, you must run the File > Database tools > Rebuild Indexes else it will fail the Integrity Check and may give some strange results in sorting and searching.

Alternative Solutions

  1. A higher level language such as perl has the needed Titlecase() function which could be used to process and return the result to a sqlite query from the platform.
  2. Spreadsheet and text editors may have the needed function: Google Sheets and Notepad++ both do. You could export a query of the NameID and Surname fields from NameTable. Process the list externally. Import the revised list to a temporary table and update the NameTable from it.
  3. Use Notepad++ or a scripting tool to operate on the GEDCOM file. For example, adapt this procedure on NotePad++. While it’s been written for something other than GEDCOM, I think the regex for GEDCOM to cover hyphenated and non-hyphenated names would be:
    (?:1 NAME |\G(?!^))[ -]\K(\w)(\w+)
  4. Other genealogy software do have a conversion tool: Legacy Family Tree has it built in and offers a feature-restricted free trial; Family Historian has an available user-developed plugin that you could use in its 30-day unrestricted trial. You’d want to check if there are any important losses in the import-export operations needed in using one of these.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.