- 1Raison d'etre
- 2SQLite now has the INSTR function!
- 3Sample results of parsed Place names
- 4Where are the commas? query
- 5Parse the Standardized Place names query
- 6Update Place Abbreviations with two parts of the Standardized Place name
- 7Update the working Place name with three parts
- 8Further ideas
- 8.1Discussions & comments from Wikispaces site
- 8.1.1Place Details
This page responds to a request from vyger seeking a way to parse the Standardized Place name using SQLite akin to what might be done in Visual Basic with the InStr function:
... With the RM geocoded table the Normalized is generally a 4 component field delimited by commas, what I would like is set the Name to the substring left of the third comma and the Abbrev to the substring left of the second comma...
As of SQLite 3.7.15 dated 2012-12-12, the INSTR(X,Y) function was finally included. Until that time, I do not think it would have been possible to parse a comma delimited string in SQLite without using a higher level language, either to extend SQLite or to use SQLite as a data source. Now it has become feasible with those SQLite managers that have incorporated SQLite 3.7.15 or later. Unfortunately, as of this writing, SQLiteSpy has not been updated since 2011 and thus does not support the INSTR function; a new version that will has been promised but no timeline given. To carry out the development of a suitable query, I was fortunate to find that SharpPlus SQLite Developer has undergone recent revision and therefore does support it. So does SQLite Expert but only the paid version of SQLite Developer and SQLite Expert support the fake RMNOCASE collation needed to modify the Place Name. I’m hopeful that a future version of RMtrix will incorporate the SQLite INSTR function and these queries.
|Screenshot of results from PlaceParse.sql as displayed by SQLite Developer|
The results above show some of the contents of a temporary table xPlacePartsTable containing the PlaceID and Normalized columns from RM’s PlaceTable, the comma placements within the values of Normalized and the parsing of Normalized into four parts. Because it is directly and uniquely related to PlaceTable via PlaceID, it is easy to assemble the Standardized Name parts and update the working Name and Abbrev accordingly.
PlaceCommaParse.sql This query creates an initial temporary table xPlaceCommaTable with the columns from PlaceID to Comma3:
-- PlaceCommaParse.sql /* 2013-02-17 Tom Holden ve3meo Creates a temporary table of non-empty Standardized Place names with the positions of up to three commas in the string. Can be used to parse out the 4 parts of the name for further use such as the generation of a 2-part Abbreviation and 3-part Name for reports. */ DROP TABLE IF EXISTS xPlaceCommaTable; CREATE TEMP TABLE xPlaceCommaTable AS SELECT PlaceID ,Normalized ,Comma1 ,Comma2 ,Comma2 + INSTR(SUBSTR(Normalized, Comma2 + 1), ',') AS Comma3 FROM ( SELECT PlaceID ,Normalized ,Comma1 ,Comma1 + INSTR(SUBSTR(Normalized, Comma1 + 1), ',') AS Comma2 FROM ( SELECT PlaceID ,Normalized ,INSTR(Normalized, ',') AS Comma1 FROM PlaceTable WHERE PlaceType = 0 AND Normalized NOT LIKE '' ) );
PlaceParse.sql This query uses the initial temporary table to build an extended table with all the columns from the first plus the parsed parts of the Standardized Name as shown in the figure above:
-- PlaceParse.sql /* 2013-02-17 Tom Holden ve3meo Requires existence of table created by PlaceCommaParse.sql. Extracts the parts of a 4-part Standardized Place name and saves them to a temporary table */ DROP TABLE IF EXISTS xPlacePartsTable; CREATE TEMP TABLE xPlacePartsTable AS SELECT * ,CASE WHEN Comma1 > 0 THEN SUBSTR(Normalized, 1, Comma1 - 1) ELSE Normalized END AS Place1 ,CASE WHEN Comma2 > Comma1 THEN SUBSTR(Normalized, Comma1 + 1, Comma2 - Comma1 - 1) WHEN Comma1 > 0 THEN SUBSTR(Normalized, Comma1 + 1) ELSE '' END AS Place2 ,CASE WHEN Comma3 > Comma2 THEN SUBSTR(Normalized, Comma2 + 1, Comma3 - Comma2 - 1) WHEN Comma2 > Comma1 THEN SUBSTR(Normalized, Comma2 + 1) ELSE '' END AS Place3 ,CASE WHEN Comma3 > Comma2 THEN SUBSTR(Normalized, Comma3 + 1) ELSE '' END AS Place4 FROM xPlaceCommaTable;
PlaceAbbrevUpdate.sql This query writes up to the first two parts of the Standardized Name to the Abbrev column of PlaceTable:
-- PlaceAbbrevUpdate.sql /* 2013-02-17 Tom Holden ve3meo Combines up to the first two parts of the Standardized Place name (the Normalized column) and places the concatenated result in the Abbrev column of PlaceTable for Places having non-empty Normalized fields. Requires temp xPlacePartsTable generated by PlaceParse.sql or equiv. */ UPDATE PlaceTable SET Abbrev = ( SELECT CASE WHEN Comma1 > 0 THEN Place1 || ', ' || Place2 ELSE Place1 END AS Abbrev FROM xPlacePartsTable WHERE PlaceTable.PlaceID = xPlacePartsTable.PlaceID ) WHERE PlaceID IN ( SELECT PlaceID FROM xPlacePartsTable );
A similar query can set the working Place Name to up to the first three parts of the Standardized name:
-- PlaceNameUpdate.sql /* 2013-02-17 Tom Holden ve3meo Combines up to the first three parts of the Standardized Place name (the Normalized column) and places the concatenated result in the Name column of PlaceTable for Places having non-empty Normalized fields. Requires temp xPlacePartsTable generated by PlaceParse.sql or equiv. AND (fake) RMNOCASE collation. */ UPDATE PlaceTable SET Name = ( SELECT CASE WHEN Comma2 > Comma1 THEN Place1 || ', ' || Place2 || ', ' || Place3 WHEN Comma1 > 0 THEN Place1 || ', ' || Place2 ELSE Place1 END AS Name FROM xPlacePartsTable WHERE PlaceTable.PlaceID = xPlacePartsTable.PlaceID ) WHERE PlaceID IN ( SELECT PlaceID FROM xPlacePartsTable );
With these queries as examples, others can be readily developed.
- Some may prefer just the first part of the Standardized Place name for the Place Abbreviation, i.e., typically the municipality’s name alone.
- Those whose Standardized Names are more typically only 3 levels (e.g., Canadian places when geo-coded have only municipality, province, country) may prefer just two parts for the working Place Name (municipality, province).
- Perhaps revise the UPDATE query so that the working Place Name receives 2 parts of a 3-level Place and 3 parts of a 4-level place.
- The temporary table, xPlacePartsTable, when viewed is itself useful for inspecting Standardized Place names for incomplete names. Depending on the SQLite manager, it can be sorted and filtered for special views.
- xPlacePartsTable itself can be edited. Thus Placen values can be changed in it and UPDATEs run to propagate changes to the RM database. Note that TEMP tables are lost when the SQLite manager that created them closes its connection to the database nor are they available to another SQLite manager open at the same time.