We are only just beginning our foray into the world of SQL Server 2008. It has some fantastic new capabilities like spatial datatypes and methods.
The problem:
We publish datasets and documents that are specific to certain geographic areas of Alberta. The public comes to our website and wants to find all information that pertains to a town, city or area. We currently use Oracle Spatial to calculate where our publications are geographically, but with the move off of Oracle, we need a new method.
Disclosure:
Prior to working on this problem, I had no prior experience working with spatial data. It’s very likely that my methodology for resolving this issue is not the best one, but it works for me! for now.
Alberta is divided into areas based on the [pdf] National Topographic System (NTS). Most of our publications have four geographic coordinates in degrees that define its study area. These coordinates can be used to calculate which publications intersect or overlap NTS areas.
I started by building a table of NTS areas.
CREATE TABLE NTS_Area (Sheet, Bounding_Poly geography);
INSERT INTO NTS_Area (Sheet, Bounding_Poly) VALUES ('72e', geography::STGeomFromText('POLYGON((-110 49, -110 50, -112 50, -112 49, -110 49))', 4269)); INSERT INTO NTS_Area (Sheet, Bounding_Poly) VALUES ('72l', geography::STGeomFromText('POLYGON((-110 50, -110 51, -112 51, -112 50, -110 50))', 4269));
The method STGeomFromText() returns an instance of the type geography based on the the first parameter, which can be one of Point, MultiPoint, LineString, MultiLineString, Polygon, Multipolygon, or Geometry Collection. A polygon requires a minimum of three distinct points; in this case, I am providing four distinct points to create a four-sided polygon, with the fifth point being the same as the first point in order to close the polygon.
The second parameter, 4269, is the spatial reference ID and refers to the version of NAD83 that is our organizational standard. Other supported spatial references can be found in the system catalog sys.spatial_reference_systems.
My source database is still on SQL Server 2000, so I pull the four coordinates along with their key to a table on my 2008 server for processing every night. Right now it’s just over 3500 records after eliminating those records that may not contain four coordinates, have invalid coordinates, or are outside of Alberta.
CREATE TABLE [dbo].[BOUNDS]( [ID] [decimal] (13, 0) NOT NULL, [WEST_COORDINATE] [decimal] (18, 12) NOT NULL, [EAST_COORDINATE] [decimal] (18, 12) NOT NULL, [NORTH_COORDINATE] [decimal] (18, 12) NOT NULL, [SOUTH_COORDINATE] [decimal] (18, 12) NOT NULL, [ID_GEO] [geography] NULL, PRIMARY KEY CLUSTERED
I set up a trigger to create the geography object representing the polygon upon insert into the table.
UPDATE [dbo].[BOUNDS] SET [ID_GEO] = geography::STGeomFromText('POLYGON ((' + CAST([EAST__COORDINATE] AS VARCHAR) + ' ' + CAST([SOUTH_COORDINATE] AS VARCHAR) + ', ' + CAST([EAST_COORDINATE] AS VARCHAR) + ' ' + CAST([NORTH_COORDINATE] AS VARCHAR) + ', ' + CAST([WEST_COORDINATE] AS VARCHAR) + ' ' + CAST([NORTH_COORDINATE] AS VARCHAR) + ', ' + CAST([WEST_COORDINATE] AS VARCHAR) + ' ' + CAST([SOUTH_COORDINATE] AS VARCHAR) + ', ' + CAST([EAST_COORDINATE] AS VARCHAR) + ' ' + CAST([SOUTH_COORDINATE] AS VARCHAR) + '))', 4269) WHERE ID IN (SELECT ID FROM BOUNDS)
The next task was to determine how the publication polygons intersected the NTS area polygons. To cut down on noise, we exclude any matches where the intersection of the two areas was less than 10 percent.
You can see here to the right an example of how a polygon of a publication intersects with multiple NTS areas. In this case, the study area is in the northeast corner of the province and includes part of Saskatchewan. We are interested in only NTS areas 74l and 74m and the calculations will ignore the part of the polygon to the right of the two marked NTS areas.
I created a view to perform this calculation.
SELECT bc.[ID], nts.[SHEET] FROM [dbo].[BOUNDS] AS bc CROSS JOIN [dbo].[NTS_AREA] AS nts WHERE (bc.[ID_GEO].STIntersection(nts.[SHEET_GEO]).STArea() / nts.[SHEET_GEO].STArea() > .1) OR (bc.[ID_GEO].STIntersection(nts.[SHEET_GEO]).STArea() / bc.[ID_GEO].STArea() > .1)
So what does this mean? I will use the NTS area 74l in the image to the left (numbers have been rounded to simplify the math).
bc.[ID_GEO].STIntersection(nts.[SHEET_GEO]).STArea() / nts.[SHEET_GEO].STArea() > .1
Is the area of the intersection between the publication polygon and the NTS polygon divided by the area of the NTS polygon greater than .1 (10%)?
The area of intersection (green colour) is 8.5 and the total area of the NTS area is 12.9, resulting in an intersection percentage of 66%. This satisfies the criteria for inclusion.
bc.[ID_GEO].STIntersection(nts.[SHEET_GEO]).STArea() / bc.[ID_GEO].STArea() > .1
Is the area of the intersection between the publication polygon and the NTS polygon divided by the area of the publication polygon greater than .1 (10%)?
The area of intersection (green colour) is 8.5 and the total area of the publication study area is 129.4, resulting in an intersection of 6%. Less than 10% of the study area intersects with this area of the province. This calculation is effective when you have a very small study area that may be entirely within a single NTS area. It could fail the first calculation of being less than 10% of an intersection with the entire study area, but it would be 100% of the study area and thus be found by a search of a specific NTS area.
The results are then used to update a table in the originating SQL Server 2000 database with 34,000 new records.
This process works, but not as well as I would like it to. The whole job takes almost an hour to run, compared to the 30 minutes on the Oracle server. I’m pursuing research on spatial indexes and effective use of regular expressions in T-SQL, but I welcome any suggestions on how to improve this process.