How To Spatially Enable Business Information Systems

Build a Geodatabase

Step 1:  Survey the existing database structures and determine if entities contain geographic elements such as latitude, longitude, address, city, state, zip, county or region.  Tables with attributes such as these are often the starting points to building a spatial data repository from an existing database.  Also, records associated to items that are geotagged, such as pictures and SMS messages can be used to build geodatabases.

Step 2:  Determine if the role of the existing database will interfere with the GIS and vice a versa. Most GIS are used for reporting and analytical purposes and must achieve high performance data retrieval speeds to be effective.  In that regard, spatially enabling a volitile, busy online transaction processing database can be a big mistake.  Rather, a good geodatabase will likely be a data warehouse isolated from other databases on separate hardware dedicated to GIS.   Also, even if the data creation and maintenance activities originate in a geodatabase, those activities should be performed in an online transaction processing geodatabase separate from the data warehouse geodatabase. 

Step 3:  Determine the data update requirements.  Does the spatial data get updated on intervals such as nightly and weekly (like most data warehouses) or is the data updated continuously in real time (like a tracking system or online trasaction processing system)?  These types of geodatbases must be laid out much differently from one another.  For example, data warehouse GIS systems usually require performing a high volume of data processing as data updates are loaded in batches.  These operations are best done by scheduled jobs that execute after the data replication jobs that load the data warehouse are completed.  On the other hand, geodatbases which manage real-time systems for emergency responses and other tracking systems must update the spatial data quickly as the events occur. Factors such as these influence the geodatabase design dramatically and must be considered before the geodatabse design phase begins.

Step 4:  Design the geododatabase to ensure geographic data entities (commonly reffered to as feature classes) are created in a normalized manner.  By normalized manner we mean to prevent redundant data.  Occasionaly this may require modification to existing data structures. For example, in geodatabases addresses should be consolidated to a single table and made unique within that table.  When many tables within a database are attributed with address fields, the same addresses can then occur in multiple tables. In a geodatabase un-normalized data such as this can force geocoding operations to be peformed multiple times on the same address which makes the system in-efficient.  Also, in any database redundant data should be eliminated to prevent values in tables from contradicting one another, wasting storage space and slowing down query operations. 

Step 5:  Create the new geodatabase. Today there are at least three ways to spatially enable an RDBMS:
  1. SQL Spatial Extensions
  2. ESRI ArcSDE
  3. ESRI ArcSDE on top of SQL Spatial Extensions
The advantages and disadvantages of each are beyond the scope of this paper, however in general these guidelines will help you decide which way to go. If you already have ESRI ArcGIS in house or you will need a full-featured GIS to produce maps and perform advanced spatial analysis and rendering, then go with option two or three. Most GIS professionals, E.G. Cartographers and GIS Analysts, rely on ESRI ArcGIS. To get the enterprise GIS features found in ESRI ArcGIS Desktop, you will need ArcSDE (included with ESRI ArcGIS Server) installed on the database server first. On the other hand, if you don't require the advanced features of ArcGIS then you can simply use the SQL Spatial Extensions alone. SQL Spatial Extensions are available in many popular RDBMS such as Oracle, SQL Server, DB2, Informix and PostgreSQL. To use SQL Spatial Extensions you will have to refer to the RDBMS vendors documentation to determine if you must install additional software or purchase special licensing. In most cases if you have an enterprise level RDBMS license SQL Spatial Extensions will be included and if it is not enabled already then it is simply a matter of enabling the SQL Spatial Extensions option in the database configuration. If you install ArcSDE and don't also enable the SQL Spatial Extensions (option 2), then you won't be able to submit spatial SQL statements to the RDBMS engine. Therefore, if you install ArcSDE you might as well enable SQL Spatial Extensions also if you can. ESRI ArcGIS and spatial extensions will avoid one another and work fine separately unless you force them to play together by including ArcSDE configuration keywords when tables are spatially enabled. In that case, ESRI ArcSDE and ArcGIS will work with tables built with the SQL Spatial Extensions, but I have found it very tricky and could write an entire white paper on that topic as well.
I highly recommend SQL Spatial Extensions to all organizations with an enterprise RDBMS. If you are not using SQL Spatial Extensions and / or ArcSDE, you are probably not using your database to it's full potential. For example, imagine writing a SQL statement to produce a report that lists all of the restaurants within 1,000 yards of a gas station. Or, write a SQL statement that lists all the gas stations in the city limits that are also within 100 meters of a river. These features and many more are simple to accomplish with the built in OpenGIS SQL functions such as ST_WITHIN and ST_BUFFER. You don't even need mapping software, such as ArcGIS, to get the power of GIS if you use SQL Spatial Extensions.
If you decide to use ArcSDE, nothing is required outside of ArcSDE to accomplish geodatabase creation if you don't mind developing DDL-like scripts with ArcSDE command line commands such as "sdelayer". The ArcSDE Adminstrators guide which is installed on the database server with ArcSDE explains the ArcSDE commands in detail. Also, be aware that you don't have to enable the SQL Spatial Extension RDBMS option with ArcSDE, but you can if you wish to go with option three described above. If you don't like writing "sdelayer" scripts then ESRI ArcGIS Desktop includes ArcCatalog which is much like the "Enterprise Manager" of ESRI.  With ArcCatalog you may create geodatabases and the spatial tables in them from a user friendly GUI environment without command line scripts involved.  In addition, ESRI ArcGIS Desktop provides a 4th GL model builder and a case tool which translates Visio UML diagrams into geodatabase tables if you prefer to use that route. In the long run, it is worth the money to get ArcGIS Desktop if you want to get the most out of your spatial data. Still, be aware I started using ArcGIS Desktop over 13 years ago and I still have a lot to learn about it. It is extremely powerful, but very time-consuming to learn to use properly.
Regardless which option above you choose to build your geodatabase, the process of RDBMS spatial enablement also varies depending on if you create the geodatabase from a brand new empty database instance or you create it from an existing database instance which already has data in it. In most cases the geodatabase is created in an empty database instance and then data is loaded into it, but be aware that it does not have to be empty to be spatially enabled. For example, I spatially enabled a data warehouse and it would have required the company to purchase another data center to duplicate the data warehouse in new, separate databases, so we opted to spatially enable the existing databases. If you spatially enable an existing database instance the database retains all the functionality of a non-spatial database and can be used in every way it could before it was enabled. However, be aware that ArcSDE spatial enablement places triggers and adds a sequence field to each spatially enabled table. I have seen some legacy database applications have issues with that, but it is rare. 

Step 6:  Perform Initial Spatial Enablement to batch build geometry in the geodatabase tables.  We shall focus on vector data and exclude raster (imagery) for the time being.  When a spatial table (feature class) is created in a geodatabase, it is a completely normal database table except for the objectid and shape columns that ArcSE adds to it.  When these columns are added to existing database tables ESRI software recognizes the tables as spatial tables.  However, the spatial tables will not render on maps at this point since the shape column contains no geometery for the GIS to render.  This is where initial spatial enablement comes in.  Initial spatial enablement usually involves running processes which read rows of data in tables and uses parts of that data to create some type of geometry.  Once the geometry is made from the data in rows, it is then stored back into the shape column of the spatial table.  For rows with address, city, state, zip or county this process usually involves geocoding points.  Geocoding addresses is a big topic and is best left for another white paper.  In other cases, rows may contain a "from-address" and a "to-address", thus forming a route line.  Other tables may have rows that must be poplulated with polygon shapes derived from things such as county, state or other areas.   The numbers of ways to derive and create spatial data are too many to describe, but the basic business spatial entities are often created from methods similar to ones described above.  Initial spatial enablement can be pretty involved especially if there is special logic required (often refered to as geoprocessing) to derive the geometry.  Geoprocessing can involve development of custom software applications, scripts or models.  Once these applications, scripts or models are developed the intitial spatial enablement is kicked off.  Depending on the size of the spatial table, initial spatial enablement can take seconds or it can take days or weeks. 

Step 7:  Perform Continuous Spatial Enablement to maintain the geometry within the geodatabase tables.  Continuous spatial enablement is performed to replace geometry in shape columns when data values from which the geometry is derived changes.  For example, a spatial table which contains point shapes derived from geocoded postal addresses must be re-geocoded when the data values of the postal addresses change.  Also, if new postal addresses are added to the table the geocoding operation must be performed to create the point geometry for the new data.  Consequently, continuous spatial enablement is best when performed only on rows of data that have changed or were added.  Often to accomplish continuous spatial enablement, scheduled data processing jobs are used.  These jobs can be created from the geoprocessing logic developed for the initial spatial enablement step above, but made more intelligent as to not re-process all of the data needlessly.  Another and usually better way to accomplish this is by "on-update" and "on-insert" database triggers.  Triggered continuous enablement is better because it is good for both batch and real-time updates, but be wary since triggers can also lead to problems that may interfere with other operations that occur in the database not related to GIS.  Geodatabases in DB2 (Spatial Extender) and Informix (Data Blade) must employ database spatial extensions, but spatial extensions (Oracle Spatial) are optional for Oracle and not available with SQL Server, yet. Ultimately, spatial extensions are a standard that extend SQL itself and are what enables spatial data creation logic to be performed by the database with stored procedures, triggers and the like.  As an alternative to spatial extensions, geometry creation logic can be written with the ESRI ArcObjects code libraries called by Microsoft .NET, Visual Basic 6.0, C++ or Java programs.  ArcObjects offers many functions beyond those in the spatial extensions, and has the advantage of portability enabling it to be used upon any of the ArcSDE supported RDBMS.  Still, another method of geometry creation logic is done with the ESRI ArcGIS Desckop Geoprocessing Toolbox, Models and Scripts.  Search for "geoprocessing" at the ESRI Devleoper Network website to get started that way.