Sunday, March 1, 2015

Geodatabases, Attributes, and Domains

Introduction:


One of the most important data models in modern GIS is the geodatabase. It is the main mechanism used to use, organize, and store geographic data in ESRI ArcGIS. The geodatabase is object oriented, meaning that features and attributes are stored together in a single unit, or object. This is preferable to the georelational model, in which features and attributes are stored separately, but linked together by a common feature ID. This is the traditional model, and its use has decreased since the conception of ESRI's geodatabase.

There are a number of ways that the geodatabase can be set-up to reduce overhead in collecting data, but especially important in this are domains, which restrict the way data can be entered into the desired feature class. This allows the geodatabase designer to specify a given range that an attribute could fall within, or allow for a selection from pre-specified values. Imagine being out in the field to digitize locations of trees. For each tree point you need to take down some attributes, like ground cover nearby and tree height. In any study minimizing field time is important, and a domain can help achieve this. If you accidentally inputted 100 meters instead of 10 meters for a tree's height, a domain would recognize that that fell outside of the domain that you specified, being from 5-25m. You would be able to recognize the input error, and correct it before bringing it back to the analysis stage, where this erroneous value would negatively effect your data. A domain could also be used for the ground-cover, providing a list of options to select, rather than forcing the user to manually enter in the same values over and over again (ie. grass, dirt).

In this exercise, we were tasked with the construction of a geodatabase, development of domains, and creation of a feature class for use in our microclimate survey that will be conducted later this semester. For the survey, we will be collecting many different pieces of data so it is important to consider how we want to store them. Generally, it is a good idea to use fewer feature classes with multiple attributes, which results in a clean, logical and queryable dataset usable for further analysis.

The data pieces that we will be collecting include: wind speed, wind direction, humidity, dew point, surface temperature, temperature at 2m elevation, wind chill, and notes. As I just mentioned, it is often advisable to aggregate data pieces into a single feature class, so that is what I will do. This will result in a single microclimate feature class, containing attributes that detail each of these different measurements.

Another important consideration when designing a geodatabase is the desired data type for the fields that will be used. In this example, the two general data types that will be used are text and numbers. However, there are a number of different number types that can be used, each with its own pros and cons.

This table shows number data types, detailing their storable range, application, and storage size.
For this particular study, I decided that using floats would be the best option for numerical data storage, because I wanted to be able to include decimal values, but didn't need the data range that doubles can provide.

The next step is to decide what kind of domains will be needed to facilitate field collection. I will now outline the conceptual basis for each domain, before providing a detailed tutorial on the creation of the geodatabase, development of domains, and creation of feature class.

Wind Speed: The wind can't go lower than 0, and it will almost certainly not exceed 50mph. As mentioned earlier, this will be stored as a float.
Wind Direction: This will be an azimuthal measurement taken with North as 0 degrees, up to 360 degrees. Also will be a float.
Humidity: Humidity is recorded as a percentage from 0 to 100%, which represents saturation.
Surface Temperature: Temperature normally doesn't fall below -40 degrees Fahrenheit even at this time of year, and certainly won't exceed 60.
Temperature at 2 Meters: Will use same domain values as above.
Dew Point: Since dew point is based on temperature, it makes sense to use the same domain range values as the temperature fields will be using.
Wind Chill: Won't drop lower than -40 either, so same domain can be used.
Ground Cover: Possible values include grass, snow, concrete, blacktop, gravel, water, or sand. It is also advisable to include an "other" option, in case an unforeseen value arises.
Notes: No domain will be used, but this field will be valuable in attaching any other important information to each micro-climate point.

Methods:

Begin by opening ArcCatalog and navigating to the folder that you would like to store your geodatabase in.

Select folder, and right click to choose New -- File Geodatabase
New File Geodatabase.gdb will be shown under the contents tab, so you can right click on it and choose rename, and call it Microclimate.gdb

Next, right click on your Microclimate.gdb and choose Properties.

Database Properties pop-up. Notice that I selected the Domains tab, as that is what we will be working on next.
The above window is where we will define the domains outlined in the previous section. Simply click the Domain Name box to create a new domain, and be sure to give it a relevant description.

Here is the populated Domains tab. Currently selected is the Ground Cover domain, which has a field type of Text, and a domain type of Coded Values. This basically means that fields with this domain will provide a selectable dropdown list of values, so they won't have to be typed repeatedly in the field. Click inside of the Code area to specify a relevant code, then click in Description to add the appropriate land cover description as shown above. Refer to the previous section for complete list of coded values for ground cover. 
The rest of the domains that will be added are numerical domains. They each use a floating point datatype, and just require a range of acceptable values to be specified.

The temperatures domain is selected. Note that the field type is a Float, and the minimum value is -40, and the maximum is 60. These represent the minimum and maximum acceptable values for temperature in Fahrenheit.
Add the remaining domains in this manner, specifying the minimum and maximum values illustrated in the introduction section. Remember to provide relevant descriptions to each domain.

Next, we will need to create a new microclimate feature class.

Right click on the Microclimate.gdb and select New -- Feature Class as shown above. 
A new pop-up window will appear, name your feature class microclimate_[your username], and provide an alias if you'd like. Under Type, choose Point Features. This specifies that the feature class we will be using will be points. Click Next.

You will now be prompted about the coordinate system that will be used for XY coordinates in this dataset. For now, we will leave this blank. Choose Next. An XY Tolerance option will appear, accept the default value (should be 0.001 Unknown Units). Click Next. A Configuration Keyword option will now appear, accept Default and choose Next.

Now, you will see the window shown below.

This is the window that allows you to add fields. These will be similar to the domains we added earlier, but will more closely follow the outline from the end of the introduction section. Recall that we will have the following fields: wind speed, wind direction, humidity, dew point, surface temperature, temperature at 2m, wind chill, ground cover, and notes.
Here are the Field Names and Data Types. Use either all caps, or all undercase letters, and don't allow spaces, dashes, or special characters in the field names. Also, be sure to choose Float as the Data Type, because our domains will only appear if the data-types match. 
This area will be shown below the Field Name/ Data Type window shown above. In the Domain area, be sure to select the corresponding domain for each field. For example, for WIND_SPEED, choose the Wind Speed Domain from the drop-down. You won't see our Ground Cover domain as an option, because that is a text-based domain. On the GROUND_COVER field, choose text so the Ground Cover domain will be available. 
Set up your remaining domains in this fashion. Remember that DEW_POINT, TEMPERATURE_SURFACE, TEMPERATURE_2M and WIND_CHILL will all be using the Temperatures domain. NOTES will use no domain at all, allowing us to input any notes, not restricted by values.

It is also advisable to import basemap data into your geodatabase for reference while in the field, and the process for doing this depends on your area of interest, and data availability.

Discussion:


The geodatabase, domains, and feature class that were just created will be helpful for our microclimate survey, and will reduce the busy-work that would be done in the field by taking multiple points for each attribute. It will allow for stream-lined data collection, taking microclimate data readings at each point, and simply inputting the data into each proper field on the GPS device. It is also worth noting, however, that over-aggregation can also be problematic. If there are two different data pieces that have distinct functions, they should be put in separate feature classes.

Conclusion:


This exercise was valuable, because as I've demonstrated, proper planning for a field project and database design that accommodates its goals are very important in the geospatial field. It reduces overhead, allowing for quick, concise data collection and higher data reliability because of the elimination of many potentially erroneous values. It is also valuable to be able to walk a potential classmate or co-worker through the process of designing a geodatabase, which is what this exercise ultimately called for. 

Sources:

Geog 337, 336 Class Notes
ArcGIS Help. (2013, July 30). ArcGIS field data types. Retrieved March 1, 2015, from http://resources.arcgis.com/en/help/main/10.1/index.html#//003n0000001m000000

No comments:

Post a Comment