Common Data Formats and Acces Methods for Geospatial data

If you are new to geospatial data, I recommend reading the article “The fundamentals of GeoSpatial data” which explains the basics of vector and raster data.

If you are looking for an article about how to find GeoSpatial data read the article “Accessing and Managing Geospatial data“

Ways to Access data

When you have found find a data set you wish to use, there are typically several alternative ways to access the data and perhaps also different data formats to choose between. In the “table of common geospatial data formats“, you can read about the different formats and their merits.

Which method of access you chose depends on what you want to do with the data. there are four common scenarios:

You want to be able to edit/modify the data
You want to be able to use the data as input for analysis
You want to be able to change the styling(symbology) of the data
You just want to see the data as it has been stylised/(symbolized) by the data provider

These scenarios are ordered by restrictions, so you can always use data that satisfies a lower number scenario for a higher number scenario i.e. if you can edit the data you can also change the symbology of the data,

You want to be able to edit/modify the data

In this situation, you typically need to store your data in a local folder on your computer or use a dedicated geospatial database server on which you have editing rights. Also, hold in mind that you typically don’t directly edit raster data, but rather modify them through an analytic process.

Data is located in folders on your computer. You have typically downloaded the data from somewhere or exported it from a “data source that can be used as input for analysis” (see bellow). It is highly recommended that you do not mix geospatial data and other types of data, videos etc., in the same folder, i.e. do not use any standard folder such as documents or Downloads to store your GeoSpatial data. The advantage of having data in a local folder is that it typically is the fastest way for the GIS to access data, and it also works when you are offline. Further, if you are going to edit/modify the data, this is often the only solution. The drawback of having data in local folders is that GeoSpatial data have a tendency to take up a lot of space in your folders. If the original data set you downloaded gets updated, these updates will not be reflected in your data set, so you risk having an out-of-date data set. There are many different file formats for storing geospatial data You can find a list of the most common formats in the “Table of common geospatial data formats” To read more about creating an efficient folder structure for geospatial data ad how to access the structure from your GIS, read the article “Establish a Structured Geodata collection.”

Dedicated GeoSpatial database servers. While storing data on a shared network drive is an easy way of making data available to multiple users within an organisation, it has some severe drawbacks, especially in two areas. Speed, Accessing data in a dedicated GeoSpatial database such as PostgreSQL or Oracal is typically faster than accessing the data on a shared folder in situations where only a part of the data needs to be accessed, for instance, a specific type of fields in a specific part of the country. This is because the fingering can take place on the server, and only the required data needs to be sent over the network to the GIS, this naturally depends on the processing power of the GeoSpatial database servers. The other advantage of a GeoSpatial database server is that it can support multi-user editing, which has only limited support on data in a shared network folder. Accessing data from a GeoSpatial database server within the GISapp typically needs a database client to be installed in the GISapp (Oracle and PostgreSQL clients are preinstalled in both QGIS and ArcGIS Pro)

You want to be able to use the data as input for analysis

If you do not need, to be able to edit the data, there are three comment methods of access:

Data is located in a folder on a shared network drive. From the perspective of the GISapp, this is like having data in a local folder, except that shared network drives typically do not allow write access to the data (except for the data manager). this means that exactly the same file formats and methods of connecting to the data from the GISapp apply as if the data were in a local folder.

Data is located on a shared web store This can be something like GitHub, Google Drive or just any website. All you need to access this type of data is a web address (Universal Resource Locater URL), for instance, “https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/significant_month.geojson” (This URL points to USGS’s geojson file of significant earthquakes the last 30 days) Typical formats for geospatial data shared in this way are GeoJSON and KML. You can read how to access this type of data in ArcGIS, QGIS and Python. Sharing simple geospatial data using GitHub or Google drive is also rather easy, you can read how in the article “Sharing geospatial data using standard web products such as GitHub, Google Drive and WordPress”

Accessing image data via web standards.

Please remember

Always cite the data you have used and the date when you accessed the data. Most data sets also demand that you specify copyright and license information when you use the data. Read more in the article “Copyright and licencing practices“

Pictures of maps.
Indirect spatial data
Pictures of maps with coordinates
Raster data
Vector data

In addition to these five types of data, you also need to distinguish between local data and remote (service-based) data. Local data is data stored on your local computer or network drive connected to your computer such as OneDrive, while remote data is data, typically managed by someone else, that you access over an internet connection when you need it.

Pictures of maps

Pictures of maps are something you typically find in a report (pdf) or on the internet. There are three reasons why you if possible should avoid using this type of data:

They are typically not meant to be used outside their original context so you can easily by breaching someone’s copyright. You can find a general description of copyright and licencing practices here,
Pictures of maps often lack coordinated information especially if they are from a pdf or webpage. Therefore you need to add coordinate information to it through a process called “georeferencing“.
Pictures of maps can generally only be used as background maps for other data, it is a visualisation of data, not the data so you cant change the way it looks or use it in numerical analyses without creating your own data based on it. See the article on digitalisation.

Indirect spatial data

Indirect spatial data is data that does not contain spatial coordinates but an identifier (key) to a well-defined spatial location. There are two common types of indirect spatial data namely data that uses addresses as the key or data that uses the name of an administrative or statistical units i.e. country, region municipality or census area.

Exactly which choices you are presenting with depends on the software running the metadata server and which ways of access are available for the data you have located. In relation to accessing the data you typically have two main options, either download the data or access it as a service. Both options often support different standard data formats.

If you wish to access the data as a service, i.e. let the data stay at the original location and access it online, the most common standards are WMS and WFS. WMS is typically used for ready symbolized data like digital versions of paper maps or images, while WFS is more suitable if you want to do your own symbolization or even do analysis. You can read more about the different standards for data services in the post “Data and service standards for geospatial data”

If you wish to download the data to your local computer there is a host of formats for this. The most common formats for downloading data to your computer are GML and GEOjson although others such as “shp” files and “geopackages” are also often seen. Again you can read more about the different standards in the post “Data and service standards for geospatial data”.

The choice of downloading the data vs. accessing it as a service is not trivial. The obvious advantages of downloading data are:

You don’t have to be online
The data won’t disappear
It is typically faster in analysis

The advantages of accessing data as a service are:

The dataset might be huge and not fit onto your computer or you might only need a limited part of the data.
You will always be using the current updated version and especially administrative data is typically updated frequently.
You don’t have to hassle with storing the data on your computer.

You can find a description of how to load data into a series of selected software systems in the post “loading geospatial data”