How Open is Open Source Spatial Data?

By Everett Pruitt

At its core, OpenGulf attempts to model forms of open scholarship, by relying on open methods, open source geospatial information, open source software, and collaboration wherever possible. Adopting this perspective one project has focused on building a derived dataset from a well known historical gazetteer about the Gulf region, a work that we know as Lorimer’s Gazetteer of the Persian Gulf, Oman and Central Arabia. The Gazetteer was written as a colonial intelligence product on the geography and history of the Persian Gulf and was declassified in the 1950s. The Gazetteer is clearly a colonial entity; it fixates on weapons, head counts, and even discusses how cutting off water to a region would be the first step in a military occupation. The project that we are working on is antithetical to this colonial intelligence product. Where the authors primarily relied on closed (publicly unavailable sources) colonial sources, we turn to open source and aim to push the fruits of our research into the domain of open data.

Open source information is not derived exclusively from academics and the general population. The GeoNames geospatial database that we rely on for contemporary places is partially user generated, but also has information derived from government databases, most notably in the case of the Iran data, the National Geospatial-Intelligence Agency (NGA). The NGA is a US intelligence agency tasked with providing GEOINT (geospatial intelligence) for US national security, founded in its current formation in 1996 but existing since 1939. The NGA has had a focus on the Gulf due to its importance to American foreign policy and Operation Desert Storm. A plurality of GeoNames locations found in the OpenGulf database were created by the NGA. They, like the Gazetteer, rely on closed sources and produce classified products to advance US national security interests. But times have changed, merely knowing locations is not the exclusive domain of governments, and as such, NGA has published their data on locations they have placed on a map.

Because of our use of GeoNames, we are actually using a significant amount of data from NGA. Under GeoNames’ about page, they list it as the most important source of data they have. So what does this mean? As academics on this project, we attempt to prioritize the openness of our project at every turn. Indeed, one of the appeals of GeoNames is the idea that someone with in-depth on the ground knowledge of these places can modify and update the data. But, this is not the reality we face. The data is primarily derived from government sources. Is this a problem? In the humanities there is a standard of knowing and critiquing the process of data creation, but due to the closed nature of NGA data, this is impossible within GeoNames. While academia strives for transparency, intelligence agencies’ missions necessitate opacity, and open source has brought them together. GeoNames is built on data derived through both closed and open methods, transformed into open source data through its publication. While we may not know how they created the data, we now can use it. Like the Gazetteer, we benefit from it, but unlike the textual Gazetteer, we cannot parse it, as this data is nothing more than a spreadsheet. The possible biases and methodologies are absent, but the data itself matches Lorimer and other available datasets, open and commercial.

NGA’s publication of its final data in contrast to the Gazetteer’s initial classified nature is emblematic of how open source information is changing both academia and intelligence products. The NGA collected this data for similar reasons as the British Empire did, but geographic information is no longer the closely guarded secret it once was. Anyone can access high quality satellite imagery on Google Earth and even higher quality data is available commercially. The knowledge of locations is not secret, it is the subsequent analysis. This benefits us as academics, as we now have access to more granular data than in any other moment in history. Open source is not just the domain of academics and the internet, but also the private and public sector. The issue is no longer getting the data, it is filtering through it. Digital humanities is attempting to meet this challenge through the application of technology to humanities, and for this reason we use QGIS and GitHub, technologies NGA uses too. Open source information is forcing a reevaluation of traditional methods and it is benefiting us all.