Lessons learned while building the Deepwater Horizon Database: Toward improved data sharing in coastal science

TitleLessons learned while building the Deepwater Horizon Database: Toward improved data sharing in coastal science
Publication TypeJournal Article
Year of Publication2016
AuthorsThessen AE, McGinnis S, North EW
JournalComputers & Geosciences
Date PublishedFeb
ISBN Number0098-3004
Accession NumberWOS:000368220600009
AbstractProcess studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and hydrocarbon chemistry. Yet, there are no publically accessible databases which integrate these diverse data types in a georeferenced format, nor are there guidelines for developing such a database. The objective of this research was to analyze the process of building one such database to provide baseline information on data sources and data sharing and to document the challenges and solutions that arose during this major undertaking. The resulting Deepwater Horizon Database was approximately 2.4 GB in size and contained over 8 million georeferenced data points collected from industry, government databases, volunteer networks, and individual researchers. The major technical challenges that were overcome were reconciliation of terms, units, and quality flags which were necessary to effectively integrate the disparate data sets. Assembling this database required the development of relationships with individual researchers and data managers which often involved extensive e-mail contacts. The average number of emails exchanged per data set was 7.8. Of the 95 relevant data sets that were discovered, 38 (40%) were obtained, either in whole or in part. Over one third (36%) of the requests for data went unanswered. The majority of responses were received after the first request (64%) and within the first week of the first request (67%). Although fewer than half of the potentially relevant datasets were incorporated into the database, the level of sharing (40%) was high compared to some other disciplines where sharing can be as low as 10%. Our suggestions for building integrated databases include budgeting significant time for e-mail exchanges, being cognizant of the cost versus benefits of pursuing reticent data providers, and building trust through clear, respectful communication and with flexible and appropriate attributions. (C) 2015 Elsevier Ltd. All rights reserved.