Data in the Time of Cholerics: Where to Find Preserved Federal Data

During the recent change in federal government, researchers and librarians were concerned about loss of access to federal data, particularly in the area of environmental science where the new administration’s policies appeared to contradict scientific consensus. Early indications suggested that federal datasets and scientific information would be removed from the web entirely, or at least restricted in access.

In response to these concerns, a number academic institutions and other organizations began to organize data preservation efforts to ensure continued public access to endangered datasets. While websites of specific federal agencies continue to serve as the primary repositories of public data, this post focuses on a few public websites that aggregate and preserve federal datasets and provides a brief description of each.

Data.gov – Launched in May 2009, this repository was designed to improve access to machine-readable datasets generated by federal government. As of April 2017, data.gov contains over 192,000 datasets, including climate data, environmental information, and other science and research. Datasets can be found through a combination of keyword searches combined with filters including location, federal agency, publishers, dataset type, and format.

DataRefuge – DataRefuge, spearheaded by the Penn Program in Environmental Humanities, has organized a number of DataRescue events which include archiving federal websites and archiving datasets among other activities, with a focus toward preserving federal climate and environmental data. DataRefuge currently has preserved 190 datasets including data from the Department of Energy, the Environmental Protection Agency, the National Oceanic and Atmospheric Administration, and the National Aeronautics and Space Administration.

DataLumos – An archive maintained by the Inter-university Consortium for Political and Social Research (ICPSR), this crowd-sourced repository allows users to upload Federal Government datasets directly to the repository or recommend datasets that can be added by ICPSR. A current search of the archive reveals only 20 datasets, most of which deal with social science data. The archive is expected to grow with further contributions, and may include more datasets relating to the natural sciences in the future.

End of Term Web Archive – Although not specifically focused on dataset preservation, the End of Term Web Archive began in 2008 and tasked itself to comprehensively harvest the web pages on Federal Government domains and preserve them during Presidential election years, focusing on the time period of potential administration changes. A public access copy of the archive is hosted by the Internet Engine, and the collection is valuable to track changes to federal web sites over time.

Early concerns about the swift, widespread removal of federal data appear to have overestimated the current administration’s ability or resolve to remove data from the web.  Despite this, the availability of federal government data is still tenuous, requiring only an executive signature and a few subsequent keystrokes to remove public access.  Librarians must continue to be diligent in monitoring federal information policy to ensure future access to publicly-available federal data.

Eric Prosser, Science Liaison Librarian, Fort Lewis College, ejprosser@fortlewis.edu

We welcome your comments and suggestions. If you have a resource that you would like to see highlighted please leave us a comment.