Using Google Dataset Search to find Open Data

Figuring out where to start when looking for datasets can be an arduous task for many students and faculty. Common roadblocks are buried information on government websites, limited access to dataset databases at their institution or a unique topic makes finding data difficult through typical exploration. Google Dataset Search, a tool developed by Google in 2018, may help to simplify this process. 

What is Google Dataset Search?

Google Dataset Search is a specialized search engine designed to help researchers discover datasets that are openly available on the web. The single search field makes it easier to find relevant datasets quickly by using any variation of keywords, phrases, Boolean or even the site: command.

Search results provide information, or metadata about the dataset in a simplified format allowing the searcher to quickly decide if the result is relevant and credible, such as the unique identifier, last date updated, creator/provider, authors, license level for use/reuse a brief description of the dataset, and the related academic research or article linked below the title. Most importantly, downloading the dataset is made easier by the bright blue buttons directing users to where the data may be downloaded for free.

Figure 1: Example search using keywords and Boolean to find open data about bird song syntax.

How Does it Work?

Google Dataset Search uses an algorithm to identify and index datasets found on open access online repositories. This means that the datasets that you are seeing in search results are from places on the web that openly share a downloadable version of the data and provide enough metadata to be indexed by the search tool. GDS aggregates and organizes the information and presents it in a logical way so that the search process is simplified.

Using Metadata to Organize

The search tool relies heavily on the metadata connected with the original dataset. This might include keywords, a description and more information about how the dataset can be downloaded and in what file type. This is usually created by the dataset contributors and allows GDS to recognize how to categorize, assign and provide information about the dataset on the platform.

User-Friendly Interface

The interface feels familiar because it relies on the comforting look of Google Search. But unlike Google, you don’t want to use a sentence or question. Rather, be specific and use keywords and phrases. The results include key details about each dataset, such as its source, creation date, and usage permissions. Additionally, there are filter options to help refine your search such as date updated, download format, usage rights, topic, provider and more and additional related datasets are offered in the left side of the screen. Lastly, results can also be easily shared with a stable link, bookmarked or cited from the result record.

Tips for Researchers, Teachers and Librarians

Create clear and concise searches similar to how you would in a database or in Google Scholar. Don’t use a sentence or question, but rather, use keywords, phrases, Boolean and/or the site: command. If you’re unable to find what you need, try using synonyms for the words you are using to describe your topic. If you still aren’t able to find what you need, reach out to your subject librarian at your institution.

Once you find the dataset you need, be sure to critically assess it for reliability. For example, you will want to do a cursory examination of the dataset’s metadata within the GDS search results, and explore any documentation provided at the source in order to verify that the credibility of the publisher aligns with academic standards.

If you are a researcher who is openly sharing their data, remember to provide clear and comprehensive metadata along with your data. This helps to ensure accuracy and increased visibility when it is aggregated into Google Dataset Search.

Due to the ease of the interface, a librarian in any field, regardless of knowledge in computer science or data science topics can use Google Dataset Search to assist with finding data in a wide variety of fields including arts, humanities, history and more. Take some time to get familiar with the platform by searching for topics that are popular in your subject areas. You can also request to add the resource to your library’s database listing or add it as a resource in your subject guides. If you’re familiar with data science or if it’s part of your assigned subjects, you may also consider gathering information to help prepare researchers in the practical next steps such as cleaning data using Excel or using programs such as R Studio, Tableau, Python, Jupyter Notebooks or ArcGIS to analyze and visualize data.

Summary

Overall, Google Dataset Search is a useful tool for any level of user who is starting the process of gathering data on their research topic. It is particularly beneficial for searching in one place, using a simplified process and for quickly getting to the downloadable data.

Kimberly Jackson, STEM Librarian, The Claremont Colleges

We welcome your comments and suggestions. If you have a resource that you would like to see highlighted please leave us a comment.

Leave a comment