Research Organization Registry

I work for an institution that is commonly referred to in a dozen different ways, among them University of California Berkeley (also with a hyphen, comma, or “at” after “California”), UC Berkeley (also with periods after the initials), UCB, California, Cal, Cal Berkeley, and just plain Berkeley. Even institutions that are referred to in less variable ways can change names, merge with another organization, split into two or more entities, or have a name or acronym that is similar or identical to that of another institution.

Multiple institutional names become a significant problem whenever the affiliation of researchers needs to be unambiguously identified. It’s an issue that affects, for example:

  • librarians, when assessing journals using institutional publication and citation rates;
  • research institutions, when tracking publications and other research outputs;
  • researchers and publishers, when identifying authors who may be covered under Open Access or Read and Publish agreements;
  • publishers and database managers, when trying to ensure the consistency and correctness of the scholarly record;
  • funding organizations, when tracking grant recipients.

The Research Organization Registry (ROR), a recent initiative by California Digital Library (CDL), Crossref, and DataCite, attempts to address the issue of institutional disambiguation through the development of a non-commercial, open, and interoperable system of unique persistent identifiers for research organizations (see Figure 1). After the successful establishment of identifiers for online resources (DOI, digital object identifier) and for authors (ORCID, Open Researcher and Contributor ID), ROR IDs are intended to fill a significant gap in the open science infrastructure.

Figure 1. ROR.org home page. Accessed 22 March 2023 at https://ror.org

Format: ROR IDs have the form of a unique 9-character string on the ror.org domain. They are preferably written as a full URL, such as https://ror.org/01an7q238, which resolves to the ROR record for the institution. The character string is generated at random and is not intended to be human-interpretable. Related institutions do not have related ROR IDs, but are linked from their ROR records.

ROR records: ROR records can be accessed by searching the registry, through the ROR API, or by downloading ROR data in bulk. Each record in ROR includes the organization’s ROR ID, primary name, type (Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, or Other), and location. Other information can include the organization’s alternate names; a link to its website; relationships to other organizations (parent, child, related, successor, and predecessor) and links to their ROR records; and other identifiers (see Figure 2). The ROR data structure includes a comprehensive list of field names and value types, together with detailed definitions and policies for selected fields. ROR IDs and their associated metadata are provided under a Creative Commons CC0 1.0 Universal Public Domain Dedication.

Figure 2. ROR record for the University of California, Berkeley. Accessed 22 March 2023 at https://ror.org/01an7q238

Scope: ROR IDs have now been assigned to more than 100,000 “top level” research organizations, defined as entities that produce, manage, fund, publish, or provide facilities or training for research (see ROR’s “What is the definition of a research organization?”). For the purposes of ROR “top level” means, for example, a university rather than a department or research group (see “Is my organization in scope for ROR?”). Perhaps this is the reason that ROR itself does not have an ROR ID, but its parent organizations CDL, Crossref, and DataCite do.

Adoption: ROR IDs have been integrated into the Crossref metadata schema, DataCite Commons, and the ORCID API and Affiliation Manager. And ROR aligns with the requirements of the Nelson memo issued by the White House Office of Science and Technology Policy in August 2022. That memo calls for Federal agencies to make available the “metadata associated with scholarly publications and data resulting from federally funded research,” including “affiliations…referencing digital persistent identifiers, as appropriate” (p. 6).

Funding: The development of ROR was funded through grants and donations from stakeholder institutions. ROR recently announced its intention to move to dedicated funding from the three founding organizations for its core operating costs. The dedicated funding will be supplemented by contributions (including a Global Sustainability Coalition for Open Science Services (SCOSS)-coordinated funding campaign) and, for time-limited projects, grants.

Curation: ROR relies on what it calls a “centralized, community-based curation process,” in which institutions do not “own” their ROR records. Any member of the public can submit a request to add an institution to the ROR registry, or suggest a revision to an existing institutional record. To ensure consistency with standards and policies, the requests and suggestions are reviewed by the Metadata Curation Lead, who consults as needed with ROR’s Curation Advisory Board; submissions are tracked on the ROR Updates Github Project site

Some remaining questions: The criteria for assigning ROR IDs are not always clear; for example, the UC Berkeley-affiliated Innovative Genomics Institute (IGI) and California Institute for Quantitative Biosciences (QB3) have ROR IDs, but not the Helen Wills Neuroscience Institute. Also, at the time I began exploring ROR for this post the records for UC Berkeley, UC San Francisco, and UC Santa Cruz were not linked as parent organizations from the QB3 record, nor was QB3 listed reciprocally as a child organization on the records of the three UC campuses (see Figure 2 for the UC Berkeley record). These oversights have since been brought to ROR’s attention via the public curation request form, but as of this writing the records have not been amended. It remains to be seen how scalable and sustainable the community-based curation process will be. Unique persistent identifiers are useful, of course, only to the extent that their stability, accuracy, and comprehensiveness lead to wide adoption.

Elliott Smith, Emerging Technologies & Bioinformatics Librarian, University of California Berkeley

We welcome your comments and suggestions. If you have a resource that you would like to see highlighted please leave us a comment.

Leave a comment