Skip to content

Introduction to Open Science

Learning Objectives

After this lesson, you should be able to:

  • Explain what Open Science is
  • Explain the components of Open Science
  • Describe the behaviors of Open Science
  • Explain why Open Science matters in education, research, and society
  • Understand the advantages and the challenges to Open Science



What is Open Science?


"Open Science is transparent and accessible knowledge that is shared and developed through collaborative networks"

-Vincente-Saez & Martinez-Fuentes 2018






"Open Science is a collaborative and transparent approach to scientific research that emphasizes the accessibility, sharing, and reproducibility of data, methodologies, and findings to foster innovation and inclusivity"

-ChatGPT






"A series of reforms that interrogate every step in the research life cycle to make it more efficient, powerful and accountable in our emerging digital society".

-Jeffrey Gillan

open science

The Research Life Cycle from Open Science Framework





Other Definitions

"Open Science is defined as an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community." - UNESCO Definition

"Open Science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society, amateur or professional..." Wikipedia definition


Foundational Open Science Skills

1. Building a culture of scientists eager to share research materials - such as data, code, methods, documentation, and early results - with colleagues and society at large, in addition to traditional publications



2. Mastery of digital tools to create reproducible science that others can build upon



3. Understanding the push towards increased transparency and accountability for those practicing science (ie., compliance)






open science
Open Science Word Cloud by Pownall et al. 2023







What is Open Science | The Royal Society







2023: the Year of Open Science

The White House, joined by 10 federal agencies, and a coalition of more than 85 universities, declared 2023 the Year of Open Science as a way to bring awareness to the benefits of Open Science and to steer the scientitic community towards its adoption.

NASA leads a prominent program called Transform to Open Science which includes an online class on Open Science.

open science

NASA Transform to Open Science (TOPS)








6 Pillars of Open Science

Open Access Publications

Open Data

Open Educational Resources

Open Methodology

Open Peer Review

Open Source Software

Wait, how many pillars of Open Science Are There Really?

The number can be from 4 to 8






Open Access Publications

open access

Definition

"Open access is a publishing model for scholarly communication that makes research information available to readers at no cost, as opposed to the traditional subscription model in which readers have access to scholarly information by paying a subscription (usually via libraries)." -- OpenAccess.nl




Open Access Journal Examples

Major publishers have provided access points for publishing your work



Types of Publishing Business Models:

  1. Subscription model - the author pays a smaller fee (or no fee) for the article to be published. The publisher then sells subscription access to the article (usually to institutes of higher education).

  2. Open Access model - The author pays a larger fee to make the article freely available to anyone through a Creative Commons license.

    • Open Access publishing in Nature costs $12,290!
    • Open Access publising in PlosOne costs $2,290




Research Article Versions

  1. Preprint - In academic publishing, a preprint is a version of scholary paper that precedes formal peer-review and publication in a scientific journal. The preprint may be available, often as a non-typeset version available for free online.

    Pre-print Services
    • ASAPbio Pre-Print Server List - ASAPbio is a scientist-driven non-profit promoting transparency and innovation comprehensive list of pre-print servers inthe field of life science communication.
    • ESSOar - Earth and Space Science Open Archive hosted by the American Geophysical Union.
    • Peer Community In (PCI) a free recommendation process of scientific preprints based on peer reviews
    • OSF.io Preprints are partnered with numerous projects under the "-rXivs"
    The rXivs
    • AfricArXiv

    • AgrirXiv

    • Arabixiv

    • arXiv - is a free distribution service and an open-access archive for 2,086,431 scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

    • BioHackrXiv

    • BioRxiv - is an open access preprint repository for the biological sciences.
    • BodorXiv
    • EarthArXiv - is an open access preprint repository for the Earth sciences.
    • EcsArXiv - a free preprint service for electrochemistry and solid state science and technology
    • EdArXiv - for the education research community
    • EngrXiv for the engineering community
    • EvoEcoRxiv - is an open acccess preprint repository for Evolutionary and Ecological sciences.
    • MediArXiv for Media, Film, & Communication Studies
    • MedRxiv - is an open access preprint repository for Medical sciences.
    • PaleorXiv - is an open access preprint repository for Paleo Sciences
    • PsyrXiv - is an open access preprint repository for Psychological sciences.
    • SocArXiv - is an open access preprint repository for Social sciences.
    • SportrXiv - is an open access preprint for Sports sciences.
    • ThesisCommons - open Theses
  2. Author's accepted manuscript (AAM) - includes changes that came about during peer-review process. It is a non-typeset or formatted article. This often had an embargo period of 12-24 months

  3. Published version of record (VOR) - includes stylistic edits, online & print formatting. This is the version that publishers claim ownership of with copyrights or exclusive licensing.



Copyrights and Science Publishing

Upon completion of a peer-reviewed science paper, the author typically 1. signs over the copyright of the paper to the publisher or 2. signs an exclusive license agreement with the publisher

For example authors that publish in Science retain their copyright but sign a 'license to pubish' agreement with AAAS

Elsevier requires authors to sign over copyright of the article but authors retains some rights of distribution



New Open Access Mandates in US

The White House Office of Science and Technology (OSTP) has recently released a policy document known as the Nelson Memo stating that tax-payer funded research must by open access by 2026 with no embargo period.

Authors can comply with the memo by either:

  1. Publishing Open Access (this usually requires higher fees)
  2. Distributing the Author's Accepted Manuscript (AAM)

Read NSF's open access plan in reponse to the Nelson Memo

Read USDA's open access plan in reponse to the Nelson Memo





Additional Info

University of Arizona Libraries information on Open Access publishing including agreements with several journals to reduce or waive publishing fees.

https://www.coalition-s.org/









Open Data


Definitions

“Open data and content can be freely used, modified, and shared by anyone for any purpose” - The Open Definition

"Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike." - Open Data Handbook

Wikipedia definition




Data are the foundation for any scientific endeavor. A lot of thought needs to go into how to best collect, store, analyze, curate, share, and archive data.

open science

DIKW Pyramid



FAIR Principles

In 2016, the FAIR Guiding Principles for scientific data management and stewardship were published in Scientific Data.

Findable: Making data discoverable by the wider academic community and the public

Accessible: Using unique identifiers, metadata and a clear use of language and access protocols

Interoperable: Applying standards to encode and exchange data and metadata

Reusable: Enabling the repurposing of researach outputs to maximize their research potential





Reasons to Make your Data Open

  • Unnecessary duplication. Duplication of research is costly for society, and places unnecessary burden on heavily researched people and populations.
  • The data underlying publications are maintained and accessible, allowing for validation of results.
  • Data openness leads to more collaboration and advances research and innovation.
  • Your research is more visible and has greater impact. Publications which allow access to the underlying data get more citations. Greater visibility also allows for better validation and scrutiny of findings.
  • Other researchers can cite your data, which will drive up your citation number and increase your influence in your field of research.
  • Storing your data in a public repository also provides you with secure and ongoing storage that may otherwise not be available to you. -Foster Open Science





As Open as Possible, as Closed as Necessary

There are many circumstances where open data could be harmful:

  • Data on human health

  • Location of endangered species or archaeological sites

  • Data that individuals or groups do not want to be public

    CARE Principles

    The CARE Principles for Indigenous Data Governance were drafted at the International Data Week and Research Data Alliance Plenary co-hosted event "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop," 8 November 2018, Gaborone, Botswana.

    Collective Benefit

    • C1. For inclusive development and innovation
    • C2. For improved governance and citizen engagement
    • C3. For equitable outcomes

    Authority to Control

    • A1. Recognizing rights and interests
    • A2. Data for governance
    • A3. Governance of data

    Responsibility

    • R1. For positive relationships
    • R2. For expanding capability and capacity
    • R3. For Indigenous languages and worldviews

    Ethics

    • E1. For minimizing harm and maximizing benefit
    • E2. For justice
    • E3. For future use
  • Data for making lethal weapons



Open vs. FAIR

FAIR does not demand that data be open: See one definition of open: http://opendefinition.org/

Open data does not necessarily mean it is FAIR



Additional Info










Open Educational Resources

open educational resources

Definitions

"Open Educational Resources (OER) are learning, teaching and research materials in any format and medium that reside in the public domain or are under copyright that have been released under an open license, that permit no-cost access, re-use, re-purpose, adaptation and redistribution by others." - UNESCO

Wikipedia definition

Digital Literacy Organizations
  • The Carpentries - teaches foundational coding and data science skills to researchers worldwide
  • EdX - Massively Open Online Courses (not all open) hosted through University of California Berkeley
  • EveryoneOn - mission is to unlock opportunity by connecting families in underserved communities to affordable internet service and computers, and delivering digital skills trainings
  • ConnectHomeUSA - is a movement to bridge the digital divide for HUD-assisted housing residents in the United States under the leadership of national nonprofit EveryoneOn
  • Global Digital Literacy Council - has dedicated more than 15 years of hard work to the creation and maintenance of worldwide standards in digital literacy
  • IndigiData - training and engaging tribal undergraduate and graduate students in informatics
  • National Digital Equity Center a 501c3 non-profit, is a nationally recognized organization with a mission to close the digital divide across the United States
  • National Digital Inclusion Allaince - advances digital equity by supporting community programs and equipping policymakers to act
  • Net Literacy
  • Open Educational Resources Commons
  • Project Pythia is the education working group for Pangeo and is an educational resource for the entire geoscience community
  • Research Bazaar - is a worldwide festival promoting the digital literacy emerging at the centre of modern research
  • TechBoomers - is an education and discovery website that provides free tutorials of popular websites and Internet-based services in a manner that is accessible to older adults and other digital technology newcomers
Educational Materials










Open Methodology


Definitions

"An open methodology is simply one which has been described in sufficient detail to allow other researchers to repeat the work and apply it elsewhere." - Watson (2015)

"Open Methodology refers to opening up methods that are used by researchers to achieve scientific results and making them publicly available." - Open Science Network Austria



Sharing Research Computer Code

Scientists around the globe are creating computer code for scientific analysis. These are valuable contributions that need to be shared!

Platforms like GitHub and GitLab are ideal for collaboratively developing code and sharing with the open internet. In FOSS, we will show you how to use Github for sharing code, documentation, hosting websites, and software version control.

github gitlab





Publishing Your Methods or Protocols

Platforms for Publishing Protocols & Bench Techniques





PreRegistration

Preregistration is detailing your research and analysis plan and submitting it to an online registry before you engage in the research.

open science

PreRegistration in the Research Life Cycle

Why Do This?

Preregistration makes your process more open and records the difference between your initial research plan what you end up actually doing.

Preregistration separates hypothesis-generating (exploratory) from hypothesis-testing (confirmatory) research. Both are important. But the same data cannot be used to generate and test a hypothesis, which can happen unintentionally and reduce the credibility of your results.

It also helps us avoid practices like p-hacking or Hypothesizing After the Results are Known(HARKing).


Additional Info

Read this publication by Nosek et al. 2018

Open Science Framework Preregistration https://www.cos.io/initiatives/prereg










Open Peer Review



Definitions

Open peer review is an umbrella term for a number of overlapping ways that peer review models can be adapted in line with the aims of Open Science, including making reviewer and author identities open, publishing review reports and enabling greater participation in the peer review process.

-Ross-Hellauer et al. (2017)


Wikipedia's definition




Traditional Closed Peer-Review System

close peer-review

  • Throughout and after the process, the author remains unaware of the reviewers' identities, while the reviewers know the identity of the authors.
  • All communications between authors, reviewers and editors remains private




Complaints with the Traditional Closed Peer-Review System

  • Unreliable and Inconsistent
  • Delays and Expense
  • Lack of Accountability and Risks of Subversion
  • Social and Publication Biases
  • Lack of Incentives

Ross-Hallauer 2017





Open Peer-Review Ideas

open science

Open Peer Review Options at PLOS




Defenders of the Traditional Peer-Review System




Example Open Peer-Review Systems

F1000Research An open research publishing platform that offers open peer review and rapid publication. The article from Ross-Hellauer et al. (2017) has open peer-reviews.


Platforms for Reviewing Preprints










Open Source Software

Definitions

"Open source software is code that is designed to be publicly accessible—anyone can see, modify, and distribute the code as they see fit. Open source software is developed in a decentralized and collaborative way, relying on peer review and community production." - Red Hat

Wikipedia definition



Research science (and also many companies) rely on open source software to operate



Open Source Software



When you create a new software, library, or package, you become its parent and guardian.

xkcd

Image Credit: XKCD Dependency










WHY do Open Science?

A paper from Bartling & Friesike (2014) posits that there are 5 main schools of thought in Open Science, which represent 5 underlying motivations:

  1. Democratic school: primarily concerned with making scholarly work freely available to everyone
  2. Pragmatic school: primarily concerned with improving the quality of scholarly work by fostering collaboration and improving critiques
  3. Infrastructure school: primarily focused on the platforms, tools, and services necessary to conduct efficient research, collaboration, and communication
  4. Public school: primarily concerned with societal impact of scholarly work, focusing on engagement with broader public via citizen science, understandable scientific communication, and less formal communication
  5. Measurement school: primarily concerned with the existing focus on journal publications as a means of measuring scholarly output, and focused on developing alternative measurements of scientific impact

fecher_friesike

In Bartling & Friesike (2014) Open Science: One Term, Five Schools of Thought

We have added another school of thought

  1. Compliance school: government, universities, and granting agencies have embraced Open Science and are mandating some elements (e.g., data sharing with publications)








Discussion Questions

Which of the pillars of Open Science is nearest to your own heart?

Open Access Publications

Open Data

Open Educational Resources

Open Methodology

Open Peer Review

Open Source Software

Are any of the pillars more important than the others?
Are there any pillars not identified that you think should be considered?
What characteristics might a paper, project, lab group require to qualify as doing Open Science
What are some barriers to you, your lab group, or your domain doing Open Science?
What motivates you to do Open Science?
Do you feel that you fall into a particular "school"? If so, which one, and why?
Are there any motivating factors for doing Open Science that don't fit into this framework?








turingway nasatops foster carpentries cos

Open Scholarship Grassroots Community Networks

International Open Science Networks
US-based Open Science Networks
  • CI Compass - provides expertise and active support to cyberinfrastructure practitioners at USA NSF Major Facilities in order to accelerate the data lifecycle and ensure the integrity and effectiveness of the cyberinfrastructure upon which research and discovery depend.
  • Earth Science Information Partners (ESIP) Federation - is a 501©(3) nonprofit supported by NASA, NOAA, USGS and 130+ member organizations.
  • Internet2 - is a community providing cloud solutions, research support, and services tailored for Research and Education.
  • Minority Serving Cyberinfrastructure Consortium (MS-CC) envisions a transformational partnership to promote advanced cyberinfrastructure (CI) capabilities on the campuses of Historically Black Colleges and Universities (HBCUs), Hispanic-Serving Institutions (HSIs), Tribal Colleges and Universities (TCUs), and other Minority Serving Institutions (MSIs).
  • NASA Transform to Open Science (TOPS) - coordinates efforts designed to rapidly transform agencies, organizations, and communities for Earth Science
  • OpenScapes - is an approach for doing better science for future us
  • The Quilt - non-profit regional research and education networks collaborate to develop, deploy and operate advanced cyberinfrastructure that enables innovation in research and education.
Oceania Open Science Networks








Self Assessment

True or False: All research papers published in the top journals, like Science and Nature, are always Open Access?
Answer

False

Major Research journals like Science and Nature have an "Open Access" option when a manuscript is accepted, but they charge an extra fee to the authors to make those papers Open Access.

These high page costs are exclusionary to the majority of global scientists who cannot afford to front these costs out of pocket.

This will soon change, at least in the United States. The Executive Branch of the federal government recently mandated that future federally funded research be made Open Access after 2026.

True or False: an article states all of the research data used in the experiments "are available upon request from the corresponding author(s)," meaning the data are "Open"
Answer

False

In order for research to be open, the data need to be freely available from a digital repository, like Data Dryad, Zenodo.org, or CyVerse.

Data that are 'available upon request' do not meet the FAIR data principles.

Using a version control system to host the analysis code and computational notebooks, and including these in your Methods section or Supplementary Materials, is an example of an Open Methodology?
Answer

Yes!

Using a VCS like GitHub or GitLab is a great step towards making your research more reproducible.

Ways to improve your open methology can include documentation of your physical bench work, and even video recordings and step-by-step guides for every part of your project.

You are asked to review a paper for an important journal in your field. The editor asks if you're willing to release your identity to the authors, thereby "signing" your review. Is this an example of "Open Peer Review"?
Answer

Maybe

There are many opinions on what 'open-review' should consist of. A reviewer signing their review and releasing their identity to the authors is a step toward a more open process. However, it is far less open than publishing the peer-review reports online next to the final published paper.

You read a paper where the author(s) wrote their own code and licensed as "Open Source" software for a specific set of scientific tasks which you want to replicate. When you visit their personal website, you find the GitHub repository does not exist (because its now private). You contact the authors asking for access, but they refuse to share it 'due to competing researchers who are seeking to steal their intellectual property". Is the software open source?
Answer

No

Just because an author states they have given their software a permissive software license, does not make the software open source.

Always make certain there is a LICENSE associated with any software you find on the internet.

In order for the software to be open, it must follow the Open Source Initiative definition