With a slate of excellent presenters and over 160 registrants for this 19 April 2023 Forum, we held a robust discussion around the value of open data software sharing in a world of open science from the researcher perspective and those who manage support systems.
Much gratitude goes to our generous sponsors of this forum: Association for Computing Machinery, AIP Publishing, GeoScienceWorld, Silverchair, and STM.
Session one focused on the value of open data software sharing in a world of open science from a researcher perspective, and was moderated by Kristina Vrouwenvelder, Program Manager of Open Science with the American Geophysical Union. The session included three researchers (Kathleen Gregory, Researcher, University of Vienna; Mumin Oladipo, Researcher, KolaDaisi University; Elise F. Zipkin, Associate Professor, Ecology, Evolution, and Behavior Program Director, Michigan State University), all practitioners of open science.
The discussion highlighted what openness means to the researcher: that open science practices are transformative, remove barriers to sharing science globally, and increase reproducibility and transparency. But the paths towards incorporating these open science practices in research aren’t always obvious or easy, and the range of new concepts, approaches, and tools that researchers are presented with can be complex and overwhelming.
Dr. Kathleen Gregory, Researcher, University of Vienna, in the unique position of investigating practices of open science, shared a meta researcher’s perspective. Kathleen proposed that we should ask the following three questions (or as she phrased them ‘provocations’) and take practical steps to encourage, monitor and support open data.
- Provocation 1: Does ‘openness’ depend on context and demand flexibility?
- Provocation 2: What is our responsibility around attribution and credit?
- Provocation 3: What is the value of data management and data sharing?
Kathleen asked: is openness something that is binary? Is something either open or closed or is openness dependent on context and as such really demands more flexible approaches? To try to answer some of these questions, she shared the results of a recent study which is part of the Meaningful Data Counts research project being conducted at the Scholarly Communications Lab in Canada. In a survey of researchers across disciplines, 466 of the nearly 2500 respondents said that they do not actually reuse data. Their reasons for not doing so were varied, including that data reuse is not relevant to their research methods or because they get more credit for creating their own data. See 00:04:48 in the recording for more details on this 2023 study by Gregory, K., Ninkov, A., Ripp, C., Roblin, E. Peters, I., Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. Preprint: https://doi.org/10.5281/zenodo.7555266
Speaking from her own experiences as someone who practices open science, Kathleen thinks one of the real values in data management planning and in sharing data openly is that it forces us to slow down, and provides an opportunity to really think about the choices that we are making in research and with our own data.
Hear Kathleen’s presentation at [00:04:48]
Mumin Oladipo, Researcher, KolaDaisi University, spoke of the benefits of open data and software sharing and noted that he has been a beneficiary of open science. To begin, he defined Open Data and Software to him as “data and software that can be/are universally used or distributed free of charge. Open science is the practice of science in which others can collaborate and contribute and it’s freely available under terms that enable it to be reduced or distributed.” Oladipo believes there are four main areas to consider:
- What are the needs for open data software? Improves accessibility and creates opportunities, especially from the perspective of researchers in developing countries who are without access to certain information.
- What are the benefits of open data and software? Platform sharing of data and making it available to those who wouldn’t have had such opportunities, and to contribute to the knowledge base.
- Challenges of open data and software? Inadequate infrastructure, readiness, government and corporate funding, and distribution restriction, especially for early career researchers.
- Impact of Open Access and Software? Creates more opportunities for citations which increases reputation, and availability of resources to help expand the knowledge base.
Oladipo also noted that using GitHub to collaborate amongst his team throughout the research process, as well as with others such as editors and reviewers after submission, has proven to be a very useful tool and helped him and his team overcome some challenges.
Hear Oladipo’s presentation at [00:16:24]
Dr. Elise Zipkin, Department of Integrative biology at Michigan State University, shared how her lab deals with open science and research which focuses on estimating the abundance and distributions of species and how those change with big structure – big things. She believes a huge benefit of open data and open science, in general, has been that it has allowed her lab to answer and look at those kinds of questions and challenges around large data. She believes the most important things about open data are that it allows us to continue to ask and answer questions that are just unanswerable.
Some challenges that Elise’s lab finds are around math and statistical models they’ve developed and making them available to the public. They have a complete process to clean the code and make it accessible, referring back to the earlier comment by Kathleen “data open or not, it’s really not binary”. And that we should be meeting each person where they are and have policies that can be developed. This helps each person leading the project know at the onset that everything they are going to do will be made public.
In closing, Elise shared “we need to continue thinking about how to make data open and why it’s valuable, keep updating the process, and know that it’s something that we have to think about and work on regularly.“
Hear Elise’s presentation at [00:28:09]
Watch the Q&A session [00:35:04] to hear comments from speakers about flexibility overall to meet the researcher where they are in their process of sharing data. What do they think the role of a professional society, such as AGU, is in making sure that we keep this flexibility inherent in our processes while still having guidelines for researchers? As well as a discussion about documentation and how we can promote good reusability.
The outcome of two questions asked during the break were –
How does your organization provide guidance on Open Research practices?
➔ LibGuide on open research, open access, open data, open ed resources, etc? | ➔ Policy Documentation |
➔ Publishing Guidelines | ➔ Data Management and sharing service |
➔ Article templates, some web guidance, meetings, posts | ➔ Webpages for sharing guidance for data and software citation policies |
➔ Data Editors direct authors about open and sharing data and software | ➔ |
What challenges have you seen adopting open research practices?
➔ Time and resources | ➔ Publisher side barriers: implementation |
➔ Fear of being wrong / loss of control/ loss of ownership | ➔ Specific guidelines for disciplines |
➔ Communicating the value of adopting practices | ➔ No rewards |
➔ Complex research groups don’t always speak the same language | ➔ Integration with existing platforms |
➔ Time / culture / not knowing benefits | ➔ Balancing openness and patentability |
Session two was introduced by Shelley Stall, VP of Open Science Leadership at AGU, and focused on data software sharing in a world of open science from the support system perspective. Shelley introduced the three speakers for session two, Danie Kincade, Director of the Biological and Chemical Oceanography Data Management Office, Woods Hole Oceanographic Institute; Dr. Allen Pope, Program Director for Polar Cyberinfrastructure at the National Science Foundation; Lauren Kmec, Managing Editor, Science Magazine.
Danie Kincade, Director of the Biological and Chemical Oceanography Data Management Office, Woods Hole Oceanographic Institute, wanted to ensure she impressed the idea that repositories, and more specifically disciplinary repositories that serve a particular community, fill a certain niche in this open science enterprise for curating and publishing open data. She specifically drilled into three particular roles-
- Repositories partner with the actual researchers, to not only educate them on better data hygiene and data management practices but to do the heavy lift of curating and then publishing the data in a way that’s fair.
- Provide a glue to a research community in that we can help shepherd or steer data to and from one particular research effort to all of its disciplinary repositories and then in a federated way pull that data back together for reuse in a very holistic manner.
- Engage in this broader data publishing community to really drive effective data sharing forward.
Taking a holistic picture of what repositories can do for open science, Danie stated: “When researchers arrive, they’re driven by the need to satisfy funding requirements so they don’t fully appreciate the value that is provided by sharing their data. Whether that’s to them as an individual and they are satisfying their funding requirements, but they’re also getting credit for the hard work that they do by citing data or sharing data which can lead to increased citations and collaborations. By contributing the data they’re actually adding to a community resource that’s available for reuse.”
Contributing data actually contributes to the broader community resource and it also ensures transparency of results for peer review. At a societal level that transparency can boost public confidence in the scientific process, and data can go on to contribute to Resource Management and policy efforts. It’s also then accessible to the public and can be used for education. Danie also pointed to making data FAIR (Findable Accessible Interoperable and Reusable) and noted that the principles apply not only to making those data FAIR for human use but also for machine use. There are very few researchers who could make their data FAIR for automated workflows and computer analyses, so the repository needs to help the researcher and this can be done using technologies and best practices.
Repositories face various challenges such as, size, collection type, and vocabularies when stewarding content through the process. Repositories help researchers with these throughout the data life cycle: at the time of proposal, providing guidance on data management planning and how to format which improves data quality and interoperability to ensure very rich metadata necessary for reuse and peer review. This ultimately provides access to all the data information through related information, tracks the usage of that data for attribution to the PI, and archives to ensure the data will live on. Educating researchers in better data hygiene and data management practices helps them improve their own processes while repositories can be contributing to a rich resource that feeds the scientific endeavor.
Hear Danie’s presentation at [1:07:24]
Dr. Allen Pope, Program Director for Polar Cyberinfrastructure at the National Science Foundation, spoke about the value of open data and software sharing from his role at NSF. He gave some examples of different programs that support open data and software, noting each having a different flavor. From the U.S federal government and NSF in particular, there’s a growing importance placed on open, accessible, and reproducible science across all government-supported science. Allen referenced the memo from the Office of Science and Technology which he believes is a baseline for a lot of the work that NSF does, especially coming from the GEO world and the FAIROS RCNs (Findable Accessible Interoperable Reusable | Research Coordination Networks). These programs are supporting different ways to bring scientists together to share how they are doing with open science and making their data and code more available. 2023 is the federal year of open science, you can see what the different government agencies are doing at open.science.gov.
Within the Geoscience Directorate there is a Cyberinfrastructure Working Group. Some of the goals of that working group are to advance geoscience research, promote openness and participation through open science, and specifically to pursue AI and ML innovation. The group’s goals are not decoupled from the disciplinary geoscience goals, rather they are interlinked and moving forwards together. More information is at https://www.nsf.gov/geo/geo-ci/index.jsp.
Allen emphasized that software needs people to assert open science, but it’s not just the tools, training is a big part of any program and it’s about building a multi-faceted future science workforce. We want to build adaptable workflows; we want to have particular use cases so that we know the tools that are going to get used. We can translate lessons from some communities to other communities to really make the most of those investments.
Hear Allen’s presentation at [1:19:17]
Lauren Kmec, Managing Editor, Science Magazine, offered the journal perspective on data and software sharing both broadly and with a more specific focus on some recent developments. Her talk largely focused on data but also the overall message generally applying to software and code sharing. Noting many different groups have issued recommendations pertaining to open data, some are broad in scope. Examples include the widely applicable FAIR principles (https://www.go-fair.org/fair-principles/), as well as discipline-specific recommendations, such as the MIBBI guidelines. Lauren believes the question that needs to be asked is what level of familiarity do authors have? Even if authors are familiar with the accepted standards for their field of publishing, a multi-disciplinary journal may bring requirements that extend beyond what they know. In that regard, publishers have a responsibility to both emphasize open data principles and provide clear guidance.
Lauren referenced a summary of eight standards from a 2014 collaborative reproducibility workshop study that included Journal editors, disciplinary experts, and funder representatives. The goal of the workshop was to formulate shared standards for open practices across journals. “A year after these went public more than 500 journals had at least begun to implement the guidelines,” Lauren said.
Lauren also stated that we need to think about data retention policies that allow authors to alter their data but could have unintended coincidences. All of this begs the question of whether there can and perhaps even should be stronger relationships and more intentional collaborations between journals and repositories? There are obvious benefits such as quality control and data curation ensuring data sets are machine readable and have robust metadata, and a permanently accessible home with proper citations and licensing. Having these clearly strengthens the connection between the article and its data set and promotes discoverability and reuse.
Positive impacts of such collaborations may include making data seamlessly and privately available to editors and reviewers, data curation service, enhanced equity at no cost to the author, and other benefits of simplifying the process for authors, promoting reproducibility, and encouraging collaborations. There is a lot of behind-the-scenes work required for smooth implementations of translating overarching principles to daily practice.
Hear Lauren’s presentation at [1:32:41]
Make sure to watch the Q&A session [01:41:53] to hear a discussion about what it means to have the right Persistent Identifiers (PID)?
In closing, the CHORUS community effort is dedicated to making open research work and our goals to help our main stakeholder’s publishers, institutions, and funders scale their OA compliance. We work to develop metrics about open data to improve the overall quality of metadata relating to open research. We also host forums and workshops like today’s forum to connect the stakeholders so they can learn and hopefully build trust with each other. We again thank our generous sponsors of this forum: Association for Computing Machinery, AIP Publishing, GeoScienceWorld, Silverchair, and STM.