Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide

Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation School of Information Sciences, University of Tennessee, Knoxville, Tennessee, United States of America

Roles Data curation, Formal analysis, Project administration, Writing – original draft, Writing – review & editing

Affiliation Center for Information and Communication Studies, University of Tennessee, Knoxville, Tennessee, United States of America

ORCID logo

Roles Conceptualization, Formal analysis, Funding acquisition, Project administration, Writing – original draft, Writing – review & editing

Roles Conceptualization, Formal analysis, Writing – review & editing

Affiliation University of Idaho Libraries, University of Idaho, Moscow, Idaho, United States of America

Roles Formal analysis

Affiliation College of Communication and Information, University of Tennessee, Knoxville, Tennessee, United States of America

Roles Conceptualization, Formal analysis

Affiliation Departments of Biology and Environmental Science and Sustainability, Widener University, Chester, Pennsylvania, United States of America

Affiliation North Carolina State University Libraries, North Carolina State University, Raleigh, North Carolina, United States of America

Affiliation UIC University Library, University of Illinois at Chicago, Chicago, Illinois, United States of America

  • Carol Tenopir, 
  • Natalie M. Rice, 
  • Suzie Allard, 
  • Lynn Baird, 
  • Josh Borycz, 
  • Lisa Christian, 
  • Bruce Grant, 
  • Robert Olendorf, 
  • Robert J. Sandusky

PLOS

  • Published: March 11, 2020
  • https://doi.org/10.1371/journal.pone.0229003
  • Reader Comments

Fig 1

With data becoming a centerpiece of modern scientific discovery, data sharing by scientists is now a crucial element of scientific progress. This article aims to provide an in-depth examination of the practices and perceptions of data management, including data storage, data sharing, and data use and reuse by scientists around the world.

The Usability and Assessment Working Group of DataONE, an NSF-funded environmental cyberinfrastructure project, distributed a survey to a multinational and multidisciplinary sample of scientific researchers in a two-waves approach in 2017–2018. We focused our analysis on examining the differences across age groups, sub-disciplines of science, and sectors of employment.

Most respondents displayed what we describe as high and mediocre risk data practices by storing their data on their personal computer, departmental servers or USB drives. Respondents appeared to be satisfied with short-term storage solutions; however, only half of them are satisfied with available mechanisms for storing data beyond the life of the process. Data sharing and data reuse were viewed positively: over 85% of respondents admitted they would be willing to share their data with others and said they would use data collected by others if it could be easily accessed. A vast majority of respondents felt that the lack of access to data generated by other researchers or institutions was a major impediment to progress in science at large, yet only about a half thought that it restricted their own ability to answer scientific questions. Although attitudes towards data sharing and data use and reuse are mostly positive, practice does not always support data storage, sharing, and future reuse. Assistance through data managers or data librarians, readily available data repositories for both long-term and short-term storage, and educational programs for both awareness and to help engender good data practices are clearly needed.

Citation: Tenopir C, Rice NM, Allard S, Baird L, Borycz J, Christian L, et al. (2020) Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide. PLoS ONE 15(3): e0229003. https://doi.org/10.1371/journal.pone.0229003

Editor: Sergi Lozano, Universitat de Barcelona, SPAIN

Received: April 29, 2019; Accepted: January 28, 2020; Published: March 11, 2020

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: All relevant data are available in Dryad ( 10.5061/dryad.m27m0b4 ).

Funding: The project was funded as part of the National Science Foundation, Division of Cyberinfrastructure, Data Observation Network for Earth (DataONE) NSF award #1430508 under a Cooperative Agreement, William Michener, P.I. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The DataONE Usability & Assessment Working Group contributed to all aspects of this project.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Science is increasingly data intensive and recent technological developments, computational abilities, and new digital environments are placing data into the center of scientific discovery [ 1 ]. This “Fourth Paradigm” of data-intensive scientific discovery is built on three pillars of “capture, curation, and analysis” [ 2 ]. With data becoming a centerpiece of modern scientific discovery, data sharing by scientists is now a crucial element of scientific progress.

Data sharing is also a foundation for Open Science, the initiative “to make scientific research and data accessible to all. It includes practices such as publishing open scientific research, campaigning for open access and generally making it easier to publish and communicate scientific knowledge… [including] ways to make science more transparent and accessible during the research process” [ 3 ]. On January 15, 2019, U.S. President D. Trump signed into law H.R. 4174 , the Foundations for Evidence-Based Policymaking Act of 2018, which supported implementation of the principles of Open Science in the United States: “[the law] improves evidence-based policy through strengthening Federal agency evaluation capacity; furthering interagency data sharing and open data efforts; and improving access to data for statistical purposes while protecting confidential information [ 4 ].”

The goals of Open Science include greater interdisciplinary scientific collaboration, accessibility of data, and greater reproducibility and transparency of scientific work. These are dependent on increased sharing of scientific data and open access data. Data sharing is increasingly seen as an essential driver of the direction in which science is moving worldwide and across disciplines [ 5 , 6 , 7 ].

Sound data management practices are required to achieve the goals of Open Science. Best practices in data management require scientists to share their data by depositing datasets in trusted subject, governmental, or institutional repositories, by providing metadata that makes their data findable, and by citing or acknowledging their reuse of data. Understanding the actual behaviors of scientists is key to understanding what can be done to support the scientific community using the best practices.

Data management best practices are required throughout the full data lifecycle ( Fig 1 ) and are well described in the “FAIR Guiding Principles for scientific data management and stewardship [ 8 , 9 ]” which outlines a set of guiding principles to make data Findable, Accessible, Interoperable and Reusable. FAIR provide “guidance for scientific data management and stewardship and are relevant to all stakeholders in the current digital ecosystem. They directly address data producers and data publishers to promote maximum use of research data [ 10 ].” FAIR has been supported by many communities including the G20 Summit in 2016 and the Association of European Research Libraries (LIBER) in 2017 [ 10 , 11 , 12 , 13 ]. A coalition of groups representing the Earth and space sciences headed by the American Geophysical Union (AGU) in 2017 set out to develop standards that will connect researchers, publishers, and data repositories in the Earth and space sciences to enable FAIR principles [ 14 ].

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0229003.g001

In 2018, LIBER published an Open Science Roadmap that outlined the specific actions that libraries could take to promote the concept and implementation of open science [ 11 ].

Other U.S. and international organizations, societies, and projects have also endorsed and are actively moving toward supporting these concepts and principles.

DataONE is an environmental cyberinfrastructure project focused on meeting “the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data… [created to] ensure the preservation, access, use and reuse of multi-scale, multi-discipline, and multi-national science data via three primary cyberinfrastructure elements and a broad education and outreach program.” DataONE has been supported by the U.S. National Science Foundation since 2009. DataONE Usability & Assessment Working Group (UAWG) has been studying scientists’ attitudes towards and practices with data sharing and reuse for a decade [ 5 , 6 ]. Our paper reports on the results of our third survey conducted with scientists from a variety of subject disciplines and in many different countries.

Our study explores scientists’ data sharing attitudes and practices by focusing on specific stages of the data lifecycle including describe, preserve, and discover. It also examines where scientists store data both in the short and long-term, what metadata standards scientists use (if any) to describe data, and what barriers and incentives scientists face when data sharing, finding data and reusing data.

An analysis of the subset of this data, focused particularly on practices and attitudes of geophysicists and distributed to the members of the American Geophysical Union (1,372 responses from 116 countries) was published last year [ 15 ].

National and institutional factors impacting data sharing

In the past decade, governments, funding agencies, and publishers have begun implementing more rigorous open access data policies and mandates [ 16 , 17 ]. In the United States, the White House Office of Science and Technology published a memorandum in 2013 directing all federal agencies to increase public access to research and specifically allow open access to data and scholarly publications supported by federal funding [ 18 ]. A number of major U.S. federal funders have increased access to the results of funded research [ 8 ]) through a variety of policies including requiring funded projects to share data underlying the research when the results of the research are published. Some U.S. funding agencies, including the Centers for Disease Control (CDC), the Department of Defense (DOD), the Food and Drug Administration (FDA), the National Aeronautics and Space Administration (NASA), and the United States Agency for International Development (USAID), require grant recipients to submit a yearly data sharing plan as part of the application process.

The National Science Foundation [ 19 ] requires applicants to submit a data management plan as a part of the funding application, and encourages data sharing. “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing [ 19 ].” The U.S. Geological Survey (USGS) Public Access Plan “outlines a framework for activities to increase public access to scholarly publications and digital scientific data resulting from research funded by the USGS [ 20 ]” and requires providing free public access to scientific data and information products that are developed or funded by USGS.

Over the past decade, the European Union and its member states have been implementing comprehensive open data policies. The European Strategy Forum on Research Infrastructures in 2010 stated that “data in their various forms (from raw data to scientific publications) will need to be stored, maintained, and made available and openly accessible to all scientific communities [ 1 ].” In 2012, the European Commission published a recommendation and a roadmap encouraging all member-states to make the publicly-funded research data and results accessible to the public [ 21 ]. Most public sector data in Europe is open, meaning it is widely accessible and available for reuse, sometimes with no restrictive conditions [ 22 ]. In addition, the EU has invested in digital public data infrastructure, launching two major online portals: the Open Data Europe Portal (ODP open-data.europa.eu/fr/data) “that harvests metadata from public sector portals throughout Europe… [and] focuses on data made available by European countries” and the European Data Portal (EDP europeandataportal.eu) which contains datasets collected and published by the European Institutions [ 23 ]. The European Commission in 2018 published a report and an action plan focused on implementing FAIR and providing recommendations and actions for stakeholders in Europe and beyond [ 23 ].

While requirements and regulations for open access to data vary regionally, data sharing practices of researchers worldwide are also affected by policies and regulations implemented by major stakeholders, such as publishers, journals and repositories. In addition to data sharing mandates by governments and funding agencies, changing professional codes of ethics and requirements by journals and publishers toward open data are resulting in broader support from research communities towards the practice [ 24 ]. Private foundations and other major funding organizations that require data sharing by researchers include, among others, the Bill and Melinda Gates Foundation, American Heart Association, and Howard Hughes Institute [ 25 ].

Many journals and societies (e.g., American Geophysical Union ) require the deposit in appropriate public repositories of all data used in the studies reported in their publications [ 25 , 26 ]. The National Academy of Sciences and its affiliates recommend that publishers implement open data access policies, and require researchers who want to be published to share their data [ 27 ].

Individual factors impacting data sharing

Individual attitudes and practices related to data sharing may be at odds with journal or government mandates. The attitudes and practices of scientists have been investigated in a number of international and regional studies, which focus on current practices, willingness to share data, possible incentives to share, and perceived barriers [ 5 , 6 , 7 , 28 , 29 , 30 , 31 ].

In the past decade, these studies have shown that scientists increasingly recognize the real benefits of open data [ 32 ]. Our 2011 study found that a majority of the scientists were willing to share their data “if certain conditions are met (such as formal citation and sharing reprints) [ 5 ].” An expanded follow-up survey conducted several years later demonstrated “increased acceptance of and willingness to engage in data sharing, as well as an increase in actual data sharing behaviors [ 6 ].”

A 2016 international survey of scientists involved in environmental research found that 82% of respondents agreed that in their scientific community, open data was “very important,” and 17% thought it was of “intermediate importan(ce)” [ 7 ]. According to the same survey, support for data sharing “arose from research-intrinsic motives ranging from general considerations, i.e., the acceleration of scientific research and applications, to personal motivations, i.e., dissemination and recognition of research results, personal commitment to open data and requests from data users [ 7 ].” A study by Kim and Stanton [ 30 ] confirmed that individual factors that influence data-sharing behavior among scientists include perceived career benefit and career risk, scholarly altruism and the perceived amount of effort. Another study found that scientists seem to be more willing to share their data as a direct response to a request made by their peers, as they perceive this helps to ensure that their data will be cited and used properly [ 32 ].

Even though many funding agencies require data sharing, studies have shown that many scientists do not share their data, even those who receive funding from agencies that require data sharing [ 31 , 6 ]. We found major barriers to sharing data are related to insufficient funding and time constraints to prepare the data for access and reuse [ 5 ] and these barriers are persistent and include concerns about the need to publishing their analysis first [ 6 ]. The Belmont Forum open data survey of environmental scientists produced similar results; the top barriers to publish data included the need to publish first, legal constraints and concerns about loss of credit of recognition [ 7 ]. According to the same survey, the age of the respondents had an impact on perceived barriers; “desire to publish results before releasing data was somewhat more prevalent at early stages of a research career [ 7 ].”

Scientists often explain their reluctance to share data as a concern that their data will be misused or misinterpreted [ 28 , 7 ]. In disciplines that deal with human subjects, patients’ or respondents’ privacy concerns, as well as legal regulations, could be additional barriers to data sharing [ 28 ].

Another significant barrier, according to Gorgolewski, Margulies, and Milham [ 33 ] is researchers’ fears that reuse of their data and the resulting scrutiny by other researchers may reveal errors or discrepancies in the datasets, or in their interpretation.

Factors that impact researchers’ willingness to share data include 1) the extent to which they are trained in data management best practices, 2) the availability of organizational support to assist them, and 3) the extent to which they feel assured that the original datasets will be acknowledged and cited when reused [ 29 ]. Although there may be mandates to share data, the lack of perceived incentives for researchers to spend additional time and effort to prepare datasets for sharing is another reason why more data is not available publicly [ 28 ]. An opportunity to be a co-author on a study that uses their data could be a significant incentive for data sharing because of the need to publish scholarly articles for career advancement in academia [ 34 ]. Results of the Tenopir et al. [ 5 , 6 ] surveys indicated significant disciplinary differences in perceived incentives and barriers, as well as the age of the respondents, as factors in data sharing attitudes and practices. While younger respondents generally expressed more positive attitudes towards data sharing, their responses demonstrated that in reality, they are sharing less of their data than older researchers.

In addition, the type of data used by scientists and the ease with which this data can be reused is affecting data sharing and reuse practices: “reported use of models and remote-sensed data had a large positive effect on reuse behavior [ 35 ]”. Kim et al. confirmed that types of “data sources used by academic researchers were found to have a significant relationship with data sharing and reuse behaviors [ 36 ]”.

While most researchers seem to be satisfied with the initial steps of the data lifecycle (searching for, collecting and short-term storage of data), long-term data preservation is much more challenging and problematic, and most organizations do not provide sufficient training and support for data management [ 5 , 6 ]. Institutional involvement in training and data management assistance is crucial since the lack of skills and knowledge needed to prepare datasets for sharing is one primary reason why scientists choose not to share their data [ 31 , 36 ].

In this study, we used surveys to examine attitudes and practices at three stages of the data lifecycle: description, preservation and discovery. In particular, our questions focused on where data are stored both in the short and long-term, what metadata standards are used (if any) to describe their data, and barriers and incentives to data sharing and discovery.

We sought to answer the following research questions:

  • Where are the scientists storing their research data?
  • Are the scientists satisfied with the process of storing their data during and beyond the life of research projects?
  • How many scientists share data and how much of their data do scientists share?
  • What are the attitudes toward sharing and reusing research data?
  • Do scientists use sound data management practices, such as creating data management plans, providing metadata, and preserving data for the long term?
  • What are the barriers and incentives for sharing research data?
  • How much support and training in data management do organizations offer?
  • How do attitudes and practices of data sharing differ by sub-discipline of science and primary work sector?

For the purposes of analysis in this paper we defined good data practices as those that support the FAIR data principles and are likely to guard against data loss, helps facilitate sharing with other researchers, and helps facilitate long-term curation. Good data practices were assigned when respondents indicate they store data in a repository. In contrast, mediocre data practices are operationally defined as practices that have the potential of making data findable with some additional effort, but do not obviously support FAIR principles, such as storing data in a personal cloud or on an institutional/departmental/PI’s server. Finally, bad data practices are defined as those that put data at risk and make it difficult to find or preserve, such as storing data on a flash drive, personal computer, or on paper.

Material and methods

We conducted a survey distributed in a two-waves approach with the assistance of the American Geophysical Union (AGU) and several partners and colleagues. The first wave of the survey was conducted in 2017–2018, when AGU distributed an email with a link to the instrument to all 62,000 of its members. An email reminder was sent out in August 2017, and the survey was closed on March 2018 with 1372 responses for a response rate of approximately 2.2%.

The second wave was distributed between late 2017 to Spring 2018 by a variety of organizations, including the Ecological Society of America, United States Geological Survey, Elsevier, Wiley, Baltic Association of Media Research, and LabArchives. The survey was also distributed by a number of colleagues in the Middle East, Eastern / Central Europe, and Eastern and Northern Europe. The survey link was disseminated both through email and by posting a link to the survey on Twitter. The second wave of the survey ended on May 11, 2018, with 812 responses. Combined, the two waves of the survey include 2184 responses. Survey data were collected in Qualtrics and housed on a secure server at the University of Tennessee. Researchers used IBM SPSS 25 statistical analysis software package for data analysis.

The study was approved by the University of Tennessee Institutional Review Board (IRB) as an online survey that did not gather personally identifiable information. Findings are reported in aggregate and do not contain any personally identifiable information. The informed consent agreement asked respondents to indicate that they understood the terms and were over 18 years of age. In compliance with the IRB approval for work with human subjects, respondents could skip any question or withdraw from the study at any time.

Research instrument

This survey was developed based on two previous surveys of scientists conducted by members of the DataONE Usability and Assessment working group [ 5 , 6 ] For consistency, questions from previous surveys were reviewed and most previous questions were kept in the new survey. Some new questions and potential responses were added to reflect recent changes and developments in data management. Some questions that were less relevant were removed. The survey consisted of two parts, including the demographics section and a section exploring data use and reuse, data storage, and data sharing.

In the section focused on demographics, respondents were asked about their primary sector of employment, primary subject discipline, type of research activity, primary place of employment, as well as the year of birth, highest degree attained, and their gender. We decided to use chronological age instead of career stage to be consistent with our previous studies to see whether age has an influence on data attitudes or practices.

In order to explore a respondent’s practices and attitudes towards data sharing, use, and reuse, the survey included multiple questions focused on each of those topics. The survey consisted of questions asking participants to express the degree to which they agree or disagree with various statements, as well as several yes/no questions.

To discover the attitudes and practices of scientists regarding data, respondents were asked questions focused on various aspects of data management, including their current data practices, available support, practices of organizations and institutions that fund or employ them, data reuse, sharing and barriers, metadata, and institutional frameworks. In the event that the selection of possible responses did not accurately represent the respondents’ opinion, an option to select "Other" and write-in response was included in several questions.

Almost two-thirds of the respondents who answered the question regarding their primary work sector (see Fig 2 ) were employed in the academic sector (72.8%), followed by government (16.6%), commercial (3.6%), non-profit (4.3%) and other (2.7%).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.g002

Our analysis combines the described disciplines into four categories. The largest group of respondents are in the physical sciences (43.3%), a quarter are in life sciences (26.3%), and 10.7% say their discipline was computer science and engineering (see Table 1 ). The rest of the respondents (19.8%) represent various other disciplines. For the full list of disciplines, see S1 Table .

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t001

Over a third of respondents said they primarily conduct field research (34.6%); another third primarily engage in modeling (27.9%); and a quarter conduct lab research (24.4%) (see Fig 3 ).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.g003

Respondents identified 87 different countries when asked about the location of their primary place of employment with 44.6% of respondents indicating their primary employment was in the United States. Only eight countries had over 50 respondents. Our analysis groups countries of employment into six regions. Almost half of the respondents were employed in U.S./Canada (47.1%), followed by Europe/Russia (24.7%); Asia/Southeast Asia (11.0%); Africa/Middle East (6.6%), South and Central America (5.9%), and Australia/New Zealand (4.8%) (see Fig 4 ). Analysis of the regional differences of this survey will be published in a forthcoming manuscript.

thumbnail

https://doi.org/10.1371/journal.pone.0229003.g004

The original year of birth variable was transformed into age and our analysis groups these into five different clusters. Of the respondents who answered this question (n = 1984), 15.2% are 29 years old and under, 28% are 30–39 years old; 21.8% are 40–49 years old; 18.6% are 50–59 years old; and 16.3% are over 60.

Among those respondents who answered the gender question (n = 2003), 65.5% specified male, 32.2% specified female, and 2.3% preferred not to answer.

Of those respondents who answered the question “Does your primary funding agency require you to provide a data management plan?" (n = 1966), almost half (48.6%) said yes, just over a third said no (38.2%), and 13.2% did not know if their primary funding agency required a data management plan.

Current data practices

Using best practices when storing data is important both during the life of the project and beyond the life of the project. Short-term storage was defined as “storing my data during the life of the project” while long-term was “storing my data beyond the life of the project.” Typically, data storage practices vary and may reflect convenience rather than consideration of sound storage practices.

Responses to the question “How much of your data do you currently store or deposit in the following locations?” indicated that a personal computer was the primary location for data storage (61.3% of respondents store all or most of their data on a personal computer), followed by the respondent’s institution’s server (42.9%) and USB/external drive (29.8%). This question was not specifically focused on short-term or long-term data storage practices, but was referring to general data storage practices. Safer storage options such as cloud storage or repositories of various kinds are used far less often for data storage ( Table 2 ).

thumbnail

(This is a combined number of the respondents who store most or all of the data in the specified location. Each response option was coded as a separate question, since the respondents may store data in various locations simultaneously, and answer “Not sure” is omitted from the table).

https://doi.org/10.1371/journal.pone.0229003.t002

To better differentiate between what are considered better data storage practices than others, we recombined various data storage and deposit practices into three groups. We defined good data practices as storing data in a repository of some kind; mediocre data practices as storing data in the personal cloud, or on an institutional/departmental/PI’s server; and bad data practices as storing data on a flash drive, personal computer, or on paper. Fig 5 represents the number of responses for each data storage group and clearly shows that mediocre or bad data storage practices are much more common. Good data practices were the least popular option, with over half of the respondents indicating that they did not store any of their data in any kind of repository.

thumbnail

(The number is a sum of answers “Most data” and “All data.”).

https://doi.org/10.1371/journal.pone.0229003.g005

Data storage practices varies greatly across age groups (Chi-Square = 21,063; p = .000): the younger the age of the respondents, the less likely they were to adhere to good data practices. While over a third of respondents who were fifty years or older demonstrated good data practices, this number was lower in each of the younger age groups; in the age group of 40–49 year old it was under a quarter of respondents, and among those who were 30–39 years old, it was 21.6%. In the youngest age group (under 29 years old) only 19.2% of respondents demonstrated good data storage practices.

There are also disciplinary differences (Pearson Chi-Square = 52,949; p = .000) in stated data storage practices. In Marine Sciences nearly half (46.5%) of respondents report good data storage practices, followed by Space and Planetary Science (36.4%), Atmospheric Science (31%) and Environmental Science (28.3%). In the rest of the disciplines, the number of scientists who report good data storage practices is between a quarter and a fifth of the respondents. Biology was the last discipline on the list, with only 11.4% of biologists adhering to good data storage practices. Analysis by the sector of employment suggest that governmental (33.5%) and non-profit employees (28.1%) are leaders in good data storage practices, followed by the commercial sector (23.7%), academia (21.9%), and other sectors (14.3%).

Even if researchers are not using good data storage practices, researchers seem satisfied with their own practices ( Fig 6 ). This mismatch between good practices and satisfaction may show that data storage is less important to them than data collection and analysis. Approximately three-quarters of respondents (74.5%) report they are satisfied with the process of storing their data during their project (short-term). However only half (52.4%) are satisfied with the process of storing data beyond the life of the project (long term).

thumbnail

(Locating suitable repository n = 1812; provenance information n = 1803; tools for preparing metadata n = 1808).

https://doi.org/10.1371/journal.pone.0229003.g006

Chi-square tests showed that satisfaction with short-term data storage had some disciplinary variation (Pearson Chi-Square = 72,280; p = 0.01). Disciplines where almost three-quarters of respondents expressed satisfaction with short-term storage included Marine/Ocean (79.1%), Psychology (77.3%), and Biology (76.3%). The lowest level of satisfaction was from respondents representing Other Sciences (58.7%), Information Science/Computer Science (59.6%) and Physical Science (59.8%).

Satisfaction with data storage beyond the life of a project also has disciplinary variation. Respondents with the highest level of satisfaction with long-term data storage were Marine/Ocean (55.8%), Biology (50.9%) and Agriculture and Natural Resources (52.1%). The least satisfied were respondents from Space and Planetary Science: less than a third of them (27.3%) said they were satisfied with the process of storing data beyond the life of a project.

Attitudes towards both short-term (Pearson Chi-Square = 68.926; p = .002) and long-term (Pearson Chi-Square = 52.175; p = .000) data storage show variation across the age groups. The older the age of participants, the more satisfaction they expressed both with short-term data storage and long-term ( Table 3 ).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t003

When researchers are seeking data to answer their research questions, over three-quarters rely on themselves or their colleagues and about two-thirds of respondents state that they search for existing data. However, most researchers are making this search without consulting a data manager or librarian.

The fact that the respondents consult with a data manager or librarian may reflect the availability of those resources. While the age of the respondents did not have a significant impact on their reaching out to librarian or data manager for help acquiring the data, a Chi-square test (Pearson Chi-Square = 12.283; p = .015) showed a statistically significant relationship between the sector of employment and the respondent’s habit of asking data managers for help. The highest prevalence of those who ask data managers for assistance is in the commercial sector (33.4%), followed by non-profit (21.5%) and the government (20.0%). The lowest percentage of respondents who reach out to data managers (only 17.3%) are employed in the academic sector.

There are statistically significant variations by discipline in regards to working with both librarians (Pearson Chi-Square = 92.647; p = .000) and data managers in seeking new data (Pearson Chi-Square = 30.667; p = .004). Respondents from the following disciplines were more accustomed to consulting with librarians: Information/computer science (35.8%) Engineering (33.6%); other (34.0%) and Agriculture/Natural Recourses (27.6%). In all other disciplines, less than a fifth of respondents asked librarians for assistance with data.

A similar disciplinary variation is evident in the willingness of the scientists to ask data managers for help: respondents representing Agriculture and Natural Recourses (29.7%), Information Science (28.1%) and Engineering (25.9%) are most likely to consult with data managers. In all other disciplines, just 10 to 20 percent of respondents indicated they consult with data managers.

Data management support and practices.

The results of the survey highlighted the need for organizations to offer more formal training and assistance in data management to scientists, or to better publicize the support they do offer. Overall, only about a third of the respondents stated that their organizations provide any training or assistance.

When asked what kind of assistance or training is provided by the respondent’s organization or project, about a quarter to a third of respondents said they are provided with assistance in creating data management plans (33.3%); training on best practices in data management (31.3%); assistance on creating metadata to describe data or datasets (27.6%); and training on data citation (27.6%). There were differences amongst work sectors in both the types and extent of training and assistance in data management provided by organizations ( Table 4 ). Government sector respondents indicated the highest rates of provided training and assistance, while the employees of the academic sector reported the lowest rates. With respect to training on best practices, assistance creating data management plans, and help developing metadata, the correlation to work sector was statistically significant. While there was some variation among different sectors on training for data citation, it was not statistically significant.

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t004

There were noteworthy disciplinary differences regarding training and assistance in data management provided by the organization (see Table 5 ).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t005

Researchers’ levels of satisfaction with tools and practices associated with data management were low: only about a third of the respondents expressed satisfaction with the data tools and practices listed in Fig 6 .

There was some work sector variation in respondents’ satisfaction with tools and practices ( Table 6 ). Government employees were most satisfied with tools for preparing metadata (29.8%), while respondents from the commercial sector were the least satisfied with those tools (17.1%). In all other sectors, approximately a quarter of respondents expressed satisfaction with the metadata tools.

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t006

Levels of satisfaction with the ability to track and verify provenance information (Chi-Square = 24.456; p = .002), as well as the ease of locating a suitable repository for the deposit of data, also varied among work sectors. Here, again, governmental employees showed the highest level of satisfaction of all work sectors, while respondents from the non-profit sector expressed the least satisfaction.

Satisfaction with available metadata tools also varied significantly by the primary subject discipline of the respondents. Respondents who indicated that their primary discipline was Space and Planetary Science (36.4%), Atmospheric Science (32.3%), and Hydrology (27.3%) expressed a higher degree of satisfaction with the tools for preparing metadata, while those in Geology (21.8%), and Biology (15.8%) expressed the least satisfaction with the metadata tools (Chi-Square = 59.157; p = 000).

Respondents’ abilities to easily locate a suitable repository also differed across disciplines (Chi-Square = 62.686; p = 000). Respondents representing Space and Planetary Science (45.5%), Information/Computer Science (40.4%), and Geology/Earth Science (38.9%) stated that it was easy for them to locate a data repository, while for the scientists in Agriculture and Natural Resources (28.7%), Biology (27.2%) and Other Sciences (25.5%) it seemed to be a more laborious task.

Data sharing and reuse.

The idea of using data produced by other researchers was viewed positively by the vast majority of respondents: 87% said they would use such data if it would be easily available. Respondents were also enthusiastic about sharing their own data—86.7% said they are willing to share data across a broad group of researchers. There was only a slight variation between sectors of employment: respondents from the commercial sector were slightly less willing to share their data (72.1%), while in all other sectors the number of those willing to share ranged from 86.7% to 89.6%. Fig 7 shows the disciplinary variation in the willingness of respondents to share their data:

thumbnail

https://doi.org/10.1371/journal.pone.0229003.g007

Respondents representing Environmental Science (96.1%), Space and Planetary Science (94.4%), Information/Computer Science (92.5%) had the most positive attitudes towards sharing their data. Three-quarters of the respondents (77.3%) said that they would be willing to place at least some of their data into a central data repository with no restrictions ( Table 7 ). Respondents were more hesitant about the idea of sharing all of the data: less than half of respondents (44.5%) said they would be willing to share all of their data with no restrictions. Around half of respondents (56.4%) said that they would be more willing to share data if they could place some conditions on access. At the same time, for the vast majority of respondents (92.1%) getting cited by users of their datasets was important.

thumbnail

(Answers “Agree Somewhat” and “Agree Strongly” combined).

https://doi.org/10.1371/journal.pone.0229003.t007

Lack of access to data generated by others was seen by most respondents (74.6%) as a major impediment to science, but only half (50.5%) thought it affects their own personal ability to answer scientific questions ( Table 8 ).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t008

When asked what would increase their confidence in using data collected by others, the vast majority (82.1%) of respondents thought it most important to see written details about collection and quality assurance methods accompanying the data, followed by explicitly stated metadata standards (69.1%) and detailed information about the provenance (60.9%) ( Table 9 ).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t009

Over a third of respondents (38.3%) report (see Fig 8 ) that they are regular users of data collected by others. We define “regular users” as those who answered “Always” or “Frequently” to the question about use of data collected by others.

thumbnail

https://doi.org/10.1371/journal.pone.0229003.g008

There is, however, variation in the use of data collected by others, based on work sector and discipline. Government and commercial employees seemed to be using data collected by others more frequently; 59.1% and 50.1%, respectively, were regular users of such data. In the non-profit and academic sectors only about a third of respondents (36.2% and 34.8%) were regularly using data that they did not collect.

Responses to this question also varied by the primary subject discipline (Chi-Square = 87.461; p = .000). Respondents who indicated the highest rates (Always or Frequently) of regular use of data generated by others appear to be clustered in three disciplines: Atmospheric Science (50.4% of heavy users); Space and Planetary Science (45.5%), Marine/Ocean (46.5%); and Hydrology (44.0%). In the next category of disciplines, about a third of respondents used data collected by someone outside of their research team on a regular basis: Physical Sciences (37.1%), Geology/Earth Science (33.1%), Information/Computer Science (32.6%), Environmental Science/Ecology (32.2%), and Agriculture and Natural Resources (27.7%). In the last category, less than a quarter of respondents were regular users of others’ data, including Engineering (23.6%), Other (22.0%), Biology (16.7%) and Psychology (13.6%).

Barriers to data sharing.

A multiple-response question asked researchers to select from a list of possible reasons why all or part of their data might not be available to others. For those who acknowledged that at least some of their data were not available, the reasons most commonly selected included the need to publish first; insufficient time to make the data available; lack of rights to make the data public; and the lack of funding ( Fig 9 ). These barriers to data sharing are similar to those observed in earlier studies [ 5 , 6 , 7 ]

thumbnail

https://doi.org/10.1371/journal.pone.0229003.g009

More than a third of respondents (36.4%) said they use some metadata standard to describe their data. The survey instrument presented a list of many commonly known metadata standards–with an addition of an “Other” option, and asked them to identify any that they used; responses are listed in Fig 10 . Significantly, almost an equal number indicated they use no metadata standard (“None”) (35.9%), and nearly a quarter said they use "Metadata standardized within my institution/lab" (24%).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.g010

The use of a metadata standard differed significantly by sector of employment (Chi-Square = 53.532; p = .000). Government employees are most likely to use some metadata standard (54.9%), followed by respondents working in the commercial sector (43.4%). In the academic, non-profit and other sectors, only about a third of respondents were using a specified metadata standard.

Institutional framework.

Ensuring good practices in data management is an increasingly important priority for many funding agencies [ 37 , 38 , 39 ]. Almost half of the respondents (48.6%) reported that their primary funding agency requires a data management plan, while 13.2% are not sure if a data management plan were required or not.

Significantly, some disciplines appear to be much more likely to require data management plans from their scientists (Chi-Square value = 63.41; p = 0.000). Respondents representing Space and Planetary Science ranked number one in being required by their primary agency to provide a data management plan (71.4%), followed closely by Marine/Ocean Sciences (64.1%). For all results based on subject discipline, see Table 10 .

thumbnail

(n = 1966).

https://doi.org/10.1371/journal.pone.0229003.t010

Over a third of respondents (36.9%) said that their organization had a formal process for managing data in the short term, while almost half of the respondents (45.0%) said their organization did not have one. The situation is very similar with long-term data storage: 38.2% report that their organization had a formal process for storing data beyond the life of the project, while 39.9% did not have such a process.

Responses indicate significant differences among work sectors in the organization’s involvement in management and data storage. Non-profit (62.3%) and commercial (60.7%) sectors are leaders in short-term data management (Pearson Chi-Square = 93.106, p = .000), followed by the government sector (52.5%). According to respondents, academia is the least involved at an organizational level in short-term data management: only about a third of respondents who work in an academic setting say that their organizations have a formal process for data management. With respect to establishing processes for long-term data storage, the government sector is leading (61.8%), followed by commercial (54.1%) and non-profit (51.4%). Again, the academic sector seemed to be the least involved in processes addressing long-term data storage, with only 31.3% of respondents stating that their employers have a formal process for storing data beyond the life of the project. Those respondents whose organizations have a formal process for short-term data management or long-term data storage report that the following actors/units were involved in data management ( Table 11 ).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t011

Responses indicated a fairly even distribution of responsibility for data management and storage activities across IT units, colleagues, and data managers. There are some differences in what units/actors are involved in managing or storing data during or beyond the life of the project by the respondent’s primary sector of employment ( Table 12 ). For example, in the academic sector, most respondents named “Research support unit(s)” (15.8%), followed by “Colleagues in my own unit/department (14.8%)”, and “IT units” (14.5%). In the Government sector, data management and storage appear to be more structured and organized: the most popular option among respondents employed by governments indicates allocation of responsibilities to "Designated data managers" (40.2%), followed by “IT departments” (32.9%) and “Colleagues in my own unit/department” (30.9%) and “Research support unit(s)” (24.6%). In Commercial section, the most popular answers are “IT units” (31.6%), “Colleagues in my own unit/department” (30.3%), and “Designated data manager(s)” (25.0%).

thumbnail

https://doi.org/10.1371/journal.pone.0229003.t012

Limitations.

The two-wave distribution method led to data being gathered over nine months, which is not optimal but which still provides value since it fell within one academic year. We cannot calculate a total response rate for the second wave of the survey because it was distributed to professional organizations and individual researchers and we do not know how many people received or saw the invitation for the survey.

Due to the IRB requirements, the respondents could skip any question or stop the survey at any time, so the number of responses to each question may differ. The length of the survey, estimated to take about twenty minutes to complete, could have potentially contributed to survey fatigue. In addition, since we rely on a volunteer sample, self-selection bias may have occurred, stemming from the fact that people who were knowledgeable and had an opinion about data sharing could have been more likely to take the survey or conversely, they may have felt that there was no need to answer this survey since these practices are ingrained in their work habits. Since the results of the survey were self-reported, we assume that participants responded truthfully and to the best of their ability.

Discussion and conclusions

Data use/data storage: researchers store their data in a variety of places, representing good, mediocre and bad data practices.

Most researchers report they store their data on their personal computer (60.3%), departmental servers (42.9%) or USB drives (29.8%), but storage options associated with good data practices are also being used to a lesser extent. Among these, the most popular option is other data repository or archive (16.6%), institution’s repository (16.5%), discipline-based repository (5.6%), and publisher or publisher related-depositories (4.6%). Governmental and non-profit employees seem to be the leaders in good data practices, while academics and employees of the commercial sector are lagging behind. Age seems to have an inverse pattern; adhering to good data storage practices increased in each age group, starting from 19.2% in the youngest age group, to 31.1% in the oldest category.

Respondents appear to be satisfied with short-term storage solutions that provide them with more physical proximity to their data during the collection and analysis phases of their project; however, only half of them are satisfied with available mechanisms for storing data beyond the life of the project. If data resources are to be stored and preserved for the long-term, organizations need to provide easier access to long-term data storage solutions, and training and assistance to researchers on long-term data management.

Survey responses indicate that the available data management assistance to researchers is often inadequate or not known to them. Respondents in Information Science, Engineering, and Agriculture/Natural Resources disciplines appear to be more cognizant of available resources and engage them most frequently to support their data management needs. Approximately a third of respondents from these three disciplines say they consult with data managers or librarians; significantly, responses from researchers in all other disciplines indicate low rates of engagement—between ten and twenty percent—of professionals to assist with their data management needs. Further research would be necessary to determine if these distinctions are the result of broader availability of data management services in specific disciplines, or if the researchers in certain disciplines are simply more aware of the existence of such services.

Researchers employed in the government sector are most likely to work with data managers, perhaps because of open data mandates and requirements for adherence to established data management practices in government agencies. Somewhat confoundingly, data from this survey indicates that researchers in the academic sector are least likely to ask data managers or librarians for assistance in finding data for their research, even as they are the sector most likely to have these institutionalized services available to them. Although librarians and data managers may be available, most researchers in academia are unaware of assistance or do not take advantage of such assistance.

Data reuse, sharing and barriers

The vast majority of respondents have a positive attitude towards sharing data and data reuse. Over a third of respondents use data collected by others in their research on a regular basis (most of those respondents representing governmental and commercial sections), while 87% would use data collected by someone else if it could be easily accessed. According to respondents, written details on collection and quality assurance methods, explicitly stated metadata standards and detailed provenance information are the most important criteria influencing their confidence in using data collected by others.

The idea of data sharing is seen in a positive light, with 86.7% saying they would be willing to share their data across a broad group of researchers. Respondents would be more willing to share if they could place some conditions on use (56.4%) on those reusing their data. A citation is an almost universal requirement: 92.1% of respondents said it was important for them to receive citation credit by those who would use their data. The need to publish first was reported as the main barrier to sharing data, followed by the lack of rights to data, time to properly prepare data for sharing, and funding restrictions limiting their ability to prepare and deposit the data.

A vast majority of respondents feel that the lack of access to data generated by other researchers or institutions was a major impediment to progress in science at large, yet only about a half of them thought that it restricted their own ability to answer scientific questions. This discrepancy between the perceived effects on others and the respondents themselves could be viewed as a variation of the third person effect theory and could be further explored in future research.

Institutional framework/data management support

Data management plans are now required by funding agencies in many countries [ 37 , 38 ]. Around half of the respondents say their funders require a data management plan, most of them in the US and Europe; within research communities, the Space and Planetary Science and the Marine Science fields appear to be strong proponents of data management plans. Regarding a formal process for data management, non-profit and commercial sectors are leading (with over 60% of respondents indicating that their organizations have a formal process), while the academic sector seemed to be least likely to have institutionalized processes. Designated data managers and data librarians can promulgate awareness of requirements and assist researchers in preparing good data management plans.

Across all sectors, respondents perceive libraries as less involved than other units in their organizations in providing data management support services. Either many libraries are not providing these services, or their existing services are not appropriately communicated to researchers. Employees of government institutions indicate stronger support for organizing data management and storage by designated professional employees, including data managers, research support units, and IT units. Responses from the government sector indicated the highest levels of available data management assistance and training to employees, while respondents from the academic sector observed the lowest rates of available training and assistance. Clearly there is a need to do more training and assistance in the academic sector, or to more fully communicate the availability of existing services.

Satisfaction with the tools and practices associated with data management seems to be low: only about a third of respondents’ express satisfaction with tools for preparing metadata and their ability to track and verify provenance information. Access to appropriate repositories also seems to be an issue: only 37.4% of respondents say it is easy for them to locate a suitable repository for deposit of data.

Slightly more than a third of respondents use some kind of commonly recognized metadata standard, while another third say they use any standard, and about a quarter say they use a “metadata standardized within my institution/lab.” The government and commercial sectors are leaders in using a metadata standard, with over half of government respondents reporting use of such standards. Low rates of uptake of widely used metadata standards is a concern, both for data discovery and for the necessary levels of description that are vital to understanding [ 39 ]. Data managers, data librarians, and affordable metadata tools should help researchers with metadata creation. It is unrealistic to expect every researcher to be a metadata expert, but search and retrieval of data sets rely on complete and accurate metadata, and understanding of data for responsible reuse requires comprehensive details about methodologies, data structure and processing, and provenance. There is more work to be done in making this happen.

Conclusions

The results of this study align with other research projects that examine the attitudes and behavior of scientists in regards to the use, storage, and reuse of data. For example, our results show similar trends that were discovered by researchers from Leiden University’s Centre for Science and Technology Studies and Elsevier, who used a combination of qualitative and quantitative methods to examine motivations and barriers of data sharing by researchers [ 40 ]. The results of our survey have similar results, highlighting the existing gap between positive attitudes towards open data and open science with researchers actually implementing good data practices. Both surveys also demonstrated higher levels of good data practices by researchers working in scientific fields that do not deal with human subjects and where a significant amount of data is gathered by large-scale instrumentation shared by a number of researchers and research teams.

Since most questions of this survey closely emulated those from two previous studies conducted in 2011 and 2015 by DataONE Usability and Assessment group [ 5 , 6 ], we are able to discuss the general direction of the changes in attitudes and behaviors of the global scientific community. The progress in moving towards open science is reflected in the growing acceptance of the concepts crucial for open science, specifically the concepts of data sharing and reuse. Over the past decade, the number of respondents who said that they are willing to share their data with other researchers has increased. At the same time, the results of the survey demonstrate a growing acceptance of data reuse.

In addition to the increase in the number of scientists who hold positive attitudes towards data sharing and reuse, there is also a change in behavior: more respondents display good data practices, reuse data, and share their data. The use and understanding of metadata is another indicator of the progress that a scientific community is making in accepting and following good data practices. Similarly, the number of respondents who acknowledge various barriers that prevent them from sharing their data has been steadily decreasing over the examined time frame.

Over three waves of the survey, there is a noticeable positive dynamic in terms of organizational involvement in data sharing, use, and reuse. The number of respondents who stated that their organizations require data management plans is on the rise, and more say their organizations offer training and assistance in issues related to data management. Although the reasons behind this dynamic are outside of the framework of this study, we can hypothesize that general acceptance of the importance of open data and open science by various stakeholders in the research process, including publishers, governments, and funding agencies have a positive effect on organizations being more involved and more helpful with data management.

Finally, it is important to acknowledge the differences in data sharing, use, and reuse between respondents employed by four work sectors. The governmental sector emerged as a leader in positive attitudes, good data practices, as well as organizational involvement in data training and management. Examination of several important indicators, such as willingness to share and reuse data, displaying good data practices, and growing organizational involvement in data management show the governmental sector to be the frontrunner in accepting and implementing data management practices.

The results of this study can provide additional insights into what aspects of the implementation of FAIR principles to promote open data require extra attention and effort from stakeholders. For example, ensuring best data practices through Data Management Plans is listed among fifteen priority recommendations by the European Commission [ 23 ], but only about half of the respondents currently indicate that their primary funding agency requires a data management plan.

Our international study of the data management practices and attitudes of scientists shows there is variation in data practices based on the work sector, subject discipline, and, sometimes by age of researchers. Although attitudes towards data sharing and data use and reuse are mostly positive, practice does not always support data storage, sharing, and use in the future. While there is noticeable progress in moving toward open data and open science, there still is a discrepancy between positive attitudes and actual implementation of those principles by the scientific community. Goodwill and positive attitudes, however, suggest that with stronger organizational involvement in providing training and support of good data practices, there is potential for major positive changes. Assistance from data managers or data librarians, readily available data repositories for both long-term and short-term storage and educational programs to engender good data practices are clearly needed.

Supporting information

S1 table. primary subject discipline..

https://doi.org/10.1371/journal.pone.0229003.s001

https://doi.org/10.1371/journal.pone.0229003.s002

Acknowledgments

We would like to thank Bill Michener, Principal Investigator for DataONE, the University of New Mexico; Brooks Hanson, Executive Vice President for Science for the American Geophysical Union; Mike Frame and Lisa Zolly at U.S. Geological Survey, and colleagues at the ESA (Ecological Society of America), USGS (United States Geological Survey), Elsevier, and Wiley.

Also, we would like to thank Matt Dunie of LabArchives, Jim Malone of ORAU, Dr. Alison Specht of the Centre for the Synthesis and Analysis of Biodiversity, Dr. Epp Lauk of the University of Jyväskylä, Dr. Aukse Balcytiene of the Vytautas Magnus University for distributing the survey to their colleagues.

  • 1. Critchlow T, Van Dam KK. Data-Intensive Science. Chapman and Hall/CRC; 2016 Apr 19.
  • 2. Tansley S, Tolle KM. The fourth paradigm: data-intensive scientific discovery. Hey AJ, editor. Redmond, WA: Microsoft research; 2009 Oct.
  • 3. UNESCO. Open Science Movement. 2018.
  • 4. White House. Bill Announcement. January 14, 2019.
  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 10. Association of European Research Libraries. Implementing FAIR Data Principles: The Role of Libraries. 2017.
  • 11. Ayris P, Bernal I, Cavalli V, Dorch B, Frey J, Hallik M, et al. LIBER Open Science Roadmap.
  • 13. Key E, Davis R, Lee T, Samors RJ. International Cooperation Toward FAIR Principles and Open Data in Transdisciplinary Research. InAGU Fall Meeting Abstracts 2018 Dec.
  • 14. AGU coalition receives a grant to advance open and FAIR data standards in the Earth and Space Sciences. AGU; 2017 August 28.
  • 16. Corti L, Van den Eynden V, Bishop L, Woollard M. Managing and sharing research data: a guide to good practice. Sage; 2014 Mar 1.
  • 18. Sheenan J. Increasing Access to the Results of Federally Funded Science. 2016.
  • 19. National Foundation of Science. Proposal & Award Policies & Procedures Guide (PAPPG). 2018.
  • 20. U.S.G.S. Public Access to Results of Federally Funded Research at the U.S. Geological Survey. 2019.
  • 21. European Commission. European Commission Open Data Policy. 2019.
  • 22. European Commission. European Data Portal FAQ. 2018.
  • 23. European Commission. Turning FAIR into reality. 2018.
  • 24. Levesque RJ. Data sharing mandates, developmental science, and responsibly supporting authors.
  • 25. UCSF Libraries. Data Sharing & Data Management. 2018.
  • 26. PLOS. (2019). Data availability. Retrieved from https://journals.plos.org/plosone/s/data-availability
  • 28. Downey AS, Olson S, editors. Sharing clinical research data: workshop summary. National Academies Press; 2013 Jun 7.
  • 36. Kim J, Schuler ER, Pechenina A. Predictors of Data Sharing and Reuse Behavior in Academic Communities. InKnowledge Discovery and Data Design Innovation: Proceedings of the International Conference on Knowledge Management (ICKM 2017) 2017 Oct 19 (Vol. 14, p. 1). World Scientific.
  • 39. Jahnke L, Asher A, Keralis SD. The problem of data. 2012.
  • 40. Berghmans S, Cousijn H, Deakin G, Meijer I, Mulligan A, Plume A, et al. Open data: The researcher perspective. 2017

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 27 July 2021

Data sharing practices and data availability upon request differ across scientific disciplines

  • Leho Tedersoo 1 , 2 ,
  • Rainer Küngas 1 ,
  • Ester Oras 1 , 3 , 4 ,
  • Kajar Köster   ORCID: orcid.org/0000-0003-1988-5788 1 , 5 ,
  • Helen Eenmaa 1 , 6 ,
  • Äli Leijen 1 , 7 ,
  • Margus Pedaste 7 ,
  • Marju Raju 1 , 8 ,
  • Anastasiya Astapova 1 , 9 ,
  • Heli Lukner 1 , 10 ,
  • Karin Kogermann 1 , 11 &
  • Tuul Sepp   ORCID: orcid.org/0000-0002-8677-7069 1 , 12  

Scientific Data volume  8 , Article number:  192 ( 2021 ) Cite this article

26k Accesses

101 Citations

231 Altmetric

Metrics details

  • Genetic databases
  • Molecular ecology

Data sharing is one of the cornerstones of modern science that enables large-scale analyses and reproducibility. We evaluated data availability in research articles across nine disciplines in Nature and Science magazines and recorded corresponding authors’ concerns, requests and reasons for declining data sharing. Although data sharing has improved in the last decade and particularly in recent years, data availability and willingness to share data still differ greatly among disciplines. We observed that statements of data availability upon (reasonable) request are inefficient and should not be allowed by journals. To improve data sharing at the time of manuscript acceptance, researchers should be better motivated to release their data with real benefits such as recognition, or bonus points in grant and job applications. We recommend that data management costs should be covered by funding agencies; publicly available research data ought to be included in the evaluation of applications; and surveillance of data sharing should be enforced by both academic publishers and funders. These cross-discipline survey data are available from the plutoF repository.

Similar content being viewed by others

thesis on data sharing

Improving microbial phylogeny with citizen science within a mass-market video game

thesis on data sharing

Genome-wide association studies

thesis on data sharing

Artificial intelligence and illusions of understanding in scientific research

Introduction.

Technological advances and accumulation of case studies have led many research fields into the era of ‘big data’ - the possibility to integrate data from various sources for secondary analysis, e.g. meta-studies and meta-analyses 1 , 2 . Nearly half of the researchers commonly use data generated by other scientists 3 . Data sharing is a scientific norm and an important part of research ethics in all disciplines, also increasingly endorsed by publishers, funders and the scientific community 4 , 5 , 6 . Despite decades of argumentation 7 , much of the published data is still essentially unavailable for integration into secondary data analysis and evaluation of reproducibility, a proxy for reliability 8 , 9 , 10 . Furthermore, the deposited data may also be incomplete, sometimes intentionally 11 , 12 , 13 , 14 , e.g. in cases these exhibit mismatching sample codes or lack information about important metadata such as sex and age of studied organisms in biological and social sciences.

Although the vast majority of researchers prefer data sharing 12 , 15 , scientists tend to be concerned about losing their priority in future publishing and potential commercial use of their work without their consent or participation 12 , 16 , 17 . Researchers working on human subjects may be bound by legal agreements not to reveal sensitive data 16 , 18 . Across research fields, papers indicating available data are cited on average 25% more 19 . In research using microarrays, papers with access to raw data accumulate on average 69% more citations compared with other articles 20 . Unfortunately, higher citation rate has not motivated many researchers enough to release their data, although referees and funding agencies account for bibliometrics when evaluating researchers and their proposals 21 . Multiple case studies have revealed high variation in data availability in different journals and disciplines, ranging from 9 to 76% 8 , 11 , 13 , 19 , 22 , 23 , 24 . Data requests to authors are successful in 27–59% of cases, whereas the request is ignored in 14–41% cases based on previous research 10 , 25 , 26 , 27 , 28 . To promote access to data, many journals have implemented mandatory data availability statements and require data storage in supplementary materials or specific databases 29 , 30 . Because of poor enforcement, this has not always guaranteed access to published data because of broken links, the lack of metadata or the authors’ lack of willingness to share upon request 8 , 26 .

This study aims to map and evaluate cross-disciplinary differences in data sharing, authors’ concerns and reasons for denying access to data, and whether these decisions are reflected in article citations (Fig.  1 ). We selected the scholarly articles published in journals Nature and Science because of their multidisciplinary contents, stringent data availability policies outlined in authors’ instructions, and high-impact conclusions derived from the data of exceptional size, accuracy and/or novelty. We hypothesised that in spite of overall improvement in data sharing culture, the actual data availability and reasons for declining the requests to share data depend on scientific disciplines because of field-specific ‘traditions’, ‘sensitivity’ of data, or their economic potential. Our broader goal is to improve data sharing principles and policies among authors, academic publishers and research foundations.

figure 1

Schematic rationale of the study.

Initial and final data availability

We evaluated the availability of most critical data in 875 articles across nine scientific disciplines (Table  S1 ) published in Nature and Science over two 10-year intervals (2000–2009 and 2010–2019) and, in case these data were not available for access, we contacted the authors. The initial (pre-contacting) full and at least partial data availability averaged at 54.2% (range across disciplines, 33.0–82.8%) and 71.8% (40.4–100.0%), respectively. Stepwise logistic regression models revealed that initial data availability differed by research field, type of data, journal and publishing period (no vs. full availability: n = 721; Somers’ D = 0.676; R 2 model  = 0.476; P < 0.001). According to the best model (Table  S2 ), the data were less readily available in materials for energy and catalysis (W = 68.0; β = −1.52 ± 0.19; P < 0.001), psychology (W = 55.6; β = −1.11 ± 0.15; P < 0.001), optics and photonics (W = 18.8; β = −0.59 ± 0.14; P < 0.001) and forestry (W = 9.8; β = −0.52 ± 0.19; P = 0.002) compared with other disciplines, especially humanities (Fig.  2 ). Data availability was relatively lower in the period of 2000–2009 (W = 82.5; β = −0.57 ± 0.10; P < 0.001) and when the most important data were in the form of a dataset (relative to image/video and model; W = 41.5; β = −1.23 ± 0.19; P < 0.001; Fig.  3 ). Relatively less data were available for Nature (W = 32.7; β = −0.57 ± 0.19; P < 0.001), with striking several-fold differences in optics and photonics (Fig.  2 ).

figure 2

Differences in partial (grey) and full (black) data availability among disciplines depending on journal and publishing period (P1, 2000–2009; P2, 2010–2019) before contacting the authors (n = 875). Letters above bars indicate statistically significant difference groups among disciplines in full data availability compared to no data availability. Asterisks show significant differences in full data availability between journals and publishing periods.

figure 3

Types of critical data (n = 875). ( a ) Distribution of data types among disciplines (blue, dataset; purple, image; black, model); ( b ) Partial (light shades) and full (dark shades) data availability among disciplines depending on the type of critical data (DS, dataset; Img, image; Mod, model) before contacting the author(s).

Upon contacting the authors of 310 papers, the overall data availability was improved by 35.0%. Full and at least partial availability averaged 69.5% (range across disciplines, 57.0–87.9%) and 83.2% (64.9–100.0%), respectively (Fig.  4 ), after 60 days since contacting, a reasonable time frame 4 . The final data availability (after contacting the authors) was best predicted by scientific discipline, data type and time lapse since publishing (no vs. full availability: n = 580; D = 0.659; R 2 model  = 0.336; P adj  < 0.001; Table  S2 ) but with no major changes in the ranking of disciplines or data types compared with the initial data availability (Fig.  4 ). It took a median of 15 days to receive data from the authors (Fig.  5 ), with a minimum time of 13 minutes. Four authors sent their data after the 60-days period since the initial request (max. 107 days). The rate of receiving data was unrelated to any studied parameter.

figure 4

Differences in partial (grey) and full (black) data availability among disciplines after data requests (n = 672) depending on the type of critical data (DS, dataset; image; model) and publishing period (P1, 2000–2009; P2, 2010–2019). Numbers above bars indicate statistically significant difference groups among disciplines in full data availability.

figure 5

Histogram of time for receiving data from authors upon request within the 60-day reasonable time period (blue bars) and beyond (purple bar; data excluded from analyses; n = 199 requests). Note the 2-base logarithmic scale until 60 days.

Authors’ responses to data requests

The data were obtained from the authors in 39.4% of data requests on average, with a range of 27.9–56.1% among research fields. The likelihood of receiving data, the request being declined or ignored depended mostly on the time period and field of research. According to the best model (n = 310; D = 0.300; R 2 model  = 0.106; P adj  < 0.001; Table  S2 ), the data were obtained slightly less frequently for the earlier time period (29.4% vs. 56.0%; W = 20.4; β = 0.56 ± 0.12; P adj  < 0.001). Receiving data upon request tended to be lowest in the field of forestry (W = 3.6; β = −0.31 ± 0.16; P adj  = 0.177), especially when compared with microbiology (Fig.  2 ).

Declining the data request averaged 19.4% and it differed most strongly among the research fields. The best model (n = 310; D = 0.508; R 2 model  = 0.221; P adj  < 0.001) revealed that the data were not made available upon request most likely in the fields of social sciences (W = 24.3; β = −1.09 ± 0.22; P adj  < 0.001), psychology (W = 20.0; β = −0.73 ± 0.20; P adj  < 0.001) and humanities (W = 5.0; β = −0.67 ± 0.30; P adj  = 0.078) compared with natural sciences (Fig.  2 ). Furthermore, the data request was more likely to be declined when the data complexity was high (W = 9.8; β = −0.59 ± 0.19; P adj  = 0.005), the paper was not open access in ISI Web of Science (W = 4.0; β = −0.37 ± 0.18; P adj  = 0.132) and published in Science rather than Nature (W = 4.6; β = −0.35 ± 0.16 P adj  = 0.096), although these two latter figures are non-significant when accounting for multiple testing.

We received no response to 41.3% of our data requests, including two biweekly reminders. Responding to the data request differed most strongly among scientific disciplines and time periods (Fig.  6 ). Altogether 28.9% and 49.0% of requests were ignored by the authors of earlier (2000–2009) and later (2010 to 2019) papers, respectively. According to the best model (n = 310; D = 0.429; R 2 model  = 0.200; P adj  < 0.001; Table  S2 ), articles from the earlier time period (W = 9.3; β = 0.41 ± 0.13; P adj  = 0.007) and the fields of forestry (W = 13.4; β = −0.57 ± 0.16; P adj  < 0.001) and ecology (W = 7.0; β = −0.53 ± 0.20; P adj  = 0.024) had the greatest likelihood of no response, whereas social scientists (W = 7.7; β = 0.87 ± 0.31; P adj  = 0.016) answered most frequently.

figure 6

Authors’ response to data request (n = 199) depending on discipline (blue, declined; orange, ignored; purple, obtained). Bars indicate 95% CI of Sison and Glaz 51 . Letters above bars indicate statistically significant difference groups in frequency of data availability by each category based on Tukey post-hoc test and Bonferroni correction.

In general, there was no residual effect of time since publication when the publication period was included in the best model. Within the 2010–2019 period, we specifically tested whether the authors publishing in 2019 and 2018 were less likely to share their data because of potential conflicting publishing interests. This hypothesis was not supported and a non-significant reverse trend was observed as the proportion of data obtained from the authors increased from 44% in 2010–2017 to 63% in 2018–2019. Accounting for time since publishing across the entire survey period, the data availability upon request decayed at a rate 5.9% year −1 based on an exponential model. This estimate was marginally higher than the 3.5% annual loss of publicly available data (Fig.  7 ). The number of articles was insufficient to test differences in data decay rates among disciplines.

figure 7

Decay in critical data availability initially (blue circles; n = 672), at the end of a 60-day contacting period (purple circles; n = 672) and upon request from the authors (black circles; n = 310).

Authors’ concerns and reasons for declining data sharing

Upon contacting the authors, we recorded and categorised their concerns and requests related to data sharing (n = 188 authors) and their reasons for decline (n = 65). Altogether 22.9% of authors were concerned about certain aspects of our request (Fig.  8 ). Authors of non-open access publications (W = 4.6; β = 0.49 ± 0.23; P adj  = 0.064) and the field of humanities (W = 9.7; β = 1.11 ± 0.36; P adj  = 0.004) expressed any types of concerns or requests relatively more often (Table  S2 ). In particular, researchers in the fields of humanities (W = 15.2; β = 1.36 ± 0.35; P adj  < 0.001), materials for energy and catalysis (W = 6.4; β = 0.65 ± 0.26; P adj  = 0.022) and ecology (W = 5.6; β = 0.81 ± 0.34; P adj  = 0.036) were more concerned about the study’s specific purpose than researchers on average.

figure 8

Frequency distribution of authors’ ( a ) Concerns and requests (n = 199) and ( b ) reasons for declining data sharing (n = 67). White bars indicate answers where no concerns or reasons were specified.

Data sharing was declined by 33.0% of the 188 established contacts. When we specifically inquired about the reasons, the lack of time to search for data (29.2%), loss of data (27.7%) and privacy or legal concerns (23.1%) were most commonly indicated by the authors (Fig.  8 ), whereas no specific answer was provided by 10.8% of authors. According to the best binomial models (Table  S2 ), social scientists indicated data loss more commonly than other researchers (W = 10.9; β = 1.04 ± 0.32; P adj  = 0.003) and psychologists pointed most commonly to legal or privacy issues (W = 4.9; β = 0.85 ± 0.38; P adj  = 0.078). Data decline due to legal issues became increasingly important in more recent publications (days since 01.01.2000: W = 7.2; β = 0.07 ± 0.03; P adj  = 0.035). The lack of time to search tended to be more common for older studies (W = 4.0; β = −0.73 ± 0.37; P adj  = 0.135).

Data storage options and citations

The ways how the data were released differed greatly among disciplines (Fig.  9 ), with most common storage options being the supplementary materials on the publisher’s website (62.2% of articles), various data archives (22.3%) and upon request from corresponding authors (19.7%). Although 29.8% articles declared depositing data in multiple sources, no source was indicated for 35.0% of articles. Declaring data availability upon request (n = 172) ranged from 1.0% in psychology to 52.0% in forestry, with greater frequency in earlier (days back since 31.09.2019: W = 15.0; β = 0.016 ± 0.004; P < 0.001) studies and articles by non-North American corresponding authors (by primary affiliation; W = 5.6; β = 0.23 ± 0.10; P = 0.018). With a few exceptions (three datasets only commercially available, one removed during final acceptance and one homepage corrupt), all data were successfully located for other indicated data sources, but only 42.3% of data could be obtained from the authors upon request in practice. This rate is comparable to articles with no such statement (38.3%; Chi-square test: P = 0.501).

figure 9

Preferred ways of data storage in articles (n = 875) representing different disciplines (blue, text and supplement; purple, data archive; yellow, authors’ homepage; vermillion, previous publications; grey, museum; black, upon (reasonable) request; white, none declared.

The number of citations to articles ranged from 0.0 to 692.9 per year (median, 23.1). In contrast to the hypothesis that articles with available data accumulate more citations 20 , general linear modelling revealed no significant effect of initial or final data availability on annual citations. The model demonstrated that the average number of yearly citations was explained by research discipline (F 8,855  = 11.2; R 2  = 0.105; P < 0.001), data type (F 2,855  = 7.0; R 2  = 0.016; P < 0.001), open access status (F 1,855  = 4.5; R 2  = 0.005; P = 0.034) and the interaction term between open access and discipline (F 8,855  = 2.94; R 2  = 0.027; P = 0.003). Post-hoc tests indicated that articles with a dataset as a critical data source were cited on average 6% more than those with an image or model, and open access articles attracted 9% more citations than regular articles. Because of high variability in citation counts, it was not possible to test the interaction terms with scientific discipline in the current dataset. We speculate that the articles in Nature and Science are heavily cited on the basis of their key findings and interpretations that may mask the few extra citations raising from re-use of the data.

Our study uniquely points to differences among scientific disciplines in data availability as published along with the article and upon request from the authors. We demonstrate that in several disciplines such as forestry, materials for energy and catalysis and psychology, critical data are still unavailable for re-analysis or meta-analysis for more than half of the papers published in Nature and Science in the last decade. These overall figures roughly match those reported for other journals in various research fields 8 , 11 , 13 , 22 , but exceed the lowest reported values of around 10% available data 13 , 23 , 24 . Fortunately, data availability tends to improve, albeit slowly, in nearly all disciplines (Figs.  3 , 7 ), which confirms recent implications from psychological and ecological journals 13 , 31 . Furthermore, the reverse trend we observed in microbiology corroborates the declining metagenomics sequence data availability 22 . Typically, such large DNA sequence data sets are used to publish tens of articles over many years by the teams producing these data; hence releasing both raw data and datasets may jeopardise their expectations of priority publishing. The weak discipline-specific differences among Nature and Science (Fig.  2 ) may be related to how certain subject editors implemented and enforced stringent data sharing policies.

After rigorous attempts to contact the authors, data availability increased by one third on average across disciplines, with full and at least partial availability reaching 70% and 83%, respectively. These figures are in the top end of studies conducted thus far 8 , 22 and indicate the relatively superior overall data availability in Science and Nature compared with other journals. However, the relative rates of data retrieval upon request, decline sharing data and ignoring the requests were on par with studies covering other journals and specific research fields 10 , 12 , 25 , 26 , 28 . Across 20 years, we identified the overall loss of data at an estimated rate of 3.5% and 5.9% for initially available data and data effectively available upon request, respectively. This rate of data decay is much less than 17% year −1 previously reported in plant and animal sciences based on a comparable approach 24 .

While the majority of data are eventually available, it is alarming that less than a half of the data clearly stated to be available upon request could be effectively obtained from the authors. Although there may be objective reasons such as force majeure , these results suggest that many authors declaring data availability upon contacting may have abused the publishers’ or funders’ policy that allows statements of data availability upon request as the only means of data sharing. We find that this infringes research ethics and disables fair competition among research groups. Researchers hiding their own data may be in a power position compared with fair players in situations of big data analysis, when they can access all data (including their own), while others have more limited opportunities. Data sharing is also important for securing a possibility to re-analyse and re-interpret unexpected results 9 , 32 and detect scientific misconduct 25 , 33 . More rigorous control of data release would prevent manuscripts with serious issues in sampling design or analytical procedures from being prepared, reviewed and eventually accepted for publication.

Our study uniquely recorded the authors’ concerns and specific requests when negotiating data sharing. Concerns and hesitations about data sharing are understandable because of potential drawbacks and misunderstandings related to data interpretation and priority of publishing 17 , 34 that may outweigh the benefits of recognition and passive participation in broader meta-studies. Nearly one quarter of researchers expressed various concerns or had specific requests depending on the discipline, especially about the specific objectives of our study. Previous studies with questionnaires about hypothetical data sharing unrelated to actual data sharing reveal that financial interests, priority of additional publishing and fear of challenging the interpretations after data re-analysis constitute the authors’ major concerns 12 , 35 , 36 . Another study indicated that two thirds of researchers sharing biomedical data expected to be invited as co-authors upon use of their data 37 although this does not fulfil the authorship criteria 6 , 38 . At least partly related to these issues, the reasons for declining data sharing differed among disciplines: while social scientists usually referred to the loss of data, psychologists most commonly pointed out ethical/legal issues. Recently published data were, however, more commonly declined due to ethical/legal issues, which indicates rising concerns about data protection and potential misuse. Although we offered a possibility to share anonymised data sets, such trimmed data sets were never obtained from the authors, suggesting that ethical issues were not the only reason for data decline. Because research fields strongly differed in the frequency of no response to data requests, most unanswered requests can be considered declines that avoid official replies, which may harm the authors’ reputation.

Because we did not sample randomly across journals, our interpretations are limited to the journals Nature and Science . Our study across disciplines did not account for the particular academic editor, which may have partly contributed to the differences among research fields and journals. Not all combinations of disciplines, journals and time periods received the intended 25 replicate articles because of the poor representation of certain research fields in the 2000–2009 period. This may have reduced our ability to detect statistically significant differences among the disciplines. We also obtained estimates for the final data availability for seven out of nine disciplines. Although we excluded the remaining two disciplines from comparisons of initial and final data availability, it may have slightly altered the overall estimates. The process of screening the potentially relevant articles chronologically backwards resulted in overrepresentation of more recent articles in certain relatively popular disciplines, which may have biased comparisons across disciplines. However, the paucity of residual year effect and year x discipline interaction in overall models and residual time effect in separate analyses within research fields indicate a minimal bias (Figure  S1 ).

We recorded the concerns and requests of authors that had issues with initial data sharing. Therefore, these responses may be relatively more sceptic than the opinions of the majority of the scientific community publishing in these journals. It is likely that the authors who did not respond may have concerns and reasons for declining similar to those who refused data sharing.

Our experience shows that receiving data typically required long email exchanges with the authors, contacting other referred authors or sending a reminder. Obtaining data took on average 15 days, representing a substantial effort to both parties 39 . This could have been easily avoided by releasing data upon article acceptance. On the other hand, we received tips for analysis, caution against potential pitfalls and the authors’ informed consent upon contacting. According to our experience, more than two thirds of the authors need to be contacted for retrieving important metadata, variance estimates or specifying methods for meta-analyses 40 . Thus, contacting the authors may be commonly required to fill gaps in the data, but such extra specifications are easier to provide compared with searching and converting old datasets into a universally understandable format.

Due to various concerns and tedious data re-formatting and uploading, the authors should be better motivated for data sharing 41 . Data formatting and releasing certainly benefits from clear instructions and support from funders, institutions and publishers. In certain cases, public recognition such as badges of open data for articles following the best data sharing practices and increasing numbers of citations may promote data release by an order of magnitude 42 . Citable data papers are certainly another way forward 43 , 44 , because these provide access to a well-organised dataset and add to the authors’ publication record. Encouraging enlisting published data sets with download and citation metrics in grant and job applications alongside with other bibliometric indicators should promote data sharing. Relating released data in publicly available research accounts such as ORCID, ResearcherID and Google Scholar would benefit both authors, other researchers and evaluators. To account for many authors’ fear of data theft 17 and to prioritise the publishing options of data owners, setting a reasonable embargo period for third-party publishing may be needed in specific cases such as immediate data release following data generation 45 and dissertations.

All funders, research institutions, researchers, editors and publishers should collectively contribute to turn data sharing into a win-win situation for all parties and the scientific endeavour in general. Funding agencies may have a key role here due to the lack of conflicting interests and a possibility of exclusive allocation to depositing and publishing huge data files 46 . Funders have efficient enforcing mechanisms during reports periods, with an option to refuse extensions or approving forthcoming grant applications. We advocate that funders should include published data sets, if relevant, as an evaluation criterion besides other bibliometric information. Research institutions may follow the same principles when issuing institutional grants and employing research staff. Institutions should also insist their employees on following open data policies 45 .

Academic publishers also have a major role in shaping data sharing policies. Although deposition and maintenance of data incur extra costs to commercial publishers, they should promote data deposition in their servers or public repositories. An option is to hire specific data editors for evaluating data availability in supplementary materials or online repositories and refusing final publishing before the data are fully available in a relevant format 47 . For efficient handling, clear instructions and a machine-readable data availability statement option (with a QR code or link to the data) should be provided. In non-open access journals, the data should be accessible free of charge or at reduced price to unsubscribed users. Creating specific data journals or ‘data paper’ formats may promote publishing and sharing data that would otherwise pile up in the drawer because of disappointing results or the lack of time for preparing a regular article. The leading scientometrics platforms Clarivate Analytics, Google Scholar and Scopus should index data journals equally with regular journals to motivate researchers publishing their data. There should be a possibility of article withdrawal by the publisher, if the data availability statements are incorrect or the data have been removed post-acceptance 30 . Much of the workload should stay on the editors who are paid by the supporting association, institution or publisher in most cases. The editors should grant the referees access to these data during the reviewing process 48 , requesting them a second opinion about data availability and reasons for declining to do so. Similar stringent data sharing policies are increasingly implemented by various journals 26 , 30 , 47 .

In conclusion, data availability in top scientific journals differs strongly by discipline, but it is improving in most research fields. As our study exemplifies, the ‘data availability upon request’ model is insufficient to ensure access to datasets and other critical materials. Considering the overall data availability patterns, authors’ concerns and reasons for declining data sharing, we advocate that (a) data releasing costs ought to be covered by funders; (b) shared data and the associated bibliometric records should be included in the evaluation of job and grant applications; and (c) data sharing enforcement should be led by both funding agencies and academic publishers.

Materials and Methods

Data collection.

To assess differences in data availability in different research disciplines, we focused our study on Nature and Science , two high-impact, general-interest journals that practise relatively stringent data availability policies 49 . Because of major changes in the public attitude and journals’ policies about data sharing, our survey was focused on two study periods, 2000–2009 and 2010–2019. We selected nine scientific disciplines as defined by the Springer Nature publishing group - biomaterials and biotechnology, ecology, forestry, humanities, materials for energy and catalysis, microbiology, optics and photonics, psychology and social sciences (see Table  S1 for details) - for analysis based on their coverage in Nature and Science journals and data-driven research. These nine disciplines were selected based on the competence of our team and the objective to cover as different research fields as possible including natural sciences, social sciences and humanities. The articles were searched by discipline, keywords and/or manual browsing as follows. For Nature , our search was refined as https://www.nature.com/search?order=date_desc&journal=nature&article_type=research&subject=microbiology&date_range=2010-2019 (italicised parts varied). For Science , the corresponding search string was the following: https://search.sciencemag.org/?searchTerm=microbiology&order=newest&limit=textFields&pageSize=10&startDate=2010-01-01&endDate=2019-08-31&articleTypes=Research%20and%20reviews&source=sciencemag%7CScience . In both journals, the articles were retrieved by browsing search results chronologically backwards since September 2019 or September 2009 until reaching 25 articles matching the criteria. When the number of suitable articles was insufficient, we searched by using additional discipline-specific keywords in the title and browsed all issues manually when necessary. In some research fields, 25 articles could not be found for all journal and time period combinations and therefore, data availability was evaluated for 875 articles in total (Table  S1 ). In each article, we identified a specific analysis or result that was critical for the main conclusion of that study based on both the authors’ emphasis and our subjective assessment. We determined whether the underlying data of these critical results - datasets, images (including videos), models (including scripts, programs and analytical procedures) or physical items - are available in the main text, supplementary materials or other indicated sources such as specific data repositories, authors’ homepages, museums, or upon request to the corresponding author (Figure  S2 ). When available, we downloaded these data, checked for relevant metadata, identifiers and other components, and evaluated whether it is theoretically possible to repeat these specific analyses and include these materials in a field-specific metastudy. For example, in the case of a dataset, we evaluated the data table for the presence of relevant metadata and sample codes necessary to perform the analysis; for any statistical procedure, the authors must have used such a data table in their original work. We considered the data to be too raw if these either required a large amount of work (other than common data transformations) to generate the data table or model, or we had doubts whether the same data table can be reproduced with the methods described. Raw high-throughput sequencing data are typical examples of incomplete datasets, because these usually lack necessary metadata and require a thorough bioinformatics analysis, with the output depending on software and selected options. For further examples, certain optical raw images or videos make no sense without expert filtering, and computer scripts are of limited use without thorough instructions.

If these critical data were unavailable or only partly available (i.e., missing some integral metadata, instructions or explanations), we contacted the first corresponding author or a relevant author referred in relation to access to the specific item, requesting the data for a meta-study by using a pre-defined format and an institutional email address (Item S1). In the email, we carefully specified the materials required to produce a particular figure or table to avoid confusion and upsetting the authors with a messy request. We indicated that the data are intended for a metastudy in a related topic to test the authors’ willingness to share the data for actual use, not just their intention to share for no reasonable purpose. We similarly evaluated the received data for integrity and requested further information, if necessary, to meet the standards. We also recorded the responses of corresponding authors to data requests, including any specific requests or concerns and reasons for declining (Item S1).

The authors were mostly contacted early in the week and two reminders were sent ca. 14 and 28 days later if necessary (Item S1). The reminders were also addressed to other corresponding authors if relevant. If emails were returned with an error message, we contacted other corresponding authors or used an updated email address found from the internet or newer publications. We considered 60 days from sending the first email a reasonable time period for the authors to locate and send the requested data 4 .

For each article, we recorded the details of publishing (date printed, journal, discipline), corresponding authors (number, country of first affiliation, acquaintance to the contact author) and data (availability, type, ways of access) 50 . Data complexity was evaluated based on the authors’ relative amount of extra work to polish the raw data (e.g. low-complexity data include raw DNA sequence data, raw images, artefacts; high-complexity data include bioinformatics-treated molecular data sets, noise-removed images, models and scripts). As of 23.03.2020, we recorded the open access status and number of citations for each article using searches in the ISI Web of Science ( https://apps.webofknowledge.com/ ). The citation count was expressed as citations per year, discounting the first 90 days with initially less citations.

Data analysis

The principal aim of this study was to determine the relative importance of scientific discipline and time period on data availability and authors’ concerns in response to data sharing requests, by accounting for multiple potentially important covariates (Fig.  1 ). The response variables, i.e. initial and final data availability (none, partly or fully available), author’s responses (ignored, data shared or declined), concerns and reasons for decline, exhibit multinomial distribution 50 and were hence transformed to dummy variables. Similarly, the multi-level explanatory variables (discipline, topic overlap, countries and continents of corresponding authors, data type and complexity) were transformed to dummies, whereas continuous variables (linear time, number of citations, time to obtain data, number of corresponding authors) were square root- or logarithm-transformed where appropriate. All analyses were performed in STATISTICA 12 (StatSoft Inc., Tulsa, OK, USA).

Data analysis of the dummy-transformed multinomial and binomial variables was performed using stepwise logistic regression model selection with a binomial link function using corrected Akaike information criterion (AICc) as a selection criterion, and Somers’ D statistic and model determination coefficients (R 2 ) as measures of overall goodness of fit. Determination coefficients and Wald’s W statistic were used to estimate the relative importance of explanatory variables. We calculated 95% confidence intervals for multiple proportions 51 using the R package multinomialCI ( https://rdrr.io/cran/MultinomialCI/ ). Increasing false discovery rates related to multiple comparisons were accounted for by using Bonferroni correction of P-values (expressed as P adj ) where appropriate.

Models with continuous response variables (proportion of available data, annual citations, time to receive data) were tested using general linear models in two steps. First, the model selection included only dummy and continuous explanatory variables. Multilevel categorical predictors corresponding to significant dummies as well as significant continuous variables were included in the final model selection as based on forward selection. To check for potential biases related to the article selection procedure in both periods, we tested the effect of discipline, period and year and all their interaction terms on initial data availability by retaining all variables in the model (Figure  S1 ). Differences in these factor levels were tested using Tukey post-hoc tests for unequal sample size, which accounts for multiple testing issues.

Data availability

The entire dataset is available as in a spreadsheet format in plutoF data repository 50 .

Code availability

No specific code was generated for analysis of these data.

Fan, J. et al . Challenges of big data analysis. Nat. Sci. Rev. 1 , 293–314 (2014).

Article   Google Scholar  

Kitchin, R. The data revolution: Big data, open data, data infrastructures and their consequences. (Sage Publications, London, 2014).

Google Scholar  

Science Staff. Challenges and opportunities. Science 331 , 692–693 (2011).

Cech, T. R. et al . Sharing publication-related data and materials: responsibilities of authorship in the life sciences. National Academies Press, Washington, D.C. (2003).

Fischer, B. A. & Zigmond, M. J. The essential nature of sharing in science. Sci. Engineer. Ethics 16 , 783–799 (2010).

Duke, C. S. & Porter, H. H. The ethics of data sharing and reuse in biology. BioScience 63 , 483–489 (2013).

Fienberg, S. E. et al . Sharing Research Data. National Academy Press, Washington, D.C. (1985).

Begley, C. G. & Ioannidis, J. P. Reproducibility in science: improving the standard for basic and preclinical research. Circul. Res. 116 , 116–126 (2015).

Article   CAS   Google Scholar  

Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349 , aac4716 (2015).

Hardwicke, T. E. & Ioannidis, J. P. Populating the Data Ark: An attempt to retrieve, preserve, and liberate data from the most highly-cited psychology and psychiatry articles. PLoS One 13 , e0201856 (2018).

Article   PubMed   PubMed Central   CAS   Google Scholar  

Roche, D. G. et al . Public data archiving in ecology and evolution: how well are we doing? PLoS Biol. 13 , e1002295 (2015).

Tenopir, C. et al . Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLoS One 10 , e0134826 (2015).

Hardwicke, T. E. et al . Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. R. Soc. Open Sci. 5 , 180448 (2018).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Witwer, K. W. Data submission and quality in microarray-based microRNA profiling. Clin. Chem. 59 , 392–400 (2013).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Stuart, D. et al . Whitepaper: Practical challenges for researchers in data sharing. figshare https://doi.org/10.6084/m9.figshare.5975011 (2018).

Borgman, C.L. Scholarship in the digital age: Information, infrastructure, and the Internet. MIT press, Cambridge (2010).

Longo, D. L. & Drazen, J. M. Data sharing. New England J. Med. 375 , 276–277 (2016).

Lewandowsky, S. & Bishop, D. Research integrity: Don’t let transparency damage science. Nature 529 , 459–461 (2016).

Article   ADS   CAS   PubMed   Google Scholar  

Colavizza, G. et al . The citation advantage of linking publications to research data. PLoS One 15 , e0230416 (2020).

Piwowar, H. A. et al . Sharing detailed research data is associated with increased citation rate. PLoS One 2 , e308 (2007).

Article   ADS   PubMed   PubMed Central   CAS   Google Scholar  

Hicks, D. et al . Bibliometrics: the Leiden Manifesto for research metrics. Nature 520 , 429–431 (2015).

Article   ADS   PubMed   Google Scholar  

Eckert, E. M. et al . Every fifth published metagenome is not available to science. PLoS Biol. 18 , e3000698 (2020).

Sherry, C. et al . Assessment of transparent and reproducible research practices in the psychiatry literature. Preprint at https://osf.io/jtkcr/download (2019).

Vines, T. H. et al . The availability of research data declines rapidly with article age. Curr. Biol. 24 , 94–97 (2014).

Article   CAS   PubMed   Google Scholar  

Wicherts, J. M. et al . The poor availability of psychological research data for reanalysis. Am. Psychol. 61 , 726–728 (2006).

Article   PubMed   Google Scholar  

Vines, T. H. et al . Mandated data archiving greatly improves access to research data. FASEB J. 27 , 1304–1308 (2013).

Krawczyk, M. & Reuben, E. (Un)available upon request: Field experiment on researchers’ willingness to share supplementary materials. Account. Res. 19 , 175–186 (2012).

Vanpaemel, W. et al . Are we wasting a good crisis? The availability of psychological research data after the storm. Collabra 1 , 1–5 (2015).

Grant, R. & Hrynaszkiewicz, I. The impact on authors and editors of introducing data availability statements at Nature journals. Int. J. Digit. Curat. 13 , 195–203 (2018).

Hrynaszkiewicz, I. et al . Developing a research data policy framework for all journals and publishers. Data Sci. J. 19 , 5 (2020).

Wallach, J. D. et al . Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017. PLoS Biol. 16 , e2006930 (2018).

Kraus, W. L. Do you see what I see? Quality, reliability, and reproducibility in biomedical research. Mol. Endocrinol. 28 , 277–280 (2014).

Wicherts, J. M. et al . Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS One 6 , e26828 (2011).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Wallis, J. C., Rolando, E. & Borgman, C. L. If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLoS One 8 , e67332 (2013).

Blumenthal, D. et al . Withholding research results in academic life science. JAMA 277 , 1224–1228 (1997).

Kim, Y. & Stanton, J. M. Institutional and individual influences on scientists’ data sharing practices. J. Comput. Sci. Edu. 3 , 47–56 (2013).

Federer, L. M. et al . Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff. PLoS One 10 , e0129506 (2015).

Patience, G. S. et al . Intellectual contributions meriting authorship: Survey results from the top cited authors across all science categories. PLoS One 14 , e0198117 (2019).

Volk, C., Lucero, Y. & Barnas, K. Why is data sharing in collaborative natural resource efforts so hard and what can we do to improve it? Environ. Manage. 53 , 883–893 (2014).

Tedersoo, L. et al . Towards global patterns in the diversity and community structure of ectomycorrhizal fungi. Mol. Ecol. 21 , 4160–4170 (2012).

Reichman, O. J. et al . Challenges and opportunities of open data in ecology. Science 331 , 703–705 (2011).

Kidwell, M. C. et al . Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biol. 14 , e1002456 (2016).

Candela, L., Castelli, D., Manghi, P. & Tani, A. Data journals: a survey. J. Ass. Inform. Sci. Technol. 66 , 1747–1762 (2015).

Callaghan, S. et al . Making data a first class scientific output: data citation and publication by NERC’s Environmental Data Centres. Int. J. Digit. Curat. 7 , 107–113 (2012).

Dyke, S. O. & Hubbard, T. J. Developing and implementing an institute-wide data sharing policy. Genome Med. 3 , 1–8 (2011).

Heidorn, P. B. Shedding light on the dark data in the long tail of science. Libr. Trends 57 , 280–299 (2008).

Langille, M. G. et al . “Available upon request”: not good enough for microbiome data! Microbiome 6 , 8 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Morey, R. D. et al . The Peer Reviewers’ Openness Initiative: incentivizing open research practices through peer review. R. Soc. Open Sci. 3 , 150547 (2016).

Alsheikh-Ali, A. A. et al . Public availability of published research data in high-impact journals. PLoS One 6 , e24357 (2011).

Tedersoo, L. et al . Data sharing across disciplines:’available upon request’ holds no promise. University of Tartu; Institute of Ecology and Earth Sciences https://doi.org/10.15156/BIO/1359426 (2021).

Sison, C. P. & Glaz, J. Simultaneous confidence intervals and sample size determination for multinomial proportions. J. Am. Stat. Ass. 90 , 366–369 (1995).

Article   MathSciNet   MATH   Google Scholar  

Download references

Acknowledgements

We thank all authors who released their data along with their article or responded to our data request. Although some of the obtained datasets are used in a series of meta-analyses or released by us upon agreement, we apologise to the authors who spent a significant amount of time to provide the data, which we cannot use for secondary analyses. We thank A. Kahru, T. Soomere, Ü. Niinemets and J. Allik for their constructive comments on an earlier version of the manuscript.

Author information

Authors and affiliations.

Estonian Young Academy of Sciences, Kohtu 6, 10130, Tallinn, Estonia

Leho Tedersoo, Rainer Küngas, Ester Oras, Kajar Köster, Helen Eenmaa, Äli Leijen, Marju Raju, Anastasiya Astapova, Heli Lukner, Karin Kogermann & Tuul Sepp

Mycology and Microbiology Center, University of Tartu, Ravila 14a, 50411, Tartu, Estonia

Leho Tedersoo

Institute of Chemistry, University of Tartu, Ravila 14a, 50411, Tartu, Estonia

Institute of History and Archaeology, University of Tartu, Jakobi 2, 51005, Tartu, Estonia

Department of Forest Sciences, University of Helsinki, PO Box 27 (Latokartanonkaari 7), Helsinki, FI-00014, Finland

Kajar Köster

School of Law, University of Tartu, Näituse 20, 50409, Tartu, Estonia

Helen Eenmaa

Institute of Education, University of Tartu, Salme 1a, 50103, Tartu, Estonia

Äli Leijen & Margus Pedaste

Department of Musicology, Music Pedagogy and Cultural Management, Estonian Academy of Music and Theatre, Tatari 13, 10116, Tallinn, Estonia

Institute for Cultural Research and Fine Arts, University of Tartu, Ülikooli 16, 51003, Tartu, Estonia

Anastasiya Astapova

Institute of Physics, University of Tartu, W. Ostwaldi 1, 50411, Tartu, Estonia

Heli Lukner

Institute of Pharmacy, University of Tartu, Nooruse 1, 50411, Tartu, Estonia

Karin Kogermann

Institute of Ecology and Earth Sciences, University of Tartu, Vanemuise 46, 51003, Tartu, Estonia

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to study design, work with literature and writing. L.T. analysed data and led writing.

Corresponding author

Correspondence to Leho Tedersoo .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Tedersoo, L., Küngas, R., Oras, E. et al. Data sharing practices and data availability upon request differ across scientific disciplines. Sci Data 8 , 192 (2021). https://doi.org/10.1038/s41597-021-00981-0

Download citation

Received : 11 December 2020

Accepted : 29 June 2021

Published : 27 July 2021

DOI : https://doi.org/10.1038/s41597-021-00981-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Community-developed checklists for publishing images and image analyses.

  • Christopher Schmied
  • Michael S. Nelson
  • Helena Klara Jambor

Nature Methods (2024)

A 10-year update to the principles for clinical trial data sharing by pharmaceutical companies: perspectives based on a decade of literature and policies

  • Natansh D. Modi
  • Ganessan Kichenadasse
  • Ashley M. Hopkins

BMC Medicine (2023)

A dataset for assessing phytolith data for implementation of the FAIR data principles

  • Céline Kerfant
  • Javier Ruiz-Pérez
  • Emma Karoune

Scientific Data (2023)

Developing a standardized but extendable framework to increase the findability of infectious disease datasets

  • Ginger Tsueng
  • Marco A. Alvarado Cano
  • Serdar Turkarslan

Containers for computational reproducibility

  • David Moreau
  • Kristina Wiebels
  • Carl Boettiger

Nature Reviews Methods Primers (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

thesis on data sharing

Researcher attitudes toward data sharing in public data repositories: a meta-evaluation of studies on researcher data sharing

Journal of Documentation

ISSN : 0022-0418

Article publication date: 31 May 2021

Issue publication date: 19 December 2022

The purpose of this paper is to report a study of how research literature addresses researchers' attitudes toward data repository use. In particular, the authors are interested in how the term data sharing is defined, how data repository use is reported and whether there is need for greater clarity and specificity of terminology.

Design/methodology/approach

To study how the literature addresses researcher data repository use, relevant studies were identified by searching Library Information Science and Technology Abstracts, Library and Information Science Source, Thomas Reuters' Web of Science Core Collection and Scopus. A total of 62 studies were identified for inclusion in this meta-evaluation.

The study shows a need for greater clarity and consistency in the use of the term data sharing in future studies to better understand the phenomenon and allow for cross-study comparisons. Furthermore, most studies did not address data repository use specifically. In most analyzed studies, it was not possible to segregate results relating to sharing via public data repositories from other types of sharing. When sharing in public repositories was mentioned, the prevalence of repository use varied significantly.

Originality/value

Researchers' data sharing is of great interest to library and information science research and practice to inform academic libraries that are implementing data services to support these researchers. This study explores how the literature approaches this issue, especially the use of data repositories, the use of which is strongly encouraged. This paper identifies the potential for additional study focused on this area.

  • Data management
  • Data sharing

Thoegersen, J.L. and Borlund, P. (2022), "Researcher attitudes toward data sharing in public data repositories: a meta-evaluation of studies on researcher data sharing", Journal of Documentation , Vol. 78 No. 7, pp. 1-17. https://doi.org/10.1108/JD-01-2021-0015

Emerald Publishing Limited

Copyright © 2021, Jennifer L. Thoegersen and Pia Borlund

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

This study examines how researcher data sharing has been studied in the research literature. Over the past decade, there has been an increasing, international demand to make the data underlying research more available to the research community and the public. Pressure to share data has been placed on researchers by funding institutions and journal publishers, many of which have begun to encourage or require researchers to share data ( MacMillan, 2014 , p. 544). The reasons for this shift in expectations are varied, but Borgman (2012) presents four broad rationales for sharing research data: reproducibility, serving the public interest, asking new questions and advancing research (p. 1067).

However, there is not one single definition of data sharing. In an open science context, which emphasizes the importance of publicly sharing scientific knowledge as soon as practicable, data sharing is framed as research data being made publicly available with as few restrictions on reuse as possible and referred to as Open Data ( Nielsen, 2011 ; para. 2; Open Knowledge Foundation, 2014 ). The FAIR Guiding Principles take data sharing a step further, focusing not just on public accessibility, but also on utility, encouraging researchers to make data findable, accessible, interoperable and reusable ( Wilkinson et al. , 2016 , p. 3).

Adhering to these expectations presents challenges for researchers, including a lack of resources, the need for data management skills and time constraints ( Sayogo and Pardo, 2013 , p. S25). Complicating the issue further are the plethora of ways researchers can share data. They must decide how and where they will share their data; data dissemination methods include departmental and researcher websites, by request, cloud services, publications and data journals ( Bishoff and Johnston, 2015 , p. 11; Mischo and O'Donnell, 2014 , p. 35). Funding agencies and journal publishers, as well as the open science and FAIR data movements, generally encourage the use of public data repositories (which Uzwyshyn (2016 , p. 18) defines as “large database infrastructures set up to manage, share, access, and archive researchers' datasets”) when practicable and possible from a legal and ethical standpoint ( Holdren, 2013 , p. 5).

Academic libraries have taken a leading role in supporting and shaping campus and national research data management and sharing ( Christensen-Dalsgaard et al. , 2012 ). As academic libraries have taken on this mantle, Library and Information Science (LIS) research has begun to investigate data sharing prevalence among researchers and the factors influencing researcher data sharing. However, most studies are not focused specifically on the use of public data repositories, and there is no comprehensive look at researcher attitudes toward these repositories.

There are two recent reviews of data sharing literature. Examining the concept through three lenses (individual, institutional, international), Chawinga and Zinn's (2019) systematic literature review highlights existing barriers to data sharing and provides suggestions for overcoming these barriers. Alternatively, Perrier et al. 's (2020 , p. 14) meta-synthesis of qualitative studies examines researchers' views on data sharing broadly and explores the disconnect between data sharing requirements and the still low level of sharing among researchers. Both studies focus specifically on “open data,” arguing for the importance of publicly available of data, with Chawinga and Zinn (2019) stating that the terms “data sharing” and “open data” are synonymous and defining data sharing as “a deliberate effort to make all raw research data fully available for public access” (p. 110).

The current study, which evaluates both qualitative and quantitative studies, also focuses on the public availability of data, though concentrating specifically on the use of public data repositories. However, in contrast to the previous reviews, this study begins with questioning how the literature uses the term “data sharing,” acknowledging the term's inherent ambiguity and explores how research on data sharing is being conducted across a variety of disciplines.

The overall objectives of this study are to identify how the term “data sharing” is defined and operationalized in the literature, how sharing data in public data repositories is addressed and how researchers' attitudes toward data sharing relate to their data sharing behavior.

The remainder of this paper is structured as follows: Section 2 explains the methodology for identifying studies to include in the analysis. Section 3 presents the results of the study relating to how previous studies have addressed research data sharing and the use of data repositories. Section 4 discusses the results. Finally, Section 5 provides concluding remarks.

2. Methods and dataset

This section outlines the search for and identification of research literature focused on researchers' data sharing attitudes. While the main area of interest was literature in the LIS field, studies on data sharing have been published in many disciplines, and the literature search was also conducted to allow for disciplinary breadth. As such, searches were performed in two LIS databases – Library Information Science and Technology Abstracts (LISTA) and Library and Information Science Source (LISS) – and two multidisciplinary databases – Thomas Reuters' Web of Science Core Collection (WoS) and Scopus. Google Scholar was used as a check to ensure comprehensiveness of results returned from the databases. Incomplete metadata prevented Scholar from being included as a key database in this study.

The search combined the terms data repository, data sharing and open data with attitudes, beliefs and perceptions as well as researcher, scientist and faculty. An example Boolean search would be: (“data repository” or “data repositories” or “data sharing” or “open data”) and (attitudes or beliefs or perceptions) and (researcher or scientist or faculty).

The initial search was performed in all four databases in May 2019 and updated in March 2020. The results for the searches in LISTA, LISS, WoS and Scopus were 90, 105, 169 and 184, respectively, with a total of 548 results (see Table 1 ). In total, 328 results remained after duplicates were removed. Two studies were removed because they were reporting on the same study as other articles in the list, and six studies were removed for being in a language other than English. Finally, only detailed, empirical studies with researchers as study participants and a focus on influences on data sharing behaviors were included, as determined by a review of the article titles and abstracts and, if necessary, the text of the article. While the literature search was extensive, it was not exhaustive; the focus was on published, peer-reviewed studies, meaning gray literature was excluded.

Is the term “data sharing” defined and, if so, how?

Is public data repository use addressed and, if so, how?

How do researchers' attitudes toward data sharing compare to their data sharing behavior?

The purposes for most of the studies evaluated fall into three broad groups (see Appendix ). The first group of studies ( n  = 35) is concerned with researcher attitudes (or perceptions) and practices related to data sharing (and sometimes more broadly, data management). The second group of studies ( n  = 14) are specifically concerned with the factors and barriers influencing researchers' data sharing. The final group of studies ( n  = 12) explore both researcher attitudes and influences on sharing. The remaining article focuses on the research data management needs of chemistry researchers.

The studies have a broad geographical scope. The location of the studies is shown in Figure 1 . Of the 62 studies, 16 were conducted in the United States, 21 were conducted in multiple countries or in an international context [1] . The remaining studies were conducted in individual countries in Africa (2), Asia (7), Australia (1), Europe (13), North America (1) and South America (1).

The studies investigated data sharing attitudes of researchers across an array of disciplines. These areas ranged from very broad, for example, social sciences ( Bradić-Martinović and Zdravković, 2014 ; Kim and Adler, 2015 ; Polanin and Terzian, 2019 ), to more specific disciplines, for example, optical coherence tomography ( Lurie et al. , 2015 ). Participants in 13 of the studies were from health disciplines, while researchers in social sciences disciplines and natural science and engineering disciplines were the focus of 9 and 21 studies, respectively. The remaining eight studies included a mix of participants in fields across these broad disciplinary areas.

While the literature search included studies published between 1977 and 2020, those that met the inclusion criteria were published between 2008 and 2020, the majority ( n  = 46) since 2015.

For all the studies, researchers were the informants on issues related to data sharing, sometimes in addition to tangential topics such as data management, data literacy and open access. In 12 of the studies, there were additional participants, including library personnel ( Mozersky et al. , 2020 ; Scheliga and Friesike, 2014 ); research participants ( Hate et al. , 2015 ; Mazor et al. , 2017 ; Merson et al. , 2015 ); ethics board members ( Mazor et al. , 2017 ; Merson et al. , 2015 ; Mozersky et al. , 2020 ); and government employees ( Merson et al. , 2015 ; Schmidt et al. , 2016 ). Zenk-Möltgen et al. (2018) first analyzed journal data policies, before surveying authors on data sharing. As this study was focused on researcher attitudes, results from these nonresearcher populations were excluded.

The studies had a mix of quantitative (35), qualitative (23) and mixed (4) approaches. Almost all of the quantitative studies used surveys. Zenk-Möltgen et al. (2018) also conducted a document analysis first. Andreoli-Versbach and Mueller-Langer (2014) based their analysis on researchers' online presence. Survey response rates ranged from 2.2% to 100% with an average of 31.99% (excluding the ten studies for which a response rate was not reported and could not be calculated) while the number of respondents ranged from 40 to 1,829.

Most of the qualitative studies involved interviews or focus groups, which were analyzed for themes. Laine (2017) used interviews to develop case studies relating to two open research projects, and Grubb and Easterbrook (2011) used written questionnaires with open-ended questions.

This section presents the analysis of the studies and addresses each of the study questions individually.

3.1 Is the term “data sharing” defined and, if so, how?

As Kurata et al. (2017 , p. 2) discuss, the term “data sharing” can refer to a wide range of behaviors. In some contexts, it is used interchangeably with “open data” ( Chawinga and Zinn, 2019 , p. 110). In others, it is broader, encompassing any kind of data sharing. The first aim of this study was to identify how these studies define and operationalize data sharing.

For most of the included studies, a definition of “data sharing” was not explicitly stated, though an approximation could be inferred based on context and details of the study. Scheliga and Friesike's (2014) qualitative study focused on obstacles to Open Science among researchers. In this study, “data sharing” referred to sharing data publicly (though not necessarily in a repository). On the other hand, several studies in the health disciplines were focused on patient data, which suggests on-request or restricted access sharing (e.g. Hate et al. , 2015 ; Jao et al. , 2015 ).

There were 13 studies (21%) that explicitly defined data sharing. In addition, two studies ( Pardo Martínez and Poveda, 2018 , pp. 2–3; Tenopir et al. , 2018 , p. 892) provided a definition of Open Data, and Andreoli-Versbach and Mueller-Langer (2014 , pp. 1624–1625) defined “voluntary data sharing”.

While the definitions are similar, they also vary in terms of what is being shared, how it is being shared and with whom. Four of the studies ( Saeed and Ali, 2019 , p. 290; Tenopir et al. , 2015 , p. 3; Wu and Worrall, 2019 , p. 765; Zhu, 2019 , p. 2) used broad definitions. Wu and Worrall (2019) quoted Borgman's definition directly as “the release of research data for use by others” (p. 765). Zhu (2019) used an almost identical definition: “releasing research data that can be used by others” (p. 2), while Saeed and Ali (2019) defined data sharing as “the practice of making data used for academic research available to other investigators” (p. 290). Tenopir et al. (2015) states that data sharing “...occurs when scientists intentionally make their own data available to other people for their use in research or other related scientific endeavors” (p. 3).

Most of these studies broadly refer to others when identifying with whom data is shared and list examples of data sharing methods ranging from using public data repositories to sharing privately. Tenopir et al. (2011) also used a broad definition of “providing access for use and reuse of data” (p. 1).

Some of the studies qualified sharing by type of data being shared. Kim and Zhang (2015) referred to “raw data sets that have informed pet alublished articles to other researchers outside one's own research group(s) through various means such as data repositories, public web spaces, supplementary materials, or personal communications upon request” (p. 189), and Bezuidenhout (2019) focused “on the sharing of non-human data by individual scientists as part of their daily research practice” (p. 16).

The remaining studies with data sharing definitions qualify the method of sharing. Andreoli-Versbach and Mueller-Langer (2014) focused “on researchers' institutional or personal websites and data entries of the researchers under study in public data repositories” (p. 1625). “the extent to which researchers voluntarily make their data available in a “clearly and precisely documented” way and “readily available to any researcher”.

Borghi and Van Gulick (2018) defined sharing to include “activities involving the dissemination of conclusions drawn from neuroimaging data as well as the sharing of the underlying data itself through a general or discipline-specific repository” (p. 10).

For Kim and Stanton (2013) , data sharing behavior was defined “as the extent to which scientists provide other scientists with their research data and information related to their published articles by depositing them into data repositories and providing them upon request” (p. 4). Similar definitions were used by three other studies ( Ju and Kim, 2019 , pp. 583–584; Kim and Adler, 2015 , p. 409; Kim and Nah, 2018 , p. 125).

Based on an analysis of these studies, research on data sharing attitudes rarely explicitly defines the term “data sharing,” though the intended meaning can often be inferred by the context (e.g. public sharing in studies concerning Open Science). Among the studies that do define data sharing, definitions vary and often include a variety of methods of sharing, with several of the studies limiting the definition to particular methods of sharing.

3.2 Is public data repository use addressed and, if so, how?

For both accessibility and archiving purposes, the use of public data repositories is preferable to other forms of data sharing and is encouraged by many funders and journal publishers ( Holdren, 2013 ; MacMillan, 2014 ). It is also more in line with the goals of the Open Data movement ( Open Knowledge Foundation, 2014 ). Unless data need to be restricted for privacy, confidentiality or other ethical or legal reasons, the ideal method of sharing is through an open, public data repository.

Given the emphasis on public data repository use, this study was especially interested in how the included studies addressed public data repositories as opposed to other methods of sharing.

Most of the studies included did not explore public data repository use specifically. Several of the studies mentioned data repositories in their definition of data sharing ( Borghi and Van Gulick, 2018 , p. 10; Ju and Kim, 2019 , pp. 583–584; Kim and Adler, 2015 , p. 409; Kim and Nah, 2018 , p. 125; Kim and Stanton, 2013 , p. 4). However, other ways of sharing are included as well (e.g. via websites and providing data on request).

Similarly, several studies grouped data repository use with other forms of sharing. Investigating economics and management researchers, Andreoli-Versbach and Mueller-Langer (2014 , p. 1625) reported on sharing via either public data repositories or websites (16.8% of respondents). Several studies discussed publishing data, but the method of publishing was unclear ( Borghi and Van Gulick, 2018 ; Cheah et al. , 2015 ). Lurie et al. (2015 , p. 3) reported 4% ( n  = 52) of respondents shared data publicly, but not how.

Tenopir et al . (2011) reported high willingness to share research data in a “central data repository with no restrictions” (p. 15) among both research - (74%) and teaching-intensive (79%) respondents.

Of the included studies, 12 (19%) separately reported on some type of data repository use, six of which clearly indicated that the repository was “public” or “open.” Other studies referred to subject repositories , institutional repositories or simply repositories . In interviews with social sciences researchers ( n  = 30) who collected qualitative data, Mozersky et al. (2020) found that some participants were “unfamiliar with the very idea of sharing qualitative data with a repository” (p. 5), and only one participant reported sharing data in a repository. The highest reported repository usage was in Spallek et al. 's (2019 , p. 70) survey of international dental researchers. All respondents ( n  = 42) indicated some level of support for data sharing and 64% were required (mostly by funding agencies) to share data in a data repository.

When sharing in public data repositories was mentioned specifically and separately, the percentage of respondents reporting public repository use varied from 3.26% to 39.26% (see Table 2 ). Aydinoǧlu et al. (2017 , pp. 278–279) reported on open access data repository and institutional open repository use (8.3 and 3.2%, respectively). In their study of Arab universities, Elsayed and Saleh (2018 , pp. 288–290) found 64.4% of respondents shared data, but only 5.1% of these (3.26% of total respondents) did so in an open data repository. Federer et al . (2015 , p. 9) found high use of public repositories/databases in the field of health, as did Huang et al . (2012 , p. 401) in the fields of biodiversity, biogeography and conservation.

3.3 How do researchers' attitudes toward data sharing compare to their data sharing behavior?

The final aim of this study was to explore how this literature addresses the relationship between researchers' data sharing attitudes and their data sharing behavior.

Throughout the literature on researchers' data sharing attitudes and behaviors, there is a tension between the ideal and reality. A large percentage of researchers support the idea of open data, but far fewer have actually shared their own data ( Aydinoǧlu et al. , 2017 ; Diekmann, 2012 ; Hall, 2013 ; Zhu, 2019 ). In a survey of UK-based researchers, though 86% of respondents indicated that sharing data online was important, only 21% had deposited data in an online repository ( Zhu, 2019 , p. 5). They found no significant differences in sharing between the four, broad disciplinary areas studied (Medical and Life Sciences, Natural Sciences and Engineering, Social Sciences, Arts and Humanities). Similarly, Hall (2013 , p. 383) found in interviews with environmental studies faculty at US academic institutions that though most participants believed data sharing was valuable, most also felt that their data would not be useful to others. Aydinoǧlu et al. (2017 , pp. 279–280) and Diekman (2012 , p. 27) also found researchers with no data sharing experience but positive attitudes toward data sharing in their respective studies of Turkish researchers and American Agricultural sciences researchers.

In interviews with Canadian neurology researchers, Ali-Khan et al. (2017 , pp. 2–3) found that a lack of clarity around terms and expectations led to uncertainty and may inhibit data sharing among researchers who are generally favorable toward the concept of Open Science.

Some studies did indicate closer alignment of attitude and action. Interviews with US astronomy researchers showed that participants were apprehensive toward data sharing in principle and practice – due largely to the necessity for very detailed documentation to be able to reuse secondary data and the possibility for misinterpretation ( Wynholds et al. , 2011 , p. 384). Australian social sciences researchers interviewed by Hickson et al. (2016 , pp. 259–260) expressed negative attitudes toward data sharing, including concerns that their data either would not be useful to other researchers or would be used by others to publish. Laine's (2017 , p. 7) case study of two Finnish interdisciplinary open research projects presents researchers who have enthusiastically embraced openness through most of the research process. Interviewed researchers viewed openness as an asset, as by making their research public early in the research cycle, they can demonstrate work in a particular research area well prior to publishing results.

These diverse views demonstrate the importance of both attitudes and practicalities in influencing data sharing behaviors. A series of studies by Kim and colleagues investigating data sharing behaviors support this assessment ( Kim and Adler, 2015 ; Kim and Burns, 2016 ; Kim and Kim, 2015 ; Kim and Nah, 2018 ; Kim and Stanton, 2013 ; Kim and Zhang, 2015 ). These studies examined the relationship between data sharing behaviors and a variety of factors including internal researcher perceptions (career risk, career benefit and effort required to share data) and external factors (e.g. pressure from funders and journal publishers, availability of a data repository). Though results varied depending on population, in all cases both internal and external factors had a significant relationship to data sharing behaviors.

4. Discussion

While sharing data is not in itself a new phenomenon, research into researcher data sharing is an emerging area. Research across disciplines, including LIS, is being conducted to better understand researchers' data sharing beliefs, motivations and actions, generally with the intent to identify strategies to guide data sharing practices. As LIS research delves deeper into this area, there is value to be gained by increasing the clarity and granularity in which we speak of, research and report on data sharing.

Given the relative novelty of the research area in LIS, the nebulous nature of the term “data sharing” in the literature is unsurprising. As pointed out by Kurata et al . (2017 , p. 2), the vagueness of the term and the discrepancies of its use present a challenge. Many of the included studies did not explicitly define data sharing, and many of those that did used different definitions. When definitions are unclear or diverge, exploring and comparing researchers' attitudes across studies is difficult or impossible. This is especially true when focusing on public data repository use.

Borgman's (2013) broad definition of data sharing is a helpful umbrella term for the many ways in which researchers allow others to access and reuse their research data, and variations of this definition are widely used, including by some of the analyzed studies, providing a clear baseline to support mutual understanding.

However, as the data sharing attitudes and activities of researchers' continue to be explored, focusing on specific kinds of sharing is important. The same researcher can report widely divergent attitudes and behaviors depending on the kind of sharing under discussion ( Tenopir et al. , 2011 , p. 6). In data sharing, particulars are important. In addition, given the discrepancies in defining data sharing in the literature, participants are also likely to have very divergent understandings of the term. As such, more studies that clearly define the type of data sharing under study would be valuable, so that, as we continue to consider the data sharing attitudes and behaviors of researchers, we have the ability to clearly differentiate between different contexts, institutions, countries, disciplines and over time.

More specificity would be especially beneficial in increasing understanding of researchers' attitudes toward using public data repositories. While data repository use was reported on separately in some of the studies, several of the analyzed studies grouped it with other forms of data sharing. When there are no practical, ethical or legal reasons preventing it, the use of data repositories is strongly encouraged by funding agencies, journal publishers and libraries. The benefits of data sharing, often cited in LIS research and data management courses, while not entirely dependent on public accessibility, discoverability and reliable stewardship, are, generally speaking, enhanced by it. As such, researchers' attitudes and behaviors related to public data sharing in data repositories are of particular interest to LIS research. Exploring what influences changes in researcher attitudes and behaviors related to public data repositories could provide an avenue toward increasing their use.

Researchers' influences and attitudes toward data sharing, and their actual data sharing behaviors, are complex and dependent on many factors. Sharing data is viewed largely positively among researchers, but many researchers are hitting one of many walls preventing them from sharing their own data more broadly. Addressing these walls will be important for entities interested in increasing public data sharing.

The studies included in this analysis that explicitly discuss public data repositories found extremely different reported public repository use among researchers. While Zhu (2019 , p. 5) found similar data repository attitudes and behaviors across the broad disciplinary areas, it is clear that disciplinary norms contribute to data sharing views ( Kim and Adler, 2015 , p. 415; Kim and Burns, 2016 , p. 240), and certain disciplines (e.g. astronomy) that have a much stronger culture of sharing data, especially publicly ( Scheliga and Friesike, 2014 , p. 9).

Laine (2017 , p. 7) presents an interesting angle for transforming the conversation about data sharing, by shifting one of the perceived risks to data sharing (scooping) to a benefit. Researchers in the research projects examined in this study viewed openness as a way to advertise their work in an area. Framing openness in research in this way could create an incentive for researchers to share earlier and more often.

Another interesting area for exploration would be reducing the effort related to data sharing. A dilemma for librarians and policymakers who want to promote data sharing is what desired outcome to focus on. Public sharing of data in data repositories is preferable; however, it is also a less prevalent method of data sharing. Is it better to encourage any kind of data sharing initially in an effort to gradually modify attitudes and norms? Or is it better to push toward public data repository use specifically? If the latter, it will be necessary to identify ways to reduce barriers including the lack of incentivization and the effort required to identify and deposit in a data repository. This is especially true among researchers working with sensitive and qualitative data, which present unique challenges to public sharing in particular.

In order to better understand how to move forward in data sharing guidance, more studies that clearly differentiate types of data sharing would be beneficial, in both how participants are asked about data sharing and how results are reported. It is difficult to understand researcher attitudes and benchmark progress when definitions are inconsistent. Given the preference for the use of public data repositories to share data when possible, increasing research related to this method of sharing and clearly segmenting it from other methods of sharing in the study design and results allow librarians and policymakers to gain a better understanding of researchers' attitudes toward this method of sharing specifically and how they change over time and across disciplines. Based on the studies in this analysis, frequency of sharing in public data repositories is low among most groups of researchers. Several studies identified barriers to data sharing, and further research into approaches for reducing these barriers should be done. Conversely, additional research exploring why researchers do share in public repositories (as opposed to not sharing data or sharing using alternative methods) could bring insights that could be used to encourage repository use among other researchers.

4.1 Limitations

There were several limitations to the search for and identification of literature for this study. While the search for relevant literature was extensive, it was not comprehensive. LISTA and LISS were chosen to provide depth in LIS literature, due to the central role academic libraries have taken in research data management and sharing. As data sharing studies have been published across many other disciplinary journals as well, searches were performed in Web of Science and Scopus to provide disciplinary breadth. However, these databases are not comprehensive, and data sharing literature across some disciplines may not have been identified. Google Scholar was searched, but was excluded as most of the results were included in the database searches or were gray literature, which was not included in this study. Database searches were performed in English, and only English language studies were included. The search terms focused on data sharing attitudes of researchers. Studies examining data management more broadly often investigate researcher data sharing and may have provided additional insight; however, these were purposefully excluded in order to focus on studies with detailed findings on researcher data sharing.

5. Conclusion

The overall objectives of this study were to identify how the term “data sharing” is defined and operationalized in the literature, how sharing data in public data repositories is addressed in the literature and how researchers' attitudes toward data sharing compare to their data sharing behavior.

The evaluation showed that studies could be separated into three categories: studies of research data sharing attitudes and practices, studies of influences on researchers' data sharing and studies of both researcher attitudes and influences on data sharing. Though heavily skewed toward the United States, studies were from a wide array of countries, covering a variety of disciplines and employing a mix of qualitative and quantitative methods. Most studies did not explicitly define data sharing, and those that did generally used broad definitions or focused on particular methods of data sharing. Public data repositories as a method of data sharing were also rarely addressed explicitly and separately. When it was, reported public data repository use among researchers varied greatly between studies. Many studies reported a disconnect between researchers' attitudes toward data sharing and their data sharing behaviors.

As libraries continue to promote data sharing among researchers, it will be important for data management librarians to understand both how researchers' data sharing behavior is shaped and how library data management services can help mold this behavior. There are many factors influencing researcher data sharing behaviors, both internal and external. By influencing these influences, librarians and policymakers can help shape the future of the Open Data environment. In order to promote sharing data publicly in data repositories, libraries need to understand researchers' attitudes and behaviors toward this method of sharing, why they share publicly – and why they do not.

This study explored influences on researcher data sharing, specifically via public repositories. It highlighted inconsistencies in how data sharing is defined and categorized, which limits the ability to make comparisons and draw broad conclusions across studies. Sharing data with collaborators is vastly different than sharing publicly, and researchers' attitudes toward these are quite different as well. The wide spectrum of data sharing should be studied in meaningful segments that align with the goal of – inasmuch as possible – advancing science through the sharing of scientific data.

thesis on data sharing

Studies by country. Includes studies conducted in multiple countries ( n  = 7). Excludes international studies ( n  = 14)

Results of literature search

Reported use of open, open access, online or public repositories/databases

Studies included in analysis

Note(s) : *Focus group ( n  = 7); Survey ( n  = 101); **Excludes non-researcher participants; ***Two separate surveys in 2009/2010 ( n  = 1,329) and 2013/2014 ( n  = 1,015)

A distinction was made between studies where authors selected participants from multiple specific countries ( n  = 7) and studies with an international scope ( n  = 14).

Ali-Khan , S.E. , Harris , L.W. and Gold , E.R. ( 2017 ), “ Point of view: motivating participation in open science by examining researcher incentives ”, Elife , Vol. 6 , p. e29319 , doi: 10.7554/eLife.29319 .

Andreoli-Versbach , P. and Mueller-Langer , F. ( 2014 ), “ Open access to data: an ideal professed but not practised ”, Research Policy , Vol. 43 No. 9 , pp. 1621 - 1633 , doi: 10.1016/j.respol.2014.04.008 .

Aydinoǧlu , A.U. , Dogan , G. and Taskin , Z. ( 2017 ), “ Research data management in Turkey: perceptions and practices ”, Library Hi Tech , Vol. 35 No. 2 , pp. 271 - 289 , doi: 10.1108/LHT-11-2016-0134 .

Bezuidenhout , L. ( 2019 ), “ To share or not to share: incentivizing data sharing in life science communities ”, Developing World Bioethics , Vol. 19 No. 1 , pp. 18 - 24 , doi: 10.1111/dewb.12183 .

Bishoff , C. and Johnston , L. ( 2015 ), “ Approaches to data sharing: an analysis of NSF data management plans from a large research university ”, Journal of Librarianship and Scholarly Communication , Vol. 3 No. 2 , pp. 1 - 27 , doi: 10.7710/2162-3309.1231 .

Borghi , J.A. and Van Gulick , A.E. ( 2018 ), “ Data management and sharing in neuroimaging: practices and perceptions of MRI researchers ”, PloS One , Vol. 13 No. 7 , p. e0200562 , doi: 10.1371/journal.pone.0200562 .

Borgman , C.L. ( 2012 ), “ The conundrum of sharing research data ”, Journal of the American Society for Information Science and Technology , Vol. 63 No. 6 , pp. 1059 - 1078 , doi: 10.1002/asi.22634 .

Bradić-Martinović , A. and Zdravković , A. ( 2014 ), “ Researchers' interest in data service in Bosnia and Herzegovina, Croatia, and Serbia ”, IASSIST Quarterly , Vol. 38 No. 2 , p. 22 .

Chawinga , W.D. and Zinn , S. ( 2019 ), “ Global perspectives of research data sharing: a systematic literature review ”, Library and Information Science Research , Vol. 41 No. 2 , pp. 109 - 122 , doi: 10.1016/j.lisr.2019.04.004 .

Cheah , P.Y. , Tangseefa , D. , Somsaman , A. , Chunsuttiwat , T. , Nosten , F. , Day , N.P.J. , Bull , S. and Parker , M. ( 2015 ), “ Perceived benefits, harms, and views about how to share data responsibly: a qualitative study of experiences with and attitudes toward data sharing among research staff and community representatives in Thailand ”, Journal of Empirical Research on Human Research Ethics , Vol. 10 No. 3 , pp. 278 - 289 , doi: 10.1177/1556264615592388 .

Christensen-Dalsgaard , B. , van den Berg , M. , Grim , R. , Horstmann , W. , Jansen , D. , Pollard , T. and Roos , A. ( 2012 ), “ Ten recommendations for libraries to get started with research data management ”, available at: http://libereurope.eu/wp-content/uploads/The research data group 2012 v7 final.pdf .

Diekmann , F. ( 2012 ), “ Data practices of agricultural scientists: results from an exploratory study ”, Journal of Agricultural and Food Information , Vol. 13 No. 1 , pp. 14 - 34 , doi: 10.1080/10496505.2012.636005 .

Elsayed , A.M. and Saleh , E. ( 2018 ), “ Research data management and sharing among researchers in Arab universities: an exploratory study ”, Ifla Journal-International Federation of Library Associations , Vol. 44 No. 4 , pp. 281 - 299 , doi: 10.1177/0340035218785196 .

Federer , L.M. , Lu , Y.-L. , Joubert , D.J. , Welsh , J. and Brandys , B. ( 2015 ), “ Biomedical data sharing and reuse: attitudes and practices of clinical and scientific research staff ”, PloS One , Vol. 10 No. 6 , p. e0129506 , doi: 10.1371/journal.pone.0129506 .

Grubb , A.M. and Easterbrook , S.M. ( 2011 ), “ On the lack of consensus over the meaning of openness: an empirical study ”, PloS One , Vol. 6 No. 8 , Scopus , doi: 10.1371/journal.pone.0023420 .

Hall , N.F. ( 2013 ), “ Environmental studies faculty attitudes towards sharing of research data ”, Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries , pp. 383 - 384 .

Hate , K. , Meherally , S. , Shah More , N. , Jayaraman , A. , Bull , S. , Parker , M. and Osrin , D. ( 2015 ), “ Sweat, skepticism, and uncharted territory: a qualitative study of opinions on data sharing among public health researchers and research participants in Mumbai, India ”, Journal of Empirical Research on Human Research Ethics , Vol. 10 No. 3 , pp. 239 - 250 , doi: 10.1177/1556264615592383 .

Hickson , S. , Poulton , K.A. , Connor , M. , Richardson , J. and Wolski , M. ( 2016 ), “ Modifying researchers' data management practices: a behavioural framework for library practitioners ”, IFLA Journal , Vol. 42 No. 4 , pp. 253 - 265 , doi: 10.1177/0340035216673856 .

Holdren , J.P. ( 2013 ), “ Increasing access to the results of federally funded scientific research ”, Office of Science and Technology Policy , available at: https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf .

Huang , X. , Hawkins , B.A. , Lei , F. , Miller , G.L. , Favret , C. , Zhang , R. and Qiao , G. ( 2012 ), “ Willing or unwilling to share primary biodiversity data: results and implications of an international survey ”, Conservation Letters , Vol. 5 No. 5 , pp. 399 - 406 , doi: 10.1111/j.1755-263X.2012.00259.x .

Jao , I. , Kombe , F. , Mwalukore , S. , Bull , S. , Parker , M. , Kamuya , D. , Molyneux , S. and Marsh , V. ( 2015 ), “ Research stakeholders' views on benefits and challenges for public health research data sharing in Kenya: the importance of trust and social relations ”, PloS One , Vol. 10 No. 9 , p. e0135545 , doi: 10.1371/journal.pone.0135545 .

Ju , B. and Kim , Y. ( 2019 ), “ The formation of research ethics for data sharing by biological scientists: an empirical analysis ”, Aslib Journal of Information Management , Vol. 71 No. 5 , pp. 583 - 600 , doi: 10.1108/AJIM-12-2018-0296 .

Kim , Y. and Adler , M. ( 2015 ), “ Social scientists' data sharing behaviors: investigating the roles of individual motivations, institutional pressures, and data repositories ”, International Journal of Information Management , Vol. 35 No. 4 , pp. 408 - 418 , doi: 10.1016/j.ijinfomgt.2015.04.007 .

Kim , Y. and Burns , C.S. ( 2016 ), “ Norms of data sharing in biological sciences: the roles of metadata, data repository, and journal and funding requirements ”, Journal of Information Science , Vol. 42 No. 2 , pp. 230 - 245 , doi: 10.1177/0165551515592098 .

Kim , Y. and Kim , S. ( 2015 ), “ Institutional, motivational, and resource factors influencing health scientists' data-sharing behaviours ”, Journal of Scholarly Publishing , Vol. 46 No. 4 , pp. 366 - 389 , doi: 10.3138/jsp.46.4.05 .

Kim , Y. and Nah , S. ( 2018 ), “ Internet researchers' data sharing behaviors: an integration of data reuse experience, attitudinal beliefs, social norms, and resource factors ”, Online Information Review , Vol. 42 No. 1 , pp. 124 - 142 , doi: 10.1108/OIR-10-2016-0313 .

Kim , Y. and Stanton , J.M. ( 2013 ), “ Institutional and individual influences on scientists' data sharing behaviors: a multilevel analysis ”, Proceedings of the Association for Information Science and Technology , Vol. 50 No. 1 , p. 1 .

Kim , Y. and Zhang , P. ( 2015 ), “ Understanding data sharing behaviors of STEM researchers: the roles of attitudes, norms, and data repositories ”, Library and Information Science Research , Vol. 37 No. 3 , pp. 189 - 200 , doi: 10.1016/j.lisr.2015.04.006 .

Kurata , K. , Matsubayashi , M. and Mine , S. ( 2017 ), “ Identifying the complex position of research data and data sharing among researchers in natural science ”, Sage Open , Vol. 7 No. 3 , p. 2158244017717301 , doi: 10.1177/2158244017717301 .

Laine , H. ( 2017 ), “ Afraid of scooping—case study on researcher strategies against fear of scooping in the context of open science ”, Data Science Journal , Vol. 16 , doi: 10.5334/dsj-2017-029 .

Lurie , K.L. , Mistree , B.F.T. and Ellerbee , A.K. ( 2015 ). “ Perspectives of the optical coherence tomography community on code and data sharing ”, in Fujimoto , J.G. , Izatt , J.A. and Tuchin , V.V. (Eds), p. 93122M , doi: 10.1117/12.2082412 .

MacMillan , D. ( 2014 ), “ Data sharing and discovery: what librarians need to know ”, The Journal of Academic Librarianship , Vol. 40 No. 5 , pp. 541 - 549 , doi: 10.1016/j.acalib.2014.06.011 .

Mazor , K.M. , Richards , A. , Gallagher , M. , Arterburn , D.E. , Raebel , M.A. , Nowell , W.B. , Curtis , J.R. , Paolino , A.R. and Toh , S. ( 2017 ), “ Stakeholders' views on data sharing in multicenter studies ”, Journal of Comparative Effectiveness Research , Vol. 6 No. 6 , pp. 537 - 547 , doi: 10.2217/cer-2017-0009 .

Merson , L. , Phong , T.V. , Nhan , L.N.T. , Dung , N.T. , Ngan , T.T.D. , Kinh , N.V. , Parker , M. and Bull , S. ( 2015 ), “ Trust, respect, and reciprocity: informing culturally appropriate data-sharing practice in Vietnam ”, Journal of Empirical Research on Human Research Ethics , Vol. 10 No. 3 , pp. 251 - 263 , doi: 10.1177/1556264615592387 .

Mischo , W. and O'Donnell , M. ( 2014 ), “ An analysis of data management plans in University of Illinois National Science Foundation grant proposals ”, Journal of EScience Librarianship , Vol. 3 No. 1 , p. 3 , doi: 10.7191/jeslib.2014.1060 .

Mozersky , J. , Walsh , H. , Parsons , M. , McIntosh , T. , Baldwin , K. and DuBois , J.M. ( 2020 ), “ Are we ready to share qualitative research data? Knowledge and preparedness among qualitative researchers, IRB members, and data repository curators ”, IASSIST Quarterly , Vol. 43 No. 4 , pp. 1 - 23 , doi: 10.29173/iq952 .

Nielsen , M. ( 2011 ), “ Definitions of open science? The open-science archives ”, available at: https://lists.okfn.org/pipermail/open-science/2011-July/005607.html .

Open Knowledge Foundation ( 2014 ), “ Open data – an introduction ”, available at: http://webarchive.okfn.org/okfn.org/201404/opendata/ .

Pardo Martínez , C. and Poveda , A. ( 2018 ), “ Knowledge and perceptions of open science among researchers—a case study for Colombia ”, Information , Vol. 9 No. 11 , p. 292 , doi: 10.3390/info9110292 .

Perrier , L. , Blondal , E. and MacDonald , H. ( 2020 ), “ The views, perspectives, and experiences of academic researchers with data sharing and reuse: a meta-synthesis ”, PloS One , Vol. 15 No. 2 , p. e0229182 , doi: 10.1371/journal.pone.0229182 .

Polanin , J.R. and Terzian , M. ( 2019 ), “ A data-sharing agreement helps to increase researchers' willingness to share primary data: results from a randomized controlled trial ”, Journal of Clinical Epidemiology , Vol. 106 , pp. 60 - 69 , doi: 10.1016/j.jclinepi.2018.10.006 .

Saeed , S. and Ali , P.M. ( 2019 ), “ Research data management and data sharing among research scholars of life sciences and social sciences ”, DESIDOC Journal of Library and Information Technology , Vol. 39 No. 6 , pp. 290 - 299 , doi: 10.14429/djlit.39.06.14997 .

Sayogo , D.S. and Pardo , T.A. ( 2013 ), “ Exploring the determinants of scientific data sharing: understanding the motivation to publish research data ”, Government Information Quarterly , Vol. 30 , pp. S19 - S31 , doi: 10.1016/j.giq.2012.06.011 .

Scheliga , K. and Friesike , S. ( 2014 ), “ Putting open science into practice: a social dilemma? ”, First Monday , Vol. 19 No. 9 , doi: 10.5210/fm.v19i9.5381 .

Schmidt , B. , Gemeinholzer , B. and Treloar , A. ( 2016 ), “ Open data in global environmental research: the Belmont Forum's open data survey ”, PloS One , Vol. 11 No. 1 , p. e0146695 , doi: 10.1371/journal.pone.0146695 .

Spallek , H. , Weinberg , S. , Manz , M. , Nanayakkara , S. , Zhou , X. and Johnson , L. ( 2019 ), “ Perceptions and attitudes toward data sharing among dental researchers ”, JDR Clinical and Translational Research , Vol. 4 No. 1 , pp. 68 - 75 , doi: 10.1177/2380084418790451 .

Tenopir , C. , Allard , S. , Douglass , K. , Aydinoǧlu , A.U. , Wu , L. , Read , E. , Manoff , M. and Frame , M. ( 2011 ), “ Data sharing by scientists: practices and perceptions ”, PloS One , Vol. 6 No. 6 , p. e21101 , doi: 10.1371/journal.pone.0021101 .

Tenopir , C. , Christian , L. , Allard , S. and Borycz , J. ( 2018 ), “ Research data sharing: practices and attitudes of geophysicists ”, Earth and Space Science , Vol. 5 No. 12 , pp. 891 - 902 , doi: 10.1029/2018EA000461 .

Tenopir , C. , Dalton , E.D. , Allard , S. , Frame , M. , Pjesivac , I. , Birch , B. , Pollock , D. and Dorsett , K. ( 2015 ), “ Changes in data sharing and data reuse practices and perceptions among scientists worldwide ”, PloS One , Vol. 10 No. 8 , p. e0134826 , doi: 10.1371/journal.pone.0134826 .

Uzwyshyn , R. ( 2016 ), “ Research data repositories: the what, when, why, and how ”, Computers in Libraries , Vol. 36 No. 3 , pp. 18 - 21 .

Wilkinson , M.D. , Dumontier , M. , Aalbersberg , Ij. J. , Appleton , G. , Axton , M. , Baak , A. , Blomberg , N. , Boiten , J.-W. , da Silva Santos , L.B. , Bourne , P.E. , Bouwman , J. , Brookes , A.J. , Clark , T. , Crosas , M. , Dillo , I. , Dumon , O. , Edmunds , S. , Evelo , C.T. , Finkers , R. and Mons , B. ( 2016 ), “ The FAIR guiding principles for scientific data management and stewardship ”, Scientific Data , Vol. 3 , p. 160018 .

Wu , S. and Worrall , A. ( 2019 ), “ Supporting successful data sharing practices in earthquake engineering ”, Library Hi Tech , Vol. 37 No. 4 , pp. 764 - 780 , doi: 10.1108/LHT-03-2019-0058 .

Wynholds , L. , Fearon , D.S. Jr , Borgman , C.L. and Traweek , S. ( 2011 ), “ When use cases are not useful: data practices, astronomy, and digital libraries ”, Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries , pp. 383 - 386 .

Zenk-Möltgen , W. , Akdeniz , E. , Katsanidou , A. , Nasshoven , V. and Balaban , E. ( 2018 ), “ Factors influencing the data sharing behavior of researchers in sociology and political science ”, Journal of Documentation , Vol. 74 No. 5 , pp. 1053 - 1073 , doi: 10.1108/JD-09-2017-0126 .

Zhu , Y. ( 2019 ), “ Open-access policy and data-sharing practice in UK academia ”, Journal of Information Science , pp. 1 - 12 , doi: 10.1177/0165551518823174 .

Acknowledgements

The authors thank Lisa Federer, Erica DeFrain and Rasmus Thøgersen for providing valuable feedback on various drafts of this article. In addition, they thank the anonymous reviewers whose comments helped improve and clarify this manuscript.

Corresponding author

Related articles, we’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

Data Sharing And Data Reuse: An Investigation Of Descriptive Information Facilitators And Inhibitors

Add to collection, downloadable content.

thesis on data sharing

  • March 20, 2019
  • Affiliation: School of Information and Library Science
  • This dissertation examines how descriptive information inhibits or facilitates data sharing and reuse. DataONE serves as the test environment. The objective is to identify descriptive information made discoverable through DataONE and subsequently determine what of this descriptive information is helpful for scientists to determine data reusability. This study uses a mixed method approach, which includes a data profiling assessment in the form of a quantitative and qualitative content analysis and a quasi-experiment think-aloud. A quantitative and qualitative content analysis was conducted on a stratified sample of data extracted from DataONE to examine types of descriptive information made available through the shared data. Participants searched a quasi-experiment interface and thought-aloud about what information inhibited or facilitated them to determine data reusability. Additionally, participants completed a post result usefulness survey, post search rank order survey, and a post search factors survey. The quantitative and qualitative content analysis shows that the shared data contains 30 unique pieces of descriptive information found in the records. The quasi-experiment think-aloud indicates that scientists found pieces of descriptive information particularly useful for their ability to determine data reusability. These include: (a) the data description, (b) the attribute table, and (c) the research methods. In conclusion, metadata schema, member node standards, and community standards, impact what types of descriptive information are provided through the shared data. Attribute and unit lists, research methods information, and succinctly written abstracts facilitate data reuse. However long abstracts and having the same information in multiple places, and the exclusion of data descriptions inhibit data reuse. The findings and recommendations assist funding agencies and scientific organizations in understanding the current state of data being shared and prioritizing how to meet the needs of scientists regarding data reuse. This dissertation provides guidance to developers of current and future data sharing environments and infrastructures, research data management and scientific communities, scientific data managers, creators of data management plans, and funding agencies; and has implications beyond DataONE.
  • December 2016
  • scientific data management
  • mixed-methods
  • Library science
  • data sharing
  • Information technology
  • Information science
  • https://doi.org/10.17615/a7vg-fp11
  • Dissertation
  • In Copyright
  • Michener, William
  • Losee, Robert
  • Hossein Jarrahi, Mohammed
  • Rajasekar, Arcot
  • Moore, Reagan
  • Greenberg, Jane
  • Doctor of Philosophy
  • University of North Carolina at Chapel Hill Graduate School

This work has no parents.

Select type of work

Master's papers.

Deposit your masters paper, project or other capstone work. Theses will be sent to the CDR automatically via ProQuest and do not need to be deposited.

Scholarly Articles and Book Chapters

Deposit a peer-reviewed article or book chapter. If you would like to deposit a poster, presentation, conference paper or white paper, use the “Scholarly Works” deposit form.

Undergraduate Honors Theses

Deposit your senior honors thesis.

Scholarly Journal, Newsletter or Book

Deposit a complete issue of a scholarly journal, newsletter or book. If you would like to deposit an article or book chapter, use the “Scholarly Articles and Book Chapters” deposit option.

Deposit your dataset. Datasets may be associated with an article or deposited separately.

Deposit your 3D objects, audio, images or video.

Poster, Presentation, Protocol or Paper

Deposit scholarly works such as posters, presentations, research protocols, conference papers or white papers. If you would like to deposit a peer-reviewed article or book chapter, use the “Scholarly Articles and Book Chapters” deposit option.

thesis on data sharing

As a bona fide Library user, I declare that:

  • I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  • I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  • I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

  • New technology can help lower the trust threshold among firms, even competitors, and overcome perceived engineering and regulatory challenges.
  • The digital world is more standardized and services provide secure platforms to store, share, and analyze data, reducing the need for in-house tech expertise and infrastructure.
  • The value opportunity of data sharing is estimated to be 2.5% of global GDP.

Subscribe to our Digital, Technology, and Data E-Alert.

" "

Digital, Technology, and Data

/ article, the benefits of data sharing now outweigh the risks.

By  François Candelon ,  Guillaume Sajust de Bergues ,  David Zuluaga Martínez ,  Harsha Chandra Shekar , and  Marcos Aguiar

Key Takeaways

Many of today’s biggest industry challenges won’t be solved by a company toiling alone, drawing only on its proprietary data. Complex issues such as fraud detection, supply chain optimization, and drug discovery can often be tackled most effectively through collaboration, pooling data from multiple industry players. In this scenario, everyone wins––both individual companies and the industry at large. And the win could be substantial: in 2019, the Organization for Economic Co-operation and Development estimated the value opportunity of data sharing at 2.5% of the global GDP.

Although most companies remain resistant to strategic data sharing, some industry heavyweights are already realizing the benefits. In the US, automobile insurance companies are collaborating through a claim-history information exchange called LexisNexis CLUE Auto. For these insurers, sharing proprietary claims data has significantly sped up the underwriting process and reduced liability risk; as a result, 99% of US underwriters now participate. The European aerospace company Airbus has also taken a collaborative approach with its suppliers and customers. In 2017, it launched the digital platform Skywise to address industrywide operational challenges, such as predictive maintenance and fleet performance, through data sharing. The platform is estimated to generate hundreds of millions of dollars in revenue and cost savings annually among all participants.

Such examples, however, are outliers. Even in instances when sharing data would solve some intractable problems and generate value for all involved, executives’ fears hold them back. They are concerned about the engineering and regulatory challenges and, crucially, they worry that the data they share might be used against them by other firms. But our research shows that such perceptions are largely outdated.

What’s changed? The technology. Compared with even five years ago, today’s software and tools, as well as new forms of data, can mitigate or resolve many of the engineering and regulatory challenges that companies (rightly) cite, while also reducing the need for trust between companies that would benefit from collaboration. Now is the right moment for savvy executives to revisit strategic data sharing.

New Tech Tackles Engineering and Regulatory Challenges

Today, companies can manage the traditional engineering and regulatory challenges associated with sharing data in wholly new ways. Two commonly cited obstacles offer cases in point.

Obstacle #1: “We lack the infrastructure, common standards, and talent.” Over the years, data engineering challenges have taken different forms––for example, companies might not have the internal know-how or the digital infrastructure to share data efficiently and securely. These deficits have been made more pronounced by the lack of universal data formatting standards and common sharing protocols.

While these obstacles have stalled data sharing in the past, new technology provides solutions. Data sharing services offered by companies such as Databricks are now widely available. These services provide secure platforms for organizations to store, share, and analyze data, reducing the need for in-house tech expertise and infrastructure. Moreover, the digital world is far more standardized than it once was, even without an explicit universal standard for sharing data. Connectors between systems, such as application programming interfaces, are more homogenous and their documentation more easily accessible than even five years ago.

Emerging technologies may also soon provide a workaround for companies dealing with talent shortages. For example, before data can be exchanged, it must be cleaned and formatted. This has traditionally been a time-consuming task. Today, generative AI can be programmed to identify inconsistencies and errors within the data and generate scripts to fix these issues, which enable companies to automate parts of the process.

Obstacle #2: “It’s too difficult to comply with regulations.” The rise of strict regulatory frameworks, ranging from data privacy to antitrust to cross-border data flows, has made data sharing feel like more trouble than it’s worth for many companies. They have struggled to find a straightforward path to compliance.

Companies have long been wary of sharing data because of the possibility that sensitive, personally identifiable information (PII) could slip through the cracks. Along with reputational damage, this kind of leak has the potential to violate data privacy regulations like the EU’s General Data Protection Regulation. Today, however, numerous vendors offer tools that can help companies handle sensitive data. For example, companies can use discovery tools to scan and analyze their data repositories to identify hidden sensitive or confidential information before they share their data. Companies can also use modern data anonymization tools to remove or encrypt PII in an irreversible way, ensuring it can’t be linked back to an individual.

Antitrust regulation can also be a concern, but it shouldn’t deter efforts to share data. As already mentioned, virtually all US insurers collaborate on the LexisNexis CLUE Auto platform. Although this partnership could have attracted the scrutiny of regulators, it hasn’t––because the objective is to address legitimate industrywide issues. Likewise, the largest automotive manufacturers in Germany cocreated the Catena-X initiative, a collaborative data exchange network that––far from being censured on antitrust grounds––received the active support of the German government.

There is one caveat: in many parts of the world, regulations governing cross-border data flows are highly fragmented and restrictive. This often limits data sharing to within regulatory blocs, such as the EU or the US––making data exchanges across these blocs challenging. We see a path forward, however, with new types of data that have emerged with the rise of AI, such as features, embeddings, or AI model parameters. These new forms of data, used in lieu of raw data, could allow for safe sharing that respects regulators’ objectives, such as protecting individual data privacy––although this would require regulators to update their policies to take these new forms of data into account.

Trust Is Still Paramount

Advances in technology make handling sensitive data more secure, but companies are still squeamish because of perceived strategic risk. This perception creates a collective action problem; companies see both the value and the risk, but rarely an incentive to be first movers. That sense of risk can take different forms depending on the data sharing partner, whether that’s a direct competitor, a customer or supplier in the supply chain, or an aggregator.

When data sharing counterparties are also competitors, companies fear revealing their intellectual property (IP) or “secret sauce” hidden in the data. For example, the US auto insurers contributing to the LexisNexis CLUE Auto database worry that they could indirectly reveal confidential information. Claim data contains information such as the amount paid or vehicle type which, when put together, might reveal valuable information about a company’s customer base. However, LexisNexis, in its role as data sharing intermediary, also provides trust- creating features. Crucially, only members that report their data to the platform are allowed to withdraw information, and the service strictly controls which information can be withdrawn.

Data sharing therefore works within the industry, despite the risk, because (1) the auto insurers have sufficient trust in one another and the data sharing system; and (2) the substantial benefits of the data sharing system outweigh any lingering strategic risk.

In other instances, such as when companies share data within their supply chains, executives worry that suppliers and customers could use shared data to gain an edge in commercial negotiations. In the case of the airplane industry’s Skywise database, participating airlines are sharing sensitive details about their operations, such as in-flight and engineering data. These details could be used by Airbus or other companies in the supply chain to evaluate things like a company’s headroom ahead of pricing negotiations. Yet despite the existence of strategic risk, sharing is made possible by the trust that Airbus, its suppliers, and customers each place in the data sharing system and its ability to improve the supply chain’s functioning in a neutral way.

This trust can be explained by multiple mechanisms . For example, Airbus built transparency into the platform through data governance, which makes participants’ behavior observable to other participants. Airbus has also created an ecosystem of trustworthy participants through the training and certification that Skywise delivers to partners it has vetted through an intensive verification system.

Conversely, when companies don’t trust each other enough, the perceived risk of data sharing outweighs the expected benefits. Consider the aggregator service Order with Google. This service collects data, including pricing and most popular menu items, from multiple food delivery services. But Google also has internal teams that use this data to understand the marketplace and offer the food delivery services insights to improve their clients’ business. This may be a primary reason why Uber Eats, for example, does not participate in Google’s aggregator service. Uber would likely benefit from the insights and incremental orders, but arguably would not want its data used to help competitors.

How Technology Makes Trust Easier

The good news is, when trust doesn’t come naturally, technology can help to lower the “trust threshold.” In other words, technology can act as a partial substitute for trust, especially at the start, to ignite the relationship between prospective data sharing partners in the following ways:

Technology provides transparency into partners’ data governance and usage. Technology can help increase trust among firms by creating data sharing systems that are trustworthy. For example, when companies decide to collaborate, they establish data sharing agreements. These agreements specify minimum data governance standards––the processes and policies that each partner must implement to manage and protect any data that falls into their hands. Historically, it’s been difficult to monitor how these standards get implemented in practice, but today, modern software can help provide transparency for all involved parties. For instance, specialized tools can create irrevocable records of data transaction history to automate various aspects of data governance monitoring, including the review and enforcement of each partner’s data governance policies.

Beyond governance, data sharing agreements also typically outline how partners are allowed to use the shared data. But here too, these agreements have been hard to enforce because it’s difficult to monitor what partners do with the data once they have access to it. Virtual data rooms, such as Snowflake Global Data Clean Room, are secure online spaces for data storage and distribution that can address this issue. Such spaces include tools that set permissions and restrictions on the data or track data access and usage––thus enabling the data owner to control how their data is used and analyzed even after it’s been shared.

  • Technology reduces the risk of strategic information leaks with synthetic data. Beyond the risk of inadvertently releasing data protected by privacy regulations, there is also the risk of mistakenly releasing raw data that contains confidential company information and IP. One emerging solution is the use of synthetic data, which is created to have the same characteristics as a real-world data set, but without including real-world data. Research has shown that synthetic data is very difficult to reverse-engineer when properly synthesized, which would enable wary companies to share data without fear of a damaging leak.

Technology circumvents sharing company data directly by employing decentralized model training. We are observing an ever-growing hunger for data to train AI models. Some companies are starting to realize the benefits of jointly training an AI model with other companies, in service of addressing industrywide issues. With access to a wider data set, the joint model will provide more valuable insights beyond the capacity of a single company. Joint models are typically trained by a single trusted organization––a joint venture or a third party––charged with collecting the data from each company and using the consolidated data for training. Federated learning is emerging as an alternative, decentralized approach. It uses data from multiple companies, but the data doesn’t leave each individual company’s premises. This allows for companies to share the insights embedded in their data and contribute to a collective effort without the risks associated with sharing the data itself with other firms.

A recent example of federated learning is the shared platform MELLODDY, developed by a European consortium of ten pharmaceutical companies aiming to accelerate drug discovery. The companies codeveloped the platform to enhance machine learning models with data from each company, but without directly exposing their proprietary information. The interest for these pharmaceutical companies is not only to be compliant with patient data regulations, but also to enable collaboration in a low-trust environment wherein firms that collaborate gain a competitive edge.

Technology alone won’t be able to fully overcome the trust gap between companies. The use of virtual data rooms, for instance, may allow companies to monitor the analyses conducted there, but they can’t control how counterparties will use the insights derived from these analyses. Some risk will remain.

But the potential benefits of data sharing increasingly outweigh such risks. Companies should take advantage of today’s technology to address issues across their industry by starting to collaborate with firms they may not yet fully trust.

Over time, technology itself may begin to create a “trust flywheel.” As partners benefit from the value of sharing data and gain confidence in the process, they will feel encouraged to keep sharing, and in some cases, to deepen these relationships or seek out new partners that can add more richness and depth. For some leaders, this scenario may seem far-fetched––but today’s software and tools have made it something all companies can achieve.

The authors would like to thank Gaurav Jha for his contribution to this article.

bhi-logo-image-gallery-2-tcm9-239323.jpg

The BCG Henderson Institute is Boston Consulting Group’s strategy think tank, dedicated to exploring and developing valuable new insights from business, technology, and science by embracing the powerful technology of ideas. The Institute engages leaders in provocative discussion and experimentation to expand the boundaries of business theory and practice and to translate innovative ideas from within and beyond business. For more ideas and inspiration from the Institute, please visit our website and follow us on LinkedIn and X (formerly Twitter) .

fancois candelon headshot (1).jpg

Managing Director & Senior Partner; Global Director, BCG Henderson Institute

Guillaume-Sajust-de-Bergues.jpg

Lead Data Scientist

david Zuluaga Martínez.jpg

Partner, BCG Henderson Institute Ambassador

Photo of BCG expert Harsha Chandra Shekar

Managing Director & Senior Partner

ABOUT BOSTON CONSULTING GROUP

Boston Consulting Group partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. BCG was the pioneer in business strategy when it was founded in 1963. Today, we work closely with clients to embrace a transformational approach aimed at benefiting all stakeholders—empowering organizations to grow, build sustainable competitive advantage, and drive positive societal impact.

Our diverse, global teams bring deep industry and functional expertise and a range of perspectives that question the status quo and spark change. BCG delivers solutions through leading-edge management consulting, technology and design, and corporate and digital ventures. We work in a uniquely collaborative model across the firm and throughout all levels of the client organization, fueled by the goal of helping our clients thrive and enabling them to make the world a better place.

© Boston Consulting Group 2024. All rights reserved.

For information or permission to reprint, please contact BCG at [email protected] . To find the latest BCG content and register to receive e-alerts on this topic or others, please visit bcg.com . Follow Boston Consulting Group on Facebook and X (formerly Twitter) .

PhD Defence Sara Shakeri

SecConNet|Smart and Secure Container Networks for Trusted Big Data Sharing

There are many organizations interested in sharing data with others. However, they can do this only if a secure platform is available. Digital Data Marketplaces (DDMs) are emerging as a framework for organizations to share their data. To increase trust among participating organizations multiple agreements should be established to determine policies about who has access to what. Translating these high-level sharing policies to actionable code and setting up an infrastructure that implements and enforces the policies is still a big challenge. In SecConNet, we research novel container network architectures, which utilize programmable infrastructures and virtualization technologies across multiple administrative domains whilst maintaining the security and quality requirements of requesting parties for both private sector and scientific use cases. Containers are lightweight alternatives to full-fledged virtual machines. A container can operate as a secure, isolated, and individual entity that on behalf of its owner manages and processes the data it is given. However, for multi-organization (chain) applications groups of containers need access to the same data and/or need to exchange data among them. Technologies to connect containers are developed with primary attention to their performance, but the greatest challenge is the creation of secure and reliable multi-domain container networks. We first investigate different technologies to evaluate their capabilities to support the network infrastructure requirements in secure data sharing. We then proposed a P4-based network to be able to build a multi-domain DDM. Finally, we use the capabilities of the P4-based network to monitor the transactions in the DDM.

thesis on data sharing

Oudezijds Voorburgwal 229 - 231 1012 EZ Amsterdam

Cookie Consent

The UvA uses cookies to ensure the basic functionality of the site and for statistical and optimisation purposes. Cookies are also placed to display third-party content and for marketing purposes. Click 'Accept all cookies' to consent to the placement of all cookies, or choose 'Decline' to only accept functional and analytical cookies. Also read the UvA Privacy statement .

Secure Data Sharing in the Cloud

  • First Online: 01 January 2013

Cite this chapter

thesis on data sharing

  • Danan Thilakanathan 3 ,
  • Shiping Chen 4 ,
  • Surya Nepal 4 &
  • Rafael A. Calvo 3  

3131 Accesses

27 Citations

Cloud systems can be used to enable data sharing capabilities and this can provide an abundant of benefits to the user.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mell P, Grance T (2012) The NIST definition of cloud computing. NIST Spec Publ 800:145. National Institute of Standards and Technology, U.S. Department of Commerce. Source: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf. Accessed on Oct 2012

Wikipedia definition of Cloud computing (2012). Source: http://en.wikipedia.org/wiki/Cloud_computing. Accessed on Oct 2012

Healey M (2010) Why IT needs to push data sharing efforts. Information Week. Source: http://www.informationweek.com/services/integration/why-it-needs-to-push-data-sharing-effort/225700544. Accessed on Oct 2012

Gellin A (2012) Facebook’s benefits make it worthwhile. Buffalo News.

Google Scholar  

Riley DA (2010) Using google wave and docs for group collaboration. Library Hi Tech News.

Wu R (2012) Secure sharing of electronic medical records in cloud computing. Arizona State University, ProQuest Dissertations and Theses

Pandey S, Voorsluys W, Niu S, Khandoker A, Buyya R (2012) An autonomic cloud environment for hosting ECG data analysis services. Future Gener Comput Syst 28(1):147–154

Bender D (2012) Privacy and security issues in cloud computing. Comput Internet Lawyer 1–15.

Judith H, Robin B, Marcia K, Fern H (2009) Cloud computing for dummies. For Dummies.

SeongHan S, Kobara K, Imai H (2011) A secure public cloud storage system. International conference on internet technology and secured transactions(ICITST) 2011, pp 103–109.

Zhou M, Zhang R, Xie W, Qian W, Zhou A (2010) Security and privacy in cloud computing: a survey. Sixth international conferences on emantics knowledge and grid (SKG) 2010:105–112

Article   Google Scholar  

Rocha F, Abreu S, Correia M (2011) The final Frontier: confidentiality and privacy in the cloud, pp 44–50.

Huang R, Gui X, Yu S, Zhuang W (2011) Research on privacy-preserving cloud storage framework supporting ciphertext retrieval. International conference on network computing and information security 2011:93–97

Xiao Z, Xiao Y (2012) Security and privacy in cloud computing. IEEE Commun Surveys Tutorials 99:1–17

MATH   Google Scholar  

Chen D, Zhao H (2012) Data security and privacy protection issues in cloud computing. International conference on computer science and electronics, engineering, pp 647–651.

Zhou M (2010) Security and privacy in the cloud: a survey. Sixth international conference on semantics knowledge and grid (SKG) 2010:105–112

Wang J, Liu C, Lin GTR (2011) How to manage information security in cloud, computing, pp 1405–1410.

Wang Y (2011) The role of SaaS privacy and security compliance for continued SaaS use. International conference on networked computing and advanced information management (NCM) 2011:303–306

Oza N, Karppinen K, Savola R (2010) User experience and security in the cloud-An empirical study in the finnish cloud consortium. IEEE second international conference on cloud computing technology and science (CloudCom) 2010:621–628

Sarathy R, Muralidhar K (2006) Secure and useful data sharing. Decis Support Syst 204–220.

Butler D Data sharing threatens privacy, vol 449(7163). Nature Publishing, Group, pp 644–645.

Mitchley M (2006) Data sharing: progress or not? Credit, Manage 10–11.

Feldman L, Patel D, Ortmann L, Robinson K, Popovic T (2012) Educating for the future: another important benefit of data sharing. Lancet 1877–1878.

Geoghegan S (2012) The latest on data sharing and secure cloud computing. Law, Order 24–26.

Sahafizadeh E, Parsa S (2010) Survey on access control models. 2nd international conference future computer and communication (ICFCC) 2010, pp V1–1-V1-3.

Parker RB (1973) A definition of privacy. Rutgers Law Rev 275.

Schwab AP, Frank L, Gligorov N (2011) Saying privacy, meaning confidentiality. Am J Bioeth 44–45.

HIPAA Privacy (2012) U.S. Department of Health and Human Services. Source: http://www.hhs.gov/ocr/privacy/hipaa/understanding/index.html. Accessed on Nov 2012

Internet privacy? (2001) School Libraries in Canada 20–22.

Donlon-Cotton C (2010) Privacy and social networking. Law, Order 16–17.

Priscilla MR (1986) Privacy, government information, and technology. Public Admin Rev 629–634. Source: http://www.jstor.org/stable/976229. Accessed on Oct 2012

Federal Privacy Act. About.com Source: http://usgovinfo.about.com/library/weekly/aa121299a.htm. Accessed on Oct 2012

Mcbeth J (2011) Governments need privacy too. The Straits Times.

WikiLeaks. Source: http://wikileaks.org. Accessed on Oct 2012

Verma R (2012) Confidentiality and privacy issues. The Law Handbook. Education Law. Source: http://www.lawhandbook.org.au/handbook/ch06s03s08.php. Accessed on Oct 2012

Ruhr (2011) Cloud computing: Gaps in the ’cloud’. NewsRx Health Sci.

Zunnurhain K, Vrbsky SV (2010) Security attacks and solutions in clouds. CloudCom2010 Poster.

Motivations of a Criminal Hacker. Microsoft TechNet. Source: http://technet.microsoft.com/en-us/library/cc505924.aspx. Accessed on Oct 2012

Hacking Attacks-How and Why. Crucial Paradigm Web Solutions. Source: http://www.crucialp.com/resources/tutorials/website-web-page-site-optimization/hacking-attacks-how-and-why.php

Andy P (2007) Salesforce.com Scrambles To Halt Phishing Attacks. http://InternetNews.com. Accessed on Oct 2012

Charles A (2011) PlayStation Network: hackers claim to have 2.2m credit cards. The Guardian Technology Blog. Source: http://www.guardian.co.uk/technology/blog/2011/apr/29/playstation-network-hackers-credit-cards. Accessed on Oct 2012

Whitney L (2011) Feds investigate alleged attacks on Gmail accounts. CNet news. Source: http://news.cnet.com/8301-1009_3-20068229-83/feds-investigate-alleged-attacks-on-gmail-accounts. Accessed on Oct 2012

Jim C, Chyen Yee L (2011) Hacker attacks threaten to dampen cloud computing’s prospects. Reuters article. Source: http://www.reuters.com/article/2011/06/03/us-cloudcomputing-idUSTRE7521WQ20110603. Accessed on Oct 2012

Dominguez K (2012) Trend micro researchers identify vulnerability in hotmail. Trend Micro. Source: http://blog.trendmicro.com/trendlabs-security-intelligence/trend-micro-researchers-identify-vulnerability-in-hotmail/. Accessed on Oct 2012

Choney S (2011) Hotmail, Yahoo Mail users also targets in attacks. NBC News. Source: http://www.nbcnews.com/technology/technolog/hotmail-yahoo-mail-users-also-targets-attacks-123078. Accessed on Oct 2012

Galvin N (2012) File-sharing service users in cloud over access to data. The Age.

Hulme G (2009) Amazon web services DDoS attack and the cloud. InformationWeek. Source: http://www.informationweek.com/security/amazon-web-services-ddos-attack-and-the/229204417. Accessed on Oct 2012

Hachman M (2012) New facebook phishing attack steals accounts, financial information. PC Mag. Source: http://www.pcmag.com/article2/0,2817,2398922,00.asp. Accessed on Oct 2012

Albanesius C (2012) Ramnit computer worm compromises 45K facebook logins. PC Mag. Source: http://www.pcmag.com/article2/0,2817,2398432,00.asp. Accessed on Oct 2012

NIST Privacy and Security guideslines (2012) NIST. Source: http://csrc.nist.gov/publications/nistpubs/800-144/SP800-144.pdf. Accessed on Oct 2012

Cavoukian A (2008) Privacy in the clouds. Identity Inf Soc 1(1):89–108

Sabahi F (2011) Cloud computing security threats and responses. IEEE 3rd international conference communication software and networks (ICCSN) 2011, pp 245–249.

Li J, Zhao G, Chen X, Xie D, Rong C, Li W, Tang L, Tang Y (2010) Fine-grained data access control systems with user accountability in cloud computing. IEEE second international conference on cloud computing technology and science(CloudCom) 2010, pp 89–96.

Naone E (2011) Homomorphic encryption. Technol Rev 50–51.

Yao J, Chen S, Nepal S, Levy D, Zic J (2010) TrustStore: making Amazon S3 trustworthy with services composition. 10th IEEE/ACM international conference cluster, cloud and grid computing (CCGrid) 2010, pp 600–605.

Scale ME (2009) Cloud computing and collaboration. Library Hi Tech News, pp 10–13.

Ratley N (2012) Data-sharing ‘would benefit patients’. The Southland Times.

Melis RJF, Vehof H, Baars L, Rietveld MC (2011) Sharing of research data. Lancet 378(9808):1995

Feldman L, Patel D, Ortmann L, Robinson K, Popovic T (2012) Educating for the future: another important benefit of data sharing. Lancet 379(9829):1877–1878

What’s in it for me? the benefits of sharing credit data (2011). Banker, Middle East.

Zhao G, Rong C, Li J, Zhang F, Tang Y (2010) Trusted data sharing over untrusted cloud storage providers. IEEE second international conference cloud computing technology and science(CloudCom) 2010, pp 97–103.

Luther M (2010) Federated key management for secure cloud computing. Voltage security conference presentation. Source: http://storageconference.org/2010/Presentations/KMS/17.Martin.pdf. Accessed on Nov 2012

Pate S, Tambay T (2011) Securing the Cloud-Using encryption and key management to solve today’s cloud security challenges. Storage Networking Industry Association 2011. Source: http://www.snia.org/sites/default/education/tutorials/2011/spring/security/PateTambay_Securing_the_Cloud_Key_Mgt.pdf. Accessed on Nov 2012

Encryption and Key Management (2012) Cloud security alliance wiki. Source: https://wiki.cloudsecurityalliance.org/guidance/index.php/Encryption_and_Key_Management. Accessed on Nov 2012

Mather T (2010) Key management in the cloud. O’Reilly Community. Source: http://broadcast.oreilly.com/2010/01/key-management-in-the-cloud.html. Accessed on Nov 2012

OASIS Key Management Interoperability Protocol (2012) Web site. Source: https://www.oasisopen.org/committees/tc_home.php?wg_abbrev=kmip #overview. Accessed on Nov 2012

Key Management Interoperability Protocol (2012) Wikipedia definition. http://en.wikipedia.org/wiki/Key_Management_Interoperability_Protocol. Accessed on Nov 2012

Barker E, Barker W, Burr W, Polk W, Smid M (2007) Recommendation for key management-Part 1: general (Revised). Computer Security. NIST Spec Publ 800–857. Source: http://csrc.nist.gov/publications/nistpubs/800-57/sp800-57-Part1-revised2_Mar08-2007.pdf. Accessed on Nov 2012

ISO/IEC 11770–5:2011 Information technology-Security techniques-Key management-Part 5: group key management. ISO Standards catalogue. Source: http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=54527. Accessed on Nov 2012

ISO 11568–2:2012 Financial services - Key management (retail) - Part 2: Symmetric ciphers, their key management and life cycle. ISO Standards catalogue. Source: http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=53568. Accessed on Nov 2012

Lei S, Zishan D, Jindi G (2010) Research on key management infrastructure in cloud computing environment. 9th international conference on grid and cooperative computing (GCC) 2010, pp 404–407.

Fathi H, Shin S, Kobara K, Chakraborty S, Imai H, Prasad R (2006) LR-AKE-based AAA for network mobility (NEMO) over wireless links. IEEE J Select Areas Commun 24(9):1725–1737

Sanka S, Hota C, Rajarajan M (2010) Secure data access in cloud computing. IEEE 4th international conference internet multimedia services architecture and application(IMSAA) 2010, pp 1–6.

Bennani N, Damiani E, Cimato S (2010) Toward cloud-based key management for outsourced databases. IEEE 34th annual computer software and applications conference workshops (COMPSACW) 2010, pp 232–236.

Goyal V, Pandey O, Sahai A, Waters B (2006) Attribute-based encryption for fine-grained access control of encrypted data. 13th ACM conference on computer and communications security (CCS ’06) 2006, pp 89–98.

Tu S, Niu S, Li H, Xiao-ming Y, Li M (2012): Fine-grained access control and revocation for sharing data on clouds. IEEE 26th international parallel and distributed processing symposium workshops and PhD forum (IPDPSW) 2012, pp 2146–2155.

Li M, Yu S, Zheng Y, Ren K, Lou W (2013) Scalable and secure sharing of personal health records in cloud computing using attribute-based encryption. IEEE Trans Parallel Distrib Syst 131–143.

Wang X, Zhong W (2010) A new identity based proxy re-encryption scheme. International conference biomedical engineering and computer science (ICBECS) 2010:145–153

Tran DH, Nguyen HL, Zha W, Ng WK (2011) Towards security in sharing data on cloud-based social networks. 8th International conference on information, communications and signal processing (ICICS) 2011, pp 1–5.

Yu S, Wang C, Ren K, Lou W (2010) Achieving secure, scalable, and fine-grained data access control in cloud computing. In: INFOCOM, 2010 proceedings IEEE, pp 1–9

Yang Y, Zhang Y (2011) A generic scheme for secure ata sharing in cloud. 40th international conference parallel processing workshops (ICPPW) 2011, pp 145–153.

Liu Q, Wang G, Wu J (2012) Check-based proxy re-encryption scheme in unreliable clouds. 41st international conference on parallel processing workshops (ICPPW) 2012, pp 304–305.

Download references

Author information

Authors and affiliations.

Department of Electrical Engineering, The University of Sydney, Sydney, NSW, 2006, Australia

Danan Thilakanathan & Rafael A. Calvo

CSIRO ICT Centre, Cnr Vimiera and Pembroke Rodas, Marsfield, NSW, 2122, Australia

Shiping Chen & Surya Nepal

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Danan Thilakanathan .

Editor information

Editors and affiliations.

CSIRO ICT Centre, Marsfield, New South Wales, Australia

Surya Nepal

Telstra Corporation Limited, Melbourne, Victoria, Australia

Mukaddim Pathan

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Thilakanathan, D., Chen, S., Nepal, S., Calvo, R.A. (2014). Secure Data Sharing in the Cloud. In: Nepal, S., Pathan, M. (eds) Security, Privacy and Trust in Cloud Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38586-5_2

Download citation

DOI : https://doi.org/10.1007/978-3-642-38586-5_2

Published : 04 September 2013

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-38585-8

Online ISBN : 978-3-642-38586-5

eBook Packages : Engineering Engineering (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Purdue University Graduate School

File(s) under embargo

until file(s) become available

SECURE AUTHENTICATION AND PRIVACY-PRESERVING TECHNIQUES IN VEHICULAR AD-HOC NETWORKS

VANET is formed by vehicles, road units, infrastructure components, and various con- nected objects.It aims mainly to ensure public safety and traffic control. New emerging applications include value-added and user-oriented services. While this technological ad- vancement promises ubiquitous deployment of the VANET, security and privacy challenges must be addressed. Thence, vehicle authentication is a vital process to detect malicious users and prevent them from harming legitimate communications. Hover, the authentication pro- cess uses sensitive information to check the vehicle’s identity. Sharing this information will harm vehicle privacy. In this thesis, we aim to deal with this issues:

  • How can we ensure vehicle authentication and avoid sensitive and identity information leaks simultaneously?
  • When nodes are asked to provide identity proof, how can we ensure that the shared information is only used by an authorized entity?
  • Can we define an effective scheme to distinguish between legitimate and malicious network nodes?This dissertation aims to address the preservation of vehicle private information used within the authentication mechanism in VANET communications.The VANET characteristics are thoroughly presented and analyzed. Security require- ments and challenges are identified. Additionally, we review the proposed authentication techniques and the most well-known security attacks while focusing on the privacy preser- vation need and its challenges.To fulfill, the privacy preservation requirements, we proposed a new solution called Active Bundle AUthentication Solution based on SDN for Vehicular Networks (ABAUS). We intro- duce the Software Defined Networks (SDN) as an authentication infrastructure to guarantee the authenticity of each participant. Furthermore, we enhance the preservation of sensitive data by the use of an active data Bundle (ADB) as a self-protecting security mechanism. It ensures data protection throughout the whole data life cycle. ABAUS defines a dedicated registration protocol to verify and validate the different members of the network.

first solution focused on legitimate vehicle identification and sensitive data pro- tection. A second scheme is designed to recognize and eliminate malicious users called BEhaviour-based REPutation scheme for privacy preservation in VANET using blockchain technology (BEREP). Dedicated public blockchains are used by a central trust authority to register vehicles and store their behavior evaluation and a trust scoring system allows nodes to evaluate the behavior of their communicators and detect malicious infiltrated users.

By enhancing sensitive data preservation during the authentication process and detect- ing malicious attempts, our proposed work helps to tackle serious challenges in VANET communications.

Degree Type

  • Doctor of Philosophy
  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Additional committee member 2, additional committee member 3, additional committee member 4, usage metrics.

  • System and network security
  • Data security and protection

CC BY 4.0

  • Get 7 Days Free

LyondellBasell Industries NV Class A

Lyondellbasell earnings: sequential growth supports our recovery thesis.

thesis on data sharing

LyondellBasell's first-quarter results showed demand was recovering in line with our outlook for the cadence of the year. Adjusted EBITDA rose sequentially from the fourth quarter driven by profit growth in most segments. In 2024, we think Lyondell will generate higher profits from a favorable price-to-cost spread and improved volumes, which will allow the company to run its plants at a higher capacity utilization.

Free Trial of Morningstar Investor

Get our analysts’ objective, in-depth, and continuous investment coverage of LYB so you can make buy / sell decisions free of market noise.

Sponsor Center

IMAGES

  1. FREE 11+ Research Data Sharing Agreement Templates in PDF

    thesis on data sharing

  2. (PDF) Data Sharing

    thesis on data sharing

  3. 4 Issues and Challenges Associated with Data Sharing

    thesis on data sharing

  4. Connected Automated Driving

    thesis on data sharing

  5. ️ Data gathering procedure example thesis. SAMPLE DATA GATHERING

    thesis on data sharing

  6. Data gathering procedure thesis writing

    thesis on data sharing

VIDEO

  1. How to Write a THESIS Statement

  2. Chapter 1 The Evolution of Data Communication

  3. How to write a good introduction #research #thesis #dataanalytics #dissertation #introduction

  4. Data Sharing using Google Data Cloud: Challenge Lab || #qwiklabs || #GSP375

  5. 10 years of use of Big Data and Data Science

  6. How to write Thesis statement| women universities as agents of change| CSS essay PMS essay #cssexam

COMMENTS

  1. PDF Full thesis Data sharing across research and public communities ch1

    Data sharing contexts are reflected by the cross-level data providers and human mediators rooted in different groups, whereas data sharing processes are reflected ... develop the proposal into a full thesis. iii Dr. Jen Hammock from the Smithsonian Institute is an amazing and longtime collaborator. Jen gave very generously of her time, patience ...

  2. Data sharing, management, use, and reuse: Practices and ...

    Background With data becoming a centerpiece of modern scientific discovery, data sharing by scientists is now a crucial element of scientific progress. This article aims to provide an in-depth examination of the practices and perceptions of data management, including data storage, data sharing, and data use and reuse by scientists around the world. Methods The Usability and Assessment Working ...

  3. Qualitative Data Sharing: Data Repositories and Academic Libraries as

    In academia, the idea of data as a valuable commodity has taken hold in the form of data sharing. Data sharing in academia can accelerate the pace of research, encourage new research questions and design, help avoid duplication of research, provide resources for student research, and reduce the burden on research subjects (Borgman, 2015; Lyon ...

  4. A focus groups study on data sharing and research data management

    Abstract. Data sharing can accelerate scientific discovery while increasing return on investment beyond the researcher or group that produced them. Data repositories enable data sharing and ...

  5. (PDF) Data sharing, management, use, and reuse: Practices and

    practices in data management require scientists to share their data by depositing datasets in. trusted subject, governmental, or institutional repositories, by providing metadata that makes. their ...

  6. PDF Blockchain-Empowered Trustworthy Data Sharing: Fundamentals

    hensive survey on blockchain-based data-sharing architectures and applications to fill the gap. First, we present the foundations of blockchains and discuss the challenges of current data-sharing techniques. Second, we focus on the convergence of blockchain and data sharing to give a clear picture of this landscape and

  7. Data sharing practices and data availability upon request differ across

    Data sharing is one of the cornerstones of modern science that enables large-scale analyses and reproducibility. We evaluated data availability in research articles across nine disciplines in ...

  8. Researcher attitudes toward data sharing in public data repositories: a

    Both studies focus specifically on "open data," arguing for the importance of publicly available of data, with Chawinga and Zinn (2019) stating that the terms "data sharing" and "open data" are synonymous and defining data sharing as "a deliberate effort to make all raw research data fully available for public access" (p. 110).

  9. Data Sharing Fundamentals: Definition and Characteristics

    Data sharing is defined as "the domain-independent process of giving third parties access to the data sets of others" [19]. It is enacted by providing and facilitating access for compliant use and ...

  10. Secure data sharing in cloud and IoT by leveraging attribute-based

    This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This ... Data sharing is very important to enable different types of cloud and IoT-based services. For example, organizations migrate their data to the cloud and share it with

  11. Dissertation or Thesis

    This dissertation provides guidance to developers of current and future data sharing environments and infrastructures, research data management and scientific communities, scientific data managers, creators of data management plans, and funding agencies; and has implications beyond DataONE. Date of publication. December 2016; Keyword

  12. PDF SECURE DATA SHARING IN CLOUD A Thesis

    The data file is owned by one of the users and is shared with other users. The data owner decides who can access and edit the file. The major contributions through this proposed methodology are as follows: Confidentiality of data by using symmetric encryption. Secure data sharing over the cloud without the use of Elliptic Curve or Bilinear

  13. Blockchain-Based Research Data Sharing Framework for ...

    A platform that allows owners to control and get rewards from sharing their data would be an important enabler of research data-sharing, since presently, such incentives for researchers to share their data are largely missing. Our approach delivers a usable blockchain based model for a collection of researchers' data, providing accountability ...

  14. PDF Research Data Sharing, Reuse, and Metrics: Adoption and ...

    Data sharing is widely believed to be beneficial to science and is now supported by digitization and new online infrastructures for sharing datasets. Nevertheless, differences in research cultures and the ... Thank you for taking the time to read over sections of my thesis several times and for providing your valuable feedback. Thanks to Kate ...

  15. PolyU Electronic Theses: High-performance packing and searching for

    In this thesis, we study blockchain-big data sharing in terms of the basic concepts, challenging issues, and high-performance solutions. In particular, we conduct a comprehensive survey of big data sharing and present the system architecture and layered research framework of blockchain-based big data sharing. Inside the research framework, we ...

  16. The Effects of Data Sharing on a Perishable Goods Supply Chain

    This thesis will explore the benefits data sharing can bring to both suppliers and retailers of perishable goods. 1.2 Company Background Our thesis sponsor is recognized as a leading packaged salad and fresh produce supplier, we will use an alias of SuperSalad for the purposes of this thesis. Despite SuperSalad's strong presence

  17. Data Sharing in Psychology

    Impetus for data sharing: Open science and e-science movements. Almost all major funding agencies in the US and abroad are developing policies around open sharing of research data and other research products (Gewin, 2016; McKiernan et al., 2016).The impetus for the push towards open science, which at a minimum encompasses the open sharing of the products of research (such as research articles ...

  18. Own Data? Ethical Reflections on Data Ownership

    In discourses on digitization and the data economy, it is often claimed that data subjects shall be owners of their data. In this paper, we provide a problem diagnosis for such calls for data ownership: a large variety of demands are discussed under this heading. It thus becomes challenging to specify what—if anything—unites them. We identify four conceptual dimensions of calls for data ...

  19. Blockchain-Based Data Sharing System for AI-Powered Network ...

    The explosive development of mobile communications and networking has led to the creation of an extremely complex system, which is difficult to manage. Hence, we propose an AI-powered network framework that uses AI technologies to operate the network automatically. However, due to the separation between different mobile network operators, data barriers between diverse operators become ...

  20. (PDF) Data Sharing Practices among Researchers at South African

    Concerns with data sharing identified in this study were data. privacy and confidentiality (57 .1%), the fact that it takes time and effort (37 .3%), intellectual property rights. (31 .3%), and ...

  21. The Benefits of Data Sharing Now Outweigh the Risks

    While these obstacles have stalled data sharing in the past, new technology provides solutions. Data sharing services offered by companies such as Databricks are now widely available. These services provide secure platforms for organizations to store, share, and analyze data, reducing the need for in-house tech expertise and infrastructure.

  22. PhD Defence Sara Shakeri

    Digital Data Marketplaces (DDMs) are emerging as a framework for organizations to share their data. To increase trust among participating organizations multiple agreements should be established to determine policies about who has access to what.

  23. Secure Data Sharing in the Cloud

    Cloud systems [1, 2] can be used to enable data sharing capabilities and this can provide an abundant of benefits to the user.There is currently a push for IT organisations to increase their data sharing efforts. According to a survey by InformationWeek [], nearly all organisations shared their data somehow with 74 % sharing their data with customers and 64 % sharing with suppliers.

  24. A Secure Data Sharing Platform using Blockchain and IPFS

    Due to the involvement of TTP, such systems lack trust, transparency, security, and immutability. To overcome these issues, this paper proposed a blockchain-based secure data sharing platform by ...

  25. Secure Authentication and Privacy-preserving Techniques in Vehicular Ad

    VANET is formed by vehicles, road units, infrastructure components, and various con- nected objects.It aims mainly to ensure public safety and traffic control. New emerging applications include value-added and user-oriented services. While this technological ad- vancement promises ubiquitous deployment of the VANET, security and privacy challenges must be addressed. Thence, vehicle ...

  26. Tesla makes big progress in China and its stock soars 15%

    All quotes are in local exchange time. Real-time last sale data for U.S. stock quotes reflect trades reported through Nasdaq only. Intraday data delayed at least 15 minutes or per exchange ...

  27. Private and Secure Students' Data Sharing in Educational Systems

    [email protected]. Abstract — Protecting students' private data in the education. system is an important issue. However, it is not an easy task. Students' data is important in the process ...

  28. LyondellBasell Earnings: Sequential Growth Supports Our Recovery Thesis

    Provide specific products and services to you, such as portfolio management or data aggregation. Develop and improve features of our offerings. Gear advertisements and other marketing efforts ...