Archiving and sharing data

Introduction
Research data is a valuable output, so it’s important to consider how you can preserve and if possible, share with it others after the end of a research project. Where possible datasets should be deposited in a data repository which can support long-term storage of data underpinning research outputs, enable the discovery and sharing of data for future reuse by others and provide a persistent identifier for long-term discoverability.
Many funders and publishers, require that research data be shared as openly as possible. LSE's Research Data Management Policy recognises that effective management of research data for long-term preservation and reuse is an integral part of good research practice.
Each dataset and research project are different so it is important to consider your data preservation and sharing plans at the beginning of your project, to allow you to tailor your approach based on the data you are working with.
Why archive and share your data?
- facilitates transparency, integrity, validation and reproducibility of research findings
- ensures funder and publisher requirements are met
- preserves data in the long term to protect against loss, obsolescence or deterioration
- increases visibility of data that can be cited by others and enhance opportunities for collaboration with your research colleagues
Archiving and sharing data
Considerations for sharing
The first step is to work out which data can be shared. It may be that you can share the whole dataset, but there are various reasons why this might not be possible. You may also need to keep in mind any funder requirements for sharing data.
Working with sensitive personal data requires additional planning at the beginning of a project, by doing this a subset may be possible to share using a combination of informed consent, anonymisation and controlling access to data. Equally, if using third party copyrighted materials or commercial data, you may also find yourself restricted by law or contract in what you can share openly.
If you cannot share all of your data, you should consider whether parts of it, or supporting documentation and metadata, can be shared, to make your research as transparent and open as possible.
‘Sharing’ your data doesn’t have to mean sharing it with everyone, our advice is to make your data ‘as open as possible, as closed as necessary’. This allows you to put in the appropriate level of access to the data that is deemed necessary. See below for more details on this.
Preservation and disposal
As well as considering what data could be made available to others, you should also consider what data you wish to preserve for the long-term. It is not advised that you use your OneDrive for long-term storage, as it takes up storage space and is more at risk of data loss. Instead, you should preserve this dataset and accompanying documentation in your chosen data repository.
Not all research data need to be kept, and it may be impractical to keep everything, so you should still be selective in what you wish to keep. For further advice, the University of Bath hosts a useful table on deciding which data to keep and which to delete.
You should consider how long you plan on preserving data for, as some funders specifically ask that data be preserved for a minimum number of years. LSE also provides general guidance in the Retention Schedule.
Finally, for any data that won’t be preserved or shared, you will need to ensure you securely delete it. For digital data, note that normal file deletion doesn’t truly erase the data and only creates another copy in the Recycle Bin. Use an ‘eraser tool’ to ensure data is permanently removed from your devices.
The best option for archiving and sharing your work is in a data repository, as they have been specifically designed for this purpose.
If you publish your data with an external repository, send us the DOI or URL and we can create a record for the dataset in our research data catalogue.
Subject repositories
We recommend, wherever possible to share your data via a subject-specific data repository. This is because this is where you are most likely to find people doing similar work who may want to re-use your dataset, and the repository will have been tailored to preserve and share particular types of data. These include:
A key discipline-specific repository in the social sciences is the UK Data Archive, funded by the ESRC.
Nature also maintains a list of repositories by discipline, including social sciences data repositories.
You can also find a data repository relevant to your discipline at the Registry of Research Data Repositories (re3data).
Cross-disciplinary/multi-purpose repositories
If no discipline-specific repository is available, you can also deposit in multi-purpose data repositories such as Figshare or Zenodo. Zenodo and Figshare both allow registered users to deposit data free of charge and issue DOIs for datasets.
Data journals
You can also share your data by publishing in a data journal. Data journals are scholarly journals that specialise in publishing papers about datasets – examples include Data and policy, Data in brief, Journal of open humanities data and Scientific data. They allow an author to focus on the data itself, giving details of its collection, processing, software, file formats etc, rather than offering conclusions or findings. It allows the reader to understand when, how and why data was collected and what the data-product is and is a great way for researchers to gain credit and impact for data production and data sharing. Please be aware that most data journals do not host data files, you would also need to preserve the dataset in a repository.
A note on project websites
Posting your research data on a project website is not recommended as the primary distribution method for sharing data. This is because websites do not offer long-term preservation facilities, and this also wouldn’t meet funder requirements for sharing data. Instead, share your data in a repository and then link out to this from your project website.
When you have decided where you’ll be archiving and/or sharing your chosen datasets, there are two more key decisions you’ll need to take when depositing:
Access
There are a number of control mechanisms you can use to restrict who can access your data. Different repositories offer different levels of controlled access but broadly will cover:
Open: Open to all researchers to access, usually without having to register an account with the data repository.
Embargoed: data is locked down for a pre-set amount of time.
Safeguarded: users must register for access and are asked to sign a generic, end user license agreement in order to use data.
Controlled / Closed: data are too sensitive or confidential to be allowed on open or safeguarded access and can only be accessed via special permission from data owner. Usually, data can only be accessed via a Trusted Research Environment (TRE). For secure access to data at LSE, please see the secure data webpages.
Licensing
When sharing data, you should assign the data a licence so others can clearly see what they’re allowed to do with it. You can utilise one of the Creative Commons licences, or create your own bespoke terms.
You can also licence software. These licences typically cover both software itself as well as the underpinning source code. For help in choosing your licence we recommend using choose a license.
Preparing your data
You’ll then need to take some practical steps to prepare your data files for the deposit. You can find useful guidance on preparing your data from the UK Data Service.
Data access statements
Once you’ve deposited your data, make the most of this by linking the data to your published findings via a data access statement (also known as data availability statement). Data access statements are used in published works to link to any data which underpins the publication and lay out the terms for access.
In a data access statement you should include:
- a link to where the data can be accessed, ie, a data repository
- details of terms of access, such as licensing information
- if data cannot be shared, an explanation of why this is
Your chosen journal or publisher may guide the format and placement of a statement. If no ‘Data access’ or ‘Data availability’ section exist, instead, add this to your ‘Acknowledgements’ section. The University of Manchester have created a variety of data access statements that cover numerous different scenarios and can be adapted for use.
FAIR sharing
To get the most out of sharing your data you need to make it FAIR. The FAIR principles ensure data are shared in a way which enables maximum reuse for humans and machines. FAIR's 15 principles cover four areas where practical steps can be taken to improve use of your data by making it:
- Findable
- Accessible
- Interoperable
- Reusable
Findable: This can be achieved by having clear, descriptive metadata included in the data repository record and using persistent identifiers to easily link to your data.
Accessible: Once a researcher has found the data, it needs to be clear how to use it. This can be achieved by choosing a repository that supports appropriate specific access restrictions and making it clear within the record how the data can be used. You can also make this clear in your data access statement in your publications, which can link the data to the publication.
Interoperable: Data should also be shared in such a way as to ensure that it can be used in a wide variety of systems, and other datasets, which is difficult if you have used a lot of proprietary software to create the data. Effective data exchange relies on using open file formats and discipline-specific standards eg, methodologies, ontologies.
Reusable: To aid reusability deposit documentation alongside your data to enables users with no prior knowledge of the data, to clearly understand how the research was carried out and what the data mean. You should also consider how you will license your data to maximize reuse where possible, using open licences such as Creative Commons licences, or specific software licences.
CARE Principles
The CARE Principles complement the FAIR data principles. The CARE (Collective benefit, Authority to control, Responsibility and Ethics) Principles for indigenous data governance are intended to provide guidance to data-focused projects involving indigenous peoples, however they can also be a useful framework for working with research data from other vulnerable participant groups.