Archiving and sharing data

Introduction

Research data is a valuable output, so it’s important to consider how you can preserve and if possible, share with it others after the end of a research project. Where possible datasets should be deposited in a data repository which can support long-term storage of data underpinning research outputs, enable the discovery and sharing of data for future reuse by others and provide a persistent identifier for long-term discoverability.

Many funders and publishers, require that research data be shared as openly as possible. LSE's Research Data Management Policy recognises that effective management of research data for long-term preservation and reuse is an integral part of good research practice.

Each dataset and research project are different so it is important to consider your data preservation and sharing plans at the beginning of your project, to allow you to tailor your approach based on the data you are working with.

Why archive and share your data?

facilitates transparency, integrity, validation and reproducibility of research findings
ensures funder and publisher requirements are met
preserves data in the long term to protect against loss, obsolescence or deterioration
increases visibility of data that can be cited by others and enhance opportunities for collaboration with your research colleagues

Considerations for sharing

The first step is to work out which data can be shared. It may be that you can share the whole dataset, but there are various reasons why this might not be possible. You may also need to keep in mind any funder requirements for sharing data.

Working with sensitive personal data requires additional planning at the beginning of a project, by doing this a subset may be possible to share using a combination of informed consent, anonymisation and controlling access to data. Equally, if using third party copyrighted materials or commercial data, you may also find yourself restricted by law or contract in what you can share openly.

If you cannot share all of your data, you should consider whether parts of it, or supporting documentation and metadata, can be shared, to make your research as transparent and open as possible.

‘Sharing’ your data doesn’t have to mean sharing it with everyone, our advice is to make your data ‘as open as possible, as closed as necessary’. This allows you to put in the appropriate level of access to the data that is deemed necessary. See below for more details on this.

Preservation and disposal

As well as considering what data could be made available to others, you should also consider what data you wish to preserve for the long-term. It is not advised that you use your OneDrive for long-term storage, as it takes up storage space and is more at risk of data loss. Instead, you should preserve this dataset and accompanying documentation in your chosen data repository.

Not all research data need to be kept, and it may be impractical to keep everything, so you should still be selective in what you wish to keep. For further advice, the University of Bath hosts a useful table on deciding which data to keep and which to delete.

You should consider how long you plan on preserving data for, as some funders specifically ask that data be preserved for a minimum number of years. LSE also provides general guidance in the Retention Schedule.

Finally, for any data that won’t be preserved or shared, you will need to ensure you securely delete it. For digital data, note that normal file deletion doesn’t truly erase the data and only creates another copy in the Recycle Bin. Use an ‘eraser tool’ to ensure data is permanently removed from your devices.

The best option for archiving and sharing your work is in a data repository, as they have been specifically designed for this purpose.

If you publish your data with an external repository,  send us the DOI or URL and we can create a record for the dataset in our research data catalogue.

Subject repositories

We recommend, wherever possible to share your data via a subject-specific data repository. This is because this is where you are most likely to find people doing similar work who may want to re-use your dataset, and the repository will have been tailored to preserve and share particular types of data. These include:

A key discipline-specific repository in the social sciences is the UK Data Archive, funded by the ESRC.

Nature also maintains a list of repositories by discipline, including social sciences data repositories.

You can also find a data repository relevant to your discipline at the Registry of Research Data Repositories (re3data).

LSE Research Online

You can also deposit data in LSE Research Online (LSERO). This is our institutional repository for the preservation and sharing of research data produced or collected at LSE. This is a good option where no discipline-specific repository is available, or you want to store and share your research data within LSE systems, and link with your LSERO research publications. For more information on this please see our section ‘deposit data in LSE Research Online’.

Cross-disciplinary/multi-purpose repositories

You can also deposit in multi-purpose data repositories such as Figshare or Zenodo. Zenodo and Figshare both allow registered users to deposit data free of charge and issue DOIs for datasets.

Data journals

You can also share your data by publishing in a data journal. Data journals are scholarly journals that specialise in publishing papers about datasets – examples include Data and policy, Data in brief, Journal of open humanities data and Scientific data. They allow an author to focus on the data itself, giving details of its collection, processing, software, file formats etc, rather than offering conclusions or findings. It allows the reader to understand when, how and why data was collected and what the data-product is and is a great way for researchers to gain credit and impact for data production and data sharing. Please be aware that most data journals do not host data files, you would also need to preserve the dataset in a repository.

A note on project websites

Posting your research data on a project website is not recommended as the primary distribution method for sharing data. This is because websites do not offer long-term preservation facilities, and this also wouldn’t meet funder requirements for sharing data. Instead, share your data in a repository and then link out to this from your project website.

When you have decided where you’ll be archiving and/or sharing your chosen datasets, there are two more key decisions you’ll need to take when depositing:

Access

There are a number of control mechanisms you can use to restrict who can access your data. Different repositories offer different levels of controlled access but broadly will cover:

Open: Open to all researchers to access, usually without having to register an account with the data repository.

Embargoed: data is locked down for a pre-set amount of time.

Safeguarded: users must register for access and are asked to sign a generic, end user license agreement in order to use data.

Controlled / Closed: data are too sensitive or confidential to be allowed on open or safeguarded access and can only be accessed via special permission from data owner. Usually, data can only be accessed via a Trusted Research Environment (TRE). For secure access to data at LSE, please see the secure data webpages.

Licensing

When sharing data, you should assign the data a licence so others can clearly see what they’re allowed to do with it. You can utilise one of the Creative Commons licences, or create your own bespoke terms.

You can also licence software. These licences typically cover both software itself as well as the underpinning source code. For help in choosing your licence we recommend using choose a license.

Preparing your data

You’ll then need to take some practical steps to prepare your data files for the deposit. You can find useful guidance on preparing your data from the UK Data Service.

Data access statements

Once you’ve deposited your data, make the most of this by linking the data to your published findings via a data access statement (also known as data availability statement). Data access statements are used in published works to link to any data which underpins the publication and lay out the terms for access.

In a data access statement you should include:

a link to where the data can be accessed, ie, a data repository
details of terms of access, such as licensing information
if data cannot be shared, an explanation of why this is

Your chosen journal or publisher may guide the format and placement of a statement. If no ‘Data access’ or ‘Data availability’ section exist, instead, add this to your ‘Acknowledgements’ section. The University of Manchester have created a variety of data access statements that cover numerous different scenarios and can be adapted for use.

FAIR sharing

To get the most out of sharing your data you need to make it FAIR. The FAIR principles ensure data are shared in a way which enables maximum reuse for humans and machines. FAIR's 15 principles cover four areas where practical steps can be taken to improve use of your data by making it:

Findable
Accessible
Interoperable
Reusable

Findable: This can be achieved by having clear, descriptive metadata included in the data repository record and using persistent identifiers to easily link to your data.

Accessible: Once a researcher has found the data, it needs to be clear how to use it. This can be achieved by choosing a repository that supports appropriate specific access restrictions and making it clear within the record how the data can be used. You can also make this clear in your data access statement in your publications, which can link the data to the publication.

Interoperable: Data should also be shared in such a way as to ensure that it can be used in a wide variety of systems, and other datasets, which is difficult if you have used a lot of proprietary software to create the data. Effective data exchange relies on using open file formats and discipline-specific standards eg, methodologies, ontologies.

Reusable: To aid reusability deposit documentation alongside your data to enables users with no prior knowledge of the data, to clearly understand how the research was carried out and what the data mean. You should also consider how you will license your data to maximize reuse where possible, using open licences such as Creative Commons licences, or specific software licences.

CARE Principles

The CARE Principles complement the FAIR data principles. The CARE (Collective benefit, Authority to control, Responsibility and Ethics) Principles for indigenous data governance are intended to provide guidance to data-focused projects involving indigenous peoples, however they can also be a useful framework for working with research data from other vulnerable participant groups.

LSERO is a place where staff and students can deposit datasets related to your LSE publications, or other data created at LSE which has long-term value. We can archive your data, with secure, reliable long-term storage, and if you choose, also openly share your data, with high-quality metadata and indexing for greater visibility and discovery of your data.

We provide a hands-on deposit service, tailored to your data requirements, providing advice and guidance around most appropriate options for archiving and sharing your data and take you through each of the steps, if you are new to the process.

Up to 20 GB of data per project can be deposited at no charge. Deposits greater than 25 GB may be subject to a charge and must be discussed with us before the deposit is made.

By depositing data in LSERO you can:

deposit your data files and related documentation for long-term archiving and access;
receive a Datacite DOI – a unique, persistent identifier for your dataset, so that it can be cited and linked to;
license your data and choose access conditions to control how your data can be accessed and used.
You can also submit information about datasets that you have deposited in other data repositories to LSERO, which you can then link to your LSERO publications, including theses.

To deposit data please first read our guidance on how to prepare your research data for deposit at LSE and then complete our data deposit form. If you’d like to book a consultation to discuss your data, please select this option when completing the form.

Archiving and sharing data

Introduction

Why archive and share your data?

Archiving and sharing data

Selecting data to archive/share

Where to archive/share data

How to archive/share

Deposit data in LSE Research Online

New LSE Research Online