A sattelite

Version control

How can I manage changes to files so I know which is the current version?

  • Devise and apply a file naming convention
  • Pick a system and be consistent

Knowing what is the current version of a file helps manage revisions and protects against unauthorised changes. There are different ways to approach version control, but be consistent whatever procedure you adopt.

General advice on version control is to:

  • Apply a new version number before editing the file.
  • Keep master files of raw and cleaned data sets, and work from copies.
  • Assign responsibility for master files to someone in your research team.
  • Control access permissions to files – read, write, full access.
  • Record changes.
  • Develop a formal procedure for deleting redundant copies – old versions, drafts, working copies.

Source: based on Corti et al. (2014) Managing Research Data (Sage: London) p.73

If you are continually working on a file, a system to identify the current version is essential. There are two ways to approach this, the easiest visual way is in the file name, and an alternative is to include version information in the file itself.

File names

Changes to data or contextual documentation file names should be according to a uniform convention based on procedures established by the researcher or research team.

File names should be independent of their storage location. Use succinct file titles that provide enough information that you can understand what is in the file without having to open it. Use of dates or version numbers (or both) in a file name is useful to identify versions. Keep characters to the ISO basic Latin alphabet and Arabic numerals. Use underscores and hyphens to replace blanks.

If you are using dates, use the ISO 8601 standard of Four-digit year-two-digit month-two-digit day. So, 8 March 2015 would be “2015-03-08”.

If you are numbering files sequentially, be sure to include enough digits in your file naming convention. For example, a collection with 100 plus files should begin 001.

A numbering concept can also be used. One example is three-level versioning, or "Major.Minor.Revision".

Major numbers begin at 1. Minor and revision numbers always start with 0. Therefore, the first version of a data file is "1.0.0". Numbers change when the file changes. The number of levels (say, more or less than three) of numbering and their limits can be adapted according to the particular needs of the project. Which number changes depends on the relevance of changes. For example, a quantitative research project might adopt the following versioning policy for a data file.

Three-level versioning and criteria for changing version number:

Modification

First position/major number

Second position/minor number

Third position/revision number

One or more cases in a record is inserted or deleted

One or more variables is inserted or deleted

One or more variables is inserted or deleted

One or more new columns are inserted

One more sample/wave is added

Changes when altering a variable, i.e. significant relevant corrections or additions are made in the data (labels, recoding, data formats, etc.).

Modified when some simple revisions of labels are made.

Software and platforms may have automated version management. LSE’s Office365 platform captures previous versions of files for review and retrieval. Alternatives when working in external collaborations, or if the same file is being constantly edited, include version management software such as Git (also used in Github). Should they be considered suitable platforms for storing research data, Dropbox and Google Drive also automate revision histories. Microsoft Word also captures version information, but not previous versions, in its metadata fields

Version information in files

An alternative or matching approach is to include version information in the file itself. This allows an extra safeguard to match file name information. For example, LSE policy documents use a template that requires versioning information be included. This contains information on the date of the version, its number, and what is done to change the file:

Document control example