Version Control

Learning Objectives

After this lesson, you should be able to:

  • Understand the basics of git as a resource for reproducible programming
  • Describe tools and approaches to creating your git Repositories
  • Describe best practices for maintaining GitHub Organizations and Repositories
  • Maintain own GitHub user profile and repositories

Version control refers to keeping track of the version of a file, set of files, or a whole project.

Some version control tools:

  • Microsoft Office's Track Changes functionality
  • Apple's Time Machine
  • Google Docs' Version History
  • Git

Version control is as much a philosophy as a set of tools; you don't need to master Git to utilize version control (though it is certainly a worthwhile tool for many researchers).


Git-related Definitions

Git - tool for version control.

GitHub - hosted server that is also interactive.

repo - short for repository

local - on your personal computer.

remote - somewhere other than your computer. GitHub can host remote repositories.

clone - copy of a repository that lives locally on your computer. Pushing changes will affect the repository online.

fetch - getting latest changes to the repository on your local computer.

branch - a history of changes to a repository. You can have parallel branches with separate histories, allowing you to keep a "main" version and development versions.

fork - copy of someone else's repository stored locally on your account. From forks, you can make pull requests to the main branch.

upstream - primary or main branch of original repository.

downstream - branch or fork of repository.

commit - finalize a change.

push - add changes back to the remote repository.

merge - takes changes from a branch or fork and applies them to the main.

pull request - proposed changes to/within a repository.

issue - suggestions or tasks needed for the repository. Allows you to track decisions, bugs with the repository, etc.

Git vs. GitHub

Git is a command-line program for version control of repositories. It keeps track of changes you make to files in your repository and stores those changes in a .git folder in that repository. These changes happen whenever you make a commit. Git stores the history of these commits in a "tree", so you can go back to any previous commit. By keeping track of the differences between commits, Git can be much more efficient than storing an entire copy of each version in a document's history.

You could utilize Git completely on its own, on your local computer, and get a lot of benefits. You will have a history of the changes you made to a project, allowing you to go back to any old version of your work. However, where Git really shines is in collaborative work. In order to effectively collaborate with others on a project, you need two basic features: a way to allow people to work in parallel, and a way to host repositories somewhere where everyone can access them. The first feature is branching, which is part of Git, and the hosting part can be taken care of by platforms like GitHub, GitLab, or Bitbucket. We will focus on GitHub.

GitHub is a site that can remotely host your Git repositories. By putting your repository onto GitHub, you get a backup of the repository, a way to collaborate with others, and a lot of other features.

Practical Git Techniques

After learning the basics of using Git, which you can learn with the Software Carpentry Git Lesson, there are some next things that can be useful to learn. Here are a couple topics that are worth digging into more:

Useful GitHub Features

At its core, GitHub is just a place to host your Git repositories. However, it offers a lot of functionality that has less to do with Git, and more to do with our favorite topic, Project Management. We will walk through a few of these useful features.


Git is not really for storing or manipulating data, especially large files. But the CyVerse Discovery Environment{target=_blank is a great place to serve, store, and share data.

Self Assessment

True or False: Using Git requires a GitHub account


Git is open source software.

GitHub is a privately owned (Microsoft) company

Other platforms like GitLab, GitBucket, and GNU Savannah all offer Git as a version control system service.

True or False: Using Git is easy


Using Git can be frustrating to even the most experienced users

When you find a new repository on GitHub that you think can help your research, what are the first things you should do?

Look at the

Most GitHub repositories have a file which explains what you're looking at.

Look at the LICENSE

Not all repositories are licensed the same way - be sure to check the LICENSE file to see whether the software is open source, or if it has specific requirements for reuse.

