GIT workshop

GIT workshop

Manuela Salvucci

2019-11-06

Outline

  • What is version control?
  • Why bother with “formal” version control?
  • How to install and get started with GIT
  • Use GIT core features
  • Review files history, revert/amend changes
  • Collaborate online with others with BitBucket, GitHub or GitLab
  • Hands-on examples

Version control

What is version control?

“Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.”

(Pro Git, Scott Chacon and Ben Straub, 2014)

From https://wac-cdn.atlassian.com/dam/jcr:34e935dd-3108-40ef-bb3d-9ed01d977d6d/hero.svg?cdnVersion=659

Without version control… a way too familiar picture

From http://phdcomics.com/comics/archive.php?comicid=1531

Without version control… “informal” versioning

  • None
  • Named files:
    • OK:
      • manuscript_my_draft.docx
      • manuscript_my_draft_with_coauthor_comments.docx
    • Better:
      • manuscript_draft_v01.docx
      • manuscript_draft_v02.docx
  • Named zip-files:
    • manuscript_drafts.zip
    • manuscript_cell_submission.zip
    • manuscript_pnas_submission.zip
    • manuscript_pnas_revisions.zip
    • manuscript_pnas_proofs.zip
  • Sync online services (Microsoft/Dropbox/Google/Overleaf/Sharelatex)

Without version control… challenges

  • Time consuming
  • Error prone
  • Requires self-discipline (save everything, good file names, sticking to a routine, …)
  • Relationship between changes in multiple files is lost
  • Information about what, when and why something changed is lost?
    • How would you go about finding out when the p-value for Figure 2.A got set to the (wrong) value?
  • Non-linear history (parallel versions)
  • Disk space

Why bother with “formal” version control?

  • We are too busy to use inefficient, manual, error-prone versioning
  • Research is increasingly collaborative:
    • we need a better way to document the rational behind data cleaning, analysis steps, generation of figures, write-ups…
    • we need a better way to “merge” inputs and feedback to the project from collaborators
    • often your future self is the collaborator (and you don’t reply to emails…)
  • We do research anywhere:
    • on our workstation at work
    • on the laptop at home/bus/conference
    • on a dedicated facility workstation
  • Projects are always evolving and never “really” finished

What can Version Control Systems do for my research?

  • Version Control Systems are software that keep track of your files and their full history
  • Project files and “history” in the form of “snapshots”/“checkpoints” are organized in a folder
  • Explicitly indicate what file(s) and what change(s) to store with a named snapshot (include why the changes were made)
  • Can “go back in time” and see/use files how they look at a specific snapshot
  • Can see what changed between snapshots, and in what snapshot content was first introduced
  • Can “experiment” by having “organized parallel versions” of files
  • Synchronize different copies of the project between different computers/collaborators

Version Control Systems - Vocabulary

  • Version Control Systems are software that keep track of your files and their full history
  • Project files and “history” in the form of “snapshots”/“checkpoints” are organized in a folder -> repository
  • Explicitly indicate what file(s) and what change(s) to store with a named snapshot (include why the changes were made) -> commit or revision
  • Can “go back in time” or “jump forward” and see/use files how they look at a specific snapshot -> checkout or revert
  • Can see what changed between snapshots, and in what snapshot content was first introduced -> diff, annotate and blame
  • Can “experiment” by having “organized parallel versions” of files -> branch
  • Synchronize different copies of the project between different computers/collaborators -> push & pull

What type of files can I track with version control?

  • All types of files can be tracked with version control (but big files may require special care)
  • Version control is most useful for plain “text”-files (txt, md, tex, csv, .py, .R, .m, html, ….) where differences between versions can be “easily” visualized and multiple changes can be merged/combined automatically
  • Version control works also for binary files (docx, xlsx, etc.), but it would only tell us if there is a change, but not visualize the change and the version control system will not be able to merge changes automatically

What version control systems are available?

  • GIT, PerForce, Mercurial, Subversion (SVN), Bazaar, Concurrent Versions System (CVS), Monotone, ….
  • We will focus on GIT in this workshop

Centralised vs. distributed version control system

  • In the centralized setup, there is a single (central) copy of the project and each user will apply changes to the central copy

  • In the distributed setup, each user has their own (full) copy of the project (a clone)

  • SVN, PerForce, CVS are examples of centralised version control system

  • GIT and Mercurial are examples of distributed version control system

GIT

GIT

From https://git-scm.com/
  • Popular version control software:
    • Distributed system
    • Free and Open Source
    • Available for Windows, Linux and Mac
    • A lot of support, infrastructure and tools available to interface with GIT:
      • graphical user interfaces (GUIs)
      • seamless integration with Integrated Development Environments (IDEs) for R, MATLAB, Python, …
      • cloud services (BitBucket, GitHub, GitLab)
  • Developed by the Linus Torsvalds in 2005 to manage the development of Linux and maintained by Junio Hamano

Tracking large files with GIT

“Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise”

https://git-lfs.github.com/

  • Drop-in replacement for “normal” GIT -> git lfs add vs. git add
  • Files are stored “externally”, so that GIT operations can run seamlessly and fast
  • Good solution for “large” files (100 MB - 2 GB)
  • Alternatives:
    • git-annex
    • Do not version control large file
      • set permission to Read Only
      • version control metadata instead

Exclude files from tracking

  • .gitignore: “special” file to list files and folders to intentionally not track

  • Prevents files/folders from showing when running git status -> less clutter

  • Can also be added by running git add -f (short for force)

  • Rationale: not all files need to be version controlled

    • figures, tables, manuscript pdf generated by running code -> version control the raw data and the code to generate outputs instead
    • temporary files, compiled outputs, …
  • Checkout https://www.gitignore.io/ to help identify files to ignore

Installation

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Go to https://git-scm.com/

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Select to download latest stable GIT release for Windows

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Wait for executable to download

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Save executable in suggested folder

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Click on executable to start installation process

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Select Install anyway

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Select Yes

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Next

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Accept default settings by clicking on Install

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Monitor installation progress

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Select Launch Git Bash, Unselect View Release Notes and click on Finish

Git installation

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
  4. Verify installation completed successfully
Verify installation completed successfully

Now you install GIT on your computer (5-10 min)

  1. Go to https://git-scm.com/
  2. Download executable in suggested directory
  3. Install by following step-by-step instructions and accepts default settings
Signal once installation progress has started

Demo

Demo

To demonstrate we are going to go through an example for writing a manuscript.

  • We will track the history of our manuscript and accompanying files in git
  • We will use git to see the history of our files and to undo a mistake
  • We will use git to synchronize the files between multiple computers and to collaborate with other authors

Writing papers with Markdown

“Markdown is a lightweight markup language with plain text formatting syntax”

(Wikipedia)

  • Markdown text (.md extension) can be converted to other formats (.docx, .pdf, .html) with Pandoc
  • References can also be stored in plain text files (.bib)
  • Learn more about markdown here
  • Try it online

Writing papers with Markdown - example

Markdown manuscript

---
title: 'HCP: A Matlab package to create beautiful heatmaps with richly annotated covariates'
authors:
 - name: Manuela Salvucci
   orcid: 0000-0001-9941-4307
   affiliation: 1
 - name: Jochen H. M. Prehn
   orcid: 0000-0003-3479-7794
   affiliation: 1
affiliations:
 - name: Centre for Systems Medicine, Department of Physiology and Medical Physics, Royal College of Surgeons in Ireland, Dublin, Ireland
   index: 1
date: 20 January 2019
bibliography: paper.bib
---

# Summary
A heatmap is a graphical technique that maps 2-dimensional matrices of numerical values to colors to provide an immediate
and intuitive visualization of the underlying patterns [@Eisen1998]. Heatmaps are often used in conjunction with cluster
analysis to re-order observations and/or features by similarity and thus, rendering common and distinct patterns more apparent.
When generating these visualizations, it is often of interest to interpret the underlying patterns in the context of other
data sources. In the field of bioinformatics, heatmaps are frequently used to visualize high-throughput and high-dimensional
datasets, such as those derived from profiling biological samples with *-omic* technologies (whole genome sequencing,
transcriptomics and proteomics). Often, biological samples (for example, patient tumour samples) are characterized at
multiple *-omic* level and it is of interest to contrast and compare patterns captured at the different molecular layers
along with their associations with other observable features (covariates). The concurrent display of continuous or
categorical covariates enriches the visualization with additional information such as group membership.

Markdown pdf

Getting started

Two main approaches to get a git repository:

  • start a repository from scratch -> git init
  • start by cloning an existing repository -> git clone

Practical example

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Make a project folder

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Name it demo

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Open GIT bash

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
GIT bash

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Configure GIT

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Initialize repository

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
The folder still looks empty after git init. There is a hidden .git directory that you can normally not see

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
If you explicitly open the .git subdirectory, you can see a lot of files internal to GIT. You do not need to directly interact with these files (and do not delete them)

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Create a new text document for a manuscript we are writing

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Rename the file to manuscript.md to indicate that the file is formatted with markdown

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
First manuscript draft

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
We can use git status command to see what the repository status is

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
To prepare a new file to be added to the repository, we use git add. If we re-run git status we now see that the file is staged

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
To store changes in the repository, we use git commit. We specify a commit message after -m to record what we did

Anatomy of a commit

  • Includes:
    • what changed compared to the previous commit (snapshot)
      • which files are affected by changes and how
    • rationale for the change (commit message)
    • timestamp
    • “name”: unique identifier represented by SHA-1 hashes
      • for example: 4fc82ba7bb3f3a3de8ac57f16b6a926a7e60a21e
      • first 6 digits are typically sufficient to describe a commit -> shorthand version 4fc82ba
    • “parent” commit (reference to previous snapshot)
      • first commit is special (has no parent)
      • last commit is special (it is called HEAD)
  • The full series of commits makes up the whole project

Guidelines on commits… size matter

  • Commit small units of changes and commit often
  • A good unit of change is a small, self-contained, working change
    • GOOD: data.csv, process_data_figure1.py, make_figure1.py
    • BAD: 1 commit with a day worth of work (on multiple fronts)
  • Rule of thumb: commit together what you would need to undo if you later want to disregard this change

Guidelines on commits… message

  • Write good commit messages:
    • GOOD: Update ReadMe to include ‘how-to-install’ section. Fixes issue ##1
    • BAD: Major fixup
    • which of the 2 messages above would you rather read the evening before a deadline?
  • A perfect commit message summarises the what and why of the change, not the how (can be seen from the diffs)
  • Other advice include:
    • keep the message subject coincise (<50 words) -> log looks cleaner
    • add additional details (if needed) after a blank line and wrap at 72 characters -> readability
    • use imperative verb (Add vs. Added) -> if change get reverted, message reads better (Revert Add …)
    • use commit.template
  • Examples of how (not to) write commit messages
  • More tips on writing good commit messages

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
We use GIT status to check that there are no outstanding changes

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Let us do some more work on the manuscript. We need to add more details for materials and methods and add a section for references

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Once we have finished with our change, we use git commit to add the new version of the file to the GIT repository

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
We changed our mind, and we will use mutation data instead of RNASeq. Let us update the materials and methods

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Commit the change as before

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Add figures and tables to our manuscript

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
We can use the git diff command to see how our current files are different from the last one checked into the repository

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
If we create a figures directory with some image files and run git diff we see that this directory is untracked by GIT

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
We add the whole directory with git add and rerun git status. Now it lists the files as new instead

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
To commit the files, we use git commit -a instead of listing them. -a means all, and will commit all files that we have added or modified

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
The git log command can show a history from the repository. Last change on top. –online gives a more compact representation

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Git diff can also be used to show the difference between two revisions in the history. We need to specify the two commit identifiers

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Actually, we changed our mind again, and want to use RNASeq. Let us revert the previous change

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
GIT will ask us for a commit message for the revert. The default message is fine

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
GIT confirms the change, like a normal commit

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
The history captures the revert

Practical example

  1. Make a project folder
  2. Start a GIT bash in the project folder
  3. Configure GIT
  4. Initialize repository
  5. Standard workflow
    1. Make edits
    2. (git add)
    3. git commit
    4. Repeat
  6. git status
  7. git diff
  8. git log
  9. git revert
Note that we did not just go back to a previous revision. We selectively undid the RNASeq->mutation change, but we still have the figures and tables, which was added afterwards. GIT has automatically merged our changes together

Other useful GIT commands

  • git rm FILENAME: delete tracked file
  • git mv FILENAME1 FILENAME2: rename file from FILENAME1 to FILENAME2
  • git log –follow: inspec t log (even with renaming)

I am working on it…

  • git add -p FILENAME: add portions of changes you made to a file
    • preserve other changes, but they will not be captured in this commit
    • useful when you set out to make some changes, but you could not help fixing (other) unrelated stuff
  • git squash: pool related commits in a meta-commit
  • git stash: stash away work in progress which is in a state that is too preliminary to be committed and get back to it later
    • git stash list
    • git stash pop
    • git stash drop

Ops, I did not mean to do that…. Let’s pretend it never happened

  • git commit –amend: by far the most used command
    • useful when you forgot to add a file before committing or you would like to change commit message
  • git revert SHA: revert changes applied by SHA by creating a new commit
  • git checkout FILENAME: undo (uncommited) changes to FILENAME
  • git checkout SHA: checkout a snapshot where all was good
  • git reset: undo changes, degree of annihilation depends on flags (–soft vs. –hard), be careful

Ops, something went terribly terribly wrong, at some point in the past

  • git show : inspect (suspicious) commit
  • git blame: when and who changed/broke this?
  • git bisect: run binary search to identify when problem was introduced
    • extremelly useful command, a life-saver
    • requires knowing what right vs. wrong means (unit tests, ground truth, …)

GIT SubModules… russian dolls repositories

“It often happens that while working on one project, you need to use another project from within it. Perhaps it’s a library that a third party developed or that you’re developing separately and using in multiple parent projects. A common issue arises in these scenarios: you want to be able to treat the two projects as separate yet still be able to use one from within the other.”

https://git-scm.com/book/en/v2/Git-Tools-Submodules

Exercise 1

Exercise 1 (20 min)

  1. Create a folder named “christmas_repo”
  2. Open GIT bash, verify the installation and configure GIT
  3. Initialize a GIT repository in the folder
  4. Create a text file (“wish_list.md”) with 3 gifts you wish to receive for Christmas
  5. Add and commit the wish list file to GIT
  6. Edit the wish list file, and add 2 more presents
  7. Use GIT to check the difference between the current and previous version
  8. Commit the updated file
  9. Create a file (“recipients.md”) with a list of people you plan to buy gifts for
  10. Check the status of the GIT repository
  11. Add and commit the new file to GIT
  12. Create a file (“past_gifts.md”) with a list of what gifts you gave last year
  13. Maybe you remembered a few more people you would like to give gifts to. Add them to “recipients.md”
  14. Add the new file and commit “past_gifts.md” and “recipients.md” to GIT
  15. Look at the GIT history
  16. Revert the change that added more presents to the wish list in step 5
  17. Play around with doing more changes and commits

GIT support tools

GIT support tools

  • Graphical user interface (GUIs)
  • Integration with software Integrated Development Environments (IDEs) for R, MATLAB, Python, …
  • Cloud services (BitBucket, GitHub, GitLab)

GIT graphical user interface (GUI)

From https://git-scm.com/downloads/guis

GIT graphical user interface (GUI)

GIT graphical user interface (GUI)

GIT integration with software Integrated Development Environments (IDE)

RStudio (R IDE)

MATLAB (MATLAB IDE)

PyCharm (Python IDE)

Cloud services that support GIT… “social” coding

  • Servers that can host a copy of your repository
  • Useful as a backup
  • Can make synchronization and collaboration easier
  • Free plans available
  • Most popular alternatives:

  • Other alternatives include Crucible, AWS CodeCommit, CodeCommit, ….

Cloud services that support GIT… “social” coding

Comparison of key features in free plans from GitHub, BitBucket and GitLab

  • Similar products, select the one that suits best your needs

Bitbucket

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Go to the BitBucket website and click Get Started

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Follow instruction by filling in required info

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Follow instruction by filling in required info

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Verify email

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Log in with your credential

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Log in with your credential

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Choose your username

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Finalize setup

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Complete account creation

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Create a repository for the demo

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Create a repository for the demo

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Create a repository for the demo

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Since we have an existing repository to upload, we follow the instructions for Get your local repository on BitBucket

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
We go to the GIT bash to upload. We need to use the https protocol (instead of ssh) on the RCSI network

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Also to use git from the RCSI network, we need a workaround for ssl verification

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The BitBucket landing page for the repository shows the list of files and when they were last changed

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
We can see the history of commits

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The content of the last commit

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The content of the last commit

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
BitBucket has an annotate feature which highlights when each line in the file was last changed

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Markdown rendering of the manuscript file

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Collaborator clones repository

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Collaborator adds text on bioinformatic analysis

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Collaborator commits their change

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Collaborator tries to push their change to the bitbucket server. This fails, because another change has been made after they clone

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Collaborator needs to first pull from the server

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The pull results in a merge between the two changes. They accept the default commit message for the merge

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The pull is successful

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
Now they can push their change to the server

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The server now shows history for the file that includes both the collaborators changes and my other simultaneous change, and show that they have been merged together

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
When we annotate the file we see the bioinformatic analysis text from the collaborator and the samples information from our change

Using branches and tags

  • Implicit branches:
    • The collaborator has their own temporary unnamed branch when working on the files simultaneous to other work. The branches get merged when they pull and push.
    • Similarly you can get temporary unnamed branches when doing work on two computers
  • Explicit branches:
    • You can create a named branch for changes that you want to keep separate from your main work (master branch)
    • The branches optionally be can be be merged later
  • Tags:
    • You can name a specific revision (snapshot), like a milestone, to keep track of it
    • For example to have a record of what specific revision was used for:
      • a paper submission
      • conference talk

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The history graph shows a figures branch for work on the figures that is kept separate from the rest

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
The figures branch was merged with the rest of the work

BitBucket example

  1. Create account and log in
  2. Push demo repository to BitBucket
  3. Show history and diffs
  4. Collaboration scenario
    1. Cloning repository (before last commit)
    2. Making changes
    3. Pushing and pulling
    4. Inspecting history
We tagged the revision we shared with the other co-authors

Exercise 2

Exercise 2 (20 min)

  1. Create a BitBucket account
  2. Push your “christmas_repo” from exercise 1 to BitBucket
  3. Add more past_gifts and push to BitBucket
  4. Update recipients through the BitBucket web-interface, and pull to your local machine
  5. Collaborate in pairs

Wrap-up

Take home messages

  • Version control with GIT helps keep your file history organized
  • Light weight: minimum effort required
  • If you are not comfortable with using the command line, download a GUI or use GIT from your IDE
  • Commit early and often
  • Write good commit messages - future you will appreciate it
  • Useful for backups and for collaboration

Thanks & Questions