Manuela Salvucci
2019-11-06
“Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.”
(Pro Git, Scott Chacon and Ben Straub, 2014)
In the centralized setup, there is a single (central) copy of the project and each user will apply changes to the central copy
In the distributed setup, each user has their own (full) copy of the project (a clone)
SVN, PerForce, CVS are examples of centralised version control system
GIT and Mercurial are examples of distributed version control system
“Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise”
.gitignore: “special” file to list files and folders to intentionally not track
Prevents files/folders from showing when running git status -> less clutter
Can also be added by running git add -f (short for force)
Rationale: not all files need to be version controlled
Checkout https://www.gitignore.io/ to help identify files to ignore
To demonstrate we are going to go through an example for writing a manuscript.
“Markdown is a lightweight markup language with plain text formatting syntax”
(Wikipedia)
Markdown manuscript
---
title: 'HCP: A Matlab package to create beautiful heatmaps with richly annotated covariates'
authors:
- name: Manuela Salvucci
orcid: 0000-0001-9941-4307
affiliation: 1
- name: Jochen H. M. Prehn
orcid: 0000-0003-3479-7794
affiliation: 1
affiliations:
- name: Centre for Systems Medicine, Department of Physiology and Medical Physics, Royal College of Surgeons in Ireland, Dublin, Ireland
index: 1
date: 20 January 2019
bibliography: paper.bib
---
# Summary
A heatmap is a graphical technique that maps 2-dimensional matrices of numerical values to colors to provide an immediate
and intuitive visualization of the underlying patterns [@Eisen1998]. Heatmaps are often used in conjunction with cluster
analysis to re-order observations and/or features by similarity and thus, rendering common and distinct patterns more apparent.
When generating these visualizations, it is often of interest to interpret the underlying patterns in the context of other
data sources. In the field of bioinformatics, heatmaps are frequently used to visualize high-throughput and high-dimensional
datasets, such as those derived from profiling biological samples with *-omic* technologies (whole genome sequencing,
transcriptomics and proteomics). Often, biological samples (for example, patient tumour samples) are characterized at
multiple *-omic* level and it is of interest to contrast and compare patterns captured at the different molecular layers
along with their associations with other observable features (covariates). The concurrent display of continuous or
categorical covariates enriches the visualization with additional information such as group membership.
Markdown pdf
Two main approaches to get a git repository:
“It often happens that while working on one project, you need to use another project from within it. Perhaps it’s a library that a third party developed or that you’re developing separately and using in multiple parent projects. A common issue arises in these scenarios: you want to be able to treat the two projects as separate yet still be able to use one from within the other.”
RStudio (R IDE)
MATLAB (MATLAB IDE)
PyCharm (Python IDE)
Comparison of key features in free plans from GitHub, BitBucket and GitLab
Get int touch: manuelasalvucci@rcsi.ie
Presentation, CheatSheet, Handout and Solution: https://bitbucket.org/manuela_s/git_workshop/downloads/
Workshop repo: https://bitbucket.org/manuela_s/git_workshop
Useful resources