CS 346 (W23)
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Using Git

A Version Control Systems (VCS) is a software system designed to track changes to source code. It is meant to provide a canonical version of your project’s code and and other assets, and ensure that only desireable (and tested) changes are pushed into production. Common VCS systems include Mercurial (hg), Subversion (SVN), Perforce, and Microsoft Team Foundation Server. We’ll be using Git, a very popular VCS, in this course.

All VCS’s, including Git, let you take a snapshot of your source code at any given point in time. This example shows a project that starts with a single index.html file, adds about.html at a later time, and then finally makes some edits. The VCS tracks these changes, and provides functionality that we’ll discuss below.

A basic workflow when using version control

https://www.git-tower.com/learn/git/ebook/en/desktop-gui/basics/what-is-version-control

Why use Version Control?

A VCS provides some major benefits:

  • History: a VCS provides a long-term history of every file1. This includes tracking when files were added, or deleted, and every change that you’ve made. Changes are grouped together, so you can look at (for instance) the set changes that introduced a feature.
  • Versions: the ability to version your code, and compare different versions. Did you break something? You can always unwind back to the “last good” change that was saved, or ever compare your current code with the previously working version to identify an issue.
  • Collaboration: a VCS provides the necessary capabilities for multiple people to work on the same code simultaneously, while keeping their changes isolated. You can create branches where your changes are separate from other ongoing changes, and the VCS can help you merge changes together once they’re tested.

Installing Git

Git binaries can be installed from the Git home page or through a package manager (e.g. Homebrew on Mac). Although there are graphical clients that you can install, Git is primarily a command-line tool. Commands are of the form: git <command>.

You’ll also want to make sure that the git executable (git or git.exe) is in your path.

Concepts

Version control is modeled around the concept of a changeset: a grouping of files that together represent a change to the system (e.g. a feature that you’ve implemented may impact multple source files). A VCS is designed to track changes to sets of files.

Git is designed around these core concepts:

  • Repository: The location of the canonical version of your source code.
  • Working Directory: A copy of your repository, where you will make your changes before saving them in the repository.
  • Staging Area: A logical collection of changes from the working directory that you want to collect and work on together (e.g. it might be a feature that resulted in changes to multiple files that you want to save as a single change).

A repository can be local or remote:

  • A local repository is where you might store projects that you don’t need to share with anyone else (e.g. these notes are in a local git repository on my computer).
  • A remote repository is setup on a central server, where multiple users can access it (e.g. GitLab, GitHub effectively do this, by offering free hosting for remote repositories).

Git works by operating on a set of files (aka changeset): we git add files in the working directory to add them to the change set; we git commit to save the changeset to the local repository. We use git push and git pull to keep the local and remote repositories synchronized.

Git Diagram

https://support.nesi.org.nz/hc/en-gb/articles/360001508515-Git-Reference-Sheet

Local Workflow

To create a local repository that will not need to be shared:

  1. Create a repository. Create a directory, and then use the git init command to initialize it. This will create a hidden .git directory (where Git stores information about the repository).
$ mkdir repo
$ cd repo
$ git init
Initialized empty Git repository in ./repo/.git/
$ ls -a
.    ..   .git
  1. Make any changes that you want to your repository. You can add or remove files, or make change to existing files.
$ vim file1.txt
ls -a
.         ..        .git      file1.txt
  1. Stage the changes that you want to keep. Use the git add command to indicate which files or changes you wish to keep. This adds them to the “staging area”. git status will show you what changes you have pending.
$ git add file1.txt 
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
new file:   file1.txt
  1. Commit your staging area. git commit assigns a version number to these changes, and stores them in your local repository as a single changeset. The -m argument lets you specify a commit message. If you don’t provide one here, your editor will open so that you can type in a commit message. Commit messages are mandatory, and should describe the purpose of this change.
$ git commit -m "Added a new file"Remote Workflow

Remote Workflow

A remote workflow is almost the same, except that you start by making a local copy of a repository from a remote system.

  1. Clone a remote repository. This creates a new local repository which is a copy of a remote repository. It also establishes a link between them so that you can manually push new changes to the remote repo, or pull new changes that someone else has placed there.
# create a copy of the CS 346 public repository
$ git clone https://git.uwaterloo.ca/j2avery/cs346.git ./cs346

Making changes and saving/committing them is the same as the local workflow (above).

  1. Push to a remote repository to save any local changes to the remote system.
$ git push
  1. Pull from remote repository to get a copy of any changes that someone else may have saved remotely since you last checked.
$ git pull
  1. Status will show you the status of your repository; log will show you a history of changes.
# status when local and remote repositories are in sync
$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

# condensed history of a sample repository
$ git log --oneline
b750c10 (HEAD -> master, origin/master, origin/HEAD) Update readme.md
fcc065c Deleted unused jar file
d12a838 Added readme
5106558 Added gitignore

Branching

The biggest challenge when working with multiple people on the same code is that you all may want to make changes to the code at the same time. Git is designed to simplify this process.

Git uses branches to isolate changes from one another. You think of your source code as a tree, with one main trunk. By default, everyone in git is working from the “trunk”, typically named master or main (you can see this when we used git status above).

img

https://www.nobledesktop.com/learn/git/git-branches

A branch is a fork in the tree, where we “split off” work and diverge from one of the commits (typically we split from a point where everything is working as expected)! Once we have our feature implemented and tested, we can merge our changes back into the master branch.

Notice that there is nothing preventing multiple users from doing this. Because we only merge changes back into master when they’re tested, the trunk should be relatively stable code2.

We have a lot of branching commands:

$ git status	// see the current branch
On branch master

$ git branch test // create a branch named test
Created branch test

$ git checkout test  // switch to it
Switched to a new branch 'test'

$ git checkout master //switch back to master
Switched to branch 'master'

$ git branch -d test // delete branch 
Deleted branch test (was 09e1947).

When you branch, you inherit changes from your starting branch. Any change that you make on that branch are isolated until you choose to merge them.

A typical workflow for adding a feature would be:

  1. Create a feature branch for that feature.
  2. Make changed on your branch only. Test everything.
  3. Code review it with the team.
  4. Switch back to master and git merge from your feature branch to the master branch. If there are no conflicts with other change on the master branch, your changes will be automatically merged by git. If your changed conflict (e.g multiple people changed the same file and are trying to merge all changed) then git may ask you to manually merge them.
$ git checkout -b test // create branch
Switched to a new branch 'test'

$ vim file1.md // make some changes
$ git add file1.md
$ git commit -m "Committing changed to file1.md"

$ git checkout master // switch to master
$ git merge test // merge changes from test 
Updating 09e1947..ebb5838
Fast-forward
 file1.md                   | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+), 18 deletions(-)

$  git branch -d test // remove branch (optional)
Deleted branch test (was ebb5838).

Merging Code

This is a trivial case, demonstrating a merge that happens very soon after the branch was created. However, it’s more likely that branches will be created and worked on for a long period of time before you merge back to main.

When you merge, Git examines your copy of each file, and attempts to apply any other changes that may have been committed to main since you created the branch. (If there’s multiple people working on the project, it’s not unusual for multiple changes to be made to the same file). In many cases, as long as there are no conflicts, Git will merge the changes together. However, if Git is unable to do so (e.g. you and a colleague both changed the same file and your changes overlap), then you will be prompted to manually merge the changes together.

When this happens, Git will apply both changes to the source file, and add inline comments. You have to manually fix the file, and then commit the change before attempting to merge again.

Pull Requests (PRs)

One way to avoid merge issues is to review changes before they are merged into main (this also lets you review the code, manually run tests etc). The standard mechanism for this is a Pull Request (PR). A PR is simply a request to another developer (possibly the person responsible for maintaining the main branch) to git pull your feature branch and review it before merging.

We will not force PRs in this course, but you might find them useful within your team.

GitLab also calls these Merge Requests.

Creating a Merge Request in GitLab

Best Practices

These are suggestions for working with Git effectively.

  • Work iteratively. Learn to solve a problem in small steps: define the interface, write tests against that interface, and get the smallest functionality tested and working.
  • Commit often! Once you have something work (even partly working) commit it! This gives you the freedom to experiment and always revert back to a known-good version.
  • Branch as needed. Think of a branch as an efficient way to go down an alternate path with your code. Need to make a major change and not sure how it will work out? Branch and work on it without impacting your main branch.
  • Store your projects in private, online repositories. Keep them private so that you don’t share them unless it’s appropriate. Being online provides a remote backup and makes it easy to add someone to your project later.

https://xkcd.com/1597

https://xkcd.com/1597


  1. Versioning is useful for more than just source code. These course notes, for instance, are in a git repo, along with source code, image and other assets. ↩︎

  2. Unless your changes conflict, but that’s why we do integration testing! ↩︎