A Version Control Systems (VCS) is a software system designed to track changes to source code. It is meant to provide a canonical version of your project’s code and and other assets, and ensure that only desireable (and tested) changes are pushed into production. Common VCS systems include Mercurial (hg), Subversion (SVN), Perforce, and Microsoft Team Foundation Server. We’ll be using
Git, a very popular VCS, in this course.
All VCS’s, including Git, let you take a snapshot of your source code at any given point in time. This example shows a project that starts with a single
index.html file, adds
about.html at a later time, and then finally makes some edits. The VCS tracks these changes, and provides functionality that we’ll discuss below.
Why use Version Control?
A VCS provides some major benefits:
- History: a VCS provides a long-term history of every file1. This includes tracking when files were added, or deleted, and every change that you’ve made. Changes are grouped together, so you can look at (for instance) the set changes that introduced a feature.
- Versions: the ability to version your code, and compare different versions. Did you break something? You can always unwind back to the “last good” change that was saved, or ever compare your current code with the previously working version to identify an issue.
- Collaboration: a VCS provides the necessary capabilities for multiple people to work on the same code simultaneously, while keeping their changes isolated. You can create branches where your changes are separate from other ongoing changes, and the VCS can help you merge changes together once they’re tested.
Git binaries can be installed from the Git home page or through a package manager (e.g. Homebrew on Mac). Although there are graphical clients that you can install, Git is primarily a command-line tool. Commands are of the form:
You’ll also want to make sure that the
git executable (
git.exe) is in your path.
Version control is modeled around the concept of a changeset: a grouping of files that together represent a change to the system (e.g. a feature that you’ve implemented may impact multple source files). A VCS is designed to track changes to sets of files.
Git is designed around these core concepts:
- Repository: The location of the canonical version of your source code.
- Working Directory: A copy of your repository, where you will make your changes before saving them in the repository.
- Staging Area: A logical collection of changes from the working directory that you want to collect and work on together (e.g. it might be a feature that resulted in changes to multiple files that you want to save as a single change).
A repository can be local or remote:
- A local repository is where you might store projects that you don’t need to share with anyone else (e.g. these notes are in a local git repository on my computer).
- A remote repository is setup on a central server, where multiple users can access it (e.g. GitLab, GitHub effectively do this, by offering free hosting for remote repositories).
Git works by operating on a set of files (aka changeset): we
git add files in the working directory to add them to the change set; we
git commit to save the changeset to the local repository. We use
git push and
git pull to keep the local and remote repositories synchronized.
To create a local repository that will not need to be shared:
- Create a repository. Create a directory, and then use the
git initcommand to initialize it. This will create a hidden
.gitdirectory (where Git stores information about the repository).
$ mkdir repo $ cd repo $ git init Initialized empty Git repository in ./repo/.git/ $ ls -a . .. .git
- Make any changes that you want to your repository. You can add or remove files, or make change to existing files.
$ vim file1.txt ls -a . .. .git file1.txt
- Stage the changes that you want to keep. Use the
git addcommand to indicate which files or changes you wish to keep. This adds them to the “staging area”.
git statuswill show you what changes you have pending.
$ git add file1.txt $ git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: file1.txt
- Commit your staging area.
git commitassigns a version number to these changes, and stores them in your local repository as a single changeset. The
-margument lets you specify a commit message. If you don’t provide one here, your editor will open so that you can type in a commit message. Commit messages are mandatory, and should describe the purpose of this change.
$ git commit -m "Added a new file"Remote Workflow
A remote workflow is almost the same, except that you start by making a local copy of a repository from a remote system.
- Clone a remote repository. This creates a new local repository which is a copy of a remote repository. It also establishes a link between them so that you can manually push new changes to the remote repo, or pull new changes that someone else has placed there.
# create a copy of the CS 346 public repository $ git clone https://git.uwaterloo.ca/j2avery/cs346.git ./cs346
Making changes and saving/committing them is the same as the local workflow (above).
- Push to a remote repository to save any local changes to the remote system.
$ git push
- Pull from remote repository to get a copy of any changes that someone else may have saved remotely since you last checked.
$ git pull
- Status will show you the status of your repository; log will show you a history of changes.
# status when local and remote repositories are in sync $ git status On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean # condensed history of a sample repository $ git log --oneline b750c10 (HEAD -> master, origin/master, origin/HEAD) Update readme.md fcc065c Deleted unused jar file d12a838 Added readme 5106558 Added gitignore
The biggest challenge when working with multiple people on the same code is that you all may want to make changes to the code at the same time. Git is designed to simplify this process.
Git uses branches to isolate changes from one another. You think of your source code as a tree, with one main trunk. By default, everyone in git is working from the “trunk”, typically named
main (you can see this when we used
git status above).
A branch is a fork in the tree, where we “split off” work and diverge from one of the commits (typically we split from a point where everything is working as expected)! Once we have our feature implemented and tested, we can merge our changes back into the
Notice that there is nothing preventing multiple users from doing this. Because we only merge changes back into
master when they’re tested, the trunk should be relatively stable code2.
We have a lot of branching commands:
$ git status // see the current branch On branch master $ git branch test // create a branch named test Created branch test $ git checkout test // switch to it Switched to a new branch 'test' $ git checkout master //switch back to master Switched to branch 'master' $ git branch -d test // delete branch Deleted branch test (was 09e1947).
When you branch, you inherit changes from your starting branch. Any change that you make on that branch are isolated until you choose to merge them.
A typical workflow for adding a feature would be:
- Create a feature branch for that feature.
- Make changed on your branch only. Test everything.
- Code review it with the team.
- Switch back to
git mergefrom your feature branch to the master branch. If there are no conflicts with other change on the
masterbranch, your changes will be automatically merged by git. If your changed conflict (e.g multiple people changed the same file and are trying to merge all changed) then git may ask you to manually merge them.
$ git checkout -b test // create branch Switched to a new branch 'test' $ vim file1.md // make some changes $ git add file1.md $ git commit -m "Committing changed to file1.md" $ git checkout master // switch to master $ git merge test // merge changes from test Updating 09e1947..ebb5838 Fast-forward file1.md | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+), 18 deletions(-) $ git branch -d test // remove branch (optional) Deleted branch test (was ebb5838).
This is a trivial case, demonstrating a merge that happens very soon after the branch was created. However, it’s more likely that branches will be created and worked on for a long period of time before you merge back to main.
When you merge, Git examines your copy of each file, and attempts to apply any other changes that may have been committed to main since you created the branch. (If there’s multiple people working on the project, it’s not unusual for multiple changes to be made to the same file). In many cases, as long as there are no conflicts, Git will merge the changes together. However, if Git is unable to do so (e.g. you and a colleague both changed the same file and your changes overlap), then you will be prompted to manually merge the changes together.
When this happens, Git will apply both changes to the source file, and add inline comments. You have to manually fix the file, and then commit the change before attempting to merge again.
Pull Requests (PRs)
One way to avoid merge issues is to review changes before they are merged into main (this also lets you review the code, manually run tests etc). The standard mechanism for this is a Pull Request (PR). A PR is simply a request to another developer (possibly the person responsible for maintaining the main branch) to
git pull your feature branch and review it before merging.
We will not force PRs in this course, but you might find them useful within your team.
GitLab also calls these Merge Requests.
These are suggestions for working with Git effectively.
- Work iteratively. Learn to solve a problem in small steps: define the interface, write tests against that interface, and get the smallest functionality tested and working.
- Commit often! Once you have something work (even partly working) commit it! This gives you the freedom to experiment and always revert back to a known-good version.
- Branch as needed. Think of a branch as an eﬀicient way to go down an alternate path with your code. Need to make a major change and not sure how it will work out? Branch and work on it without impacting your main branch.
- Store your projects in private, online repositories. Keep them private so that you don’t share them unless it’s appropriate. Being online provides a remote backup and makes it easy to add someone to your project later.