Lecture 4: Git, GitHub, and Version Control

BAA1028 - Workflow & Data Management

Damien Dupré

For Today

1. Look back at Lecture 3 with new slides about Absolute vs. Relative paths.

2. Introduction to Command-Line Interface

3. Introduction to Git


Any questions?

Demonstration

Here is my own workflow with Git, GitHub, R, and Rstudio!

Command-Line Interface

What is a command-line interface (CLI)?

A command-line interface (CLI) processes commands to a computer program in the form of lines of text. The program which handles the interface is called a command-line interpreter or command-line processor.

Operating systems implement a command-line interface in a shell for interactive access to operating system functions or services. Such access was primarily provided to users by computer terminals starting in the mid-1960s, and continued to be used throughout the 1970s and 1980s on VAX/VMS, Unix systems and personal computer systems including DOS, CP/M and Apple DOS.

See https://en.wikipedia.org/wiki/Command-line_interface

What is a command-line interface (CLI)?

Command-Line Interface

  • Text-based commands.
  • Efficient for automation.
  • Preferred by tech experts.

Graphical User Interface

  • Visual elements (icons, windows).
  • User-friendly.
  • Immediate visual feedback.
  • Common for everyday tasks.

Evolution of Computer Interfaces

  • 1969: UNIX introduces Command Line Interface (CLI)
  • 1980: MS-DOS popularizes Command Line Interface (CLI)
  • 1984: Apple Macintosh brings GUI to mainstream
  • 1985: Microsoft introduces Windows 1.0, a graphical user interface (GUI).
  • 1991: Linux introduces CLI to open source
  • 1995: Windows 95 merges GUI and CLI
  • 2001: Mac OS X merges Unix and GUI
  • 2004: Ubuntu simplifies Linux CLI
  • 2020s: CLI continues its indispensable role, supporting automation, scripting, and advanced tasks.

And what is a shell?

In computing, a shell is a computer program which exposes an operating system’s services to a human user or other program. In general, operating system shells use either a command-line interface (CLI) or graphical user interface (GUI), depending on a computer’s role and particular operation. It is named a shell because it is the outermost layer around the operating system.

See https://en.wikipedia.org/wiki/Shell_(computing)

Terminal vs Shell

Vocabulary:

  • Terminal: A program that runs a shell
  • Shell: A program that interprets commands
  • Bash: The most common shell on Linux and macOS
  • PowerShell: The most common shell on Windows
  • Cmd: The legacy shell on Windows

My CLI of choice is the Bash

  • Linux/macOS user already have access to the Bash
  • MS Windows user can use the “Git Bash” after installing Git

Windows Terminal

A new terminal application for Windows 10

Okay, but why?!

  • Fast and efficient way to interact with your computer

  • Important part of your automation toolbox to create a reproducible data analysis pipeline

  • Accessing a remote server almost always requires some sort of command line skills

I also recommend “Top ten reasons to learn to use the command line: Expanding your reproducibility tools”

Important Commands

Description Win Linux, macOS (Bash)
Copy files, folders copy cp
Move files, folders move mv
List folder content dir ls
Create new folder mkdir mkdir
Change current folder cd cd
Show current path. echo %cd% pwd
Locate a software where which
Danger zone No undo!
Delete file(s) del rm
Delete folder(s) rmdir. rm

Important Commands

Almost all of these command can have several arguments, e.g., ls -la

They also have subcommands, e.g., git

  • git init
  • git add
  • git commit
  • git push

Each subcommand has its own set of options and arguments.

Git

Automated Version Control (AVC)

Automated Version Control (AVC)

  • A system that automatically manages changes to files, typically in the context of software development.
  • Keeps track of every modification to the code in a special kind of database.
  • If a mistake is made, developers can turn back the clock and compare earlier versions of the code to help fix the mistake while minimising disruption to all team members.

AVC: What should I care?

AVC: What should I care?

Automated Version Control

  • Backup and restore: Changes are stored securely and can be restored at any point.
  • Collaboration: Multiple people can work on the same project at the same time.
  • Track changes: You can see who last modified something that might be causing a problem, who introduced an issue, when it was introduced, and more.
  • Explore alternatives: Safely experiment with new ideas in a branch, without affecting the main project.

Introduction to Git and GitHub

  • Git: A version control system that lets you manage and keep track of your source code history.
  • GitHub: A cloud-based hosting service that lets you manage Git repositories.
  • Benefits:
    • Track changes in your code across versions.
    • Collaborate with others on projects.
    • Backup your work on the cloud.

What is Git ?

It’s a version control system, which works like a time machine. In this time machine, it has check points which is known as Commits and it’s unique for every check point.

It has a super cool feature like multiverse aka Branching, that lets us create an alternate version of our code. A branch is a copy of project and it’s super useful when working in groups, allowing us the capability of working on with or without changing the original code. We can also synchronize the changes that happened across different branches and that’s called Merging.

GitHub is nothing without Git

What is a Git repository?

  • A place where you can store your code, your files, and each file’s revision history.
  • Contains a .git folder at the root which does all the git magic behind the scenes.

Check Git

To check if Git is already installed, run the following in the Terminal / Command Prompt:

Linus/MacOS

which git

If asked to install the Xcode command line tools, say yes! Right click to copy on a Terminal line

Windows

where git

Check the version of your Git if you have one

git --version

Downloading and Installing Git

If Git is not installed:

If the Windows installer hangs with the progress bar at 100%:

  • Close the installer with Task Manager
  • Press Ctrl + Alt + Delete;
  • Select Task Manager;
  • Find Git for Windows installer and close.

Setting up Git

The first step is to set up some configuration variables which helps git to keep track of contributor and contributors contribution.

To set up the basic changes we will run git config command.

# git config command
git config --global user.name "Your Name"
git config --global user.email "Your Email"

Exercise 1: Setting up Git

In your terminal, type your GitHub Name and Email:

git config --global user.name "Your Name"
git config --global user.email "Your Email"

example

git config --global user.name "damien-dupre"
git config --global user.email "damien.dupre@dcu.ie"

Warning, if done correctly there is no output to these commandes

05:00

Setting up Git

The global option makes sure that every project in that local machine will use that name and email address.

To see all the global configuration we can run the below command:

# displaying git global configuration command
git config --list

To see where the settings file are defined, we can run bellow command:

# displaying setting files location command
git config --list --show-origin

Setting up Git

Once we set up the global variables, we will initialize the project. To do so, we have to go to that directory from the command line.

# changing directory command
cd folder/location

Once we are in the desired directory, we will initialize our project using git init or git clone

# Initialise a New Repo
git init
# Cloning an Existing Repo
git clone https://github.com/username/repository.git

Setting up Git

It will create a .git file in the same directory and by default it’s hidden. To see that, we need to type ls -la command on terminal.

git init or git clone commands initialize the project for us and make files that are necessary for keeping track of our changes. We can change the directory to see what files have been created by the git init or git clone commands.

cd .git
ls -la

It will display that git has created a several files to keep the tracks of our changes.

Exercise 2: Clone Repository

  1. Go the the GitHub repository of your website,
  2. Find the git clone url
  3. Be sure that your terminal path has moved to your “project” folder
  4. Clone this repository from your terminal
cd ~ # brings you back to the root/home of your computer
cd project # moves the terminal to the "project" folder
git clone https://github.com/username/repository.git # copy the website repository/folder in the "project" folder
05:00

Git Add

After initializing the git, we need to add the files in the staging environment with git add command. Staging is a temporary area that we can store files that we want to commit later.

# Basic command for adding Files
git add filename.extension # it adds single file at a time and it's a good
practices
# Alternative Options
git add --all # it adds all the file in the directory to the staging environment
git add -A # does the same thing as --all but it's a shortcut
git add . # same as --all and -A

Git Commit

  • You can think of a commit as a snapshot of your work at a particular time
  • You can navigate between commits easily with git
  • This allows you to switch easily between different versions of your work
  • When you commit, rather than saving all the files in a project every time, git is efficient and only stores the files which have been changed between the previous commit and your current one
  • The commit also stores a reference to its parent commit

Git Commit

This means that Git has four main states that your files can be in:

  • Untracked: You’ve created a new file and not told git to keep track of it.
  • Modified: You’ve changed a file that git already has a record of, but have not told git to include these changes in your next commit. We say these files are in the working tree.
  • Staged: You’ve told git to include the file next time you do a commit. We say these files are in the staging area.
  • Committed: The file is saved in it’s present state in the most recent commit.

Git Commit

Git Commit

Last step is to commit the changes using the git commit command.

# basic commit command
git commit -m "Commit Message"
# commit command with description along with commit Message
git commit -m "Commit Message" -m "Commit Description"

git commit message should be short and precise. What are the changes being made, why are the changes and the functionalities it will add. git commit message should be like email subject.

Exercise 3: First Command Line Commit

  1. Open the index.html file
  2. Modify 1 word in the text diplayed in the website (title or content)
  3. Perform your first commit with the following code:
git add .
git commit -m "My first command line commit"
05:00

Working with remote repositories

  • git clone: creates a copy of the codebase on your local machine.
  • git push: pushes changes back to the remote repository.
  • git pull: pulls changes from the remote repository.

Working with remote repositories

GitHub

  • Git repository hosting service
  • Collaborate with others on codebase
  • Pull requests for code review and merging changes
  • Issue tracking and project management tools
  • GitHub Pages for hosting websites

Git and GitHub

Once your changes have been commited, it is time to send them to GitHub:

git push origin main

Be sure that your git project folder is connected to GitHub repository:

git remote add origin https://github.com/username/repository.git

Exercise 4: First Push to GitHub

Just type:

git push

And observe the results on your GitHub repository

05:00

Git and GitHub

If you want to update your project with the version hosted on github:

git pull origin master

This command pulls changes from the remote repository and merges them into your local branch.

This is useful when you are collaborating on a repository and working on different branches.

Git Information

To verify git is keeping the track of our work we can use git log command.

# verifying command to check if git is keeping track 
git log

It will show the entries that has been made in the .git folder. It shows the commit hash, author name, date with timestamp, and commit message.

Git Information

git log command shows us some information about the commits that has been made. I will use the output of our first commit.

git log
# Below is the output
commit 1b12ec3c6bc7dfc48fa33c3ae655564a02700162 (HEAD -> main) # commit hash, which is unique for every commit
Author: Shuvo Barman <user@users.noreply.github.com> # Name of the author and email
Date:   Sun Dec 17 16:32:29 2023 -0330 # detailed time of the commit

    Initial commit # commit message

the HEAD always point to the current branch in this case it’s main

Git Environments

We can see this with git status viewing command.

# Viewing status command
git status

If a file goes into Modified Stage, we have two option for that file. One is adding that file using git add command or we can restore it using git restore command.

# restoring file command
git restore FILENAME
# Alternative options
git restore . # restore all the files into it's previous state
git checkout . # older version of restore command

Ignoring Files

Even if git is used to keep track of files but sometimes we don’t want to keep track of all the files.

There could be several reasons behind it. Such as:

  • Sensitive information(i.e. Password, Authentication tokens, API keys etc.)
  • Personal notes(i.e. To-Do list for project)
  • System files
  • Large files

we can achieve this by creating a new file called .gitignore. Inside this file we can add any types of pattern, file name, folder name etc. which will git ignore. Git doesn’t track empty folders.

Ignoring Files

we can also use global ignore file to ignoring files or pattern that we don’t want git to upload.

# global git ignore command
git config --global core.excludesfile [file]

Clearing the cahce using below command.

# cache clearing command
git rm -r --cached .

Deleting

It can be done in two ways, from command line and another is doing manually from the IDE interface. we use git rm command to delete file.

# delete file command
git rm FILENAME
# force fully delete file 
git rm -f FILENAME

If we delete the file using git rm command, it deletes the file and moved that deletion automatically into staging. But if we do it manually from IDE interface we have do move that file from staging.

Deleting

After deleting a certain file using git rm command, if we want to restore the file we need to execute below command

# restoring deleted file command
git restore --staged FILENAME
# git restore -S FILENAME

after executing the above command we have to execute below command as well.

git restore FILENAME

Renaming

This is a tricky part if we do it from the IDE interface manually. It will track two action: first deletion of certain file and second creating a new file. if we want to restore the manually renamed file using git restore command, git will keep both the files.

To do it from the command line we have to use git mv command

# file renaming command
git mv OLD_FILE_NAME NEW_FILE_NAME

To restore the renamed file to it’s previous name, we can use the same command git mv with just changing the order of file names.

Branches

to see the brances we can use git branch command. It will show all the available branc in our project.

# checking git branch command
git branch

To create a new branch we will have to take a copy of a snapshot from another branch and start working from there. To do so,

# copying a branch command
git switch -c NAME
# Alternative Option
git checkout -b NAME # older version of git switch

Branches

git merge command will merge the changes from one branch into the current branch.

# merge command syntax
git merge <branch>

When we merge a feature into main branch, it’s a good idea to delete that branch. To do that, we use below command

# git branch delete command syntax
git branch --delete NAME
# Alternative Option
git branch -d NAME # we can use this as long as branches have no conflicts
git branch -D NAME # ignore any conflicts, forcefully delete the branch

These sequence of doing things is also called git flow.

Best practices

When working alone:

  • Commit often
  • Use descriptive commit messages
  • Review code regularly
  • Use .gitignore to exclude files
  • Don’t commit data (only very small test data)
  • Don’t commit passwords

When working with a team:

  • Keep pull requests small and focused
  • Use “issues” to track work

References

Huge thanks the following people who have generated and shared most of the content of this lecture:


Thanks for your attention and don’t hesitate to ask if you have any questions!
@damien_dupre
@damien-dupre
https://damien-dupre.github.io
damien.dupre@dcu.ie