05:00
BAA1028 - Workflow & Data Management
Here is my own workflow with Git, GitHub, R, and Rstudio!
A command-line interface (CLI) processes commands to a computer program in the form of lines of text. The program which handles the interface is called a command-line interpreter or command-line processor.
Operating systems implement a command-line interface in a shell for interactive access to operating system functions or services. Such access was primarily provided to users by computer terminals starting in the mid-1960s, and continued to be used throughout the 1970s and 1980s on VAX/VMS, Unix systems and personal computer systems including DOS, CP/M and Apple DOS.
In computing, a shell is a computer program which exposes an operating system’s services to a human user or other program. In general, operating system shells use either a command-line interface (CLI) or graphical user interface (GUI), depending on a computer’s role and particular operation. It is named a shell because it is the outermost layer around the operating system.
Vocabulary:
My CLI of choice is the Bash
A new terminal application for Windows 10
Fast and efficient way to interact with your computer
Important part of your automation toolbox to create a reproducible data analysis pipeline
Accessing a remote server almost always requires some sort of command line skills
Description | Win | Linux, macOS (Bash) |
---|---|---|
Copy files, folders | copy |
cp |
Move files, folders | move |
mv |
List folder content | dir |
ls |
Create new folder | mkdir |
mkdir |
Change current folder | cd |
cd |
Show current path. | echo %cd% |
pwd |
Locate a software | where |
which |
Danger zone No undo! | ||
Delete file(s) | del |
rm |
Delete folder(s) | rmdir . |
rm |
Almost all of these command can have several arguments, e.g., ls -la
They also have subcommands, e.g., git
git init
git add
git commit
git push
Each subcommand has its own set of options and arguments.
It’s a version control system, which works like a time machine. In this time machine, it has check points which is known as Commits and it’s unique for every check point.
It has a super cool feature like multiverse aka Branching, that lets us create an alternate version of our code. A branch is a copy of project and it’s super useful when working in groups, allowing us the capability of working on with or without changing the original code. We can also synchronize the changes that happened across different branches and that’s called Merging.
.git
folder at the root which does all the git magic behind the scenes.To check if Git is already installed, run the following in the Terminal / Command Prompt:
Check the version of your Git if you have one
If Git is not installed:
If the Windows installer hangs with the progress bar at 100%:
The first step is to set up some configuration variables which helps git to keep track of contributor and contributors contribution.
To set up the basic changes we will run git config
command.
In your terminal, type your GitHub Name and Email:
example
Warning, if done correctly there is no output to these commandes
05:00
The global option makes sure that every project in that local machine will use that name and email address.
To see all the global configuration we can run the below command:
To see where the settings file are defined, we can run bellow command:
Once we set up the global variables, we will initialize the project. To do so, we have to go to that directory from the command line.
Once we are in the desired directory, we will initialize our project using git init
or git clone
It will create a .git
file in the same directory and by default it’s hidden. To see that, we need to type ls -la
command on terminal.
git init
or git clone
commands initialize the project for us and make files that are necessary for keeping track of our changes. We can change the directory to see what files have been created by the git init
or git clone
commands.
It will display that git has created a several files to keep the tracks of our changes.
cd ~ # brings you back to the root/home of your computer
cd project # moves the terminal to the "project" folder
git clone https://github.com/username/repository.git # copy the website repository/folder in the "project" folder
05:00
After initializing the git, we need to add the files in the staging environment with git add
command. Staging is a temporary area that we can store files that we want to commit later.
# Basic command for adding Files
git add filename.extension # it adds single file at a time and it's a good
practices
# Alternative Options
git add --all # it adds all the file in the directory to the staging environment
git add -A # does the same thing as --all but it's a shortcut
git add . # same as --all and -A
This means that Git has four main states that your files can be in:
Last step is to commit the changes using the git commit
command.
# basic commit command
git commit -m "Commit Message"
# commit command with description along with commit Message
git commit -m "Commit Message" -m "Commit Description"
git commit message should be short and precise. What are the changes being made, why are the changes and the functionalities it will add. git commit message should be like email subject.
05:00
git clone
: creates a copy of the codebase on your local machine.git push
: pushes changes back to the remote repository.git pull
: pulls changes from the remote repository.Once your changes have been commited, it is time to send them to GitHub:
Be sure that your git project folder is connected to GitHub repository:
Just type:
And observe the results on your GitHub repository
05:00
If you want to update your project with the version hosted on github:
This command pulls changes from the remote repository and merges them into your local branch.
This is useful when you are collaborating on a repository and working on different branches.
To verify git is keeping the track of our work we can use git log
command.
It will show the entries that has been made in the .git
folder. It shows the commit hash, author name, date with timestamp, and commit message.
git log
command shows us some information about the commits that has been made. I will use the output of our first commit.
git log
# Below is the output
commit 1b12ec3c6bc7dfc48fa33c3ae655564a02700162 (HEAD -> main) # commit hash, which is unique for every commit
Author: Shuvo Barman <user@users.noreply.github.com> # Name of the author and email
Date: Sun Dec 17 16:32:29 2023 -0330 # detailed time of the commit
Initial commit # commit message
the HEAD always point to the current branch in this case it’s main
We can see this with git status
viewing command.
If a file goes into Modified Stage, we have two option for that file. One is adding that file using git add
command or we can restore it using git restore
command.
Even if git is used to keep track of files but sometimes we don’t want to keep track of all the files.
There could be several reasons behind it. Such as:
we can achieve this by creating a new file called .gitignore
. Inside this file we can add any types of pattern, file name, folder name etc. which will git ignore. Git doesn’t track empty folders.
we can also use global ignore file to ignoring files or pattern that we don’t want git to upload.
Clearing the cahce using below command.
It can be done in two ways, from command line and another is doing manually from the IDE interface. we use git rm
command to delete file.
If we delete the file using git rm
command, it deletes the file and moved that deletion automatically into staging. But if we do it manually from IDE interface we have do move that file from staging.
After deleting a certain file using git rm
command, if we want to restore the file we need to execute below command
after executing the above command we have to execute below command as well.
This is a tricky part if we do it from the IDE interface manually. It will track two action: first deletion of certain file and second creating a new file. if we want to restore the manually renamed file using git restore
command, git will keep both the files.
To do it from the command line we have to use git mv
command
To restore the renamed file to it’s previous name, we can use the same command git mv
with just changing the order of file names.
to see the brances we can use git branch
command. It will show all the available branc in our project.
To create a new branch we will have to take a copy of a snapshot from another branch and start working from there. To do so,
git merge
command will merge the changes from one branch into the current branch.
When we merge a feature into main branch, it’s a good idea to delete that branch. To do that, we use below command
# git branch delete command syntax
git branch --delete NAME
# Alternative Option
git branch -d NAME # we can use this as long as branches have no conflicts
git branch -D NAME # ignore any conflicts, forcefully delete the branch
These sequence of doing things is also called git flow.
When working alone:
When working with a team:
Huge thanks the following people who have generated and shared most of the content of this lecture:
Zoë Turner: Introduction to Git and GitHub Session - Prework for a computer
Miguel Xochicale: Introduction to git and GitHub
James Emberton, Amy Pike and Marion Weinzierl: Git for beginners
Joe Wallwork and Tom Meltzer: Intermediate Git and GitHub
Thanks for your attention and don’t hesitate to ask if you have any questions!
@damien_dupre
@damien-dupre
https://damien-dupre.github.io
damien.dupre@dcu.ie