Lecture 3: Publishing with GitHub Pages

BAA1028 - Workflow & Data Management

Damien Dupré

For Today

Requirements

1. Find these slides at:

https://damien-dupre.github.io/BAA1028/lecture_3

2. Homework Exercise done


Objectives

1. Host your ePortfolio on GitHub

2. Publish your ePortfolio with GitHub pages


Any questions?

Then, it’s time to enter …

Host your Website on GitHub

GitHub

GitHub has a lot of different functions. For now, we will only see how to use it to published the html document output from our notebook file.

What is GitHub

Primary used to collaborate on code development, it became multi-purpose:

  • Version Control
  • File and Code Storage
  • Collaboration Projects
  • Social Media for Developers
  • Online Publication & Website Host
  • Automatic Actions

And even more that I am not aware of!

What is GitHub

Primary used to collaborate on code development, it became multi-purpose:

  • Version Control
  • File and Code Storage
  • Collaboration
  • Developers Social Media
  • Online Publication & Website Host
  • Automatic Actions

And even more that I am not aware of!

Exercise 1: Sign In or Sign Up

  1. Go to https://github.com,
  2. Click Sign In or Sign Up (if you have already done it),
  3. If you are creating an account, fill all requested information.

Note: Your user name will become extremely important in your future, firstname-name is usually good

02:00

Welcome to GitHub

How does GitHub Work?

The core principle of GitHub is a remote desktop (or profile) with a folder called Repository for each project you are working on or you worked on (also called “Repo” if you want to use the slang).

Exercise 2: Your First Repository

Follow the steps here after to create a Repository:

  1. In the upper-right corner of any page, use the + drop-down menu, and select New repository.

Exercise 2: Your First Repository

Exercise 2: Your First Repository

  1. Type a short, memorable name for your repository, like hello-world.
  1. Optionally, add a description of your repository. For example, My first repository on GitHub.

Exercise 2: Your First Repository

  1. Choose a PUBLIC repository visibility. For more information, see about repositories,

  2. Tick ✅ Add a README file,

  3. Click Create repository.

Congratulations! You’ve successfully created your first repository, and initialized it with a README file.

Always Commit Changes

In GitHub, a commit is a saved change to a project’s source code or other files. When you make changes to a file in a GitHub repository, you create a new version of that file.

A commit contains a snapshot of the changes you’ve made to one or more files, along with a message that describes the changes. This message should be descriptive and clear, so that other developers can understand what changes you’ve made and why.

Everytime you want to take into account a change in your repository, you need to commit these changes

Exercise 3: Your First Commit

When you created your new repository, you initialized it with a README file. README files are a great place to describe your project in more detail, or add some documentation such as how to install or use your project. The contents of your README file are automatically shown on the front page of your repository.

Follow the steps here after to commit a change to the README file.

  1. In your repository’s list of files, click README.md.

Exercise 3: Your First Commit

  1. In the upper right corner of the file view, click on the pen icon to open the file editor ✏️,
  1. In the text box, type some information about the project.

Exercise 3: Your First Commit

  1. Above the new content, click Preview to review the changes you made to the file.
  1. Click Commit changes….

Exercise 3: Your First Commit

  1. In the “Commit message” field, type a short, meaningful commit message that describes the change you made to the file,
  1. Below the commit message fields, decide whether to add your commit to the current branch or to a new branch. Select commit directly to the main branch for now.

  2. Click Commit changes.

⚠️ Warning: For collaborative projects never commit to the main branch

Exercise 4: Add New Files to your Repository

  1. In your Repository Page in GitHub, Click Add files then on Upload files,

  2. Drop or choose all the files necessary to your transformed template website,

  3. In the main box and commit your changes

02:00

GitHub Pages

GitHub Pages is a web hosting service offered by GitHub that allows you to host static websites directly from a GitHub repository. This means you can use GitHub to store and version control your website’s code, and then host it for free using GitHub Pages.

Your website will then be published at a URL based on your GitHub username and repository name (e.g., username.github.io/repository).

Exercise 5: Turn on GitHub Page

Turn on GitHub Pages for your project repository:

  1. Go to Settings and find Pages on the left pane,

  2. In Branch, instead of None select Main and click Save,

  3. Click on ActionsActions and wait that “pages build and deployment” finishes,

  4. When it’s done, go to https://username.github.io/repository/nameofyourfile.html.

03:00

Keep your Projects Tidy

Project Management

Why do we care about project management?


Portability

The ability to move the project without breaking code or needing adapting

  • you will change computers
  • you will reorganise your file structure
  • you will share your code with others

Reproducibility

The ability to rerun the entire process from scratch

  • not just for reviews
  • not just for best-practice analytics
  • also for future (or even present) you
  • and for your collaborators/helpers

Project Workflows

Portability

  • All necessary files should be contained in the project and referenced relatively
  • All necessary outputs are created by code in the project and stored in the project

Reproducibility

  • All code can be run in fresh sessions and produce the same output
  • Does not force other users to alter their own work setup

Portability

In your code, do not use:

import os
os.chdir('/path/to/your/directory')

Prefer:

#| eval: false
# pip
python -m pip install pyprojroot

# conda
conda install -c conda-forge pyprojroot

Then:

from pyprojroot.here import here

here()

Portability

What’s wrong with os.chdir('/path/to/your/directory')?

  • It will only ever work for the user creating the file

  • It is not portable

    • Moving the folder/file will break the code
    • Collaborators will need to change any setwd path
  • Increases likelihood that work from other processes leaks into current work

Portability

The pyprojroot library:

If all files are contained in the project folder reference files with the here() function from the pyprojroot library creates relative paths from project root allows several ways to indicate project root folder

Self-Contained Projects

Contains all necessary files for your project, eportfolio or any repository in general:

  • data
  • results
  • documentation
  • scripts
  • images
  • designs (css/sass)
  • tabs/topics

Folder/File structure

data

  • all raw data files, organised in meaningful ways
  • never, ever write back to this folder, read only
  • if using git, never commit to history, place in .gitignore

results

  • write all analysis etc. results to treat as disposable, can be overwritten
  • may also include figures etc if wanted

docs

  • documentation
  • Quarto files

src or py

  • if you write functions that are used in several places
  • this is the standard python folder for keeping these files that might be called in python scripts

scripts/analysis

  • files with full analysis pipelines
  • might have source calls to files in python

Folder/File structure

README.md

  • markdown file describing the project content and intent
  • maybe also explains which files to look in for what
  • ideal to have if saving the folder to github

LICENCE

  • dictates how code can be reused
  • not covering that in this series, ask me at need

File Naming

Organising files in data/, results/, docs/, and scripts/ require some ideas of how to name files for:

  • easy machine reading
  • easy human reading
  • easy understanding of file content
  • choosing the correct type of file to store

If you are using the py/ folder to store python-functions, these might need somewhat different naming conventions than the other folders, as these are functions you can use across the other files.

Here, naming should be particularly thought in terms of content rather than structural organisation.

File Naming

An important part of project management, code automation, and data analytics in general is to have your files read by a piece of code or software.

Machines are clever, but extremely pedantic.

Be consistent, be meticulous.

Some machines are more clever than others, so name files in a way that the “dumbest” of them can deal with.

File Naming

  • Don’t use white space
    • decide on a separator and use consistently
    • recommend the dash -
  • Use small case letters
    • certain machines care about capitalization
  • Use numbers smartly
    • numbers are awesome to use and can help organise files meaningfully
    • but needs some thinking about before implementing

File Naming

Naming - variables and filenames should have meaningful names in snake_case format, preferring all lower case.

File Naming

Machines will first list files starting with numbers (ascdendingly) then in alphabetic order.

1_file.txt
2_file.txt
file_one.txt
file_three.txt
file_two.txt

But they wont understand the difference between 1 and 10

10_file.txt
1_file.txt
2_file.txt
file_1.txt
file_10.txt
file_2.txt

‘zero-padding’ is a way of preserving file order

01_file.txt
02_file.txt
10_file.txt
file_01.txt
file_02.txt
file_10.txt

File Naming

Using dates in file names may also ensure decent organisation but be consistent. Recommend using YYYY-MM-DD formatting

13-11-21_initial-submission-results.txt
22-01-03_revised-results.txt
2022-02-28_results.txt

vs

2021-11-13_initial-submission-results.txt
2022-02-28_results.txt
2022-03-01_revised-results.txt

File Naming

Consider using different space separators for different parts of the file name

This way you can use the file name it self, programatically, if needed

2021-11-13_initial_submission_results.txt
2022-02-28_results.txt
2022-03-01_revised_results.txt

File Naming

Optimising file names for computers is great, but ultimately its us humans that need to choose files to work with. Naming files in a way that makes the file content obvious (or at least give an idea of content) by the file name is good for such interactions.

2021-11-13_final-results.txt
2022-02-28_finalfinal-results.txt
2022-03-01_finished-results.txt

vs.

2021-11-13_first-submission-results.txt
2022-02-28_revision-round1-results.txt
2022-03-01_revision-round2-results.txt
2022-03-01_revision-round2-no-sex-results.txt

Image Types

Images from plots should use png or svg

  • .png supports transparency and has no quality loss upon re-saving

  • .svg can rescale to infinity without getting grainy

  • .jpg best for photos, quality loss on rescale, blurry edges and poor text rendering

Images can also some times be saved in pdf, but pdf while a vector format, cannot support transparency.

Tiff has fallen out of favour due to high file sizes, but are preferable to jpeg for photos.

What is a path?

A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory.

The delimiting character is most commonly the slash (“/”), the backslash character (“\”), or colon (“:”), though some operating systems may use a different delimiter.

Resources can be represented by either absolute or relative paths

See https://en.wikipedia.org/wiki/Path_(computing)

Delimiting characters

Delimiting characters / or \ vary by operating system

  • MS Windows:
    • Use a colon : to specify the drive name (e.g., c:, d:, e:)
    • Folders and files are seperated by a backslash character (\)
    • Example: J:\Work\PARI\PARI-F\data\pari-f_data_v0-1.dta
  • Linux/macOS:
    • No colon (:)
    • Use only the slash (/) character
    • Example: /E/syncwork/projects/confer/ps2021-10-ws-repro-research

Delimiting characters

  • Software, such as R, \(\LaTeX\), and Python, with a Linux/UNIX background behaves differently under MS Windows when it comes to specifying file/folder paths

  • That is, when specifying a path (e.g., in R or \(\LaTeX\)) in MS Windows, these programs do not like the backslash character (\) (the backslash is used for “escaping” other characters)

Delimiting characters

Two solutions in MS Windows:

  1. Use the slash character / instead of \, e.g.:
import pandas
csv_file = pandas.read_csv('C:/Damien/myfolder/mysubfolder/mydata.csv')
  1. Escape the backslash character via \\, e.g.:
import pandas
csv_file = pandas.read_csv('C:\\Damien\\myfolder\\mysubfolder\\mydata.csv')

Delimiting characters

Note: Many programming languages/statistical packages (R, Python, …) can dynamically create a full path that follows the rules of the respective operating system

import os
os.path.join("e:", "folder1", "folder2", "file")

returns: 'e:folder1\\folder2\\file' (in Windows)

A handy tool when working on both operating system: Path Copy Copy – Copy file paths from Windows explorer’s contextual menu

Mac users can left-click and press option to “copy as a Pathname”

Relative vs. Absolute Paths

An absolute path specifies a file or directory location from the root directory.

Examples:

  • Mac/Linux:
/Users/username/Documents/file.txt
  • Windows:
C:\Users\Username\Documents\file.txt

Relative vs. Absolute Paths

A relative path specifies a location relative to the current directory which is a “fixed location” on your computer

Often, this “fixed location” is the so-called “working directory”

  • The dot . denotes the current working directory
  • The dot dot .. denotes the parent directory, i.e., it points upwards in the folder hierarchy
  • Finally, the tilde symbol ~ will bring you back to your home directory, e.g. cd ~

Examples:

subfolder/file.txt  # Inside a subfolder
./file.txt       # Current directory
../file.txt      # Parent directory

Works differently based on where the command is run.

Relative vs. Absolute Paths

So, let’s assume the project “PARI-F” is located on drive J:, the full absolute path is J:\Work\PARI\PARI-F

  • The content of the project’s folder PARI-F is:
.
|-- analysis
|-- data
|-- doc
|-- pari-f.stpr
`-- report

Relative vs. Absolute Paths

All other file- or folder-related operations are defined relative to this working directory

The huge benefit: when you share your project with a colleague or move it to a new computer, you only have to define the working directory once, everything else should work flawelessly

How to define a working directory?

import os; os.chdir("full-path-to-working-directory")

How to get information about the current working directory?

import os; os.getcwd() (cwd = current working directory); see below for an example

import os 
os.getcwd()

Checking Your Current Directory

From a terminal/command line:

  • Mac/Linux:

    pwd
  • Windows (Git Bash or WSL):

    pwd
  • Windows (Command Prompt):

    cd

Practical Examples

  • Moving into a subdirectory:

    cd Documents/Projects
  • Moving up one level:

    cd ..
  • Accessing a file in a parent directory:

    cat ../notes.txt

See here for more examples

References

Huge thanks the following people who have generated and shared most of the content of this lecture:


Thanks for your attention and don’t hesitate to ask if you have any questions!
@damien_dupre
@damien-dupre
https://damien-dupre.github.io
damien.dupre@dcu.ie