Lecture 1: Introduction to ePortfolios

BAA1028 - Workflow & Data Management

Damien Dupré

Module Contact Details

Damien Dupré, PhD

  • email: damien.dupre@dcu.ie
  • phone: 00353 (0)1 700 6360
  • office: Q233 DCU Business School

Module Content

Knowledge

  • ePortfolio
  • Project Organisation

Skills

  • bash/zsh
  • Git & GitHub
  • Quarto
  • Markdown & HTML
  • Jupyter

Slides

Available on the module’s loop page and, more importantly, online at:

https://damien-dupre.github.io -> courses -> BAA1028

What About You?

  • Who is using Microsoft OS, Mac Os, or Linux?
  • Who has already an account on GitHub, GitLab, or equivalent?
  • Who has a personal website on WordPress, Google Site, or equivalent?
  • Who knows how to read and write .html files?
  • Which coding platform are you currently using to code: Spyder, JupyterLab, Jupyter Notebook, PyCharm, VS Code, Neovim, RStudio, Atom, Positron, other?
  • Who knows what Quarto is?
  • Who knows what bash/zsh is?
  • Do you know where is your Notepad (Windows) or TextEdit (MacOS)?

Module Assessment

90% ePortfolio

Website that showcases your coding skills, achievements, experiences, and personal reflections. It should include various media types—documents, images, videos, audio clips, presentations, and web links—that can be tailored for different audiences. Deadline: April 13, 2025

10% Kubicle Courses

Creating an ePortfolio

ePortfolio Example 1

ePortfolio Example 2

ePortfolio Example 3

ePortfolio Example 4

ePortfolio Example 5

ePortfolio Example 6

ePortfolio Example 7

ePortfolio Example 8

ePortfolio Example 9

ePortfolio Example 10

ePortfolio Example 11

ePortfolio Example 12

What is an ePortfolio?

Definition

  • An ePortfolio is a digital collection of materials that showcases an individual’s achievements, skills, experiences, and learning.
  • It serves as both a personal and professional development tool.
  • Combines multimedia elements (text, images, videos, etc.) to enhance presentation.

ePortfolio are also about what you don’t include

  • Quality over quantity! Don’t throw in every student project
  • Avoid sharing irrelevant information like data cleaning (just link to the file with the code) or personal information like phone number
  • Check for grammar errors/clarity

ePortfolio Platforms

  • Google Sites: Free and user-friendly while having limited design options.
  • Canva: Visually appealing templates.
  • WordPress: Customisable and scalable website for diverse purpose.
  • Loop Reflect/Mahara: Designed for education and collaboration.
  • LinkedIn: Professional networking with portfolio integration.

…Just to name a few.

These services offer a platform to design an eportfolio/website and to host the eportfolio/website on their own servers.

Instead, we are going to manually create our eportfolio/website and to host it on a specific server.

ePortfolio Requirements

Technical Features

  • Published with GitHub Pages
  • Use of Git to upload files rather manual upload
  • Use of Quarto rather than manual html pages
  • Active code rather than inactive code (e.g., screenshots)
  • Tidy project folder organisation
  • Careful use of data, both databases and personal data

Components

  • Introduction: About me/profile section.
  • Evidence of Work: Projects, assignments, and achievements.
  • Reflection: Insights on learning experiences.
  • References/Contact: Networks and accessibility details.
  • CV: Link to pdf accessible document.

Additional Criteria

Overall design, attention to details, and innovation.

Plagiarism


Do NOT copy other students work


Do NOT copy code online without reference


Do NOT use full code templates

Exercise 1: Type of Files in an ePortfolio

  1. Go to the GitHub Repository of one of the portfolio presented earlier,
  2. Download and unZip this repository on your own Computer,
  3. List all the file extensions of documents included in this folder and subfolders.
05:00

Exercise 1: Type of Files in an ePortfolio

Which file extensions have you found?

Web and Interactive Content

•   .html / .htm: HyperText Markup Language files for web content.
•   .css: Cascading Style Sheets for web design.
•   .js: JavaScript files for interactive elements.

Text and Document Formats

•   .doc / .docx: Microsoft Word documents.
•   .pdf: Portable Document Format, widely used for static, professional documents.
•   .txt: Plain text files.
•   .rtf: Rich Text Format, compatible with various word processors.
•   .odt: OpenDocument Text, used by open-source tools like LibreOffice.

Presentation Formats

•   .ppt / .pptx: Microsoft PowerPoint presentations.
•   .odp: OpenDocument Presentation format.
•   .key: Apple Keynote presentations.

Spreadsheet Formats

•   .xls / .xlsx: Microsoft Excel spreadsheets.
•   .csv: Comma-separated values, often used for data sharing.
•   .ods: OpenDocument Spreadsheet format.

Image Formats

•   .jpg / .jpeg: Compressed image files.
•   .png: High-quality image files supporting transparency.
•   .gif: Animated or static image files.
•   .bmp: Bitmap image files.
•   .tiff / .tif: High-resolution image files.
•   .webp: Compressed image format for the web.
•   .ico: Icon image files, often used for branding or UI mockups.
•   .heic: High Efficiency Image Format, used in modern Apple devices.
•   .svgz: Compressed Scalable Vector Graphics.
•   .raw: Camera raw image files from DSLRs.

Audio Formats

•   .mp3: Compressed audio files.
•   .wav: High-quality audio files.
•   .ogg: Open-source audio file format.

Video Formats

•   .mp4: A widely-used video format compatible with most devices.
•   .avi: Video format with higher quality, but larger file size.
•   .mov: Apple QuickTime video format.
•   .wmv: Windows Media Video format.

Other Formats

•   .svg: Scalable Vector Graphics, ideal for logos and illustrations.
•   .md: Markdown files, often used in coding or minimalist documentation.
•   .log: Log files for tracking progress or issues.
•   .yml / .yaml: Data serialisation formats, often for configuration files.
•   .zip / .tar.gz / .7z: Compressed archive formats for distributing multiple files.
•   .dat: Generic data files.

Keep your Projects Tidy

Project Organisation

Keep your projects tidy: 1 folder per projects and all the projects folders in 1 “project” folder.

Exercise 2: Create your “project” folder

  1. Go to the root/home of your computer using your file explorer
  2. Create a “Project” folder
  3. In this “Project” folder, create a folder called “my_first_project”
  4. Print the path of “my_first_project” folder
05:00

Project Management

Why do we care about project management?


Portability

The ability to move the project without breaking code or needing adapting

  • you will change computers
  • you will reorganise your file structure
  • you will share your code with others

Reproducibility

The ability to rerun the entire process from scratch

  • not just for reviews
  • not just for best-practice analytics
  • also for future (or even present) you
  • and for your collaborators/helpers

Project Workflows

Portability

  • All necessary files should be contained in the project and referenced relatively
  • All necessary outputs are created by code in the project and stored in the project

Reproducibility

  • All code can be run in fresh sessions and produce the same output
  • Does not force other users to alter their own work setup

What is a path?

A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory.

The delimiting character is most commonly the slash (“/”), the backslash character (“\”), or colon (“:”), though some operating systems may use a different delimiter.

Resources can be represented by either absolute or relative paths

See https://en.wikipedia.org/wiki/Path_(computing)

Delimiting characters

Delimiting characters / or \ vary by operating system

  • MS Windows:
    • Use a colon : to specify the drive name (e.g., c:, d:, e:)
    • Folders and files are seperated by a backslash character (\)
    • Example: J:\Work\PARI\PARI-F\data\pari-f_data_v0-1.dta
  • Linux/macOS:
    • No colon (:)
    • Use only the slash (/) character
    • Example: /E/syncwork/projects/confer/ps2021-10-ws-repro-research

Delimiting characters

  • Software, such as R, \(\LaTeX\), and Python, with a Linux/UNIX background behaves differently under MS Windows when it comes to specifying file/folder paths

  • That is, when specifying a path (e.g., in R or \(\LaTeX\)) in MS Windows, these programs do not like the backslash character (\) (the backslash is used for “escaping” other characters)

Delimiting characters

Two solutions in MS Windows:

  1. Use the slash character / instead of \, e.g.:
import pandas
csv_file = pandas.read_csv('C:/Damien/myfolder/mysubfolder/mydata.csv')
  1. Escape the backslash character via \\, e.g.:
import pandas
csv_file = pandas.read_csv('C:\\Damien\\myfolder\\mysubfolder\\mydata.csv')

Delimiting characters

Note: Many programming languages/statistical packages (R, Python, …) can dynamically create a full path that follows the rules of the respective operating system

import os
os.path.join("e:", "folder1", "folder2", "file")

returns: 'e:folder1\\folder2\\file' (in Windows)

A handy tool when working on both operating system: Path Copy Copy – Copy file paths from Windows explorer’s contextual menu

Mac users can left-click and press option to “copy as a Pathname”

Relative vs. Absolute Paths

An absolute path specifies a file or directory location from the root directory.

Examples:

  • Mac/Linux:
/Users/username/Documents/file.txt
  • Windows:
C:\Users\Username\Documents\file.txt

Relative vs. Absolute Paths

A relative path specifies a location relative to the current directory which is a “fixed location” on your computer

Often, this “fixed location” is the so-called “working directory”

  • The dot . denotes the current working directory
  • The dot dot .. denotes the parent directory, i.e., it points upwards in the folder hierarchy
  • Finally, the tilde symbol ~ will bring you back to your home directory, e.g. cd ~

Examples:

subfolder/file.txt  # Inside a subfolder
./file.txt       # Current directory
../file.txt      # Parent directory

Works differently based on where the command is run.

Relative vs. Absolute Paths

So, let’s assume the project “PARI-F” is located on drive J:, the full absolute path is J:\Work\PARI\PARI-F

  • The content of the project’s folder PARI-F is:
.
|-- analysis
|-- data
|-- doc
|-- pari-f.stpr
`-- report

Relative vs. Absolute Paths

All other file- or folder-related operations are defined relative to this working directory

The huge benefit: when you share your project with a colleague or move it to a new computer, you only have to define the working directory once, everything else should work flawelessly

How to define a working directory?

import os; os.chdir("full-path-to-working-directory")

How to get information about the current working directory?

import os; os.getcwd() (cwd = current working directory); see below for an example

import os 
os.getcwd()

Checking Your Current Directory

From a terminal/command line:

  • Mac/Linux:

    pwd
  • Windows (Git Bash or WSL):

    pwd
  • Windows (Command Prompt):

    cd

Practical Examples

  • Moving into a subdirectory:

    cd Documents/Projects
  • Moving up one level:

    cd ..
  • Accessing a file in a parent directory:

    cat ../notes.txt

See here for more examples

Portability

In your code, do not use:

import os
os.chdir('/path/to/your/directory')

Prefer:

#| eval: false
# pip
python -m pip install pyprojroot

# conda
conda install -c conda-forge pyprojroot

Then:

from pyprojroot.here import here

here()

Portability

What’s wrong with os.chdir('/path/to/your/directory')?

  • It will only ever work for the user creating the file

  • It is not portable

    • Moving the folder/file will break the code
    • Collaborators will need to change any setwd path
  • Increases likelihood that work from other processes leaks into current work

Portability

The pyprojroot library:

If all files are contained in the project folder reference files with the here() function from the pyprojroot library creates relative paths from project root allows several ways to indicate project root folder

Self-Contained Projects

Contains all necessary files for your project, eportfolio or any repository in general:

  • data
  • results
  • documentation
  • scripts
  • images
  • designs (css/sass)
  • tabs/topics

Folder/File structure

data

  • all raw data files, organised in meaningful ways
  • never, ever write back to this folder, read only
  • if using git, never commit to history, place in .gitignore

results

  • write all analysis etc. results to treat as disposable, can be overwritten
  • may also include figures etc if wanted

docs

  • documentation
  • Quarto files

src or py

  • if you write functions that are used in several places
  • this is the standard python folder for keeping these files that might be called in python scripts

scripts/analysis

  • files with full analysis pipelines
  • might have source calls to files in python

Folder/File structure

README.md

  • markdown file describing the project content and intent
  • maybe also explains which files to look in for what
  • ideal to have if saving the folder to github

LICENCE

  • dictates how code can be reused
  • not covering that in this series, ask me at need

File Naming

Organising files in data/, results/, docs/, and scripts/ require some ideas of how to name files for:

  • easy machine reading
  • easy human reading
  • easy understanding of file content
  • choosing the correct type of file to store

If you are using the py/ folder to store python-functions, these might need somewhat different naming conventions than the other folders, as these are functions you can use across the other files.

Here, naming should be particularly thought in terms of content rather than structural organisation.

File Naming

An important part of project management, code automation, and data analytics in general is to have your files read by a piece of code or software.

Machines are clever, but extremely pedantic.

Be consistent, be meticulous.

Some machines are more clever than others, so name files in a way that the “dumbest” of them can deal with.

File Naming

  • Don’t use white space
    • decide on a separator and use consistently
    • recommend the dash -
  • Use small case letters
    • certain machines care about capitalization
  • Use numbers smartly
    • numbers are awesome to use and can help organise files meaningfully
    • but needs some thinking about before implementing

File Naming

Naming - variables and filenames should have meaningful names in snake_case format, preferring all lower case.

File Naming

Machines will first list files starting with numbers (ascdendingly) then in alphabetic order.

1_file.txt
2_file.txt
file_one.txt
file_three.txt
file_two.txt

But they wont understand the difference between 1 and 10

10_file.txt
1_file.txt
2_file.txt
file_1.txt
file_10.txt
file_2.txt

‘zero-padding’ is a way of preserving file order

01_file.txt
02_file.txt
10_file.txt
file_01.txt
file_02.txt
file_10.txt

File Naming

Using dates in file names may also ensure decent organisation but be consistent. Recommend using YYYY-MM-DD formatting

13-11-21_initial-submission-results.txt
22-01-03_revised-results.txt
2022-02-28_results.txt

vs

2021-11-13_initial-submission-results.txt
2022-02-28_results.txt
2022-03-01_revised-results.txt

File Naming

Consider using different space separators for different parts of the file name

This way you can use the file name it self, programatically, if needed

2021-11-13_initial_submission_results.txt
2022-02-28_results.txt
2022-03-01_revised_results.txt

File Naming

Optimising file names for computers is great, but ultimately its us humans that need to choose files to work with. Naming files in a way that makes the file content obvious (or at least give an idea of content) by the file name is good for such interactions.

2021-11-13_final-results.txt
2022-02-28_finalfinal-results.txt
2022-03-01_finished-results.txt

vs.

2021-11-13_first-submission-results.txt
2022-02-28_revision-round1-results.txt
2022-03-01_revision-round2-results.txt
2022-03-01_revision-round2-no-sex-results.txt

Image Types

Images from plots should use png or svg

  • .png supports transparency and has no quality loss upon re-saving

  • .svg can rescale to infinity without getting grainy

  • .jpg best for photos, quality loss on rescale, blurry edges and poor text rendering

Images can also some times be saved in pdf, but pdf while a vector format, cannot support transparency.

Tiff has fallen out of favour due to high file sizes, but are preferable to jpeg for photos.

References

Huge thanks the following people who have generated and shared most of the content of this lecture:


Thanks for your attention and don’t hesitate to ask if you have any questions!
@damien_dupre
@damien-dupre
https://damien-dupre.github.io
damien.dupre@dcu.ie