class: center, middle, inverse, title-slide .title[ # BAA1030 - Data Analytics and Story Telling ] .subtitle[ ## Lecture 7: Introduction to Python ] .author[ ### Damien Dupré - Dublin City University ] --- # Module Objectives Part 2 In the first part of the module we have seen how to **design visualisations** and how to **use efficient software** to embedded stories and visualisations (for example Tableau). While a dashboard solution would be defined as "No-Code", the second part of this lecture will teach you how to design visualisations and to embed them as long as the story using **Open-Source coding technologies**. #### We will start from scratch: - Introduction to Python - Python for Visualisations - Introduction to Quarto - Introduction to GitHub and GitHub Pages --- # Example of Student Submissions <iframe width="1000" height="450" src="https://mathis-yannicopoulos.github.io/Individual-Assignment/Indvidual%20Assignment.html"></iframe> - [
Website](https://mathis-yannicopoulos.github.io/Individual-Assignment/Indvidual%20Assignment.html) - [
Repository](https://github.com/Mathis-Yannicopoulos/Individual-Assignment) --- # Example of Student Submissions <iframe width="1000" height="450" src="https://leospagni.github.io/UNICEF/UNICEF_Quarto_Project/UNICEF_report"></iframe> - [
Website](https://leospagni.github.io/UNICEF/UNICEF_Quarto_Project/UNICEF_report) - [
Repository](https://github.com/leospagni/UNICEF/tree/main/UNICEF_Quarto_Project) --- # Example of Student Submissions <iframe width="1000" height="450" src="https://shanemarioantao.github.io/Assignment/project/Unicef.html"></iframe> - [
Website](https://shanemarioantao.github.io/Assignment/project/Unicef.html) - [
Repository](https://github.com/ShaneMarioAntao/Assignment/tree/main/project) --- # Python Introduction Modern data analytics uses free and open-source computer languages: * Proprietary languages (e.g., Matlab) and software (e.g., SPSS, Stata, SAS) are outdated * Python and R are the main open-source languages for data analytics While R is mostly used in academic research and public institutions, Python is, by far, the most used language in organisations. #### So let's use Python! <img src="https://i.programmerhumor.io/2023/06/programmerhumor-io-python-memes-backend-memes-6e51e8ccc5a8207.jpg" width="30%" style="display: block; margin: auto;" /> --- class: inverse, mline, center, middle # 1. Python and its IDE --- # What are Python and Python IDEs? There are some key concepts you need to understand and to remember: * Python is the name of the language * Python can be coded in various types of interface also called Integrated Development Environment or IDE At its simplest, **Python is like a car’s engine** while **an IDE is like a car’s dashboard**. .pull-left[ .center[Python: The engine] <img src="https://raw.githubusercontent.com/damien-dupre/img/main/car_motor.jpeg" style="display: block; margin: auto;" /> ] .pull-right[ .center[IDE: The dashboard] <img src="https://raw.githubusercontent.com/damien-dupre/img/main/car_dashboard.jpeg" style="display: block; margin: auto;" /> ] --- # What are Python and Python IDEs? In its default installation Python and your Python IDE have to be installed separately. Therefore, you should first download and install Python: [https://www.python.org/](https://www.python.org/) Then, download and install your Python IDE. Popular Python IDEs include: - [PyCharm: feature-rich with intelligent code suggestions](https://www.jetbrains.com/pycharm/) - [VS Code: lightweight and highly customisable](https://code.visualstudio.com/) - [Spyder: designed for scientific computing](https://docs.spyder-ide.org/) Additionally, a software called [Anaconda](https://www.anaconda.com/download) proposes to install both for free: - It includes Conda, a package manager that simplifies installing, updating, and managing libraries, avoiding dependency conflicts often encountered with pip. - It comes pre-installed with over 1,500 python libraries, such as NumPy, Pandas, Matplotlib, and TensorFlow, saving time and effort in setting up an environment. --- # Which IDE to Use? <img src="https://miro.medium.com/v2/resize:fit:1200/1*HmjIg5JOH3gXXCGX0JK95g.png" style="display: block; margin: auto;" /> Source [
Stackoverflow Survey 2024](https://survey.stackoverflow.co/2024/technology#3-integrated-development-environment) --- # What are Python and Python IDEs? While VS Code is the most used IDE, it is worth to mention the existence of [Jupyter Notebook](https://jupyter.org/). It offers a simple but interactive interface that allows you to write Python code, test it, and see the result below the instruction rather than in a separate console. It allows to combine text in Markdown format (a lighter markup text format than HTML), Python code, and HTML code for visualizations and animations in a single document. <img src="https://python.sdv.u-paris.fr/img/jupyter-exemple.png" style="display: block; margin: auto;" /> However, we are not going to use any of these solutions... --- class: clear ## .center[**Time to enter ...**] -- <img src="https://raw.githubusercontent.com/damien-dupre/img/main/the_matrix.gif" width="150%" style="display: block; margin: auto;" /> --- # Google Colab Instead, we are going to use Google Colab! This solution has several advantages, particularly for users who want a hassle-free, cloud-based solution without worrying about local installation or hardware limitations. It is entirely cloud-based, meaning there is no need to install or configure anything on your local machine. Try it here: [https://colab.research.google.com/](https://colab.research.google.com/) <img src="https://algotrading101.com/learn/wp-content/uploads/2021/05/Google-Colab-Guide-e1620759490851.jpg" width="50%" style="display: block; margin: auto;" /> --- # Code in Google Colab Most of the python code displayed in this lecture is included in these slides. Rather than typing it manually, open these slides in another tab to copy-paste the code Two ways to access these slides: - From Loop: Lectures > Lecture 7 - Or from the URL: https://damien-dupre.github.io/BAA1030/lectures/lecture_7 <img src="https://c.tenor.com/0heitU7-tg4AAAAC/copy-paste-paste.gif" width="50%" style="display: block; margin: auto;" /> --- class: inverse, mline, center, middle # 2. Coding in Google Colab --- # Google Colab When you create a new project, you will launch Rstudio see the following 3 windows (also called panes): * **Console**: where the results are printed * **Variable Environment**: where the objects are stored * **Notebook**: where the code should be typed and saved <img src="https://hutchdatascience.org/Intro_to_Python/images/colab.png" width="60%" style="display: block; margin: auto;" /> --- # Google Colab ## Python Console Open it via `View` -> `Executed` code history. You give it one line of Python code, and the console executes that single line of code; you give it a single piece of instruction, and it executes it for you. ## Notebook In the central panel of the website, you will see Python code interspersed with word document text. This is called a Python Notebook (other similar services include Jupyter Notebook, iPython Notebook), which has chunks of plain text and Python code, and it helps us understand better the code we are writing. --- # Variable Environment .pull-left[ Open it by clicking on the <kbd>{x}</kbd> button on the left-hand panel. Often, your code will store information in the Variable Environment, so that information can be reused. For instance, we often load in data and store it in the Variable Environment, and use it throughout rest of your Python code. Note that I will mention the `Terminal` in the lecture. In Google Colab, the terminal can only be accessed if you pay for Google Colab Pro. However, this is something that we won't do and we will find free alternative to use the `Terminal` in Google Colab. ] .pull-right[ <img src="https://www.jcchouinard.com/wp-content/uploads/2022/06/image-46.png" style="display: block; margin: auto;" /> ] --- class: title-slide, middle ## Exercise The first thing we will do is see the different ways we can run Python code. - In the Python Notebook, create chunk of Python Code with `2+2`. - Then, click the arrow button ▶️ to run the code.
−
+
02
:
00
--- # Running Python Code In addition to the code cell, you can use the Python Console too. Type something into the Python Console (Execution) and click the arrow button, such as `2+2`. The Python Console will run it and give you an output. A very useful trick is to run every single Python code chunk via `Runtime` -> `Run all`. Remember that **the order that you run your code matters** in programming. Your final product would be the result of Option 3, in which you run every Python code chunk from start to finish. However, sometimes it is nice to try out smaller parts of your code via Options 1 or 2. But you will be at risk of running your code out of order! --- # New Code Chunks To create your own content in the notebook, click on a section you want to insert content, and then click on <kbd>+ Code</kbd> or <kbd>+ Text</kbd> to add Python code or text, respectively. Python Notebook is great for data science work, because: - It encourages reproducible data analysis, when you run your analysis from start to finish. - It encourages excellent documentation, as you can have code, output from code, and prose combined together. The version of Python used in this course and in Google Colab is Python 3, which is the version of Python that is most supported. Now, we will get to the basics of programming grammar. --- class: inverse, mline, center, middle # 3. The Basics of python Code --- # What are .py and .ipynb files? **.py** is the extension for a python script (document including only python code) **.ipynb** is the extension for a i-python notebook documents which are used by Jupyter or Google Colab. These notebooks mixes code cells (or code chunks) and markdown cells (basically text) ## How to Run Python Code? In a Python notebook, click on the play arrow next to the code cell or: - Press <kbd>Ctrl</kbd> & <kbd>Enter</kbd> (Win) - Press <kbd>command</kbd> & <kbd>Enter</kbd> (Mac) <img src="https://amitness.com/posts/images/colab-cell-hover.png" style="display: block; margin: auto;" /> --- # How to Run Python Code? Inside a code cell, every character will be read as python code excepts if after a `#` which is used for comments. Example of non-active code: ``` python # non active code for comments ``` Example of active code: ``` python print("active", "code") ``` ``` ## active code ``` ``` python 1 + 1 ## everything after `#` is non active and is used for comments ``` ``` ## 2 ``` --- # What are python packages/libaries? Python packages extend the functionality of python. They are written by a worldwide community of python users and can be downloaded for free from the internet. A good analogy for **python packages are like apps you can download onto a mobile phone**. .pull-left[ .center[Python: A new phone] <img src="https://raw.githubusercontent.com/damien-dupre/img/main/phone_design.jpeg" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ .center[Packages/Libaries: Apps you can download] <img src="https://raw.githubusercontent.com/damien-dupre/img/main/phone_apps.jpeg" width="100%" style="display: block; margin: auto;" /> ] --- # What are python packages/libaries? Say you have purchased a new phone, to use Instagram you need to **install the app once** and to **open the app** every time you want to use it. The process is very similar for using an python package. You need to: #### 1. Install the package/library .pull-left[ From the terminal using: ``` bash pip install praise ``` ] .pull-right[ Or, in a Google Colab code cell using: ``` python !pip install praise ``` ] #### 2. “Load” the package with the relevant function using `from` and `import` ``` python from praise import praise ``` Note: Here, `praise` is the name of the package/library and `praise` is also the name of a function contained in the package/library. #### 3. Once the package is loaded, you can use the function `praise()` from this package such as: ``` python praise() ``` --- # What are python packages/libaries? In order to load all the functions contained in a package/library, it can be called using an explicit alias using `import` and `as`: ``` python import praise as pr ``` Then functions can be used using the alias as prefix: ``` python pr.praise() ``` With the time, alias conventions have been created for popular packages/libraries: ``` python import numpy as np import pandas as pd import math as m import matplotlib.pyplot as plt ``` --- class: title-slide, middle ## Live Demo --- class: title-slide, middle ## Exercise Open an New Notebook in Google Colab. In this document: - Create a 1st code cell to **install the package "praise"** ``` python !pip install praise ``` - Create a 2nd code cell to **load the library praise** and the function **praise()** ``` python from praise import praise ``` - Create a 3rd code cell to **run the function `praise()`** as it is, without arguments ``` python praise() ```
−
+
05
:
00
--- # Calling Functions Functions are algorithms (or lines of code) which **transform data to something else**. For example, the function `praise()`, without arguments, prints a random message. Functions have **a name** and **several parameters or arguments** that require some information. ``` python function_name(argument_1=value_1, argument_2=value_2, ...) ```
<i class="fas fa-exclamation-triangle faa-flash animated faa-slow " style=" color:red;"></i>
However, arguments are usually matched by position without explicitly calling them: ``` python function_name(value_1, value_2, ...) ``` --- # Calling Functions For example, the function `arange()` (read "A-range" for array range) from the package `numpy` makes a sequences of numbers: * The first argument `from` is the number starting the sequence * The second argument `to` is last number of the sequence ``` python import numpy as np np.arange(0, 10) ``` ``` ## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) ``` --- # Calling Functions However, the function `arange()` has many more arguments, for example ``` python np.arange(0, 10, dtype=float) ``` ``` ## array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) ``` Note: These are fundamental types used in lists, tuples, and other structures: - Integer (**int**) – Whole numbers (e.g., 1, 2, -5) - Floating-Point (**float**) – Decimal numbers (e.g., 3.14, -0.001) - String (**str**) – Text data (e.g., "hello", 'Python') - Boolean (**bool**) – True or False values (True, False) --- # Assign Values to Objects An object is a box that **can include anything** (e.g., values, dataframes, figures, models, functions, ...) and **has a name** that you have to choose. To create an object, you need to **assign something** to a name using the `=` operator. ``` python x = 4 ``` If you type the name of the object, python will print out its content. ``` python x ``` ``` ## 4 ``` Then, you can use this object further in your code: ``` python np.arange(x, 10) ``` ``` ## array([4, 5, 6, 7, 8, 9]) ``` --- # Assign Values to Objects It is very important to distinguish values and objects in python: <table> <thead> <tr> <th style="text-align:left;"> Type </th> <th style="text-align:left;"> Class </th> <th style="text-align:left;"> Example </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Integer (int) </td> <td style="text-align:left;"> Numeric Value </td> <td style="text-align:left;"> 1, 2, ... </td> </tr> <tr> <td style="text-align:left;"> Floating-Point (float)) </td> <td style="text-align:left;"> Decimal Value </td> <td style="text-align:left;"> 1.1, 2.8, ... </td> </tr> <tr> <td style="text-align:left;"> String (str) </td> <td style="text-align:left;"> Character Value </td> <td style="text-align:left;"> "one", "two", ... </td> </tr> <tr> <td style="text-align:left;"> Word without quotes </td> <td style="text-align:left;"> Object Name </td> <td style="text-align:left;"> function name, data name, ... </td> </tr> </tbody> </table> These, types of values are then stored in different objects: <img src="https://nustat.github.io/DataScience_Intro_python/Datasets/numpy_image.png" width="100%" style="display: block; margin: auto;" /> --- # Different Objects All object assignments have the same form: ``` python object_name = object_content ``` You want your object names to be descriptive, so you will need a convention for multiple words. I recommend **snake_case** where you separate lower-case words with `_`. ``` python numeric_value = 1 character_value = "one" list_with_numeric_values = [1, 2] list_with_character_values = ["one", "two"] ``` --- class: title-slide, middle ## Live Demo --- class: title-slide, middle ## Exercise In the same python code chunk in google colab, **Copy, Paste, and Run** the following code: ``` python import matplotlib.pyplot as plt # Data my_power = [0.5, 99.5] my_knowledge = ["without python", "with python"] # Create bar plot plt.bar(my_knowledge, my_power) # Add labels plt.xlabel("Knowledge Level") plt.ylabel("Power (%)") plt.title("Bar Plot Example") # Show plot plt.show() ```
−
+
05
:
00
--- class: inverse, mline, center, middle # 4. Access Data in Google Colab --- # Open your Data as python Object (1) Google Colab is a free remote computer, the computing is not run on your computer. To open Data on Google Colab, you first need to `Upload` your file on this computer and to `read` the data in python. .pull-left[ .center[Step 1: Upload your File] <img src="https://miro.medium.com/v2/resize:fit:1400/1*eLs1D3BI4_HLAabN5WUTPg.png" style="display: block; margin: auto;" /> ] .pull-right[ .center[Step 2: Read your File with Pandas] <img src="https://miro.medium.com/v2/resize:fit:1400/1*EwbRL6__lKav4sKfjS8Ktw.png" style="display: block; margin: auto;" /> ] --- # Open your Data as python Object (2) Once you have uploaded your data file from your computer to this free cloud computer, you can read it using the function `read_csv` in the `pandas` package/library. The only argument expected is the path to the upload file which can be obtained with a left/double click on the file. The result of the `read_csv` has to be saved in an object and you can print this object by running its name. ``` python import pandas as pd df = pd.read_csv("/content/titanic.csv") df ``` --- # Open your Data as python Object (3) If you use python on your own computer, file paths are different according to your Operating System: ``` python # Windows my_file_object = pd.read_csv("C:/path/to/my/file.csv") # Macos my_file_object = pd.read_csv("/Users/path/to/my/file.csv") ``` The following codes will generate an error: ``` python # Incomplete path my_file_object = pd.read_csv("/path/to/my/file.csv") # Missing file extension my_file_object = pd.read_csv("C:/path/to/my/file") # Use of backward slash my_file_object = pd.read_csv("C:\path\to\my\file.csv") ``` --- class: title-slide, middle ## Live Demo --- class: title-slide, middle ## Exercise 1. Click on `Upload` to upload your "unicef_metadata.csv" file on your Google Colab 2. Use the function `read_csv` in the `pandas` package/library to read the file.
−
+
05
:
00
--- class: inverse, mline, center, middle # 5. Save Your Notebook --- # Save Your Notebook Your data file will be removed from Google Colab as soon as you close the tab. However, that's ok we can upload it and use it again every time we come back to the Notebook. What is important to save is your Notebook with the code your wrote. By clicking on `File` > `Save`, you will save it on your Google Drive. A new folder called "Colab Notebooks" will be created and all your Notebooks saved in this folder. --- class: title-slide, middle ## Exercise Save the Notebook you were working on and find it in your google drive.
−
+
03
:
00
--- # Become Expert in Python Because Python is free, plenty of free learning materials are available online, on Youtube or Tik Tok Social networks like LinkedIn or X, and content aggregators like Medium or Towards Data Science, offer posts of varying quality. The site [Real Python](https://realpython.com/) generally provides very good posts, comprehensive and educational. Many free online books are also listed in the [Big Book of Python](https://www.bigbookofpython.com/) There are well-crafted newsletters for regularly tracking developments in the data science ecosystem. For me, they are the primary source of fresh information. - If you had to subscribe to only one newsletter, the most important to follow is Andrew Ng's, <a href="https://www.deeplearning.ai/the-batch/"><em>"The Batch"</em></a>. - General newsletters from <a href="https://dataelixir.com/"><em>Data Elixir</em></a> and <a href="https://alphasignal.ai/">Alpha Signal</a> keep you updated with the latest news. - In the field of data visualization, newsletters from <a href="https://blog.datawrapper.de/newsletter/"><em>DataWrapper</em></a> provide accessible content on the subject. --- # How to solve errors .pull-left[ #### 1. Look at your error * If it's obvious, solve it by yourself * If it's not obvious, paste the error in Google #### 2. Code assistant AIs * ChatGPT, Github Copilot, or any other can be very helpful * Provide them with the code + error #### 3. Look at the function * Documentation available online #### 4. Look at the web * Google "python how to ..." * Stack Overflow ] .pull-right[ <img src="https://media.makeameme.org/created/me-when-i-99d07768c9.jpg" width="100%" style="display: block; margin: auto;" /> ] --- class: inverse, mline, left, middle <img class="circle" src="https://github.com/damien-dupre.png" width="250px"/> # Thanks for your attention and don't hesitate to ask if you have any question! [
@damien-dupre](http://github.com/damien-dupre) [
https://damien-dupre.github.io](https://damien-dupre.github.io) [
damien.dupre@dcu.ie](mailto:damien.dupre@dcu.ie)