This is a repository of Python tutorials that give an overview of the Python programming language, commonly used Python standard libraries, commonly used datascience libraries numpy, pandas and matplotlib as well as Anaconda/Miniconda and the conda package manager. The tutorials will also cover IPython and the Jupyter ecosystem, particularly focusing on the JupyterLab 4 IDE. The tutorials are in the form of markdown files or interactive Python notebooks. Please feel free to star, share and fork this repository.
For datascience it is recommended to install Python in a conda environment. This can be done using the Anaconda 2024-02.1 Data Science Python Distribution or bootstrap Miniconda.
For Linux, Ubuntu will be used as an example distro but the procedure is the same on most other distros. The Mac Terminal and File System are Linux Based and therefore installation on Mac should therefore be more or less identical to Linux.
A popular IDE not preinstalled with Anaconda, but commonly used with Anaconda is VSCode. These guides will primarily focus on using the JupyterLab (which has better performance for lager notebooks) but VSCode can be used in its place if preferred. VSCode is available for Windows/Linux and Mac. The installation process is similar for each Operating System because VSCode is a general purpose code editor and a number of extensions need to be installed for each programming language:
GitHub Desktop may be used to download this repository of Python notebooks and create a local fork which you can work on:
Markdown uses simple syntax to format text and is commonly used on GitHub and within Interactive Python Notebooks. For mathematical equations it is supplemented using TeX:
These tutorials use markdown and notebook files and can be viewed in the browser (using Notebook Viewer) or in an IDE such as JupyterLab or VSCode:
The builtins module is automatically imported. It contains Pythons fundamental classes. These classes are based around the object class and the builtins module contains the functions which are used to invoke object based datamodel methods:
- Object Class
- Immutable String Class
- Immutable Bytes Class
- Mutable ByteArray Class
- Immutable Integer Class
- Immutable Floating Point Class
- Immutable Boolean Class
- Immutable Tuple Class
- Mutable List Class
- Immutable FrozenSet Class
- Mutable Set Class
- Mutable Dictionary Class
The Interactive Python Shell has a number of enhancements over the regular Python Shell. It can be used to run Python code and commonly used Shell commands that have been reimplemented as IPython magics. The Shell used for these commands will differ depending on the Operating System:
Python has a number of formatters that can be used to format code:
- Code Formatters on Windows (AutoPEP8, ISort, Black and Ruff)
- Code Formatters on Linux/Mac (AutoPEP8, ISort, Black and Ruff)
- JupyterLab Code Formatter Extension (AutoPEP8, ISort and Black)
A Python code block can be used to direct Python code in response to a condition, loop a series of operations again and again, perform error handling and to create custom functions:
The collections module contains a number of supplementary collections based around the collections seen in the Python builtins module. This includes the namedtuple
, deque
, Counter
and defaultdict
classes:
The itertools module contains supplementary iterators that are closely related to and extend those found in Python builtins:
The math and complex math modules contain commonly used mathematical functions typically returning scalar values:
The statistics module is a functional module covering basic statistics:
The random module is used to generate a random scalar number, often from a distribution:
The datetime module is used to generate scalar date, times, datetimes and timedeltas (time differences):
The Input Output (IO) module is used for reading and writing text files .txt and binary files .bin. The IO module is commonly used with the Comma Separated Values (CSV) Module for reading and writing comma separated files .csv, printed format files .prn and text delimited files .tab. The IO module is also commonly used with the JavaScript Object Notation (JSON) module for reading and writing JavaScript object notation files .json. The above are all examples of high level human-readable formats. Data can also be serialised using the pickle module which uses pickle files .pkl and serialised data can be stored in a database using the shelve module:
The Operating System module is essentially a Python implementation of the Shell programming languages and contains commands to navigate around the Operating System:
The Path and Library module is similar to the Operating System Module however uses an Object Orientated Programming (OOP) approach to file paths and libraries within the user profile or home directory. File paths are returned as instances of the Path class which have a number of useful attributes and the /
operator can be used for folder and file concatenation:
The System module provides access to some variables used or maintained by the Python interpreter and to functions that interact strongly with the interpreter:
The Numeric Python library is based upon the data structure of the NDimensional Array. This is a datastructure that is a collection however unlike builtins
collections, all the datamodel methods are configured for numeric operations. numpy also broadcasts the functions found in the math, datetime and random modules to an ndarray:
The Python and Data Analysis library builds upon the data structure of the ndarray, creating a Series which is a 1D Array (1D) with a column name and a DataFrame which is a grouping of Series analogous in form to an Excel SpreadSheet. The Python and Data analysis library can be used to programmatically manipulate the data stored in the DataFrame analogous to any data operations that would be carried out manually in Excel:
The matrix plotting library encompasses a large group of modules compartmentalising objects used for visual elements in a plot. As a user generally only the Python Plot Module is used which allows manipulation of the above objects using a simplified functional and object-orientated programming syntax:
seaborn is a wrapper library for matplotlib which greatly simplifies the code required to create plots that are commonly used for data visualisation of data stored in a DataFrame:
plotly is a Python plotting library that creates plots using nodejs. This allows plots to be displayed interactively in the cell output of an interactive Python notebook:
The Python Imaging Library contains the Image module which contains the Image data structure. This module is used for Image construction from an ndarray or an image file taken from another image manipulation program or camera: