diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..2942684
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,4 @@
+exercise_data.json
+docs/
+verify.sh
+venv
\ No newline at end of file
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..dfd3959
--- /dev/null
+++ b/README.md
@@ -0,0 +1,3 @@
+# MLOps & Interviews Insper Course
+
+MLOps & Interviews
diff --git a/active-handout.yml b/active-handout.yml
new file mode 100644
index 0000000..12b585d
--- /dev/null
+++ b/active-handout.yml
@@ -0,0 +1,44 @@
+
+theme:
+ name: active-handout-theme
+ locale: en
+
+docs_dir: content
+site_dir: docs
+
+extra_javascript:
+ - https://polyfill.io/v3/polyfill.min.js?features=es6
+ - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
+
+plugins:
+ - active-handout
+
+markdown_extensions:
+ - footnotes
+ - markdown.extensions.admonition
+ - pymdownx.tasklist:
+ custom_checkbox: true
+ - pymdownx.details
+ - pymdownx.tabbed:
+ alternate_style: true
+ - pymdownx.highlight:
+ anchor_linenums: true
+ pygments_lang_class: true
+ - pymdownx.inlinehilite
+ - pymdownx.superfences
+ - pymdownx.magiclink
+ - pymdownx.critic:
+ mode: view
+ - pymdownx.betterem:
+ smart_enable: all
+ - pymdownx.caret
+ - pymdownx.mark
+ - pymdownx.tilde
+ - pymdownx.smartsymbols
+ - pymdownx.emoji:
+ emoji_generator: !!python/name:pymdownx.emoji.to_svg
+ - attr_list
+ - pymdownx.tilde
+ - pymdownx.snippets:
+ base_path: content
+ check_paths: true
diff --git a/content/about.md b/content/about.md
new file mode 100644
index 0000000..921c6a2
--- /dev/null
+++ b/content/about.md
@@ -0,0 +1,49 @@
+# About this Course
+
+We will learn how to **Take Machine Learning Models to Production**!
+
+## Schedules
+
+- **Monday**: 09:45 AM - 11:45 AM
+- **Wednesday**: 09:45 AM - 11:45 AM
+- Office hours on **Mondays** 02:00 PM - 03:30 PM (Teams)
+
+## Deliverables
+
+Students will need to submit some assignments:
+
+- `APS`: practical activities developed during and after classes.
+- `INT`: some classes will focus on technical interviews in machine learning. These classes will require previous preparation by the students, who will interview each other during those classes.
+- `PRO`: in the last classes, students will have to apply the knowledge acquired in a project involving the deploy of an ML model.
+
+## Exams
+
+The interview assignments (`INT`) are like exams. Besides that, there will be no exams, neither classes during the intermediate and final assessment week of Insper's calendar.
+
+## Requirements.txt!
+
+In order for you to absorb the most from the course activities, it is mandatory that you have knowledge in:
+
+- Advanced computer programming
+- Cloud computing
+- Basic statistics
+
+The classes require a lot of autonomy. It's a good idea to have already taken the Insper courses of:
+
+- Cloud computing
+- Machine Learning
+
+## Final grade
+
+The final grade is calculated with the following formula
+
+```
+NF = 0.35*APS + 0.30*INT + 0.35*PRO
+```
+
+Some conditions are required for approval:
+
+- Grade average greater than or equal to `5.0` in **APS**, **INT** and **PRO**.
+- A maximum of two assignments (interviews + APS) with a score of zero or not submitted.
+
+If any of these criteria is not met, the student will fail this course!
diff --git a/content/classes/01-intro/add_webhook.png b/content/classes/01-intro/add_webhook.png
new file mode 100644
index 0000000..24dfa05
Binary files /dev/null and b/content/classes/01-intro/add_webhook.png differ
diff --git a/content/classes/01-intro/aps01_part_1.md b/content/classes/01-intro/aps01_part_1.md
new file mode 100644
index 0000000..e124c9b
--- /dev/null
+++ b/content/classes/01-intro/aps01_part_1.md
@@ -0,0 +1,422 @@
+# Standards - Aps01 - Part 1
+
+What is the size of a Data Science team? Considering data analysts, data engineers, data scientists, machine learning engineers, it is not uncommon for the professional count to reach hundreds. Across industries, companies are building [larger data science teams](https://www.statista.com/statistics/1136560/data-scientists-company-employment/) more and more.
+
+So let's assume that the odds are high that you won't work alone on a data team. Imagine if each professional developed their models in a completely different way, without any:
+
+- Language standards
+- Libraries standards (which libraries and which versions)
+- Code organization standards
+- Concerns about the resources needed to deploy the models.
+
+It is certain that this team will have difficulties in generating business value from ML!
+
+In this activity, we will work on producing a **repository template**, defining standars that should be used on future projects. Let's assume that git is used for code versioning.
+
+
+## Accept assignment
+
+All assignments delivery will be made using Git repositories. Access the link below to accept the invitation and start working on the first assignment.
+
+[Invitation link](https://classroom.github.com/a/Gg947NZN){ .ah-button }
+
+!!! important
+ You should have received a new private repository. Copy your repo address below. It will be used in the rest of the guide.
+
+ 
+
+!!! danger "Atention"
+ Please note that **APS 01** is divided into **two assignments**! The link to the second part will be available later in the **part 2** handout!
+
+## Configure assignment repository
+
+The supporting code for this activity is public in the repository [APS 01 MLOps](https://github.com/insper-classroom/mlops-aps01-marketing). In this guide we will configure your private repository to go along with this public repo.
+
+To get started, create a new folder for your delivery repository and initialize an empty repo:
+
+
+
+
+Now let's add the repository of your assignment and send the support code:
+
+!!! danger "Attention!"
+ In the next command, replace `your_private_repo_address` with the **URL** of your repository (SSH or https) created for this part of the activity.
+
+
+
+
+With that you should already have your local repository configured and pointing to two remote repositories:
+
+- **insper**: this repo contains all support code for **aps01**. It is shared across the room and no one is allowed to push it.
+- **aps**: this repo is yours alone and contains your work only. It will have only the modifications made by you.
+
+You can check that everything worked by running `git branch -avv`.
+
+Let's start by downloading the news from the support repository:
+
+
+
+ ```console
+ $ git fetch insper
+ ```
+
+
+
+
+Let's then embed the news in your local repository and push the new files to your private repo.
+
+
+
+
+
+!!! Danger "Important!"
+ Remember to add your **env** folder (`mlops` in the example) to `.gitignore`
+
+## Task 01: Opening
+
+Check the content of the `aps01` repository. Install the notebook package of your preference and open the notebook.
+
+
+
+ ```console
+ $ pip install jupyter
+ ```
+
+
+
+
+You you notice that everything was done in a single notebook. Data proccessing, analysis, model construction, etc.
+
+!!! exercise "Question"
+ Read and execute each command in the `everything.ipynb` notebook, trying to understand the function of each code created by the data scientist.
+
+!!! exercise text long "Question"
+ Explain, in general terms, what is the model predicting?
+
+ !!! answer "Answer"
+ To understant more about the data and model, access the links avaiable at the end of the notebook.
+
+ - https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset
+ - https://www.kaggle.com/code/enesztrk/bank-credit-analysis-classification
+
+!!! exercise text long "Question"
+ Considering the `everything.ipynb` notebook, what is the **target variable** used in training the model?
+
+ !!! answer "Answer"
+ The target variable is `deposit`.
+
+There are those who defend the software production inside notebooks. There is even the area of **NDD** ([Notebook-Driven Development](https://github.com/fastai/nbdev)). It works when done right, but let's stay away from these people and take a more classical approach!
+
+## Task 02: Organizing
+
+Now you must configure the repository according to some standards. Let's create specific folders for each type of resource used in the project.
+
+Think that all the repositories of the company should follow this organization pattern.
+
+!!! exercise "Question"
+ Let's organize the **data** resources. You must:
+
+ 1. Create a folder called `data`
+ 1. Move data files to this folder
+
+!!! exercise "Question"
+ For **notebooks**:
+
+ 1. Create a folder called `notebooks`
+ 1. Move notebook files to this folder
+
+## Task 03: Split notebook code
+
+Every code on this project is on a single notebook. We are going to split it considering the different functionalities provided.
+
+!!! exercise "Question"
+ Now you must:
+
+ 1. Create a folder called `src`
+ 1. Create a file `src/process.py` with all necessary code for data preprocessing. This code can generate a separeted file inside `data`.
+ 1. Create a folder called `models`
+ 1. Create a file `src/train.py` with all necessary code for model training. This code should export models to folder `models`.
+
+Leave in the notebook only code for data exploration.
+
+## Task 04: Prediction
+
+Once the training algorithm, features and hyperparameters have been chosen, the final model to be deployed can be trained with a more complete set of data (and not just `X_train`). We will ignore this fact for now!
+
+Also, when the model is in use (making predictions), the target variable is not needed or does not exist. That is, we need specific data and scripts for prediction.
+
+In this activity, consider that whenever training needs to be redone, there will be a `bank.csv` file with updated data in the `data` folder.
+
+!!! exercise "Question"
+ Let's simulate the prediction data. Now you must:
+
+ 1. Copy the `data/bank.csv` file to a new `data/bank_predict.csv` file. This new file must not have the **target** column
+ 1. Create a file `src/predict.py` with all necessary code for making predictions on file `data/bank_predict.csv`. You should use the **pickle** files of the models.
+ 1. Create a new column `y_pred` on file `data/bank_predict.csv` with the prediction of your model mapped to `"yes"` or `"no"`.
+
+At this point, you have a repository:
+- With well-organized folders
+- With specific code files to train a model
+- With specific code files to use a model to make predictions
+
+## Task 05: Readme
+
+!!! exercise "Question"
+ Create a `README.md` with some basic informations of the project
+
+## Task 06: Dependencies
+
+!!! exercise "Question"
+ Create a `requirements.txt` with all the libs used on the project.
+
+
+!!! exercise choice "Question"
+ Should you set lib versions?
+
+ - [X] Yes
+ - [ ] No
+
+ !!! answer "Answer"
+ In production deployment, it's a good idea to track dependencies to maintain stability and reliability. Besides that, in some companies your will run in a cluster (spark) where all data scientists and machine learning engineers must to use the same library versions.
+
+!!! info "Important!"
+ From now on, we will develop part 02 of APS 01!
+
+## Release APS01 Part 1!
+
+It looks like you have completed the activities for the first part of the APS, so it's time to do the release!
+
+In this APS, we will use an automatic correction server.
+
+### Webhook configuration
+
+Go to the activities repository on GitHub and access the settings (e.g., https://github.com/insper-classroom/24-2-mlops-aps01-pedrods/settings). In the left menu, choose the **Webhooks** option and then the **Add webhook** option.
+
+
+
+You will need to fill in:
+
+- Payload URL: `http://xxxx.com/yyy` **Go to Blackboard to get the URL!**
+- Content type: `application/json`
+- Secret: leave it empty!
+- SSL verification: check `Enable SSL verification`
+- Which events would you like to trigger this webhook?: Choose `"Let me select individual events"` and then:
+ - Check ONLY the OPTION:
+ - `Branch or tag creation`
+ - Uncheck the OPTION:
+ - `Pushes`
+- Finally, leave the `Active` option checked.
+
+
+[...]
+
+
+With this, your repository can now be tested automatically!
+
+### Test Release
+
+With the repository cloned on your machine, open the terminal and launch any tag.
+
+!!! info "Info!"
+ We will (intentionally) launch a tag for a non-existent activity!
+
+Now, open the terminal in the root of the repository and type the following commands:
+
+
+
+
+!!! danger "Attention!"
+ Make sure you are sending the tag to the correct remote.
+
+ === "aps"
+
+ If `git branch -avv` returns something similar to:
+
+
+
+
+### Go to your repository
+Access the issues tab of your repository on GitHub. You should find a response from the test, informing you that the activity does not exist!
+
+
+
+Click on the issue to see an example of automatic feedback.
+
+
+
+!!! info "Info!"
+ The creation of the issue indicates that our tag creations are triggering the test server!
+
+### Now, for real!
+
+!!! tip "Tip!"
+ If you need to create a new tag, increase the last number:
+
+ - `aps1.1.2`
+ - `aps1.1.3`
+
+
+
+
+!!! exercise "To do!"
+ Check the repository for issues!
+
+### Update `README.md`
+
+Now we will update the `README.md` to show the current status of the tests in your repository. Edit your `README.md` and add an API call at the beginning, providing your GitHub username.
+
+!!! danger "Attention!"
+ Access Blackboard to find the API arguments!
+
+```console
+## Status dos testes
+
+
+```
+
+An Example:
+
+
diff --git a/content/classes/01-intro/aps01_part_2.md b/content/classes/01-intro/aps01_part_2.md
new file mode 100644
index 0000000..7155925
--- /dev/null
+++ b/content/classes/01-intro/aps01_part_2.md
@@ -0,0 +1,123 @@
+# Standards - Aps01 - Part 2
+
+Now that we've defined a repository standard, it would be nice to reuse it in new projects.
+
+For that we will use `cookiecutter` to define a **template repository**. Then, when a new ML project is started, we will just use our template to start it.
+
+## Accept assignment
+
+In order to do this, you will need to [**Accept the part 2 of the assignment**](https://classroom.github.com/a/fNFOMYFt). We will use this repository as our ML repository template for new projects.
+
+!!! danger "Atention"
+ Please note that **APS 01** is divided into **two assignments**!
+
+## Task 01: Create a template
+
+Then, clone the repository in you machine and create a folder structure similar to:
+
+
+
+!!! info "Info!"
+ The `.gitkeep` are empty files created to allow empty folders to be in the template
+
+!!! exercise "Question"
+ Create a `README.md` (root directory) with some basic informations of the template repository
+
+!!! exercise "Question"
+ Create the `cookiecutter.json` with the content:
+
+ ```json
+ {
+ "directory_name": "project-name",
+ "author_name": "Your Name",
+ "compatible_python_versions": "^3.8"
+ }
+ ```
+
+!!! exercise "Question"
+ Create the `.gitignore` inside `{ {cookiecutter.directory_name} }` with the files to be ignored by default in future projects.
+
+!!! exercise "Question"
+ Create the `README.md` inside `{ {cookiecutter.directory_name} }` with the default **README** for future projects. Be creative!
+
+ 
+
+ Think about the useful information that is expected to be in the **README** of any and every project!
+
+!!! exercise "Question"
+ Create basic python files in `{ {cookiecutter.directory_name} }/src` folder.
+
+ These files would be generic files, extensively edited by developers who will use your template in many different projects. Just give a basic idea of the pattern you expect data scientists to follow when doing preprocessing or model training.
+
+!!! exercise "Question"
+ You can also leave some notebooks with basic code for exploratory data analysis.
+
+!!! exercise "Question"
+ Commit and push your changes to Github!
+
+
+### Task 02: Testing your template
+
+Install `cookiecutter`:
+
+
+
+Done! It should create the folders and files structure defined in the template.
+
+!!! danger "Remember!"
+ Delivering the assignment is the same as pushing to the `main` branch of your private repository of `aps01`!
+
+## Release APS01 Part 2!
+
+!!! exercise
+ In the repository created for part 2 of the APS, configure the webhook the same way as in part 1.
+
+
+!!! exercise
+ In the repository created for part 2 of the APS, configure the webhook the same way as in part 1.
+
+!!! exercise
+ In the repository for part 2, submit the activity by creating a `aps1.2.x` tag, replacing `x` with any number.
+
+
+
+ ```console
+ $ git tag -a aps1.2.1 -m "release my template for autograding"
+
+ $ git push origin aps1.2.1
+ ```
+
+
+
+
+!!! exercise
+ Check the repository for issues and solve then!
+
+!!! exercise
+ Check the test status API.
+
+ Feel the Dopamine Release as you check that both APS1 activities are **green** and passing the tests!
+
+ 
+
+!!! info "Important!"
+ Each part of APS1 is configured in an all-or-nothing format. Each part contributes `5` to the grade (only if passed all tests for that part). Therefore, the possible grades are `0`, `5` (only one part completed), or `10`.
+
+## References
+- https://cookiecutter.readthedocs.io/en/1.7.2/README.html
\ No newline at end of file
diff --git a/content/classes/01-intro/conf_webhook_1.png b/content/classes/01-intro/conf_webhook_1.png
new file mode 100644
index 0000000..56b70d5
Binary files /dev/null and b/content/classes/01-intro/conf_webhook_1.png differ
diff --git a/content/classes/01-intro/conf_webhook_2.png b/content/classes/01-intro/conf_webhook_2.png
new file mode 100644
index 0000000..4e390d5
Binary files /dev/null and b/content/classes/01-intro/conf_webhook_2.png differ
diff --git a/content/classes/01-intro/intro.md b/content/classes/01-intro/intro.md
new file mode 100644
index 0000000..508f2f4
--- /dev/null
+++ b/content/classes/01-intro/intro.md
@@ -0,0 +1,67 @@
+# Introduction
+
+## What is MLOps?
+
+**MLOps** refers to **Machine Learning Operations**, which is a core component of Machine Learning engineering. MLOps focuses on optimizing the process of deploying machine learning models into a production environment as well as sustaining and tracking those models once live.
+
+The goal of MLOps is to automate and standardize machine learning workflows to improve efficiency, productivity, and model performance monitoring. To achieve that, it demands collaboration between data scientists, machine learning engineers, IT teams, etc.
+
+in essence:
+
+!!! note "MLOps is all about..."
+ Taking Machine Learning Models to **production**!
+
+Lets rewrite that in a non technical way!
+
+!!! example "MLOps is all about..."
+ Provide ways for ML to generate **value to the business** on an ongoing basis!
+
+## The Machine Learning Life Cycle
+
+!!! exercise choice "Question"
+ ML projects only involve data processing and model training.
+
+ - [ ] True
+ - [X] False
+
+ !!! answer "Answer"
+ It is normal to think that ML projects only involve data processing and model training. But there are others relevant concerns to take in consideration!
+
+Lets take a look at the ML Life Cycle!
+
+
+
+The machine learning life cycle begins with **planning**. At this initial stage, objectives are defined to identify the specific problem or task that machine learning could help solve. This involves understanding the business or project goals and what type of insights or predictions are needed from deployed models. Careful planning at the start helps set clear expectations, avoiding wasting efforts on projects with inadequate data or unrealistic goals.
+
+Is also import to do **data collection and preprocessing**. This involves gathering large amounts of data from various sources such as databases, websites, APIs, and sensor readings. The data needs to be properly formatted, cleaned of any anomalies or inconsistencies, and have missing values imputed. Features also need to be extracted or engineered from the raw data. This processed data is then used to build machine learning models.
+
+Once the data is prepared, the next step is to divide it into training and test sets. The training set is used to **develop machine learning models** by exposing various algorithms to patterns in the labeled data. Popular algorithms such as decision trees, random forests, support vector machines, neural networks, etc. are applied to automatically learn from the training examples. Their parameters are tuned through iterative training to minimize error and optimize performance. The test set is used to evaluate how well the trained models generalize to new unseen data. The top performing models are selected for deployment.
+
+After a model has been selected, it needs to be **deployed** as a machine learning application or service. It can be integrated into existing software systems and utilized to make predictions on live data streams in a **production environment**. Once in operation, its performance also needs to be continuously **monitored** for accuracy and drifts over time. The data collection and model development cycles may need to be revisited to keep improving the capabilities of the learning systems. This marks the completion of one iteration of the machine learning life cycle.
+
+Take a look at a more realistic picture of ML life cycle inside an average organization today:
+
+
+
+It involves many different people with completely different skill sets and who are often using entirely different tools.
+
+!!! danger ""
+ In this course, we will focus less on model development, assuming that this is already covered in other Insper courses, and **more** on **model deploying and monitoring**.
+
+## References
+
+- Image The Machine Learning Life Cycle: https://images.datacamp.com/image/upload/v1664812812/Machine_Learning_Lifecycle_2ffa5897a7.png
+- Introducing MLOps. Chapter 1.
+- Practical MLOps. Chapter 1.
+- POE and ChatGPT.
+
+
\ No newline at end of file
diff --git a/content/classes/01-intro/issue_test.png b/content/classes/01-intro/issue_test.png
new file mode 100644
index 0000000..9698e8a
Binary files /dev/null and b/content/classes/01-intro/issue_test.png differ
diff --git a/content/classes/01-intro/issues_list.png b/content/classes/01-intro/issues_list.png
new file mode 100644
index 0000000..8916a46
Binary files /dev/null and b/content/classes/01-intro/issues_list.png differ
diff --git a/content/classes/01-intro/ml_lifecycle.png b/content/classes/01-intro/ml_lifecycle.png
new file mode 100644
index 0000000..f6cb029
Binary files /dev/null and b/content/classes/01-intro/ml_lifecycle.png differ
diff --git a/content/classes/01-intro/readme_ex.png b/content/classes/01-intro/readme_ex.png
new file mode 100644
index 0000000..a4cdf14
Binary files /dev/null and b/content/classes/01-intro/readme_ex.png differ
diff --git a/content/classes/01-intro/realistic_lifecycle.png b/content/classes/01-intro/realistic_lifecycle.png
new file mode 100644
index 0000000..703b9be
Binary files /dev/null and b/content/classes/01-intro/realistic_lifecycle.png differ
diff --git a/content/classes/01-intro/repo_ex.png b/content/classes/01-intro/repo_ex.png
new file mode 100644
index 0000000..dbd8f6f
Binary files /dev/null and b/content/classes/01-intro/repo_ex.png differ
diff --git a/content/classes/01-intro/template_folder.png b/content/classes/01-intro/template_folder.png
new file mode 100644
index 0000000..70919b2
Binary files /dev/null and b/content/classes/01-intro/template_folder.png differ
diff --git a/content/classes/01-intro/test_status.svg b/content/classes/01-intro/test_status.svg
new file mode 100644
index 0000000..5f69205
--- /dev/null
+++ b/content/classes/01-intro/test_status.svg
@@ -0,0 +1,23 @@
+
\ No newline at end of file
diff --git a/content/classes/01-intro/test_status_pass.svg b/content/classes/01-intro/test_status_pass.svg
new file mode 100644
index 0000000..dd9d0f6
--- /dev/null
+++ b/content/classes/01-intro/test_status_pass.svg
@@ -0,0 +1,23 @@
+
\ No newline at end of file
diff --git a/content/classes/02-api/api.png b/content/classes/02-api/api.png
new file mode 100644
index 0000000..f2b97ff
Binary files /dev/null and b/content/classes/02-api/api.png differ
diff --git a/content/classes/02-api/api_deploy.md b/content/classes/02-api/api_deploy.md
new file mode 100644
index 0000000..a4c78fa
--- /dev/null
+++ b/content/classes/02-api/api_deploy.md
@@ -0,0 +1,503 @@
+# Model Deployment
+
+## Categories of Model Deployment
+
+A core choice you'll need to determine that will impact both your customer base and the engineers constructing your solution is how it **computes** and **provides its forecasts** to consumers: **online** or in **batches**.
+
+
+
+**Online** prediction is when predictions are generated and returned as soon as *requests* for these predictions are received by the service. This baseline selection on synchronous or asynchronous predictions will shape many subsequent designing decisions.
+
+The main advantege of **online** prediction is that it makes it easier to provide a **real-time user experience**. Suppose that you deployed an AI model for customer claiming and that the model makes predictions for all customers overnight. During the day, things like sending messages to the call center (indicating that the customer is dissatisfied) or placing new orders (perhaps indicating that the customer is satisfied) may happen. Making the prediction closer to when it is needed makes possible the use of **newer information** and maybe return a more reliable and **valuable forecast**.
+
+When online prediction is the choice to deploy a model, it is generally made available to other applications through **API calls**. In this handout, we are going to build an API to make predictions using the model from the last class.
+
+## When is this decision made?
+
+Remeber the ML lifecycle from last class:
+
+
+
+!!! exercise choice "Question"
+ In which of these phases should the decision to deploy in **batch** or **online** be made?
+
+ - [X] Plan
+ - [ ] Operations
+ - [ ] Model Deployment
+ - [ ] Data preparation
+ - [ ] Model Evaluation
+
+ !!! answer "Answer!"
+ This decision depends on how the model will be used. It is usually possible to have a vision of this during the planning phase. It's something that can be rethought and changed, but generally knowing the company's problem (and the target variable to be predicted) already gives us an idea which style of deployment will generate more value for the business.
+
+## What are APIs?
+
+APIs (Application Programming Interfaces) allow developers to access data and services. They enable platforms, applications and systems to connect and interact with each other.
+
+
+
+You can use APIs to:
+
+- Transcribe audio using Google API
+- Make an App that interact with ChatGPT
+- Let an App send data to your ML models make and return predictions
+- And so forth!
+
+Some usefull links:
+
+- https://www.redhat.com/en/topics/api/what-are-application-programming-interfaces
+- https://aws.amazon.com/pt/what-is/api/
+
+## Build an API
+
+To construct out API, we are using **FastAPI**. Follow the handout steps and also make use of the official tutorial available at [FastAPI](https://fastapi.tiangolo.com/tutorial/).
+
+!!! info "Tip!"
+ Create a repository (public or private) in your own github account to store your API.
+
+ It is not necessary to submit the activity for this class.
+
+!!! danger ""
+ Use the environment (**conda** or **venv**) from the last class or create a new one for this class!
+
+### Install libs
+
+Let's install the necessary libraries:
+
+
+
+### A simple API
+
+Copy and paste this code in the `src/main.py` file:
+
+```python
+from fastapi import FastAPI
+
+app = FastAPI()
+
+@app.get("/")
+async def root():
+ return "Model API is alive!"
+
+```
+
+Inside `src`, start the api with the command:
+
+
+
+
+To test, go to **http://localhost:8900** in your browser!
+
+One of the wonders of fastapi is the availability of documentation. Go to **http://localhost:8900/docs** in your browser. You will see something like:
+
+
+
+Click on **"Try it out"**!
+
+## An API that makes predictions
+
+In the root folder of today's class, create a new folder called `models`:
+
+
+
+ ```console
+ $ mkdir models
+ ```
+
+
+
+We are going to store in this folder the pickle of the models trained in the last class.
+
+!!! exercise
+ Copy the `ohe.pkl` and `model.pkl` files from the last class activity to the models folder of today's class.
+
+Now, your folder should have the following structure:
+
+
+
+Then, create the `src/model.py` file. In addition to importing the necessary libraries, this file must have two functions that open (and return) the models contained in the `ohe.pkl` and `model.pkl` files.
+
+```python
+def load_model():
+ # Your code here
+ pass
+
+
+def load_encoder():
+ # Your code here
+ pass
+```
+
+!!! danger ""
+ These functions will be imported and used in `main.py`!
+
+Here it is a new (and incomplete) version of `main.py`:
+
+```python
+from fastapi import FastAPI
+
+# loader functions that you programmed!
+from model import load_model, load_encoder
+
+
+app = FastAPI()
+
+
+@app.get("/")
+async def root():
+ """
+ Route to check that API is alive!
+ """
+ return "Model API is alive!"
+
+
+@app.post("/predict")
+async def predict():
+ """
+ Route to make predictions!
+ """
+ # Load the models
+ ohe = load_encoder()
+ model = load_model()
+
+ return {"prediction": "I can almost make predictions!"}
+```
+
+!!! exercise "Question"
+z
+
+To make predictions, the `predict` route needs to receive information about the client (**person**). When analyzing a row of the `X` table from the last class (before applying the encoder), an example of the necessary features using *JSON* format would be:
+
+```json
+{
+ "age": 42,
+ "job": "entrepreneur",
+ "marital": "married",
+ "education": "primary",
+ "balance": 558,
+ "housing": "yes",
+ "duration": 186,
+ "campaign": 2
+}
+```
+
+Let's represent the person/customer information using a class identified as **"Person"**. Here's an example with the first two fields:
+
+```python
+from pydantic import BaseModel
+
+class Person(BaseModel):
+ age: int
+ job: str
+```
+
+Now we can update the `predict` route to receive a person's information!
+
+```python
+@app.post("/predict")
+async def predict(person: Person):
+ """
+ Route to make predictions!
+ """
+ ohe = load_encoder()
+ model = load_model()
+
+ df_person = pd.DataFrame([person.dict()])
+
+ person_t = ohe.transform(df_person)
+ pred = model.predict(person_t)[0]
+
+ return {"prediction": str(pred)}
+```
+
+!!! exercise
+ Complete the remaining fields of the **Person** class in the `main.py` file. Remember to import BaseModel at the beginning of `main.py`!
+
+Return to **http://localhost:8900/docs** in your browser and test the `predict` route, adding the JSON content you saw earlier!
+
+
+
+!!! help "Tip!"
+ Deploying many **online** ML systems is conceptually simpler since the records to be scored can be distributed between several machines using a load balancer. But this is a problem for another day!
+
+### Improve route with example!
+
+Let's add an example to the code so that the documentation is already pre-populated with an example, making it easier for the user to test the route.
+
+```python
+from typing import Annotated
+from fastapi import FastAPI, Body
+
+@app.post("/predict")
+async def predict(
+ person: Annotated[
+ Person,
+ Body(
+ examples=[
+ {
+ "age": 42,
+ "job": "entrepreneur",
+ "marital": "married",
+ "education": "primary",
+ "balance": 558,
+ "housing": "yes",
+ "duration": 186,
+ "campaign": 2,
+ }
+ ],
+ ),
+ ],
+):
+ """
+ Route to make predictions!
+ """
+ ohe = load_encoder()
+ model = load_model()
+
+ person_t = ohe.transform(pd.DataFrame([person.dict()]))
+ pred = model.predict(person_t)[0]
+
+ return {"prediction": str(pred)}
+```
+
+### Call API from Python!
+
+If another application needs access to the API, it can simply make a request.
+
+See an example using route `/` (check if is alive):
+
+```python
+import requests as req
+
+print(req.get("http://localhost:8900/").text)
+```
+
+And for the `predict` route:
+
+```python
+import requests as req
+
+data = {
+ "age": 42,
+ "job": "entrepreneur",
+ "marital": "married",
+ "education": "primary",
+ "balance": 558,
+ "housing": "yes",
+ "duration": 186,
+ "campaign": 2,
+}
+
+resp = req.post("http://localhost:8900/predict", json=data)
+print(f"Status code: {resp.status_code}")
+print(f"Response: {resp.text}")
+```
+
+### Add Authentication
+
+Without proper authentication, APIs would be vulnerable to unnecessary access attempts and even malicious attacks from unauthorized parties.
+
+For simplicity, let's assume there is only one valid token (`"abc123"`) as the full implementation of authentication would need database access and caching for performance.
+
+Let's add a **dependency** to the `predict` route. When the route is called, the function that resolves the dependency will extract the token from the header check if it is valid:
+
+The function and the route (I removed the example for simplicity):
+
+```python
+def get_username_for_token(token):
+ if token == "abc123":
+ return "pedro1"
+ return None
+
+async def validate_token(credentials: HTTPAuthorizationCredentials = Depends(bearer)):
+ token = credentials.credentials
+
+ username = get_username_for_token(token)
+ if not username:
+ raise HTTPException(status_code=401, detail="Invalid token")
+
+ return {"username": username}
+
+@app.post("/predict")
+async def predict(person: Person,
+ user=Depends(validate_token)
+ ):
+ # Code supressed
+ pass
+```
+
+The full code is:
+```python
+from fastapi import FastAPI, HTTPException, Depends, Body
+from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
+from pydantic import BaseModel
+from typing import Annotated
+from model import load_model, load_encoder
+import pandas as pd
+
+app = FastAPI()
+
+bearer = HTTPBearer()
+
+def get_username_for_token(token):
+ if token == "abc123":
+ return "pedro1"
+ return None
+
+
+async def validate_token(credentials: HTTPAuthorizationCredentials = Depends(bearer)):
+ token = credentials.credentials
+
+ username = get_username_for_token(token)
+ if not username:
+ raise HTTPException(status_code=401, detail="Invalid token")
+
+ return {"username": username}
+
+class Person(BaseModel):
+ age: int
+ job: str
+ marital: str
+ education: str
+ balance: int
+ housing: str
+ duration: int
+ campaign: int
+
+@app.get("/")
+async def root():
+ return "Model API is alive!"
+
+@app.post("/predict")
+async def predict(
+ person: Annotated[
+ Person,
+ Body(
+ examples=[
+ {
+ "age": 42,
+ "job": "entrepreneur",
+ "marital": "married",
+ "education": "primary",
+ "balance": 558,
+ "housing": "yes",
+ "duration": 186,
+ "campaign": 2,
+ }
+ ],
+ ),
+ ],
+ user=Depends(validate_token),
+):
+ ohe = load_encoder()
+ model = load_model()
+
+ person_t = ohe.transform(pd.DataFrame([person.dict()]))
+ pred = model.predict(person_t)[0]
+
+ return {
+ "prediction": str(pred),
+ "username": user["username"]
+ }
+
+```
+
+#### Python
+
+Call the API using Bearer Token Authentication from Python:
+
+```python
+import requests as req
+import time
+
+token = "abc123"
+
+headers = {"Authorization": f"Bearer {token}"}
+
+data = {
+ "age": 42,
+ "job": "entrepreneur",
+ "marital": "married",
+ "education": "primary",
+ "balance": 558,
+ "housing": "yes",
+ "duration": 186,
+ "campaign": 2,
+}
+
+resp = req.post("http://localhost:8900/predict",
+ json=data,
+ headers=headers)
+
+print(resp.status_code)
+print(resp.text)
+```
+
+!!! exercise
+ Edit and run the code above. Try a **valid** and an **invalid** token!
+
+### Loading Models at Startup
+
+A performance issue with AI APIs is the time required to open models. Notice that the way we did it, the model is opened every time the `predict` wheel is called.
+
+We can configure so that models are opened when the API starts:
+
+```python
+ml_models = {}
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+ ml_models["ohe"] = load_encoder()
+ ml_models["models"] = load_model()
+ yield
+ ml_models.clear()
+
+
+app = FastAPI(lifespan=lifespan)
+```
+
+So, the predict route would now have:
+```python
+ ohe = ml_models["ohe"]
+ model = ml_models["models"]
+```
+
+Rather than:
+```python
+ ohe = load_encoder()
+ model = load_model()
+```
+
+Especially for larger models, this can represent a good performance improvement.
+
+That is all for today!
+
+## References
+- Image: https://www.redhat.com/rhdc/managed-files/styles/wysiwyg_full_width/private/API-page-graphic.png?itok=RRsvST-
+- Introduction to MLOps. Chapter 6.
+- Designing Machine Learning Systems. Chapter 7.
+
+
diff --git a/content/classes/02-api/api_docs.png b/content/classes/02-api/api_docs.png
new file mode 100644
index 0000000..e0cbdee
Binary files /dev/null and b/content/classes/02-api/api_docs.png differ
diff --git a/content/classes/02-api/folders_v1.png b/content/classes/02-api/folders_v1.png
new file mode 100644
index 0000000..67056a0
Binary files /dev/null and b/content/classes/02-api/folders_v1.png differ
diff --git a/content/classes/02-api/online_vs_batch.png b/content/classes/02-api/online_vs_batch.png
new file mode 100644
index 0000000..f43d94b
Binary files /dev/null and b/content/classes/02-api/online_vs_batch.png differ
diff --git a/content/classes/02-api/try_predict.png b/content/classes/02-api/try_predict.png
new file mode 100644
index 0000000..ef94beb
Binary files /dev/null and b/content/classes/02-api/try_predict.png differ
diff --git a/content/classes/03-batch/api_predict.png b/content/classes/03-batch/api_predict.png
new file mode 100644
index 0000000..378bf62
Binary files /dev/null and b/content/classes/03-batch/api_predict.png differ
diff --git a/content/classes/03-batch/aps02_sql.md b/content/classes/03-batch/aps02_sql.md
new file mode 100644
index 0000000..5656756
--- /dev/null
+++ b/content/classes/03-batch/aps02_sql.md
@@ -0,0 +1,147 @@
+# APS 02
+
+In this assignment, we are going to create a new version of the work from the [last class (Click!)](practicing.md).
+
+!!! info "What will change?!"
+ All data will be read and written from PostgreSQL.
+
+## Before starting
+
+### Accept assignment
+
+All assignments delivery will be made using Git repositories. Access the link below to accept the invitation and start working on the second assignment.
+
+[Invitation link](https://classroom.github.com/a/HjoSiRHT){ .ah-button }
+
+## Configure assignment repository
+
+You must ensure that the repository has the required folder structure. Create by hand or use your template and then link to the assignment repository.
+
+
+
+## TASK 0: Create analytical table
+
+In DBeaver, try to create a *SQL query* that performs the necessary transformations in the data of the `item_sale` table, so that they are grouped and with the necessary fields for *model training*, as in the last class.
+
+| | store_id | total_sales | year | month | day | weekday |
+|-----:|-----------:|--------------:|-------:|--------:|------:|----------:|
+| 0 | 5000 | 62895.6 | 2023 | 1 | 1 | 6 |
+| 1 | 5000 | 42351.1 | 2023 | 1 | 2 | 0 |
+...
+| 1636 | 5005 | 46246.3 | 2023 | 9 | 29 | 4 |
+
+!!! tip "Tip!"
+ Start with a simple query like:
+
+ ```sql
+ SELECT
+ store_id,
+ client_id,
+ product_id,
+ date_sale,
+ price
+ FROM sales.item_sale;
+ ```
+
+ Then make the necessary adjustments.
+
+!!! tip "Tip!"
+ At first, use **fixed dates**, for example, assuming your model will use data from `2023-06-01` to `2023-06-30`.
+
+!!! attention
+ Notice that the query must do all necessary transformations on the data.
+
+!!! exercise "Question"
+ The current day (today) will not have reliable base sales as they are still being made by customers. Adjust your query so that the data is:
+
+ - From the day before today
+ - Up to a `delta` in the past. Choose your `delta`, for example one years.
+
+!!! danger "Important!"
+ Now "today" cannot be fixed in the query anymore, google for `CURRENT_DATE` + postgres and make the necessary adjustments!
+
+## TASK 1: Query file for analytical table
+
+We will no longer have *CSV* in the `data` folder!
+
+!!! exercise "Question"
+ Create an `.env` file with the database access credentials.
+
+
+!!! exercise "Question"
+ Create a `data/train.sql` file containing the query from the previous task.
+
+!!! exercise "Question"
+ Change your program so that:
+
+ - The `data/train.sql` file is read as text.
+ - Database credencials are loaded from `.env` file.
+ - Text query is executed to return a Pandas DataFrame in the model retraining step.
+
+!!! info
+ Search how to:
+
+ - Create a PostgreSQL database connection in Python.
+ - How to read a Pandas DataFrame from a query and connection.
+
+!!! warning "For now..."
+ For now, keep saving and reading the model in the `models` folder!
+
+!!! exercise text long "Question"
+ Why should we avoid using `*` in production queries?
+
+ Explain why making queries like this one is a bad practice:
+
+ ```sql
+ SELECT * FROM som_table
+ ```
+
+## TASK 2: Exporting predictions
+
+The way data predicted by the model are saved depends a lot on how the model is used.
+
+In this activity, let's assume that:
+
+- The model always makes predictions from the current day to the next six days.
+- The predictions are stored in another `schema` called `sales_analytics`.
+- Old predictions are not stored. Whichever prediction script is run, the table that stores predictions should be cleared and only the predictions for the day through the next week should be kept in the table.
+
+!!! bug "Challenge"
+ Try to create a query that generates a table with all the days and fields needed for prediction:
+
+ 
+
+ Then, save these lines in a new table `"scoring_ml_YOUR_INSPER_USERNAME"` on schema `sales_analytics`. In Python, iterate over these records calling and storing your model's predictions!
+
+ !!! danger ""
+ Remember to delete old records from this table every time you make predictions.
+
+ !!! info "Important!"
+ When running the prediction, you'll need the data in the same format as the training.
+
+ | | store_id | year | month | day | weekday |
+ |---:|-----------:|-------:|--------:|------:|----------:|
+ | 0 | 5000 | 2023 | 8 | 2 | 2 |
+ | 0 | 5000 | 2023 | 8 | 3 | 3 |
+ | 0 | 5001 | 2023 | 8 | 2 | 2 |
+ | 0 | 5001 | 2023 | 8 | 3 | 3 |
+
+ You can build the previous table already in the expected format (with columns of day, month, year, weekday) or just leave it with the complete date and make transformations (using Python or SQL) just before making the predictions.
+
+!!! exercise "Question"
+ Change your prediction code to meet the requested requirements.
+
+
+## TASK 3: A new View!
+
+Let's pass the code from `data/train.sql` to a view on the database.
+
+!!! exercise "Question"
+ Create a view with the contents of `data/train.sql`. The view can be called `view_abt_train_YOUR_INSPER_USERNAME` an be on schema `sales_analytics`.
+
+!!! exercise "Question"
+ Change the query in `data/train.sql` to query the newly created view.
+
+ DO NOT use `"SELECT *"`!!!
+
+Submit the activity by the [**deadline**](../../deadlines.md)!
\ No newline at end of file
diff --git a/content/classes/03-batch/csv.png b/content/classes/03-batch/csv.png
new file mode 100644
index 0000000..cba13cb
Binary files /dev/null and b/content/classes/03-batch/csv.png differ
diff --git a/content/classes/03-batch/data_formats.md b/content/classes/03-batch/data_formats.md
new file mode 100644
index 0000000..84b7c25
--- /dev/null
+++ b/content/classes/03-batch/data_formats.md
@@ -0,0 +1,264 @@
+# Data Formats
+
+## JSON
+
+**JSON** (*JavaScript Object Notation*) is an open standard and language-independent file format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. For example:
+
+```json
+[
+ {
+ "age": 42,
+ "job": "entrepreneur",
+ "marital": "married",
+ "education": "primary",
+ "balance": 558,
+ "housing": "yes",
+ "duration": 186,
+ "campaign": 2
+ },
+ {
+ "age": 35,
+ "job": "teacher",
+ "marital": "single",
+ "education": "master's",
+ "balance": 1200,
+ "housing": "no",
+ "duration": 95,
+ "campaign": 1
+ },
+ {
+ "age": 28,
+ "job": "engineer",
+ "marital": "single",
+ "education": "bachelor's",
+ "balance": 3000,
+ "housing": "yes",
+ "duration": 240,
+ "campaign": 3
+ }
+]
+```
+
+## CSV
+
+CSV (*comma*-separated values) and TSV (*tab*-separated values) files are common types of plain text data files that are used to store **tabular data**. They provide a simple, lightweight and human readable format for exporting data.
+
+The data in CSV and TSV files is organized into records or rows that are separated by newline characters. Each row contains one or more fields or values that are separated by delimiter characters.
+
+
+
+Many programming languages and software tools (like *pandas*) provide functionality for easily reading and writing CSV and TSV files for routine data import/export tasks without the complexity of a full database.
+
+!!! tip "Tip!"
+ To Highlight CSV and TSV files in *VS Code*, install the [**Rainbow CSV**](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) extension!
+
+While CSV (Comma-Separated Values) is a commonly used data format, it may have some disadvantages when used in machine learning scenarios:
+
+- **Lack of Standardized Schema**: CSV files do not have a standardized schema definition. Each CSV file represents data as rows and columns, but the interpretation of the columns and their data types is left to the user or application. Without a standardized schema, it becomes the responsibility of the Data Scientist to define and enforce the schema, which can lead to inconsistencies and data quality issues.
+
+- **Limited Data Type Support**: CSV only supports a few basic data types, such as strings and numbers. More complex data types commonly used in ML, such as datetime, categorical variables, or nested structures, require additional processing and transformation to fit into the CSV format. This can increase the complexity of data preprocessing and introduce potential errors or loss of information during conversion.
+
+- **Lack of Data Compression**: CSV files do not provide built-in data compression. As a result, they can occupy a lot of disk space.
+
+!!! Exercise text short "Question"
+
+ In ML, where large datasets are common, give some examples where increased storage requirements can become a concern.
+
+ !!! answer "Answer"
+ Big data or distributed systems.
+
+- **String Encoding Issues**: CSV files can encounter encoding issues, especially when dealing with non-ASCII characters or different encoding standards. Proper handling of string encoding is necessary to ensure data integrity in ML workflows.
+
+- **Limited Support for Missing or Null Values**: CSV does not have a standardized representation for missing or null values. Different applications or tools may handle missing values differently, leading to inconsistencies in data interpretation.
+
+- **Performance Overhead**: CSV files can have performance overhead, especially when dealing with large datasets. Parsing and processing CSV files can be slower compared to more optimized binary formats specifically designed for ML, such as Parquet or HDF5. The textual representation and parsing of CSV files can impact data loading and processing times, particularly in scenarios involving real-time or high-throughput ML.
+
+- **Lack of Metadata Support**: CSV format does not provide native support for metadata. In ML, metadata such as feature descriptions, data provenance, or annotations can be crucial for understanding and interpreting the data.
+
+## Parquet
+
+[Parquet](https://parquet.apache.org/docs/file-format/) files are a **columnar storage** file format designed for efficient data storage and processing in big data and machine learning applications. Unlike row-based formats like **CSV** or **JSON**, Parquet organizes data column-wise, allowing for *better compression*, *faster query execution*, and *improved performance*.
+
+
+
+This columnar storage format reduces disk I/O and memory footprint by reading only the necessary columns during ML model training or data analysis. Additionally, Parquet supports *predicate pushdown*, which means filtering operations can be pushed down to the storage layer, minimizing the amount of data that needs to be read.
+
+Parquet files are particularly relevant in ML because of their ability to handle *large datasets* efficiently. ML models often require massive amounts of data for training, and Parquet's compression and query optimization capabilities can significantly reduce storage costs, improve processing speed, accelerate model training, and enhance overall performance.
+
+!!! attention "To notice!"
+ Parquet is valuable format in the ML ecosystem. Click [*Here*](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Read_data_various_sources/Pandas%20CSV%20vs.%20PyArrow%20parquet%20reading%20speed.ipynb) to see a Pandas CSV vs. PyArrow parquet reading speed performance comparison.
+
+!!! tip "Tip!"
+ To explore Parquet files, you can Install [**Microsoft Data Wrangler**](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler) extension!
+
+
+## Extra: protobuf
+
+!!! info "A word from protobuf developers!"
+ Protocol Buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data. It’s like JSON, except it’s smaller and faster
+
+Although protobuf is mainly used for communication between services and defining APIs in systems that need high performance, it is an interesting read, especially since we talked about JSON in this class.
+
+Some links about **Protocol Buffers**:
+
+- https://protobuf.dev/
+- https://protobuf.dev/getting-started/pythontutorial/
+- https://lab.wallarm.com/what/protobuf-vs-json/
+- https://auth0.com/blog/beating-json-performance-with-protobuf/
+- https://buf.build/blog/the-real-reason-to-use-protobuf
+
+
+## Exercises
+
+In this exercise section, we will compare reading different file formats using `pandas`.
+
+!!! exercise "Question"
+ Create a folder to store the files for the next exercises.
+
+!!! exercise "Question"
+ Download the following data files:
+
+ - https://mlops-material.s3.us-east-2.amazonaws.com/data_formats/ex_data.csv
+ - https://mlops-material.s3.us-east-2.amazonaws.com/data_formats/ex_data.json
+ - https://mlops-material.s3.us-east-2.amazonaws.com/data_formats/ex_data.parquet
+
+
+
+!!! info "Info!"
+ Understanding the contents of the files is not important for this activity. But feel free to explore if you want!
+
+!!! exercise "Question"
+ Consider the file `ex_data.json`. Use `pandas` to read it into a DataFrame.
+
+ Repeat the procedure `1000` times, storing in a list the time used for each reading.
+
+ !!! answer "Answer"
+
+ ```python
+ import pandas as pd
+
+ import time
+
+ read_times_json = []
+ for i in range(1000):
+ start_time = time.time()
+ df2 = pd.read_json("ex_data2.json")
+ end_time = time.time()
+ read_times_json.append(end_time - start_time)
+ ```
+
+!!! exercise "Question"
+ Repeat the previous exercise, now for the files `ex_data.csv` and `ex_data.parquet`.
+
+ At the end, you should have three lists with the times for each format.
+
+ !!! answer "Answer"
+
+ **CSV**:
+
+ ```python
+ import pandas as pd
+
+ import time
+
+ read_times_csv = []
+ for i in range(1000):
+ start_time = time.time()
+ df2 = pd.read_csv("ex_data2.csv")
+ end_time = time.time()
+ read_times_csv.append(end_time - start_time)
+ ```
+
+ **Parquet**:
+
+
+ ```python
+ import pandas as pd
+
+ import time
+
+ read_times_parquet = []
+ for i in range(1000):
+ start_time = time.time()
+ df2 = pd.read_parquet("ex_data2.parquet")
+ end_time = time.time()
+ read_times_parquet.append(end_time - start_time)
+ ```
+
+!!! exercise "Question"
+ Create a DataFrame with three columns, where each column has the reading times for each format.
+
+ Then, calculate the descriptive statistics for each column of the DataFrame.
+
+ !!! answer "Answer"
+
+ ```python
+ df_times = pd.DataFrame(
+ {
+ "json": read_times_json,
+ "csv": read_times_csv,
+ "parquet": read_times_parquet,
+ }
+ )
+
+ df_times.describe()
+ ```
+
+!!! exercise "Question"
+ Use `plotly express` to plot three boxplots, comparing the times of the formats.
+
+ !!! answer "Answer"
+
+ **Install:**
+
+
+
+ **Python code:**
+
+ ```python
+ import plotly.express as px
+
+ fig = px.box(df_times, y=df_times.columns, title="Boxplot of Read Times")
+ fig.update_layout(xaxis_title="Data Formats", yaxis_title="Time")
+ fig.show()
+ ```
+
+!!! exercise text long "Question"
+ What are your conclusions?
+
+!!! info "Important!"
+ We used **pandas** to perform this test. If other libraries (such as **Spark**) are considered, it is expected that the use of `parquet` will be even more beneficial.
+
+ For example, when performing **column and row filters**, **pandas** generally needs to **read the entire file** into memory first before it can apply the filter.
+
+ On the other hand, **Apache Spark** does not need to read the entire `parquet` file to filter by a column, thanks to its ability to perform predicate pushdown and the efficient structure of the `parquet` format.
+
+!!! exercise text long "Question"
+ What **predicate pushdown** means?!
+
+ !!! answer "Answer"
+ Read [Here](./#parquet:~:text=Additionally%2C%20Parquet%20supports-,predicate%20pushdown,-%2C%20which%20means%20filtering).
+
+ When you run a query that includes a filter (predicate), predicate pushdown attempts to "push" the filter as close as possible to the data source, so that the filtering is done as early as possible in the data processing pipeline. This reduces the amount of data that is read, processed, and transferred between different stages of the system.
+
+## References
+
+- Introduction to MLOps. Chapter 6.
+- Designing Machine Learning Systems. Chapter 7.
+- https://parquet.apache.org/docs/file-format/
+- https://en.wikipedia.org/wiki/JSON
+- POE
\ No newline at end of file
diff --git a/content/classes/03-batch/data_predict_output.png b/content/classes/03-batch/data_predict_output.png
new file mode 100644
index 0000000..454ba15
Binary files /dev/null and b/content/classes/03-batch/data_predict_output.png differ
diff --git a/content/classes/03-batch/db_tool.md b/content/classes/03-batch/db_tool.md
new file mode 100644
index 0000000..93cfd37
--- /dev/null
+++ b/content/classes/03-batch/db_tool.md
@@ -0,0 +1,27 @@
+# Database tool
+
+## Install
+
+**pgAdmin** and **DBeaver** are two popular open-source database administration and management tools for working with SQL databases like PostgreSQL. It is enough to install only one of them.
+
+!!! atention
+ These are the software recommended by the professor, but you can install or use another one that you already know.
+
+- [*DBeaver - Click to Download!*](https://dbeaver.io/download/)
+
+
+
+- [*pgAdmin - Click to Download!*](https://www.pgadmin.org/download/)
+
+
+
+## Create connection
+
+After installing, create a connection to the database.
+
+!!! info
+ **Credentials** are available on Blackboard!
+
+## Schema
+
+Check the `sales` schema. It contains an `item_sale` table with the lines like the ones contained in the *CSV* that served as input in our activity from the last class.
\ No newline at end of file
diff --git a/content/classes/03-batch/dbeaver.png b/content/classes/03-batch/dbeaver.png
new file mode 100644
index 0000000..8954dd7
Binary files /dev/null and b/content/classes/03-batch/dbeaver.png differ
diff --git a/content/classes/03-batch/dot_env.md b/content/classes/03-batch/dot_env.md
new file mode 100644
index 0000000..ca5be5b
--- /dev/null
+++ b/content/classes/03-batch/dot_env.md
@@ -0,0 +1,68 @@
+# `.env` File
+
+A `.env` is a **environment file**, a plain text file commonly used in software development projects. It serves as a configuration file that stores environment variables.
+
+!!! info "Info!"
+ Environment variables are key-value pairs that hold **sensitive** or environment-specific information, such as:
+
+ - API keys
+ - Access tokens
+ - Database credentials
+ - Other configuration settings.
+
+The `.env` file plays a crucial role in separating sensitive or environment-specific data from the source code.
+
+By storing such information in a separate file, developers can easily manage different configurations for various environments (e.g., development, staging, production) **without modifying the codebase**.
+
+!!! tip "Tip!"
+ Stop hardcoding configuration values in source code. Use `.env` instead!
+
+This approach **enhances security** and simplifies the deployment process.
+
+!!! danger "Never commit!"
+ `.env` files must remain outside of version control systems (e.g., github, gitlab, bitbucket), preventing accidental exposure!
+
+## Dot Example!
+
+It is recommended to create an `.env.example` file, this must be **committed in the repository**. It must contain **all the environment variables** necessary to start the application or model, but with **dummy values**.
+
+This way, whoever is going to deploy the ML application, will know what to configure so that the application starts successfully!
+
+A `.env.example` file:
+```console
+DB_HOST="1.2.3.4"
+DB_PORT=1122
+DB_USERNAME="some_username"
+DB_PASSWORD=abc123
+DB_DATABASE="some_db"
+GITHUB_TOKEN_ACCESS="ghp_123412341234123412341234123412341234"
+```
+
+## Reading `.env`
+
+Install de lib:
+
+
+
+
+So environment variables can be read in Python with:
+
+```python
+import os
+from dotenv import load_dotenv
+
+# Reading .env and creating environment variables
+load_dotenv()
+
+# Reading environment variable
+host = os.getenv("DB_HOST")
+
+# Using environment variable
+print(host)
+```
\ No newline at end of file
diff --git a/content/classes/03-batch/ds_project_data_folder.png b/content/classes/03-batch/ds_project_data_folder.png
new file mode 100644
index 0000000..afcdf59
Binary files /dev/null and b/content/classes/03-batch/ds_project_data_folder.png differ
diff --git a/content/classes/03-batch/folder_structure.png b/content/classes/03-batch/folder_structure.png
new file mode 100644
index 0000000..c366fa7
Binary files /dev/null and b/content/classes/03-batch/folder_structure.png differ
diff --git a/content/classes/03-batch/get_data.py b/content/classes/03-batch/get_data.py
new file mode 100644
index 0000000..c6c871f
--- /dev/null
+++ b/content/classes/03-batch/get_data.py
@@ -0,0 +1,159 @@
+import sys
+import os
+import numpy as np
+import random
+import datetime
+import calendar
+import pandas as pd
+
+
+class Config:
+ stores = {
+ 5000: {
+ "avg_n": 100,
+ "avg_price": 350.0,
+ "std": 10.0,
+ "boost_weekday": [6, 7],
+ "boost_months": [5, 12],
+ },
+ 5001: {
+ "avg_n": 10,
+ "avg_price": 500.0,
+ "std": 20.0,
+ "boost_weekday": [7],
+ "boost_months": [5, 12],
+ },
+ 5002: {
+ "avg_n": 25,
+ "avg_price": 400.0,
+ "std": 10.0,
+ "boost_weekday": [7],
+ "boost_months": [4, 10, 12],
+ },
+ 5003: {
+ "avg_n": 200,
+ "avg_price": 220.0,
+ "std": 12.0,
+ "boost_weekday": [1, 3, 7],
+ "boost_months": [],
+ },
+ 5004: {
+ "avg_n": 140,
+ "avg_price": 415.0,
+ "std": 17.0,
+ "boost_weekday": [4, 6, 7],
+ "boost_months": [4, 10, 12],
+ },
+ 5005: {
+ "avg_n": 50,
+ "avg_price": 890.0,
+ "std": 15.0,
+ "boost_weekday": [6, 7],
+ "boost_months": [5, 12],
+ },
+ }
+ product_ids = np.random.randint(1000, 3000, size=30)
+
+
+def generate_day_sales(store_id, year, month, day):
+ config = Config.stores[store_id]
+ n_sales = np.random.poisson(lam=config["avg_n"], size=1)[0]
+
+ wk_day = datetime.date(year, month, day).weekday()
+
+ if wk_day in config["boost_weekday"]:
+ n_sales = int(n_sales * random.uniform(1.6, 1.7))
+
+ if month in config["boost_months"]:
+ n_sales = int(n_sales * random.uniform(1.45, 1.50))
+
+ stores = [store_id] * n_sales
+
+ products = random.choices(Config.product_ids, k=n_sales)
+
+ prices = np.random.normal(
+ loc=config["avg_price"], scale=config["std"], size=n_sales
+ )
+
+ dates = [f"{year}-{month:02d}-{day:02d}"] * n_sales
+
+ client_ids = np.random.randint(100000, 400000, size=n_sales)
+
+ df = pd.DataFrame(
+ {
+ "store_id": stores,
+ "date": dates,
+ "client_id": client_ids,
+ "product_id": products,
+ "price": prices,
+ }
+ )
+ return df
+
+
+def generate_predict_register(store_id, year, month, day):
+ return pd.DataFrame(
+ {
+ "store_id": [store_id],
+ "year": [year],
+ "month": [month],
+ "day": [day],
+ "weekday": [datetime.date(year, month, day).weekday()],
+ }
+ )
+
+
+def generate_data(year_from, month_from, day_from, year_to, month_to, day_to, type_):
+ df = None
+ for store in Config.stores:
+ for year in range(year_from, year_to + 1):
+ if year != year_from:
+ month_from = 1
+ for month in range(month_from, month_to + 1):
+ if year != year_from or month != month_from:
+ day_from = 1
+
+ if year != year_to or month != month_to:
+ day_to_ = calendar.monthrange(year, month)[1]
+ else:
+ day_to_ = day_to
+
+ for day in range(day_from, day_to_ + 1):
+ if type_ == "train":
+ new_df = generate_day_sales(
+ store_id=store, year=year, month=month, day=day
+ )
+ else:
+ new_df = generate_predict_register(
+ store_id=store, year=year, month=month, day=day
+ )
+ df = pd.concat([df, new_df])
+ return df
+
+
+if __name__ == "__main__":
+ print("Simulate data ingestion!")
+
+ out_type = sys.argv[-1]
+
+ if len(sys.argv) != 8 or out_type not in ["train", "predict"]:
+ print(
+ "USAGE: python get_data.py "
+ )
+ else:
+ date_args = sys.argv[1:-1]
+ date_args = [int(x) for x in date_args]
+ df = generate_data(*date_args, out_type)
+ st_date = "-".join(sys.argv[4:-1])
+ if out_type == "train":
+ file_name = f"{out_type}-{st_date}.csv"
+ else:
+ file_name = f"{out_type}-{st_date}.parquet"
+
+ file_path = os.path.join("../data/", file_name)
+ print(f"Saving to {file_path} file...")
+
+ if out_type == "train":
+ df.to_csv(file_path, index=False)
+ else:
+ df.to_parquet(file_path.replace(".csv", ".parquet"), index=False)
diff --git a/content/classes/03-batch/intro.md b/content/classes/03-batch/intro.md
new file mode 100644
index 0000000..f0914af
--- /dev/null
+++ b/content/classes/03-batch/intro.md
@@ -0,0 +1,110 @@
+# Batch Prediction
+
+## Remembering **Online** prediction
+
+In the last class, we saw how to deploy a model using RESTful APIs.
+
+!!! exercise choice "Question"
+ Select the sentence that best explains when to use of *online deployment*.
+
+ - [ ] For asynchronous workflows where the model takes a long time to generate each prediction
+ - [ ] When predictions are needed hours, days or years from now.
+ - [X] If you need instant or real-time prediction
+ - [ ] It doesn't matter the way you deploy!
+
+
+ !!! answer "Answer"
+ **Online** prediction is when predictions are generated and returned as soon as *requests* for these predictions are received by the service. We usually do it because this instant or real-time prediction is needed!
+
+ 
+
+ Some examples where it could be usefull:
+
+ - Making time-critical predictions like detecting fraud moments after a transaction occurs
+ - Provide real-time recommendations: movies, ads, products
+ - Run sentiment analysis during chatbot conversations.
+
+In the next classes, we will revisit this category of model deployment (**online**), making it more robust. For now, let's look at the **batch prediction** category!
+
+## What is batch prediction?
+
+In this deployment category, the trained model is applied to previously collected, static datasets stored in files or databases, rather than real-time streaming data.
+
+!!! info
+ The predictions are generated **periodically** or whenever **triggered**.
+
+ We will deal with scheduling in the next classes.
+
+Batch prediction is suitable for **non-real time predictive tasks** like:
+
+- Price forecasting
+- Customer churn
+- Store assortment
+
+where results *aren't needed instantly*.
+
+!!! danger "Planning!"
+ Note that the deployment category to be used depends a lot on how the model will be used.
+
+ For example, a pricing model could either need **batch** predictions (suppose the price will be printed on a flyer) or real-time **online** (if the price changes a lot and will be used on a website, varying according to customer behavior).
+
+ Whenever possible, align this clearly with the customer at the planning stage of the model lifecycle!
+
+## Doing batch prediction
+
+In the first class, when we saw how to standardize a data science project, there were specific folders for data storage, notebooks and source code.
+
+
+
+!!! exercise short "Question"
+ Would the `data.csv` file be used during training? How would you deal with data?
+
+ !!! answer "Answer"
+ If the model doensn't exists, we need data to train it! So, probably yes!
+
+!!! exercise short "Question"
+ During the phase of model construction, would this `data.csv` file change? Explain it.
+
+ !!! answer "Answer"
+ You may find that the data is not enough in volume or that you need new features to achieve the design goals.
+
+ Assuming enough data, it would be transformed (feature engineering, feature selection) but would remain the same during training in the phase of model construction.
+
+!!! exercise long "Question"
+ After model deployment:
+
+ - Will the model need to be retrained?
+ - Will it be retrained in the same data?
+
+ Explain yourself!
+
+ !!! answer "Answer"
+ Yes for the first, no for the second!
+
+ It will be necessary to retrain if the model lose performance over time (it almost certainly will).
+
+ We'll deal with retraining in the next classes, but it's important to start thinking about it!
+
+!!! exercise long "Question"
+ What about when we deploy the model and need to predict with it?
+
+ Assuming that the model reads a batch of data from a file, would it be the same file used for training?
+
+ Justify your answer.
+
+ !!! answer "Answer"
+ Absolutely not! In the first class we did this for simplicity. We would like to make predictions on new data.
+
+!!! exercise long "Question"
+ Still on prediction in new data: assuming that the model **reads a batch of data** from a file called `predict.csv`, would this file remain the same (have the same records) every time the predict script is called?
+
+ Justify your answer.
+
+ !!! answer "Answer"
+ No. Assuming that the whole file has been predicted, the next time that the prediction script is called, we would like to indicate a new file or that the `predict.csv` file has new data.
+
+So, in order to do batch prediction, we need to worry about **getting data**!
+
+
+
+So let's talk about **data formats** and **data sources**. Advance to the next topic!
\ No newline at end of file
diff --git a/content/classes/03-batch/item_sale.png b/content/classes/03-batch/item_sale.png
new file mode 100644
index 0000000..3160657
Binary files /dev/null and b/content/classes/03-batch/item_sale.png differ
diff --git a/content/classes/03-batch/parquet.gif b/content/classes/03-batch/parquet.gif
new file mode 100644
index 0000000..b54641d
Binary files /dev/null and b/content/classes/03-batch/parquet.gif differ
diff --git a/content/classes/03-batch/pg_admin.webp b/content/classes/03-batch/pg_admin.webp
new file mode 100644
index 0000000..1143f06
Binary files /dev/null and b/content/classes/03-batch/pg_admin.webp differ
diff --git a/content/classes/03-batch/practicing.md b/content/classes/03-batch/practicing.md
new file mode 100644
index 0000000..2ffedce
--- /dev/null
+++ b/content/classes/03-batch/practicing.md
@@ -0,0 +1,384 @@
+# Practicing
+
+## Create folder
+
+Use cookiecutter and your template, created in [**class 01**](../01-intro/aps01_part_2.md#task-01-create-a-template), to create a new folder/project for today's class. You can use any name for the folder/project (I chose `p03-batch`).
+
+!!! danger "Atention"
+ If you didn't do this part of the activity (APS 01 part 2) and do not have a template, create the folders manually.
+
+
+
+ ```console
+ $ cookiecutter https://github.com/macielcalebe/template-ds-maciel.git --checkout main
+ You've downloaded /home/calebe/.cookiecutters/template-ds-maciel before. Is it okay to delete and re-download it? [y/n] (y): y
+ [1/3] directory_name (project-name): p03-batch
+ [2/3] author_name (Your Name): Maciel
+ [3/3] compatible_python_versions (^3.8):
+ ```
+
+
+
+
+Let's check if the folders were created correctly.
+
+
+
+ ```console
+ $ cd p03-batch/
+ $ ls
+ data models notebooks README.md src
+ ```
+
+
+
+
+!!! info "Important"
+ This is not a repository (yet). We just created the folders structure of a ML project!
+
+## Introduction
+
+In this task, we are going to create a model for forecasting sales for five stores of a company. The model will make predictions in batch, for now called manually from terminal.
+
+## Task 1: Generating *Train* data
+
+??? "Click Here to see a script that generates data!"
+ Copy this python script to the path `src/get_data.py`
+ ```python
+ import sys
+ import os
+ import numpy as np
+ import random
+ import datetime
+ import calendar
+ import pandas as pd
+ import itertools
+
+
+ class Config:
+ stores = {
+ 5000: {
+ "avg_n": 100,
+ "avg_price": 350.0,
+ "std": 10.0,
+ "boost_weekday": [6, 7],
+ "boost_months": [5, 12],
+ },
+ 5001: {
+ "avg_n": 10,
+ "avg_price": 500.0,
+ "std": 20.0,
+ "boost_weekday": [7],
+ "boost_months": [5, 12],
+ },
+ 5002: {
+ "avg_n": 25,
+ "avg_price": 400.0,
+ "std": 10.0,
+ "boost_weekday": [7],
+ "boost_months": [4, 10, 12],
+ },
+ 5003: {
+ "avg_n": 200,
+ "avg_price": 220.0,
+ "std": 12.0,
+ "boost_weekday": [1, 3, 7],
+ "boost_months": [],
+ },
+ 5004: {
+ "avg_n": 140,
+ "avg_price": 415.0,
+ "std": 17.0,
+ "boost_weekday": [4, 6, 7],
+ "boost_months": [4, 10, 12],
+ },
+ 5005: {
+ "avg_n": 50,
+ "avg_price": 890.0,
+ "std": 15.0,
+ "boost_weekday": [6, 7],
+ "boost_months": [5, 12],
+ },
+ }
+ product_ids = np.random.randint(1000, 3000, size=30)
+
+
+ def generate_day_sales(store_id, date):
+ config = Config.stores[store_id]
+ year, month, day = date.year, date.month, date.day
+ n_sales = np.random.poisson(lam=config["avg_n"])
+
+ if date.weekday() in config["boost_weekday"]:
+ n_sales = int(n_sales * random.uniform(1.6, 1.7))
+
+ if month in config["boost_months"]:
+ n_sales = int(n_sales * random.uniform(1.45, 1.50))
+
+ stores = np.full(n_sales, store_id)
+ products = np.random.choice(Config.product_ids, size=n_sales)
+ prices = np.random.normal(
+ loc=config["avg_price"], scale=config["std"], size=n_sales
+ )
+ dates = np.full(n_sales, date.strftime("%Y-%m-%d"))
+ client_ids = np.random.randint(100000, 400000, size=n_sales)
+
+ return pd.DataFrame(
+ {
+ "store_id": stores,
+ "date": dates,
+ "client_id": client_ids,
+ "product_id": products,
+ "price": prices,
+ }
+ )
+
+
+ def generate_predict_register(store_id, date):
+ return pd.DataFrame(
+ {
+ "store_id": [store_id],
+ "year": [date.year],
+ "month": [date.month],
+ "day": [date.day],
+ "weekday": [date.weekday()],
+ }
+ )
+
+
+ def generate_data(year_from, month_from, day_from, year_to, month_to, day_to, type_):
+ dates = pd.date_range(
+ start=f"{year_from}-{month_from:02d}-{day_from:02d}",
+ end=f"{year_to}-{month_to:02d}-{day_to:02d}",
+ )
+ store_ids = list(Config.stores.keys())
+ combinations = itertools.product(store_ids, dates)
+
+ dfs = []
+ for store_id, date in combinations:
+ if type_ == "train":
+ dfs.append(generate_day_sales(store_id, date))
+ else:
+ dfs.append(generate_predict_register(store_id, date))
+
+ return pd.concat(dfs, ignore_index=True)
+
+
+ if __name__ == "__main__":
+ print("Simulate data ingestion!")
+
+ out_type = sys.argv[-1]
+
+ if len(sys.argv) != 8 or out_type not in ["train", "predict"]:
+ print("USAGE: python get_data.py ")
+ else:
+ date_args = sys.argv[1:-1]
+ date_args = [int(x) for x in date_args]
+ df = generate_data(*date_args, out_type)
+ st_date = "-".join(sys.argv[4:-1])
+ if out_type == "train":
+ file_name = f"{out_type}-{st_date}.csv"
+ else:
+ file_name = f"{out_type}-{st_date}.parquet"
+
+ file_path = os.path.join("../data/", file_name)
+ print(f"Saving to {file_path} file...")
+
+ if out_type == "train":
+ df.to_csv(file_path, index=False)
+ else:
+ df.to_parquet(file_path.replace(".csv", ".parquet"), index=False)
+ ```
+
+Let's use this script to simulate ingesting data that will be used whenever the model needs it:
+
+- be trained
+- make prediction
+
+!!! exercise "Question"
+ Copy the script source code and place it in file path `src/get_data.py`.
+
+
+
+ ```console
+ $ python3 get_data.py help
+ Simulate data ingestion!
+ USAGE: python get_data.py
+ ```
+
+
+
+
+We will consider as training data from the beginning of the year 2022 until the first day of August 2023.
+
+
+
+
+
+This will create a `train-2023-08-01.csv` file in the `data` folder containing the sales data for each company's store.
+
+Each line in this file represents a sale made to a customer.
+
+
+
+## Task 2: Processing train file
+
+The model will predict the total to be sold **per store** in one day.
+
+So we need to process the input data to change its granularity. The expected result is a DataFrame where each line represents the total sales of a store in one day:
+
+| | store_id | total_sales | year | month | day | weekday |
+|-----:|-----------:|--------------:|-------:|--------:|------:|----------:|
+| 0 | 5000 | 62895.6 | 2023 | 1 | 1 | 6 |
+| 1 | 5000 | 42351.1 | 2023 | 1 | 2 | 0 |
+| 2 | 5000 | 37377.4 | 2023 | 1 | 3 | 1 |
+| 3 | 5000 | 31385.5 | 2023 | 1 | 4 | 2 |
+...
+| 1636 | 5005 | 46246.3 | 2023 | 9 | 29 | 4 |
+| 1637 | 5005 | 43698.2 | 2023 | 9 | 30 | 5 |
+
+Notice the feature `weekday`. It represents the day of the week, going from `0` to `6`.
+
+!!! exercise choice "Question"
+ Which of the variables is our target variable?
+
+ - [ ] store_id
+ - [X] total_sales
+ - [ ] year
+ - [ ] month
+ - [ ] day
+ - [ ] weekday
+
+
+ !!! answer "Answer"
+ **total_sales**, that's what we are predicting!
+
+!!! exercise "Question"
+ Construct a python script that does the required granularity change and adds the `weekday` feature.
+
+ Save the result using in folder `../data` (relative to `src`) using **parquet** file format.
+
+## Task 3: Trainning the model
+
+Train a model using `RandomForestRegressor` or any other of your preference.
+
+```python
+from sklearn.ensemble import RandomForestRegressor
+
+model = RandomForestRegressor(n_estimators=100, random_state=195)
+model.fit(X_train, y_train)
+```
+
+!!! exercise "Question"
+ Construct a python script `src/train.py` that receives as argument the path of preprocessed training file:
+
+
+
+ ```console
+ $ python3 train.py ../data/train-2023-08-01.parquet
+ Training model!
+ Saving to ../model/model-2023-08-01.pickle file...
+ ```
+
+
+
+## Task 4: Simulate predicton data
+
+Now that the model is trained, we can use it to make predictions on future dates, having an estimate of the billing provided by each store.
+
+Let's use our script to simulate ingesting the data for predict. You can imagine that some system task would generate a file containing the batch of data that must be used by the model to make predictions. The script will simulate this task.
+
+Then, when it was time for the model to make predictions, the model would read the lines from this file and generate predictions of total sales.
+
+!!! exercise "Question"
+ Call the python script `src/get_data.py` in order to generate prediction data:
+
+
+
+
+
+Done! Whenever there is a new file that must be predicted, just call the `predict.py` script, informing which model to be used and the path of the file with the data! Then, the model will read this batch of information and perform the prediction.
+
+## Extra questions!
+
+### Categorical variables
+
+!!! exercise text long "Question"
+ Are there categorical variables in the training data? If yes, which ones?
+
+ Should we use OneHotEncoder? Explain.
+
+!!! exercise "Question"
+ Create scenarios and python code to evaluate the previous question.
+
+### OOT validation
+
+How do we check if a model is good? Although for now we haven't focused much on the construction details of the models, this is an important topic worth discussing!
+
+!!! info
+ We say that a model is good if it is performing well in **unseen data**.
+
+It is common to use `train_test_split` to generate `X_test` and `y_test`. This is the **Out-of-sample** (**OOS**) validation, where data are split in a random way.
+
+**Out-of-time** (**OOT**) refers to evaluating the performance of a trained model on data that falls **outside the time period** or timeframe used for training the model. This concept is particularly relevant in scenarios where the data is **time-dependent** or exhibits **temporal** patterns (like).
+
+!!! example "OOT Example!"
+ If your training data are from january to july, use data from January to May for training and separate June and July to check model performance!
+
+!!! tip "Tip!"
+ Once you decide the model is good enough, you can retrain with the whole base (January to July) and deploy this new version of the model!
+
+!!! exercise text long "Question"
+ Explain when is it a good idea to use OOT and what would be the consequences if you don't!
\ No newline at end of file
diff --git a/content/classes/03-batch/predict_dates.png b/content/classes/03-batch/predict_dates.png
new file mode 100644
index 0000000..53e2306
Binary files /dev/null and b/content/classes/03-batch/predict_dates.png differ
diff --git a/content/classes/03-batch/spark_sql.png b/content/classes/03-batch/spark_sql.png
new file mode 100644
index 0000000..27fe31b
Binary files /dev/null and b/content/classes/03-batch/spark_sql.png differ
diff --git a/content/classes/03-batch/sql.md b/content/classes/03-batch/sql.md
new file mode 100644
index 0000000..2cf21d4
--- /dev/null
+++ b/content/classes/03-batch/sql.md
@@ -0,0 +1,17 @@
+# SQL
+
+Machine learning models can take *input data* directly from a *SQL database* like [PostgreSQL](https://www.postgresql.org/).
+
+Some reasons for that:
+
+- SQL databases are commonly used for storing large volumes of **structured data** that models can be trained and scored on. Integrating with SQL allows models to leverage *existing data infrastructure*.
+
+- Models can *query only the new/updated rows* as needed rather than processing all the data each time.
+
+- SQL databases allow joining model *input features from multiple related tables* as needed. This can help models make better predictions by incorporating more context.
+
+- *Storing predictions* back in SQL allows those results to then be queried, analyzed, and accessed by other downstream applications and processes.
+
+- Analytics engine for large-scale data processing, like Apache *Spark*, have a *SQL API*.
+
+
diff --git a/content/classes/03-batch/task_train_csv.png b/content/classes/03-batch/task_train_csv.png
new file mode 100644
index 0000000..36d0ccc
Binary files /dev/null and b/content/classes/03-batch/task_train_csv.png differ
diff --git a/content/classes/04-int01/faq.md b/content/classes/04-int01/faq.md
new file mode 100644
index 0000000..2ab180a
--- /dev/null
+++ b/content/classes/04-int01/faq.md
@@ -0,0 +1,56 @@
+# Interview 01
+
+## FAQ
+
+1. *Can I do the interview from home?*
+
+ No. The interviews are face-to-face and synchronous. They must be done during class time.
+
+1. *What if I can't make it to class?*
+
+ See the rules for activities that were not delivered [Here](../../about.md#final-grade).
+
+1. *How will the dynamics of the interviews be?*
+
+ The interviews will be in pairs. The interviewer will ask questions and the interviewee must answer them.
+
+1. *Who will play the role of interviewer?*
+
+ You and your partner will alternate, sometimes being the interviewer, sometimes being the interviewee.
+
+1. *Should the interviewer create questions on the fly?*
+
+ No. There will be a support material that will guide the interviews, containing: simpler steps (such as greeting), questions from the interviewer and rubric to evaluate the response.
+
+1. *Who will evaluate responses?*
+
+ Whoever is playing the role of interviewer.
+
+1. *Will the interviews be in English?*
+
+ The support material (that will guide the dynamics) will be in English. You can agree with your partner if will speak in English, Portuguese or any other language.
+
+1. *Will I need to write the answers?*
+
+ At first, no. You may need to draw on the walls!
+
+1. *Will I be able to google during the interview?*
+
+ No.
+
+## Interview subject
+
+We will focus mainly on the subjects studied in the previous classes. For example, Interview 01 will focus on:
+
+- Introduction to MLOps
+- ML life cycle
+- Categories of Model Deployment
+- APIs
+- Online prediction
+- Batch prediction
+
+Use the available references and do extra research to delve deeper into the subjects.
+
+In addition, some interview questions will go beyond the skills practiced in MLOps classes, so that you recall and re-evaluate your skills acquired in other courses.
+
+For example, on the Ciência dos Dados final project you used a lot of communication skills. There may be questions that require, for example, explaining a concept to a manager.
\ No newline at end of file
diff --git a/content/classes/04-int01/int01.md b/content/classes/04-int01/int01.md
new file mode 100644
index 0000000..2ebdf67
--- /dev/null
+++ b/content/classes/04-int01/int01.md
@@ -0,0 +1,13 @@
+# Interview 01
+
+Read the [FAQ](faq.md) before proceeding.
+
+Then, find a colleague to do the interview with you.
+
+Decide with your colleague who will be Interviewer 1 or Interviewer 2.
+
+Then, just access the form corresponding to your role!
+
+- [**Interviewer One**](https://forms.gle/c1BfbYmeRZ4VQQU47): [Click HERE if you are the Interviewer One](https://forms.gle/c1BfbYmeRZ4VQQU47)
+
+- [**Interviewer Two**](https://forms.gle/FDeygEppNwc4sWvc8): [Click HERE if you are the Interviewer Two](https://forms.gle/FDeygEppNwc4sWvc8)
diff --git a/content/contributions.md b/content/contributions.md
new file mode 100644
index 0000000..e69de29
diff --git a/content/css/termynal.css b/content/css/termynal.css
new file mode 100644
index 0000000..a4dc9d6
--- /dev/null
+++ b/content/css/termynal.css
@@ -0,0 +1,109 @@
+/**
+ * termynal.js
+ *
+ * @author Ines Montani
+ * @version 0.0.1
+ * @license MIT
+ */
+
+ :root {
+ --color-bg: #252a33;
+ --color-text: #eee;
+ --color-text-subtle: #a2a2a2;
+}
+
+[data-termynal] {
+ width: 750px;
+ max-width: 100%;
+ background: var(--color-bg);
+ color: var(--color-text);
+ /* font-size: 18px; */
+ font-size: 15px;
+ /* font-family: 'Fira Mono', Consolas, Menlo, Monaco, 'Courier New', Courier, monospace; */
+ font-family: 'Roboto Mono', 'Fira Mono', Consolas, Menlo, Monaco, 'Courier New', Courier, monospace;
+ border-radius: 4px;
+ padding: 75px 45px 35px;
+ position: relative;
+ -webkit-box-sizing: border-box;
+ box-sizing: border-box;
+}
+
+[data-termynal]:before {
+ content: '';
+ position: absolute;
+ top: 15px;
+ left: 15px;
+ display: inline-block;
+ width: 15px;
+ height: 15px;
+ border-radius: 50%;
+ /* A little hack to display the window buttons in one pseudo element. */
+ background: #d9515d;
+ -webkit-box-shadow: 25px 0 0 #f4c025, 50px 0 0 #3ec930;
+ box-shadow: 25px 0 0 #f4c025, 50px 0 0 #3ec930;
+}
+
+[data-termynal]:after {
+ content: 'bash';
+ position: absolute;
+ color: var(--color-text-subtle);
+ top: 5px;
+ left: 0;
+ width: 100%;
+ text-align: center;
+}
+
+a[data-terminal-control] {
+ text-align: right;
+ display: block;
+ color: #aebbff;
+}
+
+[data-ty] {
+ display: block;
+ line-height: 2;
+}
+
+[data-ty]:before {
+ /* Set up defaults and ensure empty lines are displayed. */
+ content: '';
+ display: inline-block;
+ vertical-align: middle;
+}
+
+[data-ty="input"]:before,
+[data-ty-prompt]:before {
+ margin-right: 0.75em;
+ color: var(--color-text-subtle);
+}
+
+[data-ty="input"]:before {
+ content: '$';
+}
+
+[data-ty][data-ty-prompt]:before {
+ content: attr(data-ty-prompt);
+}
+
+[data-ty-cursor]:after {
+ content: attr(data-ty-cursor);
+ font-family: monospace;
+ margin-left: 0.5em;
+ -webkit-animation: blink 1s infinite;
+ animation: blink 1s infinite;
+}
+
+
+/* Cursor animation */
+
+@-webkit-keyframes blink {
+ 50% {
+ opacity: 0;
+ }
+}
+
+@keyframes blink {
+ 50% {
+ opacity: 0;
+ }
+}
diff --git a/content/deadlines.md b/content/deadlines.md
new file mode 100644
index 0000000..a911af9
--- /dev/null
+++ b/content/deadlines.md
@@ -0,0 +1,9 @@
+# Agenda
+
+Data de entrega de cada projeto
+
+| Start Date | Projeto | Entrega |
+|------------|-------------------------------------|----------|
+| Aug-05 | [Aps01](classes/01-intro/aps01_part_1.md) | Aug-15 |
+| Aug-13 | [Aps02](classes/03-batch/aps02_sql.md) | Aug-27 |
+| Sep-04 | [Aps03](classes/07-lambda/sa_lambda_function.md) | Oct-06 |
\ No newline at end of file
diff --git a/content/equipe/arnaldo.jpeg b/content/equipe/arnaldo.jpeg
new file mode 100644
index 0000000..42500a1
Binary files /dev/null and b/content/equipe/arnaldo.jpeg differ
diff --git a/content/equipe/carareto.jpeg b/content/equipe/carareto.jpeg
new file mode 100644
index 0000000..35be91a
Binary files /dev/null and b/content/equipe/carareto.jpeg differ
diff --git a/content/equipe/rogerio.jpeg b/content/equipe/rogerio.jpeg
new file mode 100644
index 0000000..40b978b
Binary files /dev/null and b/content/equipe/rogerio.jpeg differ
diff --git a/content/index.md b/content/index.md
new file mode 100644
index 0000000..75c9b26
--- /dev/null
+++ b/content/index.md
@@ -0,0 +1,61 @@
+# Camada Física da Computação
+
+Essa página contém todos os materiais para o curso de Camada Física da Computação de 2024-b.
+
+## Estrutura do Curso
+
+O curso é estruturado em 9 projetos:
+
+- **Projeto 1**: Loop-back
+- **Projeto 2**: Client-Server Simples
+- **Projeto 3**: Client-Server Completo
+- **Projeto 4**:
+- **Projeto 5**: Uart por software
+- **Projeto 6**: DTMF
+- **Projeto 7**: Transmissõa de áudio
+- **Projeto 8**:
+- **Projeto 9**: Artigo Ciêntifico
+
+Cada projeto tem a duração de uma semana.
+
+### Dinâmica das aulas
+
+A cada projeto, os alunos participarão de:
+
+- Aulas expositivas;
+- Atividades práticas guiadas para o desenvolvimento do projeto.
+
+### Avaliações
+
+- A cada projeto é realizada uma avaliação QUIZ e no final do semestre a avaliação final (AF),é realizada na semana definida no calendário acadêmico.
+
+### Critérios de avaliação
+
+- [Critérios de avaliação](criterios.md)
+
+## Principais conteúdos abordados:
+
+
+
+## Repositório
+
+Github da disciplina, pode ser acessado no link:
+
+- [Github Camada Física](https://github.com/Insper/camadafisica).
+
+## Aulas
+
+**Laboratório de Informática** - Sala 404
+- **Quinta-feira** 13:30
+- **Sexta-feira** 15:45
+
+Horário de atendimento:
+
+- **Segunda-feira** xx:00 - sala xx
+
+## Equipe
+
+!!! people "Equipe atual"
+ -  **Rodrigo Carareto** *Professor*
+ -  **Arnaldo Alves Viana Junior** *Prof. Auxiliar*
+ -  **Rogério Cuenca** *Técnico de lab*
diff --git a/content/js/custom.js b/content/js/custom.js
new file mode 100644
index 0000000..58f321a
--- /dev/null
+++ b/content/js/custom.js
@@ -0,0 +1,113 @@
+function setupTermynal() {
+ document.querySelectorAll(".use-termynal").forEach(node => {
+ node.style.display = "block";
+ new Termynal(node, {
+ lineDelay: 500
+ });
+ });
+ const progressLiteralStart = "---> 100%";
+ const promptLiteralStart = "$ ";
+ const customPromptLiteralStart = "# ";
+ const termynalActivateClass = "termy";
+ let termynals = [];
+
+ function createTermynals() {
+ document
+ .querySelectorAll(`.${termynalActivateClass} .highlight`)
+ .forEach(node => {
+ const text = node.textContent;
+ const lines = text.split("\n");
+ const useLines = [];
+ let buffer = [];
+ function saveBuffer() {
+ if (buffer.length) {
+ let isBlankSpace = true;
+ buffer.forEach(line => {
+ if (line) {
+ isBlankSpace = false;
+ }
+ });
+ dataValue = {};
+ if (isBlankSpace) {
+ dataValue["delay"] = 0;
+ }
+ if (buffer[buffer.length - 1] === "") {
+ // A last single won't have effect
+ // so put an additional one
+ buffer.push("");
+ }
+ const bufferValue = buffer.join(" ");
+ dataValue["value"] = bufferValue;
+ useLines.push(dataValue);
+ buffer = [];
+ }
+ }
+ for (let line of lines) {
+ if (line === progressLiteralStart) {
+ saveBuffer();
+ useLines.push({
+ type: "progress"
+ });
+ } else if (line.startsWith(promptLiteralStart)) {
+ saveBuffer();
+ const value = line.replace(promptLiteralStart, "").trimEnd();
+ useLines.push({
+ type: "input",
+ value: value
+ });
+ } else if (line.startsWith("// ")) {
+ saveBuffer();
+ const value = "💬 " + line.replace("// ", "").trimEnd();
+ useLines.push({
+ value: value,
+ class: "termynal-comment",
+ delay: 0
+ });
+ } else if (line.startsWith(customPromptLiteralStart)) {
+ saveBuffer();
+ const promptStart = line.indexOf(promptLiteralStart);
+ if (promptStart === -1) {
+ console.error("Custom prompt found but no end delimiter", line)
+ }
+ const prompt = line.slice(0, promptStart).replace(customPromptLiteralStart, "")
+ let value = line.slice(promptStart + promptLiteralStart.length);
+ useLines.push({
+ type: "input",
+ value: value,
+ prompt: prompt
+ });
+ } else {
+ buffer.push(line);
+ }
+ }
+ saveBuffer();
+ const div = document.createElement("div");
+ node.replaceWith(div);
+ const termynal = new Termynal(div, {
+ lineData: useLines,
+ noInit: true,
+ lineDelay: 500
+ });
+ termynals.push(termynal);
+ });
+ }
+
+ function loadVisibleTermynals() {
+ termynals = termynals.filter(termynal => {
+ if (termynal.container.getBoundingClientRect().top - innerHeight <= 0) {
+ termynal.init();
+ return false;
+ }
+ return true;
+ });
+ }
+ window.addEventListener("scroll", loadVisibleTermynals);
+ createTermynals();
+ loadVisibleTermynals();
+}
+
+async function main() {
+ setupTermynal()
+}
+
+main()
diff --git a/content/js/tabs.js b/content/js/tabs.js
new file mode 100644
index 0000000..8f16c63
--- /dev/null
+++ b/content/js/tabs.js
@@ -0,0 +1,12 @@
+let longExercises = document.querySelectorAll(".admonition.long textarea");
+
+longExercises.forEach((el) => {
+ el.addEventListener("keydown", (evt) => {
+ if (evt.keyCode == 9) {
+ evt.preventDefault();
+ el.setRangeText(" ");
+ el.selectionStart += 4;
+ }
+ });
+});
+
diff --git a/content/js/termynal.js b/content/js/termynal.js
new file mode 100644
index 0000000..134fc31
--- /dev/null
+++ b/content/js/termynal.js
@@ -0,0 +1,263 @@
+/**
+ * termynal.js
+ * A lightweight, modern and extensible animated terminal window, using
+ * async/await.
+ *
+ * @author Ines Montani
+ * @version 0.0.1
+ * @license MIT
+ */
+
+ 'use strict';
+
+ /** Generate a terminal widget. */
+ class Termynal {
+ /**
+ * Construct the widget's settings.
+ * @param {(string|Node)=} container - Query selector or container element.
+ * @param {Object=} options - Custom settings.
+ * @param {string} options.prefix - Prefix to use for data attributes.
+ * @param {number} options.startDelay - Delay before animation, in ms.
+ * @param {number} options.typeDelay - Delay between each typed character, in ms.
+ * @param {number} options.lineDelay - Delay between each line, in ms.
+ * @param {number} options.progressLength - Number of characters displayed as progress bar.
+ * @param {string} options.progressChar – Character to use for progress bar, defaults to █.
+ * @param {number} options.progressPercent - Max percent of progress.
+ * @param {string} options.cursor – Character to use for cursor, defaults to ▋.
+ * @param {Object[]} lineData - Dynamically loaded line data objects.
+ * @param {boolean} options.noInit - Don't initialise the animation.
+ */
+ constructor(container = '#termynal', options = {}) {
+ this.container = (typeof container === 'string') ? document.querySelector(container) : container;
+ this.pfx = `data-${options.prefix || 'ty'}`;
+ this.originalStartDelay = this.startDelay = options.startDelay
+ || parseFloat(this.container.getAttribute(`${this.pfx}-startDelay`)) || 600;
+ this.originalTypeDelay = this.typeDelay = options.typeDelay
+ || parseFloat(this.container.getAttribute(`${this.pfx}-typeDelay`)) || 90;
+ this.originalLineDelay = this.lineDelay = options.lineDelay
+ || parseFloat(this.container.getAttribute(`${this.pfx}-lineDelay`)) || 1500;
+ this.progressLength = options.progressLength
+ || parseFloat(this.container.getAttribute(`${this.pfx}-progressLength`)) || 40;
+ this.progressChar = options.progressChar
+ || this.container.getAttribute(`${this.pfx}-progressChar`) || '█';
+ this.progressPercent = options.progressPercent
+ || parseFloat(this.container.getAttribute(`${this.pfx}-progressPercent`)) || 100;
+ this.cursor = options.cursor
+ || this.container.getAttribute(`${this.pfx}-cursor`) || '▋';
+ this.lineData = this.lineDataToElements(options.lineData || []);
+ this.loadLines()
+ if (!options.noInit) this.init()
+ }
+
+ loadLines() {
+ // Load all the lines and create the container so that the size is fixed
+ // Otherwise it would be changing and the user viewport would be constantly
+ // moving as she/he scrolls
+ const finish = this.generateFinish()
+ finish.style.visibility = 'hidden'
+ this.container.appendChild(finish)
+ // Appends dynamically loaded lines to existing line elements.
+ this.lines = [...this.container.querySelectorAll(`[${this.pfx}]`)].concat(this.lineData);
+ for (let line of this.lines) {
+ line.style.visibility = 'hidden'
+ this.container.appendChild(line)
+ }
+ const restart = this.generateRestart()
+ restart.style.visibility = 'hidden'
+ this.container.appendChild(restart)
+ this.container.setAttribute('data-termynal', '');
+ }
+
+ /**
+ * Initialise the widget, get lines, clear container and start animation.
+ */
+ init() {
+ /**
+ * Calculates width and height of Termynal container.
+ * If container is empty and lines are dynamically loaded, defaults to browser `auto` or CSS.
+ */
+ const containerStyle = getComputedStyle(this.container);
+ this.container.style.width = containerStyle.width !== '0px' ?
+ containerStyle.width : undefined;
+ this.container.style.minHeight = containerStyle.height !== '0px' ?
+ containerStyle.height : undefined;
+
+ this.container.setAttribute('data-termynal', '');
+ this.container.innerHTML = '';
+ for (let line of this.lines) {
+ line.style.visibility = 'visible'
+ }
+ this.start();
+ }
+
+ /**
+ * Start the animation and rener the lines depending on their data attributes.
+ */
+ async start() {
+ this.addFinish()
+ await this._wait(this.startDelay);
+
+ for (let line of this.lines) {
+ const type = line.getAttribute(this.pfx);
+ const delay = line.getAttribute(`${this.pfx}-delay`) || this.lineDelay;
+
+ if (type == 'input') {
+ line.setAttribute(`${this.pfx}-cursor`, this.cursor);
+ await this.type(line);
+ await this._wait(delay);
+ }
+
+ else if (type == 'progress') {
+ await this.progress(line);
+ await this._wait(delay);
+ }
+
+ else {
+ this.container.appendChild(line);
+ await this._wait(delay);
+ }
+
+ line.removeAttribute(`${this.pfx}-cursor`);
+ }
+ this.addRestart()
+ this.finishElement.style.visibility = 'hidden'
+ this.lineDelay = this.originalLineDelay
+ this.typeDelay = this.originalTypeDelay
+ this.startDelay = this.originalStartDelay
+ }
+
+ generateRestart() {
+ const restart = document.createElement('a')
+ restart.onclick = (e) => {
+ e.preventDefault()
+ this.container.innerHTML = ''
+ this.init()
+ }
+ restart.href = '#'
+ restart.setAttribute('data-terminal-control', '')
+ restart.innerHTML = "restart ↻"
+ return restart
+ }
+
+ generateFinish() {
+ const finish = document.createElement('a')
+ finish.onclick = (e) => {
+ e.preventDefault()
+ this.lineDelay = 0
+ this.typeDelay = 0
+ this.startDelay = 0
+ }
+ finish.href = '#'
+ finish.setAttribute('data-terminal-control', '')
+ finish.innerHTML = "fast →"
+ this.finishElement = finish
+ return finish
+ }
+
+ addRestart() {
+ const restart = this.generateRestart()
+ this.container.appendChild(restart)
+ }
+
+ addFinish() {
+ const finish = this.generateFinish()
+ this.container.appendChild(finish)
+ }
+
+ /**
+ * Animate a typed line.
+ * @param {Node} line - The line element to render.
+ */
+ async type(line) {
+ const chars = [...line.textContent];
+ line.textContent = '';
+ this.container.appendChild(line);
+
+ for (let char of chars) {
+ const delay = line.getAttribute(`${this.pfx}-typeDelay`) || this.typeDelay;
+ await this._wait(delay);
+ line.textContent += char;
+ }
+ }
+
+ /**
+ * Animate a progress bar.
+ * @param {Node} line - The line element to render.
+ */
+ async progress(line) {
+ const progressLength = line.getAttribute(`${this.pfx}-progressLength`)
+ || this.progressLength;
+ const progressChar = line.getAttribute(`${this.pfx}-progressChar`)
+ || this.progressChar;
+ const chars = progressChar.repeat(progressLength);
+ const progressPercent = line.getAttribute(`${this.pfx}-progressPercent`)
+ || this.progressPercent;
+ line.textContent = '';
+ this.container.appendChild(line);
+
+ for (let i = 1; i < chars.length + 1; i++) {
+ await this._wait(this.typeDelay);
+ const percent = Math.round(i / chars.length * 100);
+ line.textContent = `${chars.slice(0, i)} ${percent}%`;
+ if (percent>progressPercent) {
+ break;
+ }
+ }
+ }
+
+ /**
+ * Helper function for animation delays, called with `await`.
+ * @param {number} time - Timeout, in ms.
+ */
+ _wait(time) {
+ return new Promise(resolve => setTimeout(resolve, time));
+ }
+
+ /**
+ * Converts line data objects into line elements.
+ *
+ * @param {Object[]} lineData - Dynamically loaded lines.
+ * @param {Object} line - Line data object.
+ * @returns {Element[]} - Array of line elements.
+ */
+ lineDataToElements(lineData) {
+ return lineData.map(line => {
+ let div = document.createElement('div');
+ div.innerHTML = `${line.value || ''}`;
+
+ return div.firstElementChild;
+ });
+ }
+
+ /**
+ * Helper function for generating attributes string.
+ *
+ * @param {Object} line - Line data object.
+ * @returns {string} - String of attributes.
+ */
+ _attributes(line) {
+ let attrs = '';
+ for (let prop in line) {
+ // Custom add class
+ if (prop === 'class') {
+ attrs += ` class=${line[prop]} `
+ continue
+ }
+ if (prop === 'type') {
+ attrs += `${this.pfx}="${line[prop]}" `
+ } else if (prop !== 'value') {
+ attrs += `${this.pfx}-${prop}="${line[prop]}" `
+ }
+ }
+ return attrs;
+ }
+ }
+
+ /**
+ * HTML API: If current script has container(s) specified, initialise Termynal.
+ */
+ if (document.currentScript.hasAttribute('data-termynal-container')) {
+ const containers = document.currentScript.getAttribute('data-termynal-container');
+ containers.split('|')
+ .forEach(container => new Termynal(container))
+ }
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..db79107
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,43 @@
+INHERIT: active-handout.yml
+
+site_name: Camada Física da Computação
+
+extra:
+ custom_variables:
+ repo_aps: https://github.com/insper/camadafisica
+ repo_aps_git: https://github.com/insper/camadafisica.git
+
+extra_javascript:
+ - js/tabs.js
+ - js/termynal.js
+ - js/custom.js
+
+extra_css:
+ - css/termynal.css
+
+nav:
+ - 'Home': index.md
+ - about.md
+ - deadlines.md
+ - contributions.md
+ - Classes:
+ - "01 - Introduction":
+ - classes/01-intro/intro.md
+ - classes/01-intro/aps01_part_1.md
+ - classes/01-intro/aps01_part_2.md
+ - "02 - Deploy: first try!":
+ - classes/02-api/api_deploy.md
+ - "03 - Batch prediction":
+ - "Part 1":
+ - classes/03-batch/intro.md
+ - classes/03-batch/data_formats.md
+ - classes/03-batch/practicing.md
+ - "Part 2":
+ - classes/03-batch/sql.md
+ - classes/03-batch/db_tool.md
+ - classes/03-batch/dot_env.md
+ - classes/03-batch/aps02_sql.md
+ - "04 - Interview 01":
+ - classes/04-int01/faq.md
+ - classes/04-int01/int01.md
+
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..5ce135f
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,3 @@
+mkdocs
+pymdown-extensions
+git+https://github.com/insper-education/active-handout-plugins-py.git@latest
\ No newline at end of file