Skip to content

Commit

Permalink
Merge pull request #5 from hsf-training/anil-changes-hsf-training-dat…
Browse files Browse the repository at this point in the history
…abases-basics

Anil changes hsf training databases basics
  • Loading branch information
michmx authored Oct 27, 2023
2 parents 462a858 + dca2183 commit c915e83
Show file tree
Hide file tree
Showing 5 changed files with 388 additions and 36 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,4 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
<!-- prettier-ignore-end -->
<!-- ALL-CONTRIBUTORS-LIST:END -->

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
218 changes: 208 additions & 10 deletions _episodes/02-sql-basics.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "MySQL basics"
teaching: 60
exercises: 30
title: "MySQL Basics"
teaching: x
exercises: x
questions:
- ""
- ""
Expand All @@ -12,15 +12,213 @@ keypoints:
- ""
- ""
---
The example-based nature of [Matplotlib documentation](https://matplotlib.org/) is GREAT.

Matplotlib is the standard when it comes to making plots in Python. It is versatile and allows for lots of functionality and different ways to produce many plots.
We will be focusing on using matplotlib for High Energy Physics.

# A simple example
## SQL Commands

As with any Python code it is always good practice to import the necessary libraries as a first step.
Here are some of the core SQL commands that you'll use in MySQL:

```python
import matplotlib.pyplot as plt
- `CREATE DATABASE`: Create a new database.
- `CREATE TABLE`: Create a new table.
- `INSERT INTO`: Insert new records into a table.
- `SELECT`: Retrieve data from a database.
- `UPDATE`: Modify existing data in a table.
- `DELETE`: Remove data from a table.

## Setting up for sql commands
In the terminal , run the following command
~~~bash
docker exec -it metadata bash -c "mysql -uroot -pmypassword"
~~~
Then you will see mysql command prompt as ``mysql>`` . All the sql command has to be typed in this command prompt.


## Create a database.
We will first create a database named ``metadata`` in our mysql server.
```sql
CREATE DATABASE metadata;
```

To work with a specific database, you can use the USE command. For instance, to select the "metadata" database:
```sql
USE metadata;
```
~~~
Database changed
~~~
{: .output}

## Creating a table

In SQL, the CREATE TABLE command is used to define a new table within a database. When creating a table, you specify the structure of the table, including the names of its columns, the data types for each column, and any constraints that ensure data integrity.

Here's a breakdown of the components within the command ``CREATE TABLE <table_name> (<colunmn_name> <data_type> <constraints>)``command:

- ``<table_name>``: This is the name of the table you're creating. It should be meaningful and reflect the type of data the table will store.

- ``<colunmn_name>``: you define the columns of the table.

- ``<data_type>``: This defines the kind of data that a column in your table can hold. Choosing the right data type is crucial because it determines how the data will be stored and processed. Some example for commonly used data types are:
- INT (Integer): This data type is used for whole numbers.
- VARCHAR(n) (Variable Character): This is used for storing variable-length character strings. The (n) represents the maximum length of the string, ensuring that the stored data does not exceed a specified limit.
- TEXT: The TEXT data type is used for storing longer text or character data. It's suitable for holding textual content that can vary in length. Unlike VARCHAR, which has a specified maximum length, TEXT allows for storing larger and more flexible text content.

- ``<constraints>``: You can apply constraints to columns. Common constraints include:
- NOT NULL: This ensures that a value must be provided for the column in every row.
- UNIQUE: This guarantees that each value in the column is unique across all rows in the table.
- PRIMARY KEY: Designates a column as the primary key, providing a unique identifier for each row.

By combining these elements, you define the table's structure and ensure data consistency and uniqueness. This structured approach to table creation is fundamental to relational databases and is a key part of database design. Keep in mind a database can have multiple tables.

Now, let's proceed with creating our "dataset" table in "metadata" database.

```sql
CREATE TABLE dataset (
id INT AUTO_INCREMENT PRIMARY KEY,
filename VARCHAR(255) NOT NULL UNIQUE,
run_number INT NOT NULL,
total_event INT NOT NULL,
collision_type TEXT,
data_type TEXT,
collision_energy INT NOT NULL
);
```
You can see the table and corresponding columns by using the command
```sql
SHOW COLUMNS FROM dataset;
```
~~~
+------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| filename | varchar(255) | NO | UNI | NULL | |
| run_number | int | NO | | NULL | |
| total_event | int | NO | | NULL | |
| collision_type | text | YES | | NULL | |
| data_type | text | YES | | NULL | |
| collision_energy | int | NO | | NULL | |
+------------------+--------------+------+-----+---------+----------------+
~~~
{: .output}

## INSERT record into table
You can use the INSERT INTO command to add records to a table. This command has structure ``INSERT INTO <table_name> (<column_name>) Values (<column_value>)``.
Here's an example of inserting data into the "dataset" table:
```sql
INSERT INTO dataset (filename, run_number, total_event, collision_type, data_type, collision_energy)
VALUES ("expx.myfile1.root", 100, 1112, "pp", "data", 11275);
```


```sql
INSERT INTO dataset (filename, run_number, total_event, collision_type, data_type, collision_energy)
VALUES ("expx.myfile2.root", 55, 999, "pPb", "mc", 1127);
```

## Search record in the table

The SELECT command allows you to retrieve records from a table. To retrieve all records from the "dataset" table, you can use:
```sql
SELECT * FROM dataset;
```
~~~
mysql> SELECT * FROM dataset;
+----+-------------------+------------+-------------+----------------+-----------+------------------+
| id | filename | run_number | total_event | collision_type | data_type | collision_energy |
+----+-------------------+------------+-------------+----------------+-----------+------------------+
| 1 | expx.myfile1.root | 100 | 1112 | pp | data | 11275 |
| 2 | expx.myfile2.root | 55 | 999 | pPb | mc | 1127 |
+----+-------------------+------------+-------------+----------------+-----------+------------------+
~~~
{: .output}
You can select specific columns by listing them after the SELECT statement:
```sql
SELECT filename FROM dataset;
```
~~~
+-------------------+
| filename |
+-------------------+
| expx.myfile1.root |
| expx.myfile2.root |
+-------------------+
~~~
{: .output}
2 rows in set (0.00 sec)

### Search with some condition
To filter data based on certain conditions, you can use the WHERE clause. This allows you to filter rows based on conditions that you specify. For example, to select filenames where the "collision_type" is 'pp':
```sql
SELECT filename FROM dataset WHERE collision_type='pp';
```
In addition you can use logical operators such as AND and OR to combine multiple conditions in the WHERE statement.
```sql
SELECT filename FROM dataset WHERE run_number > 50 AND collision_type='pp';
```

{: .source}

> ## SELECT on different condition
>
> Get the filename of condition total_event > 1000 and data_type is "data".
>
> > ## Solution
> >
> > ```sql
> >SELECT filename FROM dataset WHERE event_number > 1000 AND data_type='mc';
> > ```
> > {: .source}
> >
> > ~~~
>> +-------------------+
>>| filename |
>> +-------------------+
>> | expx.myfile1.root |
>> +-------------------+
>> 1 row in set (0.00 sec)
> > ~~~
> > {: .output}
> {: .solution}
{: .challenge}
## UPDATE
The UPDATE command is used to make changes to existing record. For example, if you want to update the "collision_type" and "collision_energy" for a specific record, you can use:
```sql
UPDATE dataset
SET collision_type = 'PbPb', collision_energy = 300
WHERE filename = 'expx.myfile1.root';
```
> ## Update on a condition
>
> update the total_event of file "expx.myfile2.root" to 800.
>
> > ## Solution
> >
> > ```sql
> > UPDATE dataset
> > SET total_event = 800
> > WHERE filename = 'expx.myfile2.root';
> > ```
> > {: .source}
> >
> > ~~~
>> +-------------------+
>>| filename |
>> +-------------------+
>> | expx.myfile1.root |
>> +-------------------+
>> 1 row in set (0.00 sec)
> > ~~~
> > {: .output}
> {: .solution}
{: .challenge}
## DELETE
The DELETE command is used to remove record from a table. To delete a record with a specific filename, you can use:
```sql
DELETE FROM dataset
WHERE filename = 'expx.myfile2.root';
```
82 changes: 74 additions & 8 deletions _episodes/04-mysql-and-python.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "MySQL and Python"
teaching: 60
exercises: 30
teaching: x
exercises: x
questions:
- ""
- ""
Expand All @@ -12,15 +12,81 @@ keypoints:
- ""
- ""
---
The example-based nature of [Matplotlib documentation](https://matplotlib.org/) is GREAT.

Matplotlib is the standard when it comes to making plots in Python. It is versatile and allows for lots of functionality and different ways to produce many plots.
We will be focusing on using matplotlib for High Energy Physics.
## Why python with SQL?

# A simple example
## Install mysql connector
To install the MySQL Python library, you can use mysql-connector-python, which is a popular MySQL driver for Python. You can install it using pip, the Python package manager. Open a terminal or command prompt
```bash
pip install mysql-connector-python
```
If you are using Python 3, you might need to use pip3 instead of pip:
```bash
pip3 install mysql-connector-python
```

## Connecting to mysql server

```python
import mysql.connector

# Establish a connection to the MySQL server
connection = mysql.connector.connect(
host="localhost", user="root", password="mypassword"
)

# Create a cursor to execute SQL commands
cursor = conn.cursor()
```
## Creating a Database

We will first create a database named "metadata" in our MySQL server.



```python
cursor.execute("CREATE DATABASE metadata")
```

To work with a specific database, you can use the USE command. For instance, to select the "metadata" database:
```python
cursor.execute("USE metadata")
```

As with any Python code it is always good practice to import the necessary libraries as a first step.


## Create a table

Tables are used to structure and store data. Here's how you can create a table named "dataset" within the "metadata" database:

```python
import matplotlib.pyplot as plt
# Define the CREATE TABLE SQL command

create_table_query = """
CREATE TABLE dataset (
id INT AUTO_INCREMENT PRIMARY KEY,
filename VARCHAR(255) NOT NULL UNIQUE,
run_number INT NOT NULL,
total_event INT NOT NULL,
collision_type TEXT,
data_type TEXT,
collision_energy INT NOT NULL
)
"""
# Execute the CREATE TABLE command
cursor.execute(create_table_query)
```



## Insert into table
Loading

0 comments on commit c915e83

Please sign in to comment.