- Tristan CHEVET
- Emilie ROUX
- Norbet ASTIER
- Massimo BACCI
In this project :
- Reading a csv file and putting it in a dataset
- Shuffling of the dataset
- Creation of a trainset and testset
- Chi² test on the test dataset
- Principal Component Analysis of the dataset
The Chi² test is to make sure our test set is not biased.
In order to have a good accuracy on a model we need to have a good equirepartition of data in the test set.
Therefore we run the Chi² test on 2 list, OBSERVED and EXPECTED.
Chi² needs a null hypotesis, here we used « The EXPECTED and OBSERVED values are independant ».
If the null hypotesis is rejected it means the test set is biased.
If the Chi² test fails to reject null hypotesis it means the test set has a good equirepartition.
Example:
We have a dataset of 30 DOG, 30 CAT and 30 GIRAFFE.
The null hypotesis is « The EXPECTED and OBSERVED values are independant »
OBSERVED is the real quantities of values in the test set so [5,15,10]
EXPECTED is the quantities of values we want in the test set so [10,10,10]\
Here you can see that the precision is not going to be good because we have more CAT than DOG.
So Chi² test will reject the null hypotesis.
Read more here https://towardsdatascience.com/chi-square-statistic-chi-squared-distribution-2499084b5da8
PCA is a statistical procedure that uses an orthogonal transformation that converts a set of correlated variables to a set of uncorrelated variables.
In this project we used the famous Iris-dataset.
We took 4 columns to analyse
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
With the help of the librairie pca-js, we found we only needed 2 variables to analyse the dataset and thus reducing the dimension of 4 to 2.
In order to run the project, you need to install Node JS : https://nodejs.org/en/download/
You then ned to install all of the necessaries libraries with the following command :
With npm :
npm install
To start the project you just need to execute the following command :
With npm :
npm start
To test the project you just need to execute the following command :
With npm :
npm test
To test the code-style of the code you just need to execute the following command :
With npm :
npm run code-style
When we have a pull request, we use Github Actions (the «pull_request_git.yml» file) to make sure the code:
- Doesn't merge directly
- Passes all required test (the style-code and the npm test) to see if there is any mistake in the code
ramda :
@stdlib/stats-chi2test (For the equirepartition on a dataset test) :
https://www.npmjs.com/package/@stdlib/stats-chi2test
pca-js :
https://www.npmjs.com/package/pca-js
chai :
gulp :
mocha :
xo :