This was my first job with data manipulation using only the Python language. This work was done in my first semester of college, and after 3 semesters I decide to rewrite the code to see how much I've evolved.
The main goal was to rewrite the code using new concepts I've learned since then, concepts like software engineering, clean code, English, data structure and making the code as efficient as possible.
-
I will explain and list the differences of the differents versions and their functions so that we can see in depth the optimization of the code. The file "data_manipulation.py" of folder V1 was the first version made and is in Portuguese, the file "data_manipulation.py" of folder V2 is the optimized code and is in English. The the file "data_manipulation.ipynb" of folder V3 is de last version, I used some libraries like Pandas and write the code with Jupyter.
-
The file "original_database.csv" is the original file found at the link "https://www.kaggle.com/ahsen1330/us-police-shootings", was downloaded in 2020, may have been updated after that date.
-
The file "ajusted_database.csv" is a copy of the file "original_database.csv" with some corrections, because the original file had some data not filled and this caused many errors in the application.
-
The file "invented_database.csv" is a file with the same columns as the original dataset, but with dummy data created by me.
-
In front of each function is a brief comment saying what it does.
-
PLEASE SEE OPTION 10 OF BOTH VERSIONS, THEY SHOW A GREAT SPEED OF PRODUCTIVITY AND THE BIG CHANGE ABOUT MY EVOLUTION.
Many values were entered locally, but you can very easily substitute an input. The code was produced to better adapt to new updates. From v1 to v2 it is possible to notice an excellent optimization when we move from a more functional programming to one that is similar to OOP. From v2 to v3 we can see how libraries such as pandas can give you more productivity and fewer lines of code.
To run this application, Jetbrains' Pycharm IDE was used (https://www.jetbrains.com/pt-br/pycharm/), but you can choose one of your choice. The Python version used to develop the application was 3.10.6. I used libraries like Matplotlib (https://matplotlib.org/) and Pandas (https://pandas.pydata.org/). V1 and V2 can be run in Pycharm, but V3 needs to be run in Jupyter Notebook (https://jupyter.org/). I recommend using Anacondas (https://www.anaconda.com/products/distribution)
Gabriel Carvalho