- Created a machine learning model that predicts whether a Pokemon will be competitively viable
- Scraped over 1000 rows of data from multiple pages on Serebii.net
- Pulled and transformed additional data from the established open-source Pokemon Battle simulator Pokemon Showdown (via their respective APIs and GitHub repositories)
- Performed in-depth analysis to understand how features are related and connected with viability
- Tested Logistic Regression, Decision Tree, Gaussian Naive Bayes, K-Nearest Neighbors, Random Forest, Support Vector Machine, and Gradient-Boosted Tree algorithms with Stratified K-Fold Cross Validation to find which models to explore further
- Optimized Logistic Regression, Decision Tree, Random Forest, and Gradient-Boosted Tree models with Cross-Validated Grid Search to arrive at the best model
- Analyzed the best model's Permutation Importance, Partial Dependence, and Shapley Values to further understand the connections between the features and viability
Python: >= 3.9
Tcl, TypeScript
Dependencies:
beautifulsoup4
joblib
matplotlib
numpy
pandas
requests
scipy
scrapy
seaborn
scikit-learn
tqdm
- Scraped Serebii.net pages:
- smogon/pokemon-showdown (GitHub)
Used Scrapy to scrape from the Serebii.net pages (see Resources Used) to obtain the following data on each Pokemon:
name
primary type
secondary type
primary ability
secondary ability
hidden ability
hp
attack
defence
special attack
special defense
speed
For Pokemon that were not scraped from the National Pokedex page, an additional field alternate
was added and set to True
.
Showdown_API.ipynb
Add_Tiers.ipynb
The scrape from Serebii.net was incomplete, as it had missed the alternate forms of many pokemon. These alternate forms were added in manually.
Data was also pulled from the Pokemon Showdown Battle Simulator API. This information had to be heavily transformed:
- Created a column for whether the Pokemon had any special tags (i.e., whether the Pokemon was labelled specifically)
- Included Legendary and Mythical Pokemon
- Created a column that represented whether the Pokemon was a Final Evolution (transformed from the Pokemon Showdown API's column
evos
) - Created a row that allowed easy conversion from the Pokemon's official name to the Pokemons's working name in the Pokemon Showdown database
Additional data on the tiering of the Pokemon was pulled directly from a file in the Pokemon Showdown GitHub respotiory (See Resources Used)
The tier of the Pokemon is a ranking of how viable the Pokemon is.
This was transformed from a ranking (Uber
, OU
, UU
, etc.) to an One-versus-All (OVA) approach (Viable versus Not Viable).
I looked at the distribution of stats, as well as the relationship between a Pokemon's types, stats, their status as a legendary, and their viability. Below are some visualization highlights:
Data_Prep.py
Model_Selection.py
Model_Tuning.py
Model_Creation.py
A Preprocessing pipeline was built that dropped the unused features, dropped outliers, minmax-scaled the numerical features (or standard scaled in the case of the Gaussian Naive Bayes model), ordinally encoded boolean features, and one-hot encoded the types. The encoding of the pokemon's type had to be done specially because both type1 and type2 would be encoded in the same columns.
I tried 7 different models and evaluated them using f1
score.
I used f1
score because of the unbalanced nature of the data (around 17% of all fully evolved pokemon were viable).
However, I also tracked the model's accuracy, precision, and recall.
Here are the models:
- Logistic Regression: Baseline for prediction. Ended up scoring surprisingly well.
- Decision Tree: A typical tree-based model. It served as a baseline for the other tree-based models.
- Gaussian Naive Bayes (GNB): A simple mathematical model. Because GNB assumes independent features, I was not confident in its predictive power for this problem.
- K-Nearest Neighbors (KNN): I thought that a good predictor of a pokemon's viability could be other pokemon similar to it. Interestingly, the KNN model improved significantly with standard scaling as opposed to minmax scaling.
- Support Vector Machine: I tried an SVM, but I was skeptical that it could work because in the EDA, it was evident that a lot of unviable pokemon are similar to viable pokemon stat-wise.
- Gradient-Boosted Tree (from
sklearn
): Because of the success of the Decision Tree and the Random Forest, I added in a simple Gradient-Boosted Tree, since they're typically powerful right out of the box.
Before tuning, here are the f1
scores of the models.
(f1
is a measure of the harmonic mean of Precision and Recall and ranges from 0
to 1
. f1
scores change slightly with each evaluation):
Logistic Regression | Decision Tree | Naive Bayes | Random Forest | K-Nearest Neighbor | Support Vector Machine | Gradient-Boosted Tree |
---|---|---|---|---|---|---|
0.643 | 0.632 | 0.571 | 0.690 | 0.472 | 0.575 | 0.670 |
It's interesting to note that while nearly all the models had a much higher precision than recall, the Naive Bayes model had the reverse: a very good recall score of 0.702
.
After running several runs of the Stratified K-Fold Cross Validation and finding the mean f1
score for each model for each run, I decided to continue tuning with the Logistic Regression, Decision Tree, Random Forest, and Gradient-Boosted Tree models, using Cross-Validation Grid Search.
Scores post-tuning on validation data:
Logistic Regression | Decision Tree | Random Forest | Gradient-Boosted Tree |
---|---|---|---|
0.769 | 0.700 | 0.800 | 0.737 |
Ultimately, the Random Forest model outperformed the other models, not only during testing and validation, but also in that it was less likely to be horribly wrong (less variation in f1
scores).
After creating the Random Forest model and training it on the entire dataset, I used several different techniques to evaluate how the model made its decision. Here are some highlights from that process.
Ultimately, I succeeded in creating a model that could semi-reliably predict the viability of a Pokemon (the f1
scores of each model were prone to changing across multiple successive runs of StratifiedKFold Cross Validation). However, in collecting data and processing it for the model, I had deliberately omitted several factors:
- Ability: a Pokemon's best ability has a huge impact on its viability. A bad ability can render a great Pokemon useless, while an amazing ability can render a terrible Pokemon incredibly powerful
- Movepool: some Pokemon are not great stat-wise, but have access to amazing moves that allow them to provide great utility or have enormous power.
- Context: The viability of a Pokemon is judged within the context of the metagame: the other Pokemon it has to play with and against.
These factors were omitted for simplicity in data collection and for time reasons. Their omission means that any model, no matter how tuned, will never be able to perfectly predict the viability of pokemon with the given dataset. However, the models performed noticeably higher than baseline, meaning that the given data--stats, legendary status, and typing--still plays a significant role in the viability of Pokemon. This goes against common consensus in the Pokemon competitive community, where movepool and abilities are often favored over stats.
To continue this project further, additional data--such as the missing data mentioned above--could be collected and a new model created that inputs these features. There could also be additional experimentation with different ways to encode the "type" feature. Finally, that model could be productionalized into a tool that helps Pokemon players easily determine the strength of different Pokemon at the start of a generation, when such information is not yet common knowledge.
- My father wrote the Tcl script to convert the TypeScript file from pokemon showdown into JSON so that it could be easily imported into Python
- ChatGPT for various odds and ends as well as general understanding
- PlayingNumbers/ds_salary_proj (GitHub) by Ken Jee as a README template