Skip to content

abdulazizalmass/ML_Datasets

 
 

Repository files navigation

Machine Learning Datasets

The repo comes loaded with following datasets :

  1. Santa/NoSanta
  2. Dogs/Cats
  3. Human/Horses
  4. SportsClassification
  5. Smile/noSmile
  6. Food-5k
  7. NIH malaria
  8. Cyclone_Wildfire_Flood_Earthquake_Database
  9. Breast Cancer dataset(idc)
  10. spatial_envelope_256x256_static_8outdoorcategories
  11. Facial Expression

All what you need is to clone the repo, feel free to fork the repo and add more datasets, more to be added soon. You can also run the script exploreDataset.py to get insight on the dataset. for example:

python exploreDataset.py --datasetDir Cyclone_Wildfire_Flood_Earthquake_Database --channels 3

Santa/NoSanta

Dataset collected by Adrian Rosebrock

[INFO] Total images of datasets/Santa is 922
[INFO] Total images of datasets/Santa/not_santa is 461
[INFO] Total images of datasets/Santa/santa is 461

Sample curve output from training cats vs dogs dataset

Dogs/Cats

Dataset from kaggle

[INFO] Total images of datasets/cats_and_dogs is 3,000
[INFO] Total images of datasets/cats_and_dogs/train is 2,000
[INFO] Total images of datasets/cats_and_dogs/train/dogs is 1,000
[INFO] Total images of datasets/cats_and_dogs/train/cats is 1,000
[INFO] Total images of datasets/cats_and_dogs/validation is 1,000
[INFO] Total images of datasets/cats_and_dogs/validation/dogs is 500
[INFO] Total images of datasets/cats_and_dogs/validation/cats is 500

Sample curve output from training cats vs dogs dataset

Human/Horses

Dataset from kaggle

[INFO] Total images of datasets/horse-or-human is 1,283
[INFO] Total images of datasets/horse-or-human/train is 1,027
[INFO] Total images of datasets/horse-or-human/train/humans is 527
[INFO] Total images of datasets/horse-or-human/train/horses is 500
[INFO] Total images of datasets/horse-or-human/validation is 256
[INFO] Total images of datasets/horse-or-human/validation/humans is 128
[INFO] Total images of datasets/horse-or-human/validation/horses is 128

Sample curve output from training cats vs dogs dataset

SportsClassification:

22 types of sports in a total of 14,405 images, Dataset from this link). The type of sports are [Swimming ,Badminton,Wrestling,Olympic Shooting,Cricket,Football,Tennis,Hockey,Ice Hockey,Kabaddi,WWE,Gymnasium,Weight lifting,Volleyball,Table tennis,Baseball,Formula 1,Moto GP,Chess,Boxing,FencingBasketbal]

[INFO] Total images of datasets/SportsClassification is 14,362
[INFO] Total images of datasets/SportsClassification/gymnastics is 711
[INFO] Total images of datasets/SportsClassification/wrestling is 601
[INFO] Total images of datasets/SportsClassification/football is 784
[INFO] Total images of datasets/SportsClassification/cricket is 665
[INFO] Total images of datasets/SportsClassification/baseball is 731
[INFO] Total images of datasets/SportsClassification/ice_hockey is 707
[INFO] Total images of datasets/SportsClassification/wwe is 667
[INFO] Total images of datasets/SportsClassification/basketball is 490
[INFO] Total images of datasets/SportsClassification/table_tennis is 705
[INFO] Total images of datasets/SportsClassification/volleyball is 703
[INFO] Total images of datasets/SportsClassification/motogp is 668
[INFO] Total images of datasets/SportsClassification/fencing is 624
[INFO] Total images of datasets/SportsClassification/hockey is 569
[INFO] Total images of datasets/SportsClassification/swimming is 683
[INFO] Total images of datasets/SportsClassification/chess is 476
[INFO] Total images of datasets/SportsClassification/tennis is 714
[INFO] Total images of datasets/SportsClassification/badminton is 928
[INFO] Total images of datasets/SportsClassification/boxing is 704
[INFO] Total images of datasets/SportsClassification/weight_lifting is 572
[INFO] Total images of datasets/SportsClassification/kabaddi is 452
[INFO] Total images of datasets/SportsClassification/formula1 is 676
[INFO] Total images of datasets/SportsClassification/shooting is 532

Sample curve output from training cats vs dogs dataset

Smile/noSmile dataset:

Dataset from this link

[INFO] Total images of datasets/SMILES is 13,165
[INFO] Total images of datasets/SMILES/smiling is 3,690
[INFO] Total images of datasets/SMILES/notsmiling is 9,475

Sample curve output from training cats vs dogs dataset

Food5K:

Dataset from Kaggle

[INFO] Total images of datasets/food-5k is 5,000
[INFO] Total images of datasets/food-5k/train is 3,000
[INFO] Total images of datasets/food-5k/train/_noFood is 1,500
[INFO] Total images of datasets/food-5k/train/food is 1,500
[INFO] Total images of datasets/food-5k/evaluation is 1,000
[INFO] Total images of datasets/food-5k/validation is 1,000
[INFO] Total images of datasets/food-5k/validation/_noFood is 500
[INFO] Total images of datasets/food-5k/validation/food is 500

Sample curve output from training cats vs dogs dataset

NIH malaria dataset:

The dataset from this link

[INFO] Total images of datasets/NIHmalaria is 27,558
[INFO] Total images of datasets/NIHmalaria/Parasitized is 13,779
[INFO] Total images of datasets/NIHmalaria/Uninfected is 13,779

Sample curve output from training cats vs dogs dataset

Cyclone_Wildfire_Flood_Earthquake_Database

The dataset is collected by Gautam Kumar

[INFO] Total images of Cyclone_Wildfire_Flood_Earthquake_Database is 4,428
[INFO] Total images of Cyclone_Wildfire_Flood_Earthquake_Database/Flood is 1,073
[INFO] Total images of Cyclone_Wildfire_Flood_Earthquake_Database/Wildfire is 1,077
[INFO] Total images of Cyclone_Wildfire_Flood_Earthquake_Database/Earthquake is 1,350
[INFO] Total images of Cyclone_Wildfire_Flood_Earthquake_Database/Cyclone is 928
sample_disaster

Breast Cancer dataset(idc):

The orignal dataset is from Kaggle however a great job is done by by Adrian Rosebrock to format the data to be ready for training, this formated data is the one included in repo.

[INFO] Total images of idc is 277,524
[INFO] Total images of idc/training is 199,818
[INFO] Total images of idc/training/0 is 143,065
[INFO] Total images of idc/training/1 is 56,753
[INFO] Total images of idc/testing is 55,505
[INFO] Total images of idc/testing/0 is 39,711
[INFO] Total images of idc/testing/1 is 15,794
[INFO] Total images of idc/validation is 22,201
[INFO] Total images of idc/validation/0 is 15,962
[INFO] Total images of idc/validation/1 is 6,239

Sample curve output from training cats vs dogs dataset

spatial_envelope_256x256_static_8outdoorcategories

This dataset contains 8 outdoor scene categories: coast, mountain, forest, open country, street, inside city, tall buildings and highways. it is originly from this link

[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories is 2,688
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/forest is 328
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/highway is 260
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/coast is 360
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/insidecity is 308
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/tallbuilding is 356
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/street is 292
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/mountain is 374
[INFO] Total images of spatial_envelope_256x256_static_8outdoorcategories/opencountry is 410

sample

FacialExpression

Dataset from Kaggle

[INFO] Total images of FacialExpression is 35,887
[INFO] Total images of FacialExpression/Happy is 8,989
[INFO] Total images of FacialExpression/Sad is 6,077
[INFO] Total images of FacialExpression/Fear is 5,121
[INFO] Total images of FacialExpression/Surprise is 4,002
[INFO] Total images of FacialExpression/Neutral is 6,198
[INFO] Total images of FacialExpression/Angry is 4,953
[INFO] Total images of FacialExpression/Disgust is 547

sample

About

Open source ML datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published