recommended open datasets
- AWS Public Datasets - large datasets and references to current use cases
- Chars74K Dataset - character recognition in natural images
- Fast.AI Datasets - Image classification/localization and NLP datasets
- Frontal Face Image Dataset - image recognition dataset for face recognition
- ImageNet - an image database organized by WordNet hierachy
- MNIST Dataset - handwritten digits
- Movie Review Dataset - movie review datasets, especially for sentiment analysis
- NLP Datasets - NLP datasets by @niderhoff
- SMS Spam Dataset
- Stanford Large Network Datasets (SNAP) - large network datasets
- Twitter Sentiment Analysis Dataset
- UC Irvine Machine Learning Repository
- Yahoo! Webscope Datasets - variety of more "scientifically" useful datasets
- YouTube-8M Dataset - large-scaled labeled video dataset
- Awesome Public Datasets - HUGE list of open datasets across a number of domains and studies
- Cool Datasets - well, as it says, cool datasets
- Data for Everyone - assortment of data collections by Figure Eight (previously Crowd Flower)
- Data Is Plural - Spreadsheet from the Data Is Plural Newsletter
- FiveThirtyEight Data - datasets from FiveThirtyEight stories
- Kaggle Datasets - variety of datasets related to data competitions at Kaggle.com
- KDNuggets Datasets - some datasets identified by KDNuggets
- LA Times Data Desk - Repos and analysis of LA Times stories
- MovieLens - movie ratings and tag applications datasets
- r/datasets - sharing and discussion about available datasets
- Open Data Inception - collection of open data portals worldwide
- ProPublica Datasets - some free, some paid datasets from ProPublica stories
- Robb Seaton 100+ Interesting Datasets for Statistics - wide variety of datasets
- Data.gov - federal government open datasets