Extract information from all games published in Steam thanks to its Web API, and store it in JSON format.
I used this code to generate these dataset: 'Steam Games Dataset'.
- Pyhton 3.8
- requests and argparse.
pip3 install requests argparse
Start generating data simply with:
python SteamGamesScraper.py
The first time, the file 'appplist.json' will be created with all the ID that facilitates Steam (>140K). In the next execution, that file will be used instead of requesting all the data again. If you want to get new IDs, simply delete the file 'appplist.json'.
Only the data of the games are saved. DLCs, music, tools, etc. are ignored and added to the file 'discarted.json' so as not to ask for them in future searches. You can delete the file to ask again for those IDs.
Finally, in the file 'games.json' all games are stored, if:
- It have been already been released.
- 'developers' field not empty.
- Price included if its not free.
The format is this:
{
"906850": {
"name": "...",
"release_date": {
"coming_soon": false,
"date": "..."
},
"required_age": 0,
"is_free": false,
"price": 0.99,
"detailed_description": "...",
"supported_languages": "...",
"reviews": "...",
"header_image": "...",
"website": "...",
"support_url": "...",
"support_email": "...",
"windows": true,
"mac": false,
"linux": false,
"metacritic_score": 0,
"metacritic_url": "...",
"achievements": 0,
"recommendations": 0,
"notes": "",
"packages": [
{
"title": "...",
"description": "...",
"subs": [
{
"text": "...",
"description": "...",
"price": 0.99
}
]
}
],
"developers": [
"..."
],
"publishers": [
"..."
],
"categories": [
"..."
],
"genres": [
"..."
],
"screenshots": [
"..."
],
"movies": [
"..."
]
},
...
}
In the file 'ParseExample.py' you can see a simple example of how to parse the information.
To change the output file uses the parameter '-o' / '-outfile':
python SteamGamesScraper.py -o output.json
Steam can reject, or even banner your IP, if he considers that you are doing too many requests. That's why 5.0 seconds are waited by default. You can change this with the parameter '-s' / '-sleep':
python SteamGamesScraper.py -s 2.0
It is not recommended to set the wait time below 5.0 seconds.
When Steam denies a request, by default it is trying up to four times. You can change the number of retries with '-r' / '-retries':
python SteamGamesScraper.py -r 10
Although it is not recommended, you can set always retry by changing the value to 0.
The games that have not yet been released are added to the file 'notreleased.json' and will not be checked again. If you want to ignore this list, you can set the parameter '-d' / '-released' to False, or eliminate the file.
At the end of the scan, or by pressing Ctrl + C, all data are recorded. You can activate the auto-save to activate each X new entries with '-a' / '-autosave':
python SteamGamesScraper.py -a 100
A backup file will also be generated with the previous data.
Code released under MIT License.