Go scrape is a little open source project I created to make it easy to bulk download demofiles for the FPS CS:GO from the popular CS:GO fansite hltv.org.
GoScrape is on PyPi, so you can use pip
to install it.
pip install goscrape
GoScrape consists of two main commands.
command | description |
---|---|
events |
used in the first step to create a json lookup file containing important and structured information about CS:GO esports events in a given timeframe and if specified also links to associated demofiles and matches. |
fetch |
build on top of the events command and can be used to bulk download the demofile json output from the events command otherwise a single event id can be specified to simply download demofiles for that event. |
argument | datatype | description | notes | |
---|---|---|---|---|
STARTDATE | string | the start date from when evet data should be gathered | formatted as string 'YYYY-MM-DD' | required |
ENDDATE | string | the date to which event data should be gathered | formatted as string 'YYYY-MM-DD' | required |
STORAGEPATH | string | the directory or filepath to which the resulting json should be stored | optional (default is cwd) | |
MATCHES | boolean | whether match information and demofile urls should be scraped as well | This flag is required if the resulting json file should be used for the fetch command |
optional (True if present) |
EVENT TYPE | enum | Which type of event datashould be pulled (Online, Lan ...) | optional (default is online) |
The Objects in the resulting json are identified by their event id given as a key and will look something like this:
{
"6475": {
"event_data": {
"entity": "event",
"event_id": "6475",
"event_url": "https://www.hltv.org/events/6475/iem-dallas-2022-oceania-open-qualifier-2",
"event_name_encoded": "iem-dallas-2022-oceania-open-qualifier-2",
"event_name_full": "IEM Dallas 2022 Oceania Open Qualifier 2",
"nr_of_teams": "8+",
"prize": "Other",
"event_type": "Online",
"location": "Oceania (Online)",
"event_start": "2022-04-20",
"event_end": "2022-04-21"
},
"matches": [
{
"entity": "match",
"teams": ["Paradox", "Aftershock"],
"date_time": "2022-04-21 10:00:00",
"match_url": "https://www.hltv.org//matches/2355881/paradox-vs-aftershock-iem-dallas-2022-oceania-open-qualifier-2",
"demo_id": "71497",
"demo_url": "https://www.hltv.org/download/demo/71497"
}
]
}
argument | datatype | description | notes | |
---|---|---|---|---|
EVENT ID | string | int | the start date from when evet data should be gathered | LOOKUP FILE & EVENT ID are mutually exclusive only one can be used |
required |
LOOKUP FILE | string | the filepath of the by the events command generated lookup that should be sued for demo downloading | LOOKUP FILE & EVENT ID are mutually exclusive only one can be used |
required |
STORAGEPATH | string | the directory to which the demofiles should be written | optional (default is cwd) | |
MULTIPROCESSING | boolean | whether multiprocessing should be utilized to speed up downloading | optional (True if present) |
This tool nor I have any affiliation with HLTV. I originally built this CLI to aid in my ability to download demos for scientific research purposes. I made it publicly availible because I thought it might benefit others as well. If you download a lot of demos the tool will automatically implement a sleep time to avoid a temporary cloudflar ban.
- Resolved an issue where the package failed to gather the file name of the fetched demo file
- Bug fixes and improvements
- Bug fixes on multiprocessed downloading
- Initial release
Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
If you expierience any issues please message me or raise an issue here