Python Web Scraper
- Scrap all search results for a keyword entered as an argument.
- Can be saved as
.csv
and.json
. - Also collect user data who uploaded contents included in search results.
pip install default-scraper
or
pip install git+https://github.com/Seongbuming/crawler.git
from default_scraper.instagram.parser import InstagramParser
USERNAME = ""
PASSWORD = ""
KEYWORD = ""
parser = InstagramParser(USERNAME, PASSWORD, KEYWORD, False)
parser.run()
Run following command to scrap contents from Instagram:
python main.py --platform instagram --keyword {KEYWORD} [--output_file OUTPUT_FILE] [--all]
Use --all
or -a
option to also scrap unstructured fields.
from default_scraper.googleplay.review.parser import GooglePlayReviewParser
APP_ID = ""
parser = GooglePlayReviewParser(APP_ID)
parser.run()
python main.py --platform googleplay_review --keyword {APP_ID} [--output_file OUTPUT_FILE]
- Structured fields
pk
id
taken_at
media_type
code
comment_count
user
like_count
caption
accessibility_caption
original_width
original_height
images
- Some fields may be missing depending on Instagram's response data.
review_id
author
review_text
rating
writed_time
- Will support scraping from more platform services.