Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goko integration #48

Open
wants to merge 61 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
d76205b
added Dark Ages cards and types Ruins, Looter, Shelter, Knight
michaeljb Apr 3, 2013
4c67812
added Dark Ages cards and types Ruins, Looter, Shelter, Knight
michaeljb Apr 3, 2013
a3069af
end of file whitespace
michaeljb Apr 3, 2013
a7a8e06
included michaeljb's dark ages csv, converted it to the .js , modifie…
ftlftw May 1, 2013
17f3191
Implemented goko log scraping, puts it into .all.tar.bz2 and .bz2.tar…
ftlftw May 2, 2013
03f837e
Slowing down and delaying updates, goko downloads are very slow. That…
ftlftw May 2, 2013
abba606
add in goko scraping in another place, here in the background tasks …
ftlftw May 2, 2013
fca369f
tar goko games as their downloaded, to avoid endless argument list.
ftlftw May 2, 2013
0bfb12a
Slow down updates.
ftlftw May 2, 2013
60edb8a
Bugfixes for goko log scraping
ftlftw May 3, 2013
106fad5
test_dominioncards now tests correctly with DA.
ftlftw May 3, 2013
4bc06fd
Adding feodum scoring to game.py and test_game
ftlftw May 4, 2013
b689060
Putting knights back in card list, easier parsing despite not being c…
ftlftw May 4, 2013
c6336a9
working on parsing. Not done yet.
ftlftw May 9, 2013
3b0a976
Working on parsing. Not done yet.
ftlftw May 9, 2013
7152089
Some fixes to make all the old parse_game tests work.
ftlftw May 9, 2013
eff3812
Fixing feodum scoring
ftlftw May 14, 2013
7bc6e91
baseline parsing of goko games works, some games parse but many don't…
ftlftw May 14, 2013
6334987
Continuing to work on parsing, adding more test cases.
ftlftw May 22, 2013
0a19147
test games
ftlftw May 22, 2013
287e145
Minor changes to test cases
ftlftw May 22, 2013
6dd9db1
adding one more test case
ftlftw May 22, 2013
22c0b28
Merging work from two different places on test cases
ftlftw May 22, 2013
c7d8f8d
Mining village and pstone working now. Modified card_list.csv to repo…
ftlftw May 22, 2013
dfebcde
Added multiple nested possession-outpost test
ftlftw May 23, 2013
131f41a
Improved tracking of thieved, noble-briganded, and rogued cards. Grav…
ftlftw May 24, 2013
e586cd9
adding more test cases for parsing
ftlftw May 24, 2013
192dfe9
Prepared for tests of many variable-coin cards. Framework only, not i…
ftlftw May 25, 2013
8447264
Merge branch 'master' of https://github.com/ftlftw/dominionstats
ftlftw May 25, 2013
2ba045f
redoing how to track cards in progress - a 'done_resolving' check
ftlftw May 27, 2013
ceb9a61
Continuing to work on variable coin cards
ftlftw May 28, 2013
5dee770
BoM as self-trashing cards
ftlftw May 28, 2013
eca7db8
Extra BoM/MV test
ftlftw May 28, 2013
7599271
OK, everything except city and diadem is parsed and tested.
ftlftw May 29, 2013
d056a69
Working on action-tracking for diadem
ftlftw May 29, 2013
35a9dd0
Done with parsing!!!
ftlftw May 29, 2013
7049aa7
starting to work on s3
ftlftw May 30, 2013
c7f221f
JoAT renaming...
ftlftw May 30, 2013
5a53add
Improved scraping for speed. More bugfixes, going through update script.
ftlftw May 31, 2013
4cc3e3d
Background tasks IN DEBUG MODE, CELERY COMMENTED OUT
ftlftw May 31, 2013
55f0bb0
Next I need to update test_game
ftlftw May 31, 2013
86c7384
cleanup
ftlftw May 31, 2013
eaf7340
Added goko tests to test_game, fixed iso start_decks
ftlftw May 31, 2013
86b803e
In goko, Supply line includes EVERY_SET_CARDS
ftlftw Jun 1, 2013
fda1e1e
Leaderboard loading. Fix to supply storing.
ftlftw Jun 3, 2013
c1020b2
Can't figure out supply_win... working on annotate_game next
ftlftw Jun 4, 2013
6e2575d
You know, nutki already made pretty-print, and I'm feeling pretty cod…
ftlftw Jun 4, 2013
ee294d8
Goal stats should not include adventures or bots. Trueskill should no…
ftlftw Jun 4, 2013
2b99c54
Goal stats should not include adventures or bots. Trueskill should no…
ftlftw Jun 4, 2013
6b6140a
fixed goals so AIs dont get listed, fixed some importing bugs.
ftlftw Jun 7, 2013
02f3db1
text changes to weppage
ftlftw Jun 7, 2013
bce6191
Continuing to debug. This bug was with BoM as Death Cart, trashing a …
ftlftw Jun 8, 2013
597afd0
Guilds
ftlftw Jun 14, 2013
aa55ad2
fixing order in card_list, adding herald on-play ability to parsing
ftlftw Jun 17, 2013
86946de
Game search, extra AI
ftlftw Jun 19, 2013
c4ca843
guilds bugfixes
ftlftw Jun 21, 2013
21ec547
Adding last few AIs
ftlftw Jun 21, 2013
5fd6a68
committing test game
ftlftw Jun 21, 2013
3ab32d5
adding test game
ftlftw Jun 21, 2013
708484a
Removing my s3 info, replacing with mccllstr's
ftlftw Jun 21, 2013
774928c
minor update so that multi-day imports work
ftlftw Jun 21, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Implemented goko log scraping, puts it into .all.tar.bz2 and .bz2.tar…
… files like the former iso ones.
ftlftw committed May 2, 2013
commit 17f3191231e4788475d9158d388d0bdd4e90affa
68 changes: 54 additions & 14 deletions scrape.py
Original file line number Diff line number Diff line change
@@ -6,6 +6,7 @@
import datetime
import glob
import logging
import shutil
import os
import os.path
import subprocess
@@ -14,19 +15,27 @@
import time
import urllib
import utils
import re

# if the size of the game log is less than this assume we got an error page
SMALL_FILE_SIZE = 5000

DEBUG = False
DEBUG = True

GOOD = 0
MISSING = 1
ERROR = 2
DOWNLOADED = 3
REPACKAGED = 4

# make I should just adopt the isotropic format for consistency?
CR_SOURCE = 5
GOKO_SOURCE = 6
ISO_SOURCE = 7

GOKO_LOG_RE = re.compile('"(log.\w+.\w+.txt)"', re.MULTILINE)

# Councilroom format is more similar to old isotropic format.
GOKO_FORMAT = '%(year)d%(month)02d%(day)02d/'
ISOTROPIC_FORMAT = '%(year)d%(month)02d/%(day)02d/all.tar.bz2'
COUNCILROOM_FORMAT = '%(year)d%(month)02d%(day)02d/%(year)d%(month)02d%(day)02d.all.tar.bz2'

@@ -39,6 +48,13 @@ def IsotropicGamesCollectionUrl(cur_date):
host = 'http://dominion.isotropic.org/gamelog/'
return host + FormatDate(ISOTROPIC_FORMAT, cur_date)

def GokoGamesCollectionUrl(cur_date):
host = 'http://dominionlogs.goko.com/'
return host+FormatDate(GOKO_FORMAT, cur_date)

def GokoSingleGameUrl(cur_date, cur_game):
return GokoGamesCollectionUrl(cur_date)+cur_game

def CouncilroomGamesCollectionUrl(cur_date):
host = 'http://councilroom.com/static/scrape_data/'
return host + FormatDate(COUNCILROOM_FORMAT, cur_date)
@@ -51,25 +67,50 @@ def RemoveSmallFileIfExists(fn):

def download_date(str_date, cur_date, saved_games_bundle):
urls_by_priority = [
CouncilroomGamesCollectionUrl(cur_date),
IsotropicGamesCollectionUrl(cur_date),
(CR_SOURCE, CouncilroomGamesCollectionUrl(cur_date)),
(GOKO_SOURCE, GokoGamesCollectionUrl(cur_date)),
(ISO_SOURCE, IsotropicGamesCollectionUrl(cur_date))
]

for url in urls_by_priority:
for (source, url) in urls_by_priority:
if DEBUG:
print 'getting', saved_games_bundle, 'at', url

contents = urllib.urlopen(url).read()
try:
contents = urllib.urlopen(url).read()
except IOError:
contents = "0"

if len(contents) > SMALL_FILE_SIZE:
if DEBUG:
print 'yay, success from', url, 'no more requests for', \
str_date, 'needed'
open(saved_games_bundle, 'w').write(contents)
if source == CR_SOURCE or source == ISO_SOURCE:
open(saved_games_bundle, 'w').write(contents)
elif source == GOKO_SOURCE:
games = re.findall(GOKO_LOG_RE, contents)
bundle_goko_games(cur_date, games, saved_games_bundle)
return True
elif DEBUG:
print 'request to', url, 'failed to find large file'
return False

def bundle_goko_games(cur_date, games, saved_games_bundle):
directory_name = tempfile.mkdtemp()
for cur_game in games:
url = GokoSingleGameUrl(cur_date, cur_game)
game_text = urllib.urlopen(url).read()
open(os.path.join(directory_name,cur_game),'w').write(game_text)

try:
subprocess.check_call(["tar", "-cjf", saved_games_bundle, "-C" ,
directory_name] + games)
except subprocess.CalledProcessError, e:
# Not handling this, just re-raise
logging.warning("Unexpected return from tar compressing goko output >>{msg}<<".format(msg=e.output))
raise
shutil.rmtree(directory_name)

def unzip_date(directory, filename):
os.chdir(directory)
cmd = 'tar -xjvf %s >/dev/null 2>/dev/null'%filename
@@ -95,12 +136,12 @@ def repackage_archive(filename):
Game archives are distributed as .tar.bz2 (a bzip2-compressed tar
archive). For speed of serving, we repackage them as .bz2.tar (a
tar archive of bzip2-compressed HTML files). The .bz2.tar file is
tar archive of bzip2-compressed HTML or text files). The .bz2.tar file is
a good bit larger, but an individual file can be extracted,
decompressed, and served to a client in tenths of a second instead
of tens of seconds. At the same time, storage space is still
dramatically smaller than a raw folder of uncompressed (or even
compressed) HTML files.
compressed) HTML or text files.
"""

orig_dir = os.getcwd()
@@ -118,7 +159,7 @@ def repackage_archive(filename):

# Compress all the game*.html files
os.chdir(directory_name)
game_files = glob.glob("game*.html")
game_files = glob.glob("game*.html")+glob.glob("log*.txt")
if len(game_files) > 0:
try:
subprocess.check_call(["bzip2"] + game_files)
@@ -134,7 +175,7 @@ def repackage_archive(filename):
# Tar the results back to the directory where the original file
# came from
dest_filename = repackage_filename(source_filename)
game_files = glob.glob("game*.html.bz2")
game_files = glob.glob("game*.html.bz2")+glob.glob("log*.txt.bz2")
try:
subprocess.check_call(["tar", "--remove", "-cf", dest_filename+".part"] + game_files)
except subprocess.CalledProcessError, e: #(retcode, cmd, output=output)
@@ -163,11 +204,10 @@ def scrape_date(str_date, cur_date, passive=False):

if passive:
return_code = MISSING

elif not download_date(str_date, cur_date, saved_games_bundle):
return_code = ERROR

return_code = DOWNLOADED
else:
return_code = DOWNLOADED

# Repackage an existing file, if found
if utils.at_least_as_big_as(saved_games_bundle, SMALL_FILE_SIZE) and \