Skip to content

Commit

Permalink
Major update with refactoring of DB connects etc.
Browse files Browse the repository at this point in the history
parse and score have had db connect refactored. Also parse has had rewrite of choosePGN function.

Large tidy up of supplementary files:
INIT_* now install docker and download relevant images

run-* have been tidied and made single purpose for convenience.
Deleted onld DEPLOY script as it was deprecated.

Minor edits to db.chess.analysis that I forgot to add to last commit.
  • Loading branch information
Nicholas Harding committed Mar 31, 2014
1 parent d9200b6 commit 6a7ebda
Show file tree
Hide file tree
Showing 13 changed files with 75 additions and 91 deletions.
10 changes: 0 additions & 10 deletions DEPLOY.sh

This file was deleted.

16 changes: 9 additions & 7 deletions INIT_HEAD.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,16 @@
sudo sh -c "echo deb http://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
sudo apt-get update
sudo apt-get install -y --force-yes lxc-docker
sudo apt-get install -y sqlite3
sudo apt-get install -y git

sudo docker pull hardingnj/parsepgn
export HOSTPATH=/glusterfs/users/nharding/

export HOSTLOC=/glusterfs/users/nharding/
git clone https://github.com/hardingnj/chess $HOSTPATH

#mkdir $HOSTLOC/chessDB
#rm $HOSTLOC/chessDB/chessAnalysis.db
#sudo apt-get install sqlite3
# IF SPECIFIED THEN DELETE AND REGEN DB FILE
#mkdir $HOSTPATH/chessDB
#rm $HOSTPATH/chessDB/chessAnalysis.db
#curl -O https://raw.githubusercontent.com/hardingnj/chess/master/schema.sql
#sqlite3 $HOSTLOC/chessDB/chessAnalysis.db < schema.sql

docker run -d -t -name parsepgn -v ${HOSTLOC}/pgn_data:/pgn:ro -v ${HOSTLOC}/chessDB:/data parsepgn $@
#sqlite3 $HOSTPATH/chessDB/chessAnalysis.db < schema.sql
2 changes: 0 additions & 2 deletions FROM_UBUNTU.sh → INIT_NODE.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,3 @@ sudo apt-get update
sudo apt-get install -y --force-yes lxc-docker

sudo docker pull hardingnj/scorepgn
HOSTLOC=/glusterfs/users/nharding/
sudo docker run -d -t -name scorepgn -v ${HOSTLOC}/chessDB:/data hardingnj/scorepgn --hashsize 1600
4 changes: 2 additions & 2 deletions R/db.chess.analysis.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ dbfile <- 'chessAnalysis.db'
library("RSQLite")
drv <- dbDriver("SQLite")
con <- dbConnect(drv, dbfile);
chess.data <- dbGetQuery(con, "Select * from games where id=112")
aa<-apply(t(chess.data[,15:22]), 2, strsplit, ',')
chess.data <- dbGetQuery(con, "Select * from games where id < 100")
#aa<-apply(t(chess.data[,15:22]), 2, strsplit, ',')
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,17 @@ chess
This is a personal project with the aim of using a strong chess engine to empirically evaluate moves in a large number of chess games (potentially over 2 million). The 3 main purposes of this are: 1. Identify objectively the strongest players in history. 2. Establish a baseline of performance expectation to identify cheating in chess. 3. Use information about the type of move to identify general weaknesses in human play, i.e. backward moves are harder to find, or knight moves harder to find.

# About the code
The code is designed to be as platform independent as possible and makes substantive use of Docker- the open source container engine. In fact this code is as much an excuse to explore the functionality of Docker as it is to address the problem outlined above. Docker is particulalry useful in this instance as it is necessary to run the code on separate machines/VMs to enable this to be in any way feasible. The code is broken into 3 component parts- an parse script that converts PGN data into a database form. An evaluation script that evaluates games found in the database and records the results. Finally an sql server that enables each script to update a database shared across machines. Each of these components exists in a separate docker container. Both the parse and the eval code containers are linked to a parent sql server container.
The code is designed to be as platform independent as possible and makes substantive use of Docker- the open source container engine. In fact this code is as much an excuse to explore the functionality of Docker as it is to address the problem outlined above. Docker is particulalry useful in this instance as it is necessary to run the code on separate machines/VMs to enable this to be in any way feasible. The code is broken into 2 component parts- a parse script that converts PGN data into a database form. An evaluation script that evaluates games found in the database and records the results. Each of these components exists in a separate docker container.

# Aknowledgements
Much of this code has been inspired by and borrowed from several sources on the internet, most notably:
* Ben Schwartz at http://txt.fliglio.com/2013/11/creating-a-mysql-docker-container/ for the implementation of a docker sql server.
* Ralph Schuler at http://ralphschuler.ch/about for the perl/stockfish interface
* Ralph Schuler at http://ralphschuler.ch/about for the perl/stockfish interface.
* Chris Cooper for valuable assistance with SQLite, and being an all round top dude.

# Supplementary files
- *INIT_NODE.sh*: This installs docker and downloads the hardingnj/scorepgn image from the Docker repo.
- *INIT_HEAD.sh*: This installs docker, sqlite3 and downloads the hardingnj/parsepgn image from the Docker repo.
- *KILL.sh*: This stops and removes all running or non-running docker containers.
- *pgn2fen.sh*: Utility script to parse a pgn file into fen. Might get it's own container someday.
- *run-XXX-YYY*: Short convenience scripts to run docker containers on different systems.
1 change: 0 additions & 1 deletion parsePGN/build.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
#!/bin/sh
docker build -t parsepgn .
echo Done docker image tagged parsepgn
92 changes: 36 additions & 56 deletions parsePGN/parsePGN.pl
Original file line number Diff line number Diff line change
Expand Up @@ -26,25 +26,27 @@
'pgndir:s'
) or die "Bad options passed";

our $database = $cfg{dbpath};
our $driver = $cfg{driver} // 'SQLite';
our $timeout = $cfg{timeout} // 3000;
print Dump(\%cfg);
my $database = $cfg{dbpath};
my $driver = $cfg{driver} // 'SQLite';
my $timeout = $cfg{timeout} // 3000;

my %hash = ( "1-0" => 1, "0-1" => 0, "1/2-1/2" => 2, "0.5-0.5" => 2);
# LOGIC
# DIR IS PASSED ON CL
# loop:
# choose 1 at random. slice off. Flock.
# if previously looked at, get another. Unflock and repeat.
# found one... process as normal.
# unflock
# sleep

my $dbh = DBI->connect(
"dbi:$driver:dbname=$database",
"",
"",
{ sqlite_use_immediate_transaction => 1, }
) or die $DBI::errstr;
$dbh->sqlite_busy_timeout($timeout);
my $sql_selectgame = "select id from games WHERE white = ? AND black = ? AND year = ? AND result = ? AND algebraic_moves = ?";
my $sql_selectplayer = "select given_name, surname, pid from players WHERE surname = ?";
my $sql_selectfile = "select fid,completed from files WHERE checksum = ?";

# ie infinite loop. This is run as daemon
while(1) {
# this function returns the pgn file, and its fid.
my $pgnfile = choosePGN($cfg{pgndir});
my $pgnfile = choosePGN($cfg{pgndir}); # This function calls the database

unless (defined $pgnfile) { sleep($cfg{sleeptime}) and next; }

Expand Down Expand Up @@ -74,19 +76,13 @@
$month = undef if $month =~ m/\?\?/;
$day = undef if $day =~ m/\?\?/;

my $dbh = DBI->connect(
"dbi:$driver:dbname=$database",
"",
""
) or die $DBI::errstr;
# look for duplicates
my $sql_selectgame = "select id from games WHERE white = ? AND black = ? AND year = ? AND result = ? AND algebraic_moves = ?";
my $selectgame = $dbh->prepare($sql_selectgame) or die $DBI::errstr;

$selectgame->execute($white, $black, $year, $result, $moves) or die "SQL Error: $DBI::errstr\n";
my $h = $selectgame->fetchrow_hashref;
my $gameToParse = $selectgame->fetchrow_hashref;
$selectgame->finish;

unless (defined $h) {
unless (defined $gameToParse) {
$dbh->do(
'INSERT INTO games (white, black, event, site, result, year, month, day, round, algebraic_moves, fileid) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)',
undef,
Expand All @@ -95,19 +91,17 @@
}
else {
# also should update if any of the fields are null in the original.
print "appears to be duplicate of record $h->{id}. Skipping...".$/;
print "appears to be duplicate of record $gameToParse->{id}. Skipping...".$/;
}
$selectgame->finish;
$dbh->disconnect;
}
# if gets here pgn was parsed ok...
my $dbh = DBI->connect("dbi:$driver:dbname=$database", "", "") or die $DBI::errstr;
$dbh->do(
'UPDATE files SET completed = ? WHERE fid = ?',
undef,
1, $pgnfile->{id}
) or die $DBI::errstr;
$dbh->disconnect;

# now sleep to give processor a break and ensure parsing doesn't get too far ahead.
sleep($cfg{sleeptime});
}
exit 127;
Expand Down Expand Up @@ -136,80 +130,66 @@ sub return_player_id {
@playername{qw/given_name initials surname suffix/} = parseName2($clean);

# define and initialize database
my $dbh = DBI->connect( "dbi:$driver:dbname=$database", "", "") or die $DBI::errstr;
$dbh->sqlite_busy_timeout($timeout);
my $sql_selectplayer = "select given_name, surname, pid from players WHERE surname = ?";
my $selectplayer = $dbh->prepare($sql_selectplayer) or die $DBI::errstr;
$selectplayer->execute($playername{surname}) or die "SQL Error: $DBI::errstr\n";

# loop through records, is given name identical? If yes then return id.
# row is a array of length=3 with given name/surname/pid
my %record;
while (@record{qw/given_name surname pid/} = $selectplayer->fetchrow_array) {
# is given name identical?
# is given name identical? This loop will only be
if ($record{given_name} eq $playername{given_name}) {
$selectplayer->finish;
$dbh->disconnect;
return $record{pid};
}
}
# else add new record, return id.
print "adding new player record: $playername{surname}".$/;
$selectplayer->finish;
my $pid = $dbh->do(
'INSERT INTO players (given_name, surname) VALUES (?, ?)',
undef,
$playername{given_name}, $playername{surname}
) or die $DBI::errstr;
my $id = $dbh->last_insert_id("", "", "players", "");
print "id: $id".$/;
$selectplayer->finish;
$dbh->disconnect;
print "adding new player record: $playername{surname}, id: $id".$/;
return $id;
}

sub choosePGN {
my $searchdir = shift;

my @PGNfiles = File::Find::Rule->file()->name('*.PGN', '*.pgn')->in($searchdir);
print "in choose PGN. Found @PGNfiles in $searchdir".$/;

my ($chosen_file, $file_id);
print "In choose PGN. Found @PGNfiles in $searchdir".$/;

while(@PGNfiles){
$chosen_file = splice @PGNfiles, int(rand($#PGNfiles + 1)), 1;
my $chosen_file = splice @PGNfiles, int(rand($#PGNfiles + 1)), 1;
my $md5 = md5_hex(do { local $/; IO::File->new($chosen_file)->getline });
print "selected: $chosen_file".$/;

# Declare/initialize database connection
my $dbh = DBI->connect( "dbi:$driver:dbname=$database", "", "") or die $DBI::errstr;
$dbh->sqlite_busy_timeout($timeout);
my $sql_selectfile = "select fid,completed from files WHERE checksum = ?";
# see if this file is in database
my $selectfile = $dbh->prepare($sql_selectfile) or die $DBI::errstr;
$selectfile->execute($md5) or die "SQL Error: $DBI::errstr\n";
my $h = $selectfile->fetchrow_hashref;
my $gameFromDB = $selectfile->fetchrow_hashref;
$selectfile->finish;

if(!defined $h) {
if(!defined $gameFromDB) {
# ie not seen before
$dbh->do(
'INSERT INTO files (checksum, filename) VALUES (?, ?)',
undef,
$md5, $chosen_file
) or die $DBI::errstr;
$file_id = $dbh->last_insert_id("", "", "files", "");
return { filepath => $chosen_file, id => $dbh->last_insert_id("", "", "files", "") };
}
elsif(!$h->{completed}){
elsif(!$gameFromDB->{completed}){
warn "Restarting parsing of $chosen_file, as did not complete previously.";
$file_id = $h->{fid};
return { filepath => $chosen_file, id => $gameFromDB->{fid} };
}
else {
warn "I have previously successfully parsed $chosen_file before w/checksum $md5.";
$chosen_file = undef;
}
$selectfile->finish;
$dbh->disconnect;
last if defined $file_id;
}
return undef if not defined $chosen_file;
my %result = (filepath => $chosen_file, id => $file_id);
return \%result;
# if we have looked at all files, but nothing is new then return undef.
warn "No unprocessed files found to parse.";
return undef;
}
7 changes: 0 additions & 7 deletions parsePGN/run-container.sh

This file was deleted.

5 changes: 5 additions & 0 deletions parsePGN/run-parsepgn-local.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/sh
# This is a simple bash script to run the parse container on a local CPU
# - rm command kills as soon as complete.
# PATH IS THE LOCATION ON THE HOST OF THE PGN FILES'
docker run -d -t -v ${HOME}/pgn_data:/pgn:ro -v ${HOME}/chessDB:/data parsepgn $@
6 changes: 6 additions & 0 deletions parsePGN/run-parsepgn-osdc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/sh
# This is a simple bash script to run the parse container.
# - rm command kills as soon as complete.
# PATH IS THE LOCATION ON THE HOST OF THE PGN FILES
HOSTPATH=/glusterfs/users/nharding/pgn_data
docker run -d -t -v ${HOSTPATH}/pgn_data:/pgn:ro -v ${HOSTPATH}/chessDB:/data hardingnj/parsepgn $@
4 changes: 0 additions & 4 deletions scorePGN/run-client.sh

This file was deleted.

3 changes: 3 additions & 0 deletions scorePGN/run-scorepgn-local.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/sh
# -t for tag? -i for interactive
docker run -d -t -v ${HOME}/chessDB:/data scorepgn $@
4 changes: 4 additions & 0 deletions scorePGN/run-scorepgn-osdc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/sh
# -t for tag? -i for interactive
export HOSTPATH=/glusterfs/users/nharding/
docker run -d -t -v ${HOSTPATH}/chessDB:/data hardingnj/scorepgn $@

0 comments on commit 6a7ebda

Please sign in to comment.