Croupier is a python project for building creatures' dataset from the Gatherer website, the official Magic Card Database. You could download the images and information from the type of creatures you want. The project's reason is to make available fun datasets for building toy machine learning projects.
card_metadata.py
: Retrieve a card type and populate the filecard_database.csv
with the following information:- id: card id used by The Gatherer database
- type: card type (e.g. creature)
- subtype: card subtype (e.g. elf, goblin, etc)
- url: of the card page in The Gatherer
card_retriever.py
: Given a card url retrieve the whole card information into two sources:- A csv storing the card information such as description, abilities, mana cost, etc.
- An image of the card
card_image_processing.py:
Take the directory with creatures folders that contains card images and crop to capture the monster image of each card.- The cropped images are store in
./sample_data/crop_img/
- The cropped images are store in
./sample_data
: A folder that includes a mini-dataset curated to exemplify the format of the files.card_database.csv
: metadata (card_id
,URL
) from various card's creatures typescard_information.csv
: features with the text that have each cardimg/card_subtype
: each folder contain the card images (e.g.img/goblin/
)crop_img/
: A folder that contains all cropped images and file have the following name structure[card_id]_[creature_type].[img_ext]
Before start: Make sure you have installed the geckodrivers for selenium.
You can curate your specific database via command line interface. In a nutshell, it is a two-step process:
- Use
card_metadata.py
to create yourcard_database.csv
populating the file with all card creatures' meta-information (card_id
,URL
) that belonged to the same kind. - Use
card_retriever.py
to extract the atomic information for each card of a given kind.
Let's start creating the card_database.csv
with each Elf cards using the
command line:
python card_metadata.py Elf
If we want to add another creature type, we use the command line again, and the program will append the new metadata in the file. The important is that you have the metadata from the cards you want to retrieve the information before moving to the next step. By the way, you should see something like the below video during the program execution.
Let's move to the second step; if you want to get all cards' features and images from elf creatures, card_retriever.py
requires only the creature_type
argument, via the command line looks as follows:
python card_retriever.py Elf
It's recommendable to check if all paths specified in the config.py
are correctly set,
so you can use the option --number [int]
to indicate how many cards you want to download, and
it's good to try with one of a sample:
python card_retriever.py -n 1 Elf
python card_retriever.py --number 1 Elf
You should verify the new lines on the file sample_data/card_information.csv
and the card images on the folder
sample_data/img/elf/
. Note: You can change the directory path in the gatherer_croupier/config.py
file.
The cards downloaded are randomly selected when card_retrieve.py
uses the -n
option, but if
you want to download a specific card like "Shessra, Death's Whisper",
you can provide their id using the --card_id
option. Note: the Gatherer database gives the
card id in the card's URL.
python card_retriver.py Elf --card_id 527518
or get the Fiendslayer Paladin card using
the non-verbose option -i
instead --card_id
:
python card_retriver.py -i 430547 Knight
Notice that you always need to provide the creature type as an argument in the command line for three reasons:
- Retrieving: it got the card's URL from the
card_database.csv
filtering by creature type. - Labelling: it saved the card images using the name structure
[card_id]_[creature_type].[image_extension]
. - Storing: it used the creature type to organize and save images under the directory
data/img/[creature_type]
.
The card's image sizes are not standard, and the dimensions vary by edition. The old expansions have designs that tend to be smaller than the new ones. The cropping script is a simple routine that crops the card image based on proportions, which changes if the design changes. Based on a sample that contains the complete collection of Elf, Knights, Zombies, and Goblins creatures, the distribution of dimensions exhibited by the cards is in the following table.
Dimension | Number of Cards | Card id Example |
---|---|---|
223x310 | 713 | 202435 |
265x370 | 702 | 509387 |
223x311 | 353 | 439626 |
266x370 | 13 | 534669 |
222x310 | 11 | 205423 |
226x311 | 3 | 221568 |
In the card_image_processing.py
the proportions are encoded in the PROP_SIZE
dictionary.
python card_image_processing
The card image processing iterate through the /sample_data/img
folder and extract the image from
the card storing all card images regardless of its creature type in the /sample_data/img_crop
folder saving with
the identical card name structure: [CARD_ID]_[CREATURE_TYPE].[IMG_EXTENSION]
. There are two image tension: PNG
and
JPEG
.
Artists (from left-to-right/top-to-bottom): Randy Vargas, Todd Lockwood, Bran Sola, Miguel Mercado, Kieran Yanner, Luca Zontini, April Prime, Wayne Wu, Wayne Reynolds, Wayne Reynolds, Volkan Baga, Daarken, Kieran Yanner, Wayne Reynolds, Michael Komarck, Izzy, Jason Felix, Nils Hamm, Josh Hass, and Crhistopher Burdett.
The croupier also extracts the card features such as card name, mana cost, card text, and others via the command line
card_retrieve.py
. The following information is in the card_information.csv
file.
column | description |
---|---|
id | The Gatherer's id unique for each card (aka card_id ). |
CARD_NAME | The name of the card creature. |
MANA_COST_SYMBOL | Total mana cost for invocation detailed by the type of mana (e.g. 2-Red-Black). |
MANA_COST_CONVERTED | Total mana cost for invocation regardless the mana type. |
TYPES | The card type could be more complex than just "creature", for example, "Elf Knight". |
CARD_TEXT | The text that includes the card effects and abilities. |
FLAVOR_TEXT | The decorator text related to the card. |
POWER_TOUGHNESS | The attack and defense points (e.g. 4/2) of the creature. |
EXPANSION | The expansion in which the card belongs. |
RARITY | The card rarity (i.e. uncommon, common, rare). |
CARD_NUMBER | The number of the card in the expansion set. |
ARTIST | The artist who creates the card image. |
COMMUNITY_RATING | A five stars community rating, there is an indicator of how many votes the card has. |
All card images, information, and symbols are trademarks of © Wizards of the Coast. Thanks to Andrew Gioia for creating the symbols in SVG format and make available in the Keyrune repository. Finally, the files which not belong to both previous sources are under MIT's license.