The MLB Gameday API is a collection of XML files hosted on an MLB server. As you would expect, a file is retrieved through a request that is made to a specific URL. Rather than attempt to explain the URL structure without a reference point, a sample URL is provided below and then described in detail.
http://gd2.mlb.com/components/game/mlb/year_2011/month_07/day_09/gid_2011_07_09_tbamlb_nyamlb_1/boxscore.xml
A request made with this URL asks for the boxscore.xml
Gameday file from the Tampa Bay Rays vs. New York Yankees game on July 9, 2011. To the astute fan, this is the game when Derek Jeter recorded his 3000th hit. The boxscore.xml
file is one of the Gameday API files and as you might imagine, it describes the box score for the game in detail. If a Gameday URL has been constructed properly, the last path component will always be a Gameday API file. As you may have inferred from the URL above, the Gameday API files are organized in a hierarchy of directories on the MLB server. Now that you've seen a URL and understand how the files are organized, let's take a closer look at the URL structure.
The Gameday API URL structure may seem overwhelming at first glance. However, with the knowledge of how API files are organized on the server and an understanding of the individual URL components, a Gameday URL will quickly become intuitive and self-documenting. The following is an overview of the URL components.
The HTTP protocol is used to retrieve a Gameday API file.
The hostname is gd2.mlb.com
and is the base URL of the API.
The /components
relative path is a directory that contains information that is all but irrelevant to the Gameday API files you are interested in. It has the game
sub-directory which is described below.
Notes
- If you haven't had the opportunity to read the MLB Advanced Media, L.P. copyright for the Gameday data please do so. You can find it in the
components
directory here.
The /components/game
relative path is a directory that much like it's parent, contains information that is all but irrelevant to the Gameday API files you are interested in. It does have the mlb
sub-directory, which is the starting point for locating Gameday API files.
The /components/game/mlb
relative path is a directory that contains several files and year sub-directories ranging from 1909 to present (and a few from the future).
Notes
- Information from years outside of 2008 - Present may have incomplete game data and fail to have Pitchf/x data which was established in 2007.
- If your goal is to find the most complete game data set, it is recommended that you focus on 2008 - Present.
The /components/game/mlb/year_YYYY
relative path is the root directory for all games in a particular year. Each year directory contains a batters
, pitchers
, media
, mobile
sub-directory as well as one for each month of the year.
Notes
YYYY
represents the four digit year.- Only the months during the season contain data (includes spring training)
The /components/game/mlb/year_YYYY/month_MM
relative path is a directory that contains a sub-directory for each day in the month.
Notes
- 'MM' represents the two digit month (single digit month is prefixed with a zero).
The /components/game/mlb/year_YYYY/month_MM/day_DD
relative path is a directory that contains a sub-directory for each game played on that day as well as various files and directories describing the games played. These additional files and directories will be documented as they are part of the overall MLB Gameday API, however the game sub-directories contain the necessary set of files to analyze a single game.
Notes:
- 'DD' represents the two digit day (single digit day is prefixed with a zero).
The /components/game/mlb/year_YYYY/month_MM/day_DD/gid_*
relative path is a directory known as the game data directory. This is where the Gameday API files are stored for a particular game. It contains several top-level files which describe game data as well as sub-directories that organize batter, pitcher, inning, media, notification, and premium information data files. The Gameday API files will be listed below with a link to a complete file reference.
Now that you know where the Gameday files are stored for a game, you need to know how the game data directory path component is structured. Rather than scroll back to the top, the sample game data directory component from our example is provided below.
/gid_2011_07_09_tbamlb_nyamlb_1
-
You can refer to this relative path as the game ID path or GID path.
-
It always takes the form
/gid_YYYY_MM_DD_atcasc_htchsc_d
YYYY
-- four digit year (same as year component)MM
-- two digit month (same as month component)DD
-- two digit day (same as day component)atc
-- three-letter away team codeasc
-- three-letter away team sport codehtc
-- three-letter home team codehsc
-- three-letter home team sport coded
-- one digit game number (either 1 or 2)
This may seem like a lot to remember but keep in mind that the year, month, and day placeholders are exactly the same as the year, month, and day path values from earlier in the URL. The game number simply identifies the game number between the two teams on the particular date. A value of 2
identifies it as the second game of a doubleheader while all other games have a value of 1
. Therefore, a value of 1
does NOT imply the first game of a doubleheader. This leaves the section identifying the team's participating in the game. Let's start with the home and away team codes.
As mentioned, the home and away team codes are a three-letter unique identifier. Each team in Major League Baseball has been provided a team code and a listing of all team information as it pertains to the MLB Gameday API can be found here. The team code is generally derived from the city name of each team (e.g. the Boston Red Sox team code is bos
and the Cincinnati Reds team code is cin
). In some cases, the city name consists of multiple words OR the city is home to multiple teams and doesn't allow for a similar derivation. In both cases, the team code remains a three-letter identifier. However, the last letter is either an a
or an n
which identifies the American or National League. A few examples will help clarify.
City Name With Multiple Words
kca
-- Kansas City Royalssln
-- St. Louis Cardinals
City with Multiple Teams
nya
-- New York Yankeesnyn
-- New York Metscha
-- Chicago White Soxchn
-- Chicago Cubs
With this knowledge, you now understand how we are able to determine that the Tampa Bay Rays were the visitors and the New York Yankees the home team from the tba
and nya
in our sample URL. This does NOT imply that the game takes place at the home team's stadium (Spring Training games are also recorded). However, information about the game location can be found in specific Gameday API files.
The last piece of information to understand is the three-letter home and away team sport code. For all Major League Baseball games, both the away team and home team sport code will have a value of mlb
. However, analysis of games between 2011 and 2013, has shown six additional team sport codes (all applied to the away team). They are as follows:
afa
-- Unknownaax
-- Farm System?bbc
-- College Baseballint
-- Internationalaaa
-- Mexican Leaguemin
-- Minor League Baseball
In an effort to provide as complete a reference to the MLB Gameday API as possible, there are instances where additional values are provided for academic if not for popular use cases. More likely than not, you are interested in Major League Baseball games and will not have to worry about values other than mlb
. This concludes the overview of Gameday URL components. Next up is an overview of the game data directory contents.
As mentioned, the game data directory is where the Gameday API files are stored for a particular game. This section is meant to list the game data directory contents. The following is a description of the files that may be included in a game data directory. More often than not, the majority of files will be present. However there are instances where certain files are not included (e.g. emailSource.xml
is not included for Spring Training games).
atv_feed_selection.xml
atv_game_events.xml
atv_preview.xml
atv_preview_noscores.xml
atv_runScoringPlays.xml
badges.xml
bench.xml
benchO.xml
bis_boxscore.xml
boxscore.xml
emailSource.xml
eventLog.xml
game.xml
game_events.xml
gamecenter.xml
gameday_Syn.xml
linescore.xml
miniscoreboard.xml
players.xml
plays.xml
rawboxscore.xml
batters/
inning/
media/
notifications/
onbase/
pitchers/
premium/
The majority of the Gameday API files are in the XML format while there are a few JSON and PLIST files. Due to the fact that the JSON files have XML counterparts and that the majority of files are XML, the JSON files have not been properly documented. The same goes for the PLIST files which have XML counterparts in the atv_* files. PLIST files are used by the MLB.TV application on the Apple TV. Below are a list of the JSON and PLIST files you may come across.
boxscore.json
game_events.json
linescore.json
plays.json
feed_selection.plist
game_events.plist
preview.plist
runScoringPlays.plist
Using the knowledge from the game data directory contents, you are now able to construct a relative path from the game data directory to the desired Gameday API file. In our example URL, the last path component was /boxscore.xml
which is a top-level file in the game data directory. This just as easily could have been the relative path /inning/inning_all.xml
which would have made a request to the inning_all.xml
Gameday API file. It is important to remember that the last path component will always be the name of a Gameday API file. Of course, the location of a particular file is dependent on where it is stored and the relevant path from the game data directory.
This concludes our overview of the MLB Gameday API. Although far from an exhaustive description, this primer should have provided you with enough information to get started looking at games of interest and get you moving in the right direction. There is an enormous amount of data provided by MLB, and understanding it's content and structure is no small feat. However, by using the Gameday API file references found on this site you will lessen the learning curve and have more time doing what interests you, analyzing a great set of data. Feel free to send me an email or tweet with any questions, concerns or comments regarding this primer or any of the pages on the site. If you're familiar with GitHub and would like to contribute, fork the repository and send along a pull request. Thank you for taking the time to check out the site, dig in, and have fun!