diff --git a/.gitignore b/.gitignore index c9227fc..fecb965 100644 --- a/.gitignore +++ b/.gitignore @@ -3,3 +3,4 @@ virtualenv/ .idea/ *.log out/ +.vs/ diff --git a/FanGraphs/exceptions.py b/FanGraphs/exceptions.py index 58cfc34..791f8ae 100644 --- a/FanGraphs/exceptions.py +++ b/FanGraphs/exceptions.py @@ -1,8 +1,37 @@ #! python3 # FanGraphs/exceptions.py +""" +The warnings and exceptions used by modules in the package. +================================================================================ +""" -class InvalidFilterQuery(Exception): + +class FilterUpdateIncapabilityWarning(Warning): + + def __init__(self): + """ + Raised when the filter queries cannot be updated. + This usually occurs when no filter queries have been configured since the last update. + """ + self.message = "No filter query configurations to update" + super().__init__(self.message) + + +class UnknownBrowserException(Exception): + + def __init__(self, browser): + """ + Raised when the browser name given is not recognized. + + :param browser: The name of the browser used + """ + self.browser = browser + self.message = f"No browser named '{self.browser}' was recognized" + super().__init__(self.message) + + +class InvalidFilterQueryException(Exception): def __init__(self, query): """ @@ -15,7 +44,7 @@ def __init__(self, query): super().__init__(self.message) -class InvalidFilterOption(Exception): +class InvalidFilterOptionException(Exception): def __init__(self, query, option): """ @@ -27,3 +56,15 @@ def __init__(self, query, option): self.query, self.option = query, option self.message = f"No option '{self.option}' could be found for query '{self.query}'" super().__init__(self.message) + + +class InvalidQuickSplitException(Exception): + + def __init__(self, quick_split): + """ + + :param quick_split: + """ + self.quick_split = quick_split + self.message = f"No quick split '{self.quick_split}` could be found" + super().__init__(self.message) diff --git a/FanGraphs/leaders.py b/FanGraphs/leaders.py index 9a42f58..871f1e0 100644 --- a/FanGraphs/leaders.py +++ b/FanGraphs/leaders.py @@ -2,433 +2,917 @@ # FanGraphs/leaders.py """ - +Web scraper for the **Leaders** tab of the `FanGraphs website`_. +Each page which is covered has its own class for scraping it. +Below are each of the covered pages with the corresponding class: + +- `Major League Leaderboards`_: :py:class:`MajorLeagueLeaderboards` +- `Splits Leaderboards`_: :py:class:`SplitsLeaderboards` +- `Season Stat Grid`_: :py:class:`SeasonStatGrid` +- `60-Game Span Leaderboards`_: :py:class:`GameSpanLeaderboards` +- `KBO Leaders`_: :py:class:`InternationalLeaderboards` +- `Combined WAR Leaderboards`_: :py:class:`WARLeaderboards` + +.. _FanGraphs website: https://fangraphs.com +.. _Major League Leaderboards: https://fangraphs.com/leaders.aspx +.. _Splits Leaderboards: https://fangraphs.com/leaders/splits-leaderboards +.. _Season Stat Grid: https://fangraphs.com/leaders/season-stat-grid +.. _60-Game Span Leaderboards: https://www.fangraphs.com/leaders/special/60-game-span +.. _KBO Leaders: https://www.fangraphs.com/leaders/international +.. _Combined WAR Leaderboards: https://www.fangraphs.com/warleaders.aspx +================================================================================ """ import csv import datetime import os -from urllib.request import urlopen import bs4 -from lxml import etree -from selenium.common import exceptions -from selenium.webdriver.common.action_chains import ActionChains -from selenium import webdriver -from selenium.webdriver.common.by import By -from selenium.webdriver.firefox.options import Options -from selenium.webdriver.support import expected_conditions -from selenium.webdriver.support.ui import WebDriverWait +from playwright.sync_api import sync_playwright import FanGraphs.exceptions -def compile_options(): - """ - Modifies Selenium WebDriver Options for ideal browser usage. - Creates directory *out/* for exported files. - - :returns: Selenium WebDriver Options Object - :rtype: selenium.webdriver.firefox.options.Options - """ - options = Options() - options.headless = True - os.makedirs("out", exist_ok=True) - preferences = { - "browser.download.folderList": 2, - "browser.download.manager.showWhenStarting": False, - "browser.download.dir": os.path.abspath("out"), - "browser.helperApps.neverAsk.saveToDisk": "text/csv" - } - for pref in preferences: - options.set_preference(pref, preferences[pref]) - return options - - class MajorLeagueLeaderboards: """ Parses the FanGraphs Major League Leaderboards page. Note that the Splits Leaderboard is not covered. Instead, it is covered by :py:class:`SplitsLeaderboards` - """ - def __init__(self): - """ - .. py:attribute:: address - The base URL address of the Major League Leaderboards page + .. py:attribute:: address + The base URL address of the Major League Leaderboards page - :type: str - :value: https://fangraphs.com/leaders.aspx + :type: str + :value: https://fangraphs.com/leaders.aspx + """ - .. py:attribute:: tree - The ``lxml`` element tree for parsing the webpage HTML. + __selections = { + "group": "#LeaderBoard1_tsGroup", + "stat": "#LeaderBoard1_tsStats", + "position": "#LeaderBoard1_tsPosition", + "type": "#LeaderBoard1_tsType" + } + __dropdowns = { + "league": "#LeaderBoard1_rcbLeague_Input", + "team": "#LeaderBoard1_rcbTeam_Input", + "single_season": "#LeaderBoard1_rcbSeason_Input", + "split": "#LeaderBoard1_rcbMonth_Input", + "min_pa": "#LeaderBoard1_rcbMin_Input", + "season1": "#LeaderBoard1_rcbSeason1_Input", + "season2": "#LeaderBoard1_rcbSeason2_Input", + "age1": "#LeaderBoard1_rcbAge1_Input", + "age2": "#LeaderBoard1_rcbAge2_Input" + } + __dropdown_options = { + "league": "#LeaderBoard1_rcbLeague_DropDown", + "team": "#LeaderBoard1_rcbTeam_DropDown", + "single_season": "#LeaderBoard1_rcbSeason_DropDown", + "split": "#LeaderBoard1_rcbMonth_DropDown", + "min_pa": "#LeaderBoard1_rcbMin_DropDown", + "season1": "#LeaderBoard1_rcbSeason1_DropDown", + "season2": "#LeaderBoard1_rcbSeason2_DropDown", + "age1": "#LeaderBoard1_rcbAge1_DropDown", + "age2": "#LeaderBoard1_rcbAge2_DropDown" + } + __checkboxes = { + "split_teams": "#LeaderBoard1_cbTeams", + "active_roster": "#LeaderBoard1_cbActive", + "hof": "#LeaderBoard1_cbHOF", + "split_seasons": "#LeaderBoard1_cbSeason", + "rookies": "#LeaderBoard1_cbRookie" + } + __buttons = { + "season1": "#LeaderBoard1_btnMSeason", + "season2": "#LeaderBoard1_btnMSeason", + "age1": "#LeaderBoard1_cmdAge", + "age2": "#LeaderBoard1_cmdAge" + } + address = "https://fangraphs.com/leaders.aspx" - :type: lxml.etree._ElementTree + def __init__(self, browser="chromium"): + """ + :param browser: The name of the browser to use (Chromium, Firefox, WebKit) - .. py:attribute:: browser - The ``selenium`` automated Firefox browser for navigating webpage. + .. py:attribute:: page + The generated synchronous ``Playwright`` page for browser automation. - :type: selenium.webdriver.firefox.webdriver.WebDriver + :type: playwright.sync_api._generated.Page + .. py:attribute:: soup + The ``BeautifulSoup4`` HTML parser for scraping the webpage. + + :type: bs4.BeautifulSoup """ - self.__selections = { - "group": "LeaderBoard1_tsGroup", - "stat": "LeaderBoard1_tsStats", - "position": "LeaderBoard1_tsPosition", - "type": "LeaderBoard1_tsType" - } - self.__dropdowns = { - "league": "LeaderBoard1_rcbLeague_Input", - "team": "LeaderBoard1_rcbTeam_Input", - "single_season": "LeaderBoard1_rcbSeason_Input", - "split": "LeaderBoard1_rcbMonth_Input", - "min_pa": "LeaderBoard1_rcbMin_Input", - "season1": "LeaderBoard1_rcbSeason1_Input", - "season2": "LeaderBoard1_rcbSeason2_Input", - "age1": "LeaderBoard1_rcbAge1_Input", - "age2": "LeaderBoard1_rcbAge2_Input" - } - self.__dropdown_options = { - "league": "LeaderBoard1_rcbLeague_DropDown", - "team": "LeaderBoard1_rcbTeam_DropDown", - "single_season": "LeaderBoard1_rcbSeason_DropDown", - "split": "LeaderBoard1_rcbMonth_DropDown", - "min_pa": "LeaderBoard1_rcbMin_DropDown", - "season1": "LeaderBoard1_rcbSeason1_DropDown", - "season2": "LeaderBoard1_rcbSeason2_DropDown", - "age1": "LeaderBoard1_rcbAge1_DropDown", - "age2": "LeaderBoard1_rcbAge2_DropDown" - } - self.__checkboxes = { - "split_teams": "LeaderBoard1_cbTeams", - "active_roster": "LeaderBoard1_cbActive", - "hof": "LeaderBoard1_cbHOF", - "split_seasons": "LeaderBoard1_cbSeason", - "rookies": "LeaderBoard1_cbRookie" - } - self.__buttons = { - "season1": "LeaderBoard1_btnMSeason", - "season2": "LeaderBoard1_btnMSeason", - "age1": "LeaderBoard1_cmdAge", - "age2": "LeaderBoard1_cmdAge" + os.makedirs("out", exist_ok=True) + + self.__play = sync_playwright().start() + browsers = { + "chromium": self.__play.chromium, + "firefox": self.__play.firefox, + "webkit": self.__play.webkit } - self.address = "https://fangraphs.com/leaders.aspx" + browser_ctx = browsers.get(browser.lower()) + if browser_ctx is None: + raise FanGraphs.exceptions.UnknownBrowserException(browser.lower()) + self.__browser = browser_ctx.launch( + downloads_path=os.path.abspath("out") + ) + self.page = self.__browser.new_page( + accept_downloads=True + ) + self.page.goto(self.address, timeout=0) - response = urlopen(self.address) - parser = etree.HTMLParser() - self.tree = etree.parse(response, parser) + self.soup = None + self.__refresh_parser() - self.browser = webdriver.Firefox( - options=compile_options() + def __refresh_parser(self): + """ + Re-initializes the ``bs4.BeautifulSoup`` object stored in :py:attr:`soup`. + Called when a page refresh is expected + """ + self.soup = bs4.BeautifulSoup( + self.page.content(), features="lxml" ) - self.browser.get(self.address) - def list_queries(self): + @classmethod + def list_queries(cls): """ Lists the possible filter queries which can be used to modify search results. :return: Filter queries which can be used to modify search results - :type: list + :rtype: list """ queries = [] - queries.extend(list(self.__selections)) - queries.extend(list(self.__dropdowns)) - queries.extend(list(self.__checkboxes)) + queries.extend(list(cls.__selections)) + queries.extend(list(cls.__dropdowns)) + queries.extend(list(cls.__checkboxes)) return queries - def list_options(self, query): + def list_options(self, query: str): """ Lists the possible options which the filter query can be configured to. - :param query: + :param query: The filter query :return: Options which the filter query can be configured to :rtype: list - :raises MajorLeagueLeaderboards.InvalidFilterQuery: Argument ``query`` is invalid + :raises FanGraphs.exceptions.InvalidFilterQuery: Argument ``query`` is invalid """ query = query.lower() if query in self.__checkboxes: options = ["True", "False"] elif query in self.__dropdown_options: - xpath = "//div[@id='{}']//div//ul//li".format( - self.__dropdown_options.get(query) - ) - elems = self.tree.xpath(xpath) - options = [e.text for e in elems] + elems = self.soup.select(f"{self.__dropdown_options[query]} li") + options = [e.getText() for e in elems] elif query in self.__selections: - xpath = "//div[@id='{}']//div//ul//li//a//span//span//span".format( - self.__selections.get(query) - ) - elems = self.tree.xpath(xpath) - options = [e.text for e in elems] + elems = self.soup.select(f"{self.__selections[query]} li") + options = [e.getText() for e in elems] else: - raise FanGraphs.exceptions.InvalidFilterQuery(query) + raise FanGraphs.exceptions.InvalidFilterQueryException(query) return options - def current_option(self, query): + def current_option(self, query: str): """ Retrieves the option which the filter query is currently set to. :param query: The filter query being retrieved of its current option :return: The option which the filter query is currently set to :rtype: str - :raises MajorLeagueLeaderboards.InvalidFilterQuery: Argument ``query`` is invalid + :raises FanGraphs.exceptions.InvalidFilterQuery: Argument ``query`` is invalid """ query = query.lower() if query in self.__checkboxes: - xpath = "//input[@id='{}']".format( - self.__checkboxes.get(query) - ) - elem = self.tree.xpath(xpath)[0] + elem = self.soup.select(self.__checkboxes[query])[0] option = "True" if elem.get("checked") == "checked" else "False" elif query in self.__dropdowns: - xpath = "//input[@id='{}']".format( - self.__dropdowns.get(query) - ) - elem = self.tree.xpath(xpath)[0] + elem = self.soup.select(self.__dropdowns[query])[0] option = elem.get("value") elif query in self.__selections: - xpath = "//div[@id='{}']//div//ul//li//a[@class='{}']//span//span//span".format( - self.__selections.get(query), - "rtsLink rtsSelected" - ) - elem = self.tree.xpath(xpath)[0] - option = elem.text + elem = self.soup.select(f"{self.__selections[query]} .rtsLink.rtsSelected") + option = elem.getText() else: - raise FanGraphs.exceptions.InvalidFilterQuery(query) + raise FanGraphs.exceptions.InvalidFilterQueryException(query) return option - def configure(self, query, option): + def configure(self, query: str, option: str, *, autoupdate=True): """ - Sets a filter query to a specified option. + Configures a filter query ``query`` to a specified option ``option``. :param query: The filter query to be configured :param option: The option to set the filter query to + :param autoupdate: If ``True``, any form submission buttons attached to the filter query will be clicked + :raises FanGraphs.exceptions.InvalidFilterQueryException: Argument ``query`` is invalid """ query, option = query.lower(), str(option).lower() if query not in self.list_queries(): - raise FanGraphs.exceptions.InvalidFilterQuery(query) - while True: - try: - if query in self.__checkboxes: - self.__config_checkbox(query, option) - elif query in self.__dropdowns: - self.__config_dropdown(query, option) - elif query in self.__selections: - self.__config_selection(query, option) - if query in self.__buttons: - self.__submit_form(query) - except exceptions.ElementClickInterceptedException: - self.__close_ad() - continue - break - response = urlopen(self.browser.current_url) - parser = etree.HTMLParser() - self.tree = etree.parse(response, parser) - - def __config_checkbox(self, query, option): - """ - Sets a checkbox-class filter query to an option + raise FanGraphs.exceptions.InvalidFilterQueryException(query) + self.__close_ad() + if query in self.__selections: + self.__configure_selection(query, option) + elif query in self.__dropdowns: + self.__configure_dropdown(query, option) + elif query in self.__checkboxes: + self.__configure_checkbox(query, option) + else: + raise FanGraphs.exceptions.InvalidFilterQueryException(query) + if query in self.__buttons and autoupdate: + self.__click_button(query) + self.__refresh_parser() - :param query: The checkbox-class filter query to be configured + def __configure_selection(self, query, option): + """ + Configures a selection-class filter query ``query`` to an option ``option`` + + :param query: The selection-class filter query to be configured :param option: The option to set the filter query to + :raises FanGraphs.exceptions.InvalidFilterOptionException: Argument ``option`` is invalid """ - current = self.current_option(query).lower() - if option == current: - return - elem = self.browser.find_element_by_xpath( - self.__checkboxes.get(query) - ) + options = [o.lower() for o in self.list_options(query)] + try: + index = options.index(option) + except ValueError: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + self.page.click("#LeaderBoard_tsType a[href='#']") + elem = self.page.query_selector_all( + f"{self.__selections[query]} li" + )[index] elem.click() - def __config_dropdown(self, query, option): + def __configure_dropdown(self, query, option): """ - Sets a dropdown-class filter query to an option + Configures a dropdown-class filter query ``query`` to an option ``option`` :param query: The dropdown-class filter query to be configured :param option: The option to set the filter query to + :raises FanGraphs.exceptions.InvalidFilterOptionException: Argument ``option`` is invalid """ options = [o.lower() for o in self.list_options(query)] - index = options.index(option) - dropdown = self.browser.find_element_by_id( - self.__dropdowns.get(query) + try: + index = options.index(option) + except ValueError: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + self.page.hover( + self.__dropdowns[query] ) - dropdown.click() - elem = self.browser.find_elements_by_css_selector( - "div[id='{}'] div ul li".format( - self.__dropdown_options.get(query) - ) + elem = self.page.query_selector_all( + f"{self.__dropdowns[query]} > div > ul > li" )[index] elem.click() - def __config_selection(self, query, option): + def __configure_checkbox(self, query, option): """ - Sets a selection-class filter query to an option + Configures a checkbox-class filter query ``query`` to an option ``option``. - :param query: The selection-class filter query to be configured + :param query: The checkbox-class filter query to be configured :param option: The option to set the filter query to """ - def open_pitch_type_sublevel(): - pitch_type_elem = self.browser.find_element_by_css_selector( - "div[id='LeaderBoard1_tsType'] div ul li a[href='#']" + options = self.list_options(query) + if option not in options: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + if option != self.current_option(query).title(): + self.page.click(self.__checkboxes[query]) + + def __click_button(self, query): + """ + Clicks the button element which is attached to the search query. + + :param query: The filter query which has an attached form submission button + """ + self.page.click( + self.__buttons[query] + ) + + def __close_ad(self): + """ + Closes the ad which may interfere with clicking other page elements. + """ + elem = self.page.query_selector(".ezmob-footer-close") + if elem: + elem.click() + + def quit(self): + """ + Terminates the underlying ``Playwright`` browser context. + """ + self.__browser.close() + self.__play.stop() + + def reset(self): + """ + Navigates to the webpage corresponding to :py:attr:`address`. + """ + self.page.goto(self.address) + self.__refresh_parser() + + def export(self, path=""): + """ + Uses the **Export Data** button on the webpage to export the current leaderboard. + The data will be exported as a CSV file and the file will be saved to *out/*. + The file will be saved to the filepath ``path``, if specified. + Otherwise, the file will be saved to the filepath *./out/%d.%m.%y %H.%M.%S.csv* + + :param path: The path to save the exported data to + """ + if not path or os.path.splitext(path)[1] != ".csv": + path = "out/{}.csv".format( + datetime.datetime.now().strftime("%d.%m.%y %H.%M.%S") ) - pitch_type_elem.click() - options = [o.lower() for o in self.list_options(query)] - index = options.index(option) - elem = self.browser.find_elements_by_css_selector( - "div[id='{}'] div ul li".format( - self.__selections.get(query) + self.__close_ad() + with self.page.expect_download() as down_info: + self.page.click("#LeaderBoard1_cmdCSV") + download = down_info.value + download_path = download.path() + os.rename(download_path, path) + + +class SplitsLeaderboards: + """ + Parses the FanGraphs Splits Leaderboards page. + + .. py:attribute:: address + The base URL address which corresponds to the Splits Leaderboards page + + :type: str + :value: https://fangraphs.com/leaders/splits-leaderboards + """ + __selections = { + "group": [ + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(1)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(2)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(3)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(4)" + ], + "stat": [ + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(6)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(7)" + ], + "type": [ + "#root-buttons-stats > div:nth-child(1)", + "#root-buttons-stats > div:nth-child(2)", + "#root-buttons-stats > div:nth-child(3)" + ] + } + __dropdowns = { + "time_filter": "#root-menu-time-filter > .fg-dropdown.splits.multi-choice", + "preset_range": "#root-menu-time-filter > .fg-dropdown.splits.single-choice", + "groupby": ".fg-dropdown.group-by" + } + __splits = { + "handedness": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(1)", + "home_away": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(2)", + "batted_ball": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(3)", + "situation": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(4)", + "count": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(5)", + "batting_order": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(1)", + "position": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(2)", + "inning": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(3)", + "leverage": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(4)", + "shifts": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(5)", + "team": ".fgBin:nth-child(3) > .fg-dropdown.splits.multi-choice:nth-child(1)", + "opponent": ".fgBin:nth-child(3) > .fg-dropdown.splits.multi-choice:nth-child(2)", + } + __quick_splits = { + "batting_home": ".quick-splits > div:nth-child(1) > div:nth-child(2) > .fgButton:nth-child(1)", + "batting_away": ".quick-splits > div:nth-child(1) > div:nth-child(2) > .fgButton:nth-child(2)", + "vs_lhp": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(1)", + "vs_lhp_home": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(2)", + "vs_lhp_away": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(3)", + "vs_lhp_as_lhh": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(4)", + "vs_lhp_as_rhh": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(5)", + "vs_rhp": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhp_home": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(2)", + "vs_rhp_away": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(3)", + "vs_rhp_as_lhh": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(4)", + "vs_rhp_as_rhh": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(5)", + "pitching_as_sp": ".quick-splits > div:nth-child(2) > div:nth-child(1) .fgButton:nth-child(1)", + "pitching_as_rp": ".quick-splits > div:nth-child(2) > div:nth-child(1) .fgButton:nth-child(2)", + "pitching_home": ".quick-splits > div:nth-child(2) > div:nth-child(2) > .fgButton:nth-child(1)", + "pitching_away": ".quick-splits > div:nth-child(2) > div:nth-child(2) > .fgButton:nth-child(2)", + "vs_lhh": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(1)", + "vs_lhh_home": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(2)", + "vs_lhh_away": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(3)", + "vs_lhh_as_rhp": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(4)", + "vs_lhh_as_lhp": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(5)", + "vs_rhh": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_home": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_away": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_as_rhp": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_as_lhp": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)" + } + __switches = { + "split_teams": "#stack-buttons > div:nth-child(2)", + "auto_pt": "#stack-buttons > div:nth-child(3)" + } + + address = "https://fangraphs.com/leaders/splits-leaderboards" + + def __init__(self, *, browser="chromium"): + """ + :param browser: The name of the browser to use (Chromium, Firefox, WebKit) + + .. py:attribute:: page + The generated synchronous ``playwright`` page for browser automation. + + :type: playwright.sync_api._generated.Page + + .. py:attribute:: soup + The ``BeautifulSoup4`` HTML parser for scraping the webpage. + + :type: bs4.BeautifulSoup + """ + os.makedirs("out", exist_ok=True) + + self.__play = sync_playwright().start() + browsers = { + "chromium": self.__play.chromium, + "firefox": self.__play.firefox, + "webkit": self.__play.webkit + } + browser_ctx = browsers.get(browser.lower()) + if browser_ctx is None: + raise FanGraphs.exceptions.UnknownBrowserException(browser.lower()) + self.__browser = browser_ctx.launch() + self.page = self.__browser.new_page() + self.page.goto(self.address, timeout=0) + self.page.wait_for_selector(".fg-data-grid.undefined") + + self.soup = None + self.__refresh_parser() + + self.configure_filter_group("Show All") + self.configure("auto_pt", "False", autoupdate=True) + + def __refresh_parser(self): + """ + Re-initializes the ``bs4.BeautifulSoup`` object stored in :py:attr:`soup`. + Called when a page refresh is expected + """ + self.soup = bs4.BeautifulSoup( + self.page.content(), features="lxml" + ) + + @classmethod + def list_queries(cls): + """ + Lists the possible filter queries which can be used to modify search results. + + :return: Filter queries which can be used to modify search results + :rtype: list + """ + queries = [] + queries.extend(list(cls.__selections)) + queries.extend(list(cls.__dropdowns)) + queries.extend(list(cls.__splits)) + queries.extend(list(cls.__switches)) + return queries + + def list_options(self, query: str): + """ + Lists the possible options which the filter query can be configured to. + + :param query: The filter query + :return: Options which the filter query can be configured to + :rtype: list + :raises FanGraphs.exceptions.InvalidFilterQuery: Argument ``query`` is invalid + """ + query = query.lower() + if query in self.__selections: + elems = [ + self.soup.select(s)[0] + for s in self.__selections[query] + ] + options = [e.getText() for e in elems] + elif query in self.__dropdowns: + selector = f"{self.__dropdowns[query]} ul li" + elems = self.soup.select(selector) + options = [e.getText() for e in elems] + elif query in self.__splits: + selector = f"{self.__splits[query]} ul li" + elems = self.soup.select(selector) + options = [e.getText() for e in elems] + elif query in self.__switches: + options = ["True", "False"] + else: + raise FanGraphs.exceptions.InvalidFilterQueryException(query) + return options + + def current_option(self, query: str): + """ + Retrieves the option(s) which the filter query is currently set to. + + Most dropdown- and split-class filter queries can be configured to multiple options. + For those filter classes, a list is returned, while other filter classes return a string. + + - Selection-class: ``str`` + - Dropdown-class: ``list`` + - Split-class: ``list`` + - Switch-class: ``str`` + + :param query: The filter query being retrieved of its current option + :return: The option(s) which the filter query is currently set to + :rtype: str or list + :raises FanGraphs.exceptions.InvalidFilterQuery: Argument ``query`` is invalid + """ + query = query.lower() + option = [] + if query in self.__selections: + for sel in self.__selections[query]: + elem = self.soup.select(sel)[0] + if "isActive" in elem.get("class"): + option = elem.getText() + break + elif query in self.__dropdowns: + elems = self.soup.select( + f"{self.__dropdowns[query]} ul li" ) - )[index] + for elem in elems: + if "highlight-selection" in elem.get("class"): + option.append(elem.getText()) + elif query in self.__splits: + elems = self.soup.select( + f"{self.__splits[query]} ul li" + ) + for elem in elems: + if "highlight-selection" in elem.get("class"): + option.append(elem.getText()) + elif query in self.__switches: + elem = self.soup.select(self.__switches[query]) + option = "True" if "isActive" in elem[0].get("class") else "False" + else: + raise FanGraphs.exceptions.InvalidFilterQueryException(query) + return option + + def configure(self, query: str, option: str, *, autoupdate=False): + """ + Configures a filter query ``query`` to a specified option ``option``. + + :param query: The filter query to be configured + :param option: The option to set the filter query to + :param autoupdate: If ``True``, :py:meth:`update` will be called following configuration + :raises FanGraphs.exceptions.InvalidFilterQueryException: Argument ``query`` is invalid + """ + self.__close_ad() + query = query.lower() + if query in self.__selections: + self.__configure_selection(query, option) + elif query in self.__dropdowns: + self.__configure_dropdown(query, option) + elif query in self.__splits: + self.__configure_split(query, option) + elif query in self.__switches: + self.__configure_switch(query, option) + else: + raise FanGraphs.exceptions.InvalidFilterQueryException(query) + if autoupdate: + self.update() + self.__refresh_parser() + + def __configure_selection(self, query: str, option: str): + """ + Configures a selection-class filter query ``query`` to an option ``option`` + + :param query: The selection-class filter query to be configured + :param option: The option to set the filter query to + :raises FanGraphs.exceptions.InvalidFilterOptionException: Argument ``option`` is invalid + """ + options = self.list_options(query) try: - elem.click() - except exceptions.ElementNotInteractableException: - open_pitch_type_sublevel() - elem.click() + index = options.index(option) + except ValueError: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + self.page.click(self.__selections[query][index]) - def __submit_form(self, query): + def __configure_dropdown(self, query: str, option: str): """ - Clicks the button element which submits the search query. + Configures a dropdown-class filter query ``query`` to an option ``option``. - :param query: The filter query which has an attached form submission button + :param query: The dropdown-class filter query to be configured + :param option: The option to set the filter query to + :raises FanGraphs.exceptions.InvalidFilterOptionException: Argument ``option`` is invalid """ - elem = self.browser.find_element_by_id( - self.__buttons.get(query) - ) + options = self.list_options(query) + try: + index = options.index(option) + except ValueError: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + self.page.hover(self.__dropdowns[query]) + elem = self.page.query_selector_all(f"{self.__dropdowns[query]} ul li")[index] elem.click() - def __close_popup(self): - pass + def __configure_split(self, query: str, option: str): + """ + Configures a split-class filter query ``query`` to an option ``option``. + Split-class filter queries are separated from dropdown-class filter queries. + This is solely because of the CSS selectors used. + + :param query: The split-class filter query to be configured + :param option: The option to configure the filter query to + :raises FanGraphs.exceptions.InvalidFilterOptionException: Argument ``option`` is invalid + """ + options = self.list_options(query) + try: + index = options.index(option) + except ValueError: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + self.page.hover(self.__splits[query]) + elem = self.page.query_selector_all(f"{self.__splits[query]} ul li")[index] + elem.click() + + def __configure_switch(self, query: str, option: str): + """ + Configures a switch-class filter query ``query`` to an option ``option``. + + :param query: The switch-class filter query to be configured + :param option: The option to configure the filter query to + :raises FanGraphs.exceptions.InvalidFilterOptionException: Argument ``option`` is invalid + """ + options = self.list_options(query) + if option not in options: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + if option != self.current_option(query)[0].title(): + self.page.click(self.__switches[query]) def __close_ad(self): """ Closes the ad which may interfere with clicking other page elements. """ - elem = self.browser.find_element_by_class_name( - "ezmob-footer-close" - ) + elem = self.page.query_selector(".ezmob-footer-close") + if elem and elem.is_visible(): + elem.click() + + def update(self): + """ + Clicks the **Update** button of the page. + All configured filters are submitted and the page is refreshed. + + :raises FanGraphs.exceptions.FilterUpdateIncapabilityWarning: No filter query configurations to update + """ + selector = "#button-update" + elem = self.page.query_selector(selector) + if elem is None: + raise FanGraphs.exceptions.FilterUpdateIncapabilityWarning() + self.__close_ad() elem.click() + self.__refresh_parser() - def quit(self): + def list_filter_groups(self): """ - Calls the ``quit()`` method of :py:attr:`browser`. + Lists the possible groups of filter queries which can be used + + :return: Names of the groups of filter queries + :rtype: list """ - self.browser.quit() + elems = self.soup.select(".fgBin.splits-bin-controller div") + groups = [e.getText() for e in elems] + return groups - def reset(self): + def configure_filter_group(self, group="Show All"): """ - Calls the ``get()`` method of :py:attr:`browser`, passing :py:attr:`address`. + Configures the available filters to the specified group of filters + + :param group: The name of the group of filters """ - self.browser.get(self.address) - response = urlopen(self.browser.current_url) - parser = etree.HTMLParser() - self.tree = etree.parse(response, parser) + selector = ".fgBin.splits-bin-controller div" + elems = self.soup.select(selector) + options = [e.getText() for e in elems] + try: + index = options.index(group) + except ValueError: + raise Exception + self.__close_ad() + elem = self.page.query_selector_all(selector)[index] + elem.click() + + def reset_filters(self): + """ + Resets filters to the original option(s). + This does not affect the following filter queries: + + - ``group`` + - ``stat`` + - ``type`` + - ``groupby`` + - ``preset_range`` + - ``auto_pt`` + - ``split_teams`` + """ + selector = "#stack-buttons div[class='fgButton small']:nth-last-child(1)" + elem = self.page.query_selector(selector) + if elem is None: + return + self.__close_ad() + elem.click() + + def list_quick_splits(self): + """ + Lists all the quick splits which can be used. + Quick splits allow for the configuration of multiple filter queries at once. - def export(self, name=""): + :return: All available quick splits + :rtype: list """ - Exports the current leaderboard as a CSV file. - The file will be saved to *./out*. - By default, the name of the file is **FanGraphs Leaderboard.csv**. - If ``name`` is not specified, the file will be the formatted ``datetime.datetime.now()``. + return list(self.__quick_splits) - :param name: The filename to rename the saved file to + def configure_quick_split(self, quick_split: str, autoupdate=True): + """ + Invokes the configuration of a quick split. + All filter queries affected by :py:meth:`reset_filters` are reset prior to configuration. + This action is performed by the FanGraphs API and cannot be prevented. + + :param quick_split: The quick split to invoke + :param autoupdate: If ``True``, :py:meth:`reset_filters` will be called + :raises FanGraphs.exceptions.InvalidQuickSplitsException: Argument ``quick_split`` if invalid + """ + quick_split = quick_split.lower() + try: + selector = self.__quick_splits[quick_split] + except KeyError: + raise FanGraphs.exceptions.InvalidQuickSplitException(quick_split) + self.__close_ad() + self.page.click(selector) + if autoupdate: + self.update() + + def export(self, path="", *, size="Infinity", sortby="", reverse=False): + """ + Scrapes and saves the data from the table of the current leaderboards + The data will be exported as a CSV file and the file will be saved to *out/*. + The file will be saved to the filepath ``path``, if specified. + Otherwise, the file will be saved to the filepath *out/%d.%m.%y %H.%M.%S.csv* + + *Note: This is a 'manual' export of the data. + In other words, the data is scraped from the table. + This is unlike other forms of export where a button is clicked. + Thus, there will be no record of a download when the data is exported.* + + :param path: The path to save the exported file to + :param size: The maximum number of rows of the table to export + :param sortby: The table header to sort the data by + :param reverse: If ``True``, the organization of the data will be reversed + :return: """ - if not name or os.path.splitext(name)[1] != ".csv": - name = "{}.csv".format( + self.page.hover(".data-export") + self.__close_ad() + self.__expand_table(size=size) + if sortby: + self.__sortby(sortby.title(), reverse=reverse) + if not path or os.path.splitext(path)[1] != ".csv": + path = "{}.csv".format( datetime.datetime.now().strftime("%d.%m.%y %H.%M.%S") ) - while True: - try: - WebDriverWait(self.browser, 20).until( - expected_conditions.element_to_be_clickable( - (By.ID, "LeaderBoard1_cmdCSV") - ) - ).click() - break - except exceptions.ElementClickInterceptedException: - self.__close_ad() - os.rename( - os.path.join("out", "FanGraphs Leaderboard.csv"), - os.path.join("out", name) - ) + with open(os.path.join("out", path), "w", newline="") as file: + writer = csv.writer(file) + self.__write_table_headers(writer) + self.__write_table_rows(writer) + def __expand_table(self, *, size="Infinity"): + """ + Expands the data table to the appropriate number of rows -class SplitsLeaderboards: + :param size: The maximum number of rows the table should have. + The number of rows is preset (30, 50, 100, 200, Infinity). + """ + selector = ".table-page-control:nth-child(3) select" + dropdown = self.page.query_selector(selector) + dropdown.click() + elems = self.soup.select(f"{selector} option") + options = [e.getText() for e in elems] + size = "Infinity" if size not in options else size + index = options.index(size) + option = self.page.query_selector_all(f"{selector} option")[index] + option.click() - def __init__(self): - pass + def __sortby(self, sortby, *, reverse=False): + """ + Sorts the data by the appropriate table header. + + :param sortby: The table header to sort the data by + :param reverse: If ``True``, the organization of the data will be reversed + """ + elems = self.soup.select(".table-scroll thead tr th") + options = [e.getText() for e in elems] + index = options.index(sortby) + elems[index].click() + if reverse: + elems[index].click() + + def __write_table_headers(self, writer: csv.writer): + """ + Writes the data table headers to the CSV file. + + :param writer: The ``csv.writer`` object + """ + elems = self.soup.select(".table-scroll thead tr th") + headers = [e.getText() for e in elems] + writer.writerow(headers) + + def __write_table_rows(self, writer: csv.writer): + """ + Iterates through the rows of the data table and writes the data in each row to the CSV file. + + :param writer: The ``csv.writer`` object + """ + row_elems = self.soup.select(".table-scroll tbody tr") + for row in row_elems: + elems = row.select("td") + items = [e.getText() for e in elems] + writer.writerow(items) + + def reset(self): + """ + Navigates to the webpage corresponding to :py:attr:`address`. + """ + self.page.goto(self.address) + self.__refresh_parser() + + def quit(self): + """ + Terminates the underlying ``playwright`` browser context. + """ + self.__browser.close() + self.__play.stop() class SeasonStatGrid: """ - Scrapes the FanGraphs Season Stat Grid webpage + Scrapes the FanGraphs Season Stat Grid webpage. + + .. py:attribute:: address + The base URL address of the Season Stat Grid page + + :type: str + :value: https://fangraphs.com/season-stat-grid """ - def __init__(self): - """ - .. py:attribute:: address - The base URL address of the Season Stat Grid page - :type: str - :value: https://fangraphs.com/season-stat-grid + __selections = { + "stat": [ + "div[class*='fgButton button-green']:nth-child(1)", + "div[class*='fgButton button-green']:nth-child(2)" + ], + "type": [ + "div[class*='fgButton button-green']:nth-child(4)", + "div[class*='fgButton button-green']:nth-child(5)", + "div[class*='fgButton button-green']:nth-child(6)" + ] + } + __dropdowns = { + "start_season": ".row-season > div:nth-child(2)", + "end_season": ".row-season > div:nth-child(4)", + "popular": ".season-grid-controls-dropdown-row-stats > div:nth-child(1)", + "standard": ".season-grid-controls-dropdown-row-stats > div:nth-child(2)", + "advanced": ".season-grid-controls-dropdown-row-stats > div:nth-child(3)", + "statcast": ".season-grid-controls-dropdown-row-stats > div:nth-child(4)", + "batted_ball": ".season-grid-controls-dropdown-row-stats > div:nth-child(5)", + "win_probability": ".season-grid-controls-dropdown-row-stats > div:nth-child(6)", + "pitch_type": ".season-grid-controls-dropdown-row-stats > div:nth-child(7)", + "plate_discipline": ".season-grid-controls-dropdown-row-stats > div:nth-child(8)", + "value": ".season-grid-controls-dropdown-row-stats > div:nth-child(9)" + } + address = "https://fangraphs.com/leaders/season-stat-grid" + + def __init__(self, *, browser="chromium"): + """ + :param browser: The name of the browser to use (Chromium, Firefox, WebKit) - .. py:attribute:: browser - The ``selenium`` automated Firefox browser for navigating the webpage. + .. py:attribute:: page + The generated synchronous ``playwright`` page for browser automation. - :type: selenium.webdriver.firefox.webdriver.WebDriver + :type: playwright.sync_api._generated.Page .. py:attribute:: soup The ``BeautifulSoup4`` HTML parser for scraping the webpage. :type: bs4.BeautifulSoup """ - self.__selections = { - "stat": [ - "div[class*='fgButton button-green']:nth-child(1)", - "div[class*='fgButton button-green']:nth-child(2)" - ], - "type": [ - "div[class*='fgButton button-green']:nth-child(4)", - "div[class*='fgButton button-green']:nth-child(5)", - "div[class*='fgButton button-green']:nth-child(6)" - ] - } - self.__dropdowns = { - "start_season": ".row-season > div:nth-child(2)", - "end_season": ".row-season > div:nth-child(4)", - "popular": ".season-grid-controls-dropdown-row-stats > div:nth-child(1)", - "standard": ".season-grid-controls-dropdown-row-stats > div:nth-child(2)", - "advanced": ".season-grid-controls-dropdown-row-stats > div:nth-child(3)", - "statcast": ".season-grid-controls-dropdown-row-stats > div:nth-child(4)", - "batted_ball": ".season-grid-controls-dropdown-row-stats > div:nth-child(5)", - "win_probability": ".season-grid-controls-dropdown-row-stats > div:nth-child(6)", - "pitch_type": ".season-grid-controls-dropdown-row-stats > div:nth-child(7)", - "plate_discipline": ".season-grid-controls-dropdown-row-stats > div:nth-child(8)", - "value": ".season-grid-controls-dropdown-row-stats > div:nth-child(9)" - } - self.address = "https://fangraphs.com/leaders/season-stat-grid" + os.makedirs("out", exist_ok=True) - self.browser = webdriver.Firefox( - options=compile_options() - ) - self.browser.get(self.address) - # Wait for JavaScript to render - WebDriverWait( - self.browser, 5 - ).until(expected_conditions.presence_of_element_located( - (By.ID, "root-season-grid") - )) + self.__play = sync_playwright().start() + browsers = { + "chromium": self.__play.chromium, + "firefox": self.__play.firefox, + "webkit": self.__play.webkit + } + browser_ctx = browsers.get(browser.lower()) + if browser_ctx is None: + raise FanGraphs.exceptions.UnknownBrowserException(browser.lower()) + self.__browser = browser_ctx.launch() + self.page = self.__browser.new_page() + self.page.goto(self.address) + self.page.wait_for_selector(".fg-data-grid.undefined") self.soup = None self.__refresh_parsers() def __refresh_parsers(self): """ - Re-initializes :py:attr:`soup` if a page reload is expected + Re-initializes the ``bs4.BeautifulSoup`` object stored in :py:attr:`soup`. + Called when a page refresh is expected """ self.soup = bs4.BeautifulSoup( - self.browser.page_source, features="lxml" + self.page.content(), features="lxml" ) - def list_queries(self): + @classmethod + def list_queries(cls): """ Lists the possible filter queries which can be sued to modify search results. @@ -436,8 +920,8 @@ def list_queries(self): :type: list """ queries = [] - queries.extend(list(self.__selections)) - queries.extend(list(self.__dropdowns)) + queries.extend(list(cls.__selections)) + queries.extend(list(cls.__dropdowns)) return queries def list_options(self, query: str): @@ -445,7 +929,7 @@ def list_options(self, query: str): Lists the possible options which the filter query can be configured to. :param query: The filter query - :return: Options which ``query`` can be configured to + :return: Options which the filter query can be configured to :rtyp: list :raises FanGraphs.exceptions.InvalidFilterQuery: Argument ``query`` is invalid """ @@ -462,7 +946,7 @@ def list_options(self, query: str): ) options = [e.getText() for e in elems] else: - raise FanGraphs.exceptions.InvalidFilterQuery(query) + raise FanGraphs.exceptions.InvalidFilterQueryException(query) return options def current_option(self, query: str): @@ -487,7 +971,7 @@ def current_option(self, query: str): ) option = elems[0].getText() if elems else "None" else: - raise FanGraphs.exceptions.InvalidFilterQuery(query) + raise FanGraphs.exceptions.InvalidFilterQueryException(query) return option def configure(self, query: str, option: str): @@ -500,145 +984,122 @@ def configure(self, query: str, option: str): :raises FanGraphs.exceptions.InvalidFilterOption: Filter ``query`` cannot be configured to ``option`` """ query = query.lower() - while True: - try: - if query in self.__selections: - self.__configure_selection(query, option) - elif query in self.__dropdowns: - self.__configure_dropdown(query, option) - else: - raise FanGraphs.exceptions.InvalidFilterQuery(query) - break - except exceptions.ElementClickInterceptedException: - self.__close_ad() + self.__close_ad() + if query in self.__selections: + self.__configure_selection(query, option) + elif query in self.__dropdowns: + self.__configure_dropdown(query, option) + else: + raise FanGraphs.exceptions.InvalidFilterQueryException(query) self.__refresh_parsers() def __configure_selection(self, query: str, option: str): """ - Configures a selection-class filter query to the option. + Configures a selection-class filter query to a specified option. :param query: The filter query :param option: The option to configure ``query`` to :raises FanGraphs.exceptions.InvalidFilterOption: Filter ``query`` cannot be configured to ``option`` """ options = self.list_options(query) - if option not in options: - raise FanGraphs.exceptions.InvalidFilterOption(query, option) - index = options.index(option) - elem = self.browser.find_element_by_css_selector( - self.__selections[query][index] - ) - elem.click() + try: + index = options.index(option) + except ValueError: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + self.page.click(self.__selections[query][index]) def __configure_dropdown(self, query: str, option: str): """ - Configures a dropdown-class filter query to the option. + Configures a dropdown-class filter query to a specified option. :param query: The filter query :param option: The option to configure ``query`` to :raises FanGraphs.exceptions.InvalidFilterOption: Filter ``query`` cannot be configured to ``option`` """ options = self.list_options(query) - if option not in options: - raise FanGraphs.exceptions.InvalidFilterOption(query, option) - index = options.index(option) - dropdown = self.browser.find_element_by_css_selector( - self.__dropdowns[query] - ) - dropdown.click() - elem = self.browser.find_elements_by_css_selector( - f"{self.__dropdowns[query]} li" - )[index] try: - elem.click() - except exceptions.ElementNotInteractableException: - actions = ActionChains(self.browser) - actions.move_to_element(elem).perform() - WebDriverWait(self.browser, 5).until( - expected_conditions.element_to_be_clickable( - (By.CSS_SELECTOR, f"{self.__dropdowns[query]} li") - ) - ).click() + index = options.index(option) + except ValueError: + raise FanGraphs.exceptions.InvalidFilterOptionException(query, option) + self.page.hover(self.__dropdowns[query]) + elem = self.page.query_selector_all(f"{self.__dropdowns[query]} ul li")[index] + elem.click() def __close_ad(self): """ Closes the ad which may interfere with interactions with other page elements """ - elem = self.browser.find_element_by_class_name( - "ezmob-footer-close" - ) - elem.click() + elem = self.page.query_selector(".ezmob-footer-close") + if elem and elem.is_visible(): + elem.click() - def export(self, name="", *, size="Infinity", sortby="Name", reverse=False): + def export(self, path="", *, size="Infinity", sortby="Name", reverse=False): """ - Exports the current leaderboard as a CSV file. - The file will be saved to *./out*. - If ``name`` is not specified, the file will take the following format: - ``datetime.datetime.now().strftime("%d.%m.%y %H.%M.%S")`` - - :param name: The filename to rename the exported file to - :param size: The number of rows, preset to 30, 50, 100, 200, or Infinity + Scrapes and saves the data from the table of the current leaderboards. + The data will be exported as a CSV file and the file will be saved to *out/*. + The file will be saved to the filepath ``path``, if specified. + Otherwise, the file will be saved to the filepath *out/%d.%m.%y %H.%M.%S.csv*. + + *Note: This is a 'manual' export of the data. + In other words, the data is scraped from the table. + This is unlike other forms of export where a button is clicked. + Thus, there will be no record of a download when the data is exported.* + + :param path: The path to save the exported file to + :param size: The maximum number of rows of the table to export :param sortby: The table header to sort the data by - :param reverse: If ``True``, the ordering of the data is reversed - """ - while True: - try: - self.__expand_table(size=size) - break - except exceptions.ElementClickInterceptedException: - self.__close_ad() - continue + :param reverse: If ``True``, the organization of the data will be reversed + """ + self.__close_ad() + self.__expand_table(size=size) self.__sortby(sortby.title(), reverse=reverse) - if not name or os.path.splitext(name)[1] != ".csv": - name = "{}.csv".format( + if not path or os.path.splitext(path)[1] != ".csv": + path = "{}.csv".format( datetime.datetime.now().strftime("%d.%m.%y %H.%M.%S") ) - with open(os.path.join("out", name), "w", newline="") as file: + with open(os.path.join("out", path), "w", newline="") as file: writer = csv.writer(file) self.__write_table_headers(writer) self.__write_table_rows(writer) def __expand_table(self, *, size="Infinity"): """ - Sets the data table size to the specified number of rows. + Expands the data table to the appropriate number of rows - :param size: The number of rows, preset to 30, 50, 100, 200 or Infinity + :param size: The maximum number of rows the table should have. + The number of rows is preset (30, 50, 100, 200, Infinity). """ selector = ".table-page-control:nth-child(3) select" - dropdown = self.browser.find_element_by_css_selector(selector) + dropdown = self.page.query_selector(selector) dropdown.click() elems = self.soup.select(f"{selector} option") options = [e.getText() for e in elems] size = "Infinity" if size not in options else size index = options.index(size) - option = self.browser.find_elements_by_css_selector( - f"{selector} option" - )[index] + option = self.page.query_selector_all(f"{selector} option")[index] option.click() def __sortby(self, sortby, *, reverse=False): """ - Sorts the data in the table to the specified table header + Sorts the data by the appropriate table header. :param sortby: The table header to sort the data by - :param reverse: If ``True``, the ordering of the data will be reversed + :param reverse: If ``True``, the organizatino of the data will be reversed """ selector = ".table-scroll thead tr th" elems = self.soup.select(selector) options = [e.getText() for e in elems] index = options.index(sortby) - option = self.browser.find_elements_by_css_selector( - selector - )[index] + option = self.page.query_selector_all(selector)[index] option.click() if reverse: option.click() def __write_table_headers(self, writer: csv.writer): """ - Writes the data table headers + Writes the data table headers to the CSV file. - :param writer: The csv.writer object + :param writer: The ``csv.writer`` object """ selector = ".table-scroll thead tr th" elems = self.soup.select(selector) @@ -647,9 +1108,9 @@ def __write_table_headers(self, writer: csv.writer): def __write_table_rows(self, writer: csv.writer): """ - Writes each row of the data table + Iterates through the rows of the data table and writes the data in each row to the CSV file. - :param writer: The csv.writer object + :param writer: The ``csv.writer`` object """ selector = ".table-scroll tbody tr" row_elems = self.soup.select(selector) @@ -660,16 +1121,17 @@ def __write_table_rows(self, writer: csv.writer): def reset(self): """ - Calls the ``get()`` method of :py:attr:`browser`, passing :py:attr:`address`. + Navigates to the webpage corresponding to :py:attr:`address`. """ - self.browser.get(self.address) + self.page.goto(self.address) self.__refresh_parsers() def quit(self): """ - Calls the ``quit()`` method of :py:attr:`browser` + Terminates the underlying ``playwright`` browser context. """ - self.browser.quit() + self.__browser.close() + self.__play.stop() class GameSpanLeaderboards: @@ -678,7 +1140,7 @@ def __init__(self): pass -class KBOLeaderboards: +class InternationalLeaderboards: def __init__(self): pass diff --git a/FanGraphs/projections.py b/FanGraphs/projections.py deleted file mode 100644 index e69de29..0000000 diff --git a/FanGraphs/tests/__init__.py b/FanGraphs/tests/__init__.py new file mode 100644 index 0000000..9fcacb0 --- /dev/null +++ b/FanGraphs/tests/__init__.py @@ -0,0 +1,2 @@ +#! python3 +# tests/__init__.py diff --git a/FanGraphs/tests/test_leaders.py b/FanGraphs/tests/test_leaders.py new file mode 100644 index 0000000..66fb809 --- /dev/null +++ b/FanGraphs/tests/test_leaders.py @@ -0,0 +1,601 @@ +#! python3 +# tests/test_leaders.py + +import bs4 +from playwright.sync_api import sync_playwright +import pytest +import requests + + +class TestMajorLeagueLeaderboards: + """ + Tests the attributes and methods in :py:class:`FanGraphs.leaders.MajorLeagueLeaderboards`. + The docstring in each test identifies the attribute(s)/method(s) being tested. + """ + + __selections = { + "group": "#LeaderBoard1_tsGroup", + "stat": "#LeaderBoard1_tsStats", + "position": "#LeaderBoard1_tsPosition", + "type": "#LeaderBoard1_tsType" + } + __dropdowns = { + "league": "#LeaderBoard1_rcbLeague_Input", + "team": "#LeaderBoard1_rcbTeam_Input", + "single_season": "#LeaderBoard1_rcbSeason_Input", + "split": "#LeaderBoard1_rcbMonth_Input", + "min_pa": "#LeaderBoard1_rcbMin_Input", + "season1": "#LeaderBoard1_rcbSeason1_Input", + "season2": "#LeaderBoard1_rcbSeason2_Input", + "age1": "#LeaderBoard1_rcbAge1_Input", + "age2": "#LeaderBoard1_rcbAge2_Input" + } + __dropdown_options = { + "league": "#LeaderBoard1_rcbLeague_DropDown", + "team": "#LeaderBoard1_rcbTeam_DropDown", + "single_season": "#LeaderBoard1_rcbSeason_DropDown", + "split": "#LeaderBoard1_rcbMonth_DropDown", + "min_pa": "#LeaderBoard1_rcbMin_DropDown", + "season1": "#LeaderBoard1_rcbSeason1_DropDown", + "season2": "#LeaderBoard1_rcbSeason2_DropDown", + "age1": "#LeaderBoard1_rcbAge1_DropDown", + "age2": "#LeaderBoard1_rcbAge2_DropDown" + } + __checkboxes = { + "split_teams": "#LeaderBoard1_cbTeams", + "active_roster": "#LeaderBoard1_cbActive", + "hof": "#LeaderBoard1_cbHOF", + "split_seasons": "#LeaderBoard1_cbSeason", + "rookies": "#LeaderBoard1_cbRookie" + } + __buttons = { + "season1": "#LeaderBoard1_btnMSeason", + "season2": "#LeaderBoard1_btnMSeason", + "age1": "#LeaderBoard1_cmdAge", + "age2": "#LeaderBoard1_cmdAge" + } + + address = "https://fangraphs.com/leaders.aspx" + + @classmethod + def setup_class(cls): + with sync_playwright() as p: + browser = p.chromium.launch() + page = browser.new_page() + page.goto(cls.address, timeout=0) + cls.soup = bs4.BeautifulSoup( + page.content(), features="lxml" + ) + browser.close() + + def test_address(self): + """ + Class attribute ``MajorLeagueLeaderboards.address``. + """ + res = requests.get(self.address) + assert res.status_code == 200 + + @pytest.mark.parametrize( + "selectors", + [__selections, __dropdown_options] + ) + def test_list_options(self, selectors: dict): + elem_count = { + "group": 3, "stat": 3, "position": 13, "type": 19, + "league": 3, "team": 31, "single_season": 151, "split": 67, + "min_pa": 60, "season1": 151, "season2": 151, "age1": 45, "age2": 45, + "split_teams": 2, "active_roster": 2, "hof": 2, "split_seasons": 2, + "rookies": 2 + } + for query, sel in selectors.items(): + elems = self.soup.select(f"{sel} li") + assert len(elems) == elem_count[query], query + assert all([isinstance(e.getText(), str) for e in elems]), query + + def test_current_option_selections(self): + """ + Instance method ``MajorLeagueLeaderboards.current_option``. + + Uses the selectors in: + + - ``MajorLeagueLeaderboards.__selections`` + """ + elem_text = { + "group": "Player Stats", "stat": "Batting", "position": "All", + "type": "Dashboard" + } + for query, sel in self.__selections.items(): + elem = self.soup.select(f"{sel} .rtsLink.rtsSelected") + assert len(elem) == 1, query + assert isinstance(elem[0].getText(), str), query + assert elem[0].getText() == elem_text[query] + + def test_current_option_dropdowns(self): + """ + Instance method ``MajorLeagueLeaderboards.current_option``. + + Uses the selectors in: + + - ``MajorLeagueLeaderboards.__dropdowns`` + """ + elem_value = { + "league": "All Leagues", "team": "All Teams", "single_season": "2020", + "split": "Full Season", "min_pa": "Qualified", "season1": "2020", + "season2": "2020", "age1": "14", "age2": "58" + } + for query, sel in self.__dropdowns.items(): + elem = self.soup.select(sel)[0] + assert elem.get("value") is not None, query + assert elem_value[query] == elem.get("value") + + @pytest.mark.parametrize( + "selectors", + [__selections, __dropdowns, __dropdown_options, + __checkboxes, __buttons] + ) + def test_configure(self, selectors: dict): + """ + Private instance method ``MajorLeagueLeaderboards.__configure_selection``. + Private instance method ``MajorLeagueLeaderboards.__configure_dropdown``. + Private instance method ``MajorLeagueLeaderboards.__configure_checkbox``. + Private instance method ``MajorLeagueLeaderboards.__click_button``. + + :param selectors: CSS Selectors + """ + for query, sel in selectors.items(): + elems = self.soup.select(sel) + assert len(elems) == 1, query + + def test_expand_sublevel(self): + """ + Statement in private instance method ``MajorLeagueLeaderboards.__configure_selection``. + """ + elems = self.soup.select("#LeaderBoard1_tsType a[href='#']") + assert len(elems) == 1 + + def test_export(self): + """ + Instance method ``MajorLeagueLeaderboards.export``. + """ + elems = self.soup.select("#LeaderBoard1_cmdCSV") + assert len(elems) == 1 + + +class TestSplitsLeaderboards: + """ + Tests the attributes and methods in :py:class:`FanGraphs.leaders.SplitsLeaderboards`. + The docstring in each test indentifies the attribute(s)/method(s) being tested. + """ + + __selections = { + "group": [ + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(1)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(2)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(3)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(4)" + ], + "stat": [ + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(6)", + ".fgBin.row-button > div[class*='button-green fgButton']:nth-child(7)" + ], + "type": [ + "#root-buttons-stats > div:nth-child(1)", + "#root-buttons-stats > div:nth-child(2)", + "#root-buttons-stats > div:nth-child(3)" + ] + } + __dropdowns = { + "time_filter": "#root-menu-time-filter > .fg-dropdown.splits.multi-choice", + "preset_range": "#root-menu-time-filter > .fg-dropdown.splits.single-choice", + "groupby": ".fg-dropdown.group-by" + } + __splits = { + "handedness": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(1)", + "home_away": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(2)", + "batted_ball": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(3)", + "situation": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(4)", + "count": ".fgBin:nth-child(1) > .fg-dropdown.splits.multi-choice:nth-child(5)", + "batting_order": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(1)", + "position": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(2)", + "inning": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(3)", + "leverage": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(4)", + "shifts": ".fgBin:nth-child(2) > .fg-dropdown.splits.multi-choice:nth-child(5)", + "team": ".fgBin:nth-child(3) > .fg-dropdown.splits.multi-choice:nth-child(1)", + "opponent": ".fgBin:nth-child(3) > .fg-dropdown.splits.multi-choice:nth-child(2)", + } + __quick_splits = { + "batting_home": ".quick-splits > div:nth-child(1) > div:nth-child(2) > .fgButton:nth-child(1)", + "batting_away": ".quick-splits > div:nth-child(1) > div:nth-child(2) > .fgButton:nth-child(2)", + "vs_lhp": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(1)", + "vs_lhp_home": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(2)", + "vs_lhp_away": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(3)", + "vs_lhp_as_lhh": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(4)", + "vs_lhp_as_rhh": ".quick-splits > div:nth-child(1) > div:nth-child(3) > .fgButton:nth-child(5)", + "vs_rhp": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhp_home": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(2)", + "vs_rhp_away": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(3)", + "vs_rhp_as_lhh": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(4)", + "vs_rhp_as_rhh": ".quick-splits > div:nth-child(1) > div:nth-child(4) > .fgButton:nth-child(5)", + "pitching_as_sp": ".quick-splits > div:nth-child(2) > div:nth-child(1) .fgButton:nth-child(1)", + "pitching_as_rp": ".quick-splits > div:nth-child(2) > div:nth-child(1) .fgButton:nth-child(2)", + "pitching_home": ".quick-splits > div:nth-child(2) > div:nth-child(2) > .fgButton:nth-child(1)", + "pitching_away": ".quick-splits > div:nth-child(2) > div:nth-child(2) > .fgButton:nth-child(2)", + "vs_lhh": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(1)", + "vs_lhh_home": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(2)", + "vs_lhh_away": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(3)", + "vs_lhh_as_rhp": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(4)", + "vs_lhh_as_lhp": ".quick-splits > div:nth-child(2) > div:nth-child(3) > .fgButton:nth-child(5)", + "vs_rhh": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_home": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_away": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_as_rhp": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)", + "vs_rhh_as_lhp": ".quick-splits > div:nth-child(2) > div:nth-child(4) > .fgButton:nth-child(1)" + } + __switches = { + "split_teams": "#stack-buttons > div:nth-child(2)", + "auto_pt": "#stack-buttons > div:nth-child(3)" + } + + address = "https://fangraphs.com/leaders/splits-leaderboards" + + @classmethod + def setup_class(cls): + """ + Initializes ``bs4.BeautifulSoup4`` object using ``playwright``. + """ + with sync_playwright() as p: + browser = p.chromium.launch() + page = browser.new_page() + page.goto(cls.address, timeout=0) + page.wait_for_selector(".fg-data-grid.undefined") + cls.soup = bs4.BeautifulSoup( + page.content(), features="lxml" + ) + browser.close() + + def test_address(self): + """ + Class attribute ``SplitsLeaderboards.address``. + """ + res = requests.get(self.address) + assert res.status_code == 200 + + def test_list_options_selections(self): + """ + Instance method ``SplitsLeaderboards.list_options``. + + Uses the selectors in: + + - ``SplitsLeaderboards.__selections`` + """ + elem_count = { + "group": 4, "stat": 2, "type": 3 + } + for query, sel_list in self.__selections.items(): + elems = [self.soup.select(s)[0] for s in sel_list] + assert len(elems) == elem_count[query] + assert all([e.getText() for e in elems]) + + @pytest.mark.parametrize( + "selectors", + [__dropdowns, __splits] + ) + def test_list_options(self, selectors: dict): + """ + Instance method ``SplitsLeaderboards.list_options``. + + Uses the selectors in: + + - ``SplitsLeaderboards.__dropdowns`` + - ``SplitsLeaderboards.__splits`` + + :param selectors: CSS selectors + """ + elem_count = { + "time_filter": 10, "preset_range": 12, "groupby": 5, + "handedness": 4, "home_away": 2, "batted_ball": 15, + "situation": 7, "count": 11, "batting_order": 9, "position": 12, + "inning": 10, "leverage": 3, "shifts": 3, "team": 32, + "opponent": 32, + } + for query, sel in selectors.items(): + elems = self.soup.select(f"{sel} li") + assert len(elems) == elem_count[query] + + def test_current_option_selections(self): + """ + Instance method ``SplitsLeaderboards.current_option``. + + Uses the selectors in: + + - ``SplitsLeaderboards.__selections`` + """ + elem_text = { + "group": "Player", "stat": "Batting", "type": "Standard" + } + for query, sel_list in self.__selections.items(): + elems = [] + for sel in sel_list: + elem = self.soup.select(sel)[0] + assert elem.get("class") is not None + elems.append(elem) + active = ["isActive" in e.get("class") for e in elems] + assert active.count(True) == 1, query + text = [e.getText() for e in elems] + assert elem_text[query] in text + + @pytest.mark.parametrize( + "selectors", + [__dropdowns, __splits, __switches] + ) + def test_current_option(self, selectors: dict): + """ + Instance method ``SplitsLeaderboards.current_option``. + + Uses the selectors in: + + - ``SplitsLeaderboards.__dropdowns`` + - ``SplitsLeaderboards.__splits`` + - ``SplitsLeaderboards.__switches`` + + :param selectors: CSS selectors + """ + for query, sel in selectors.items(): + elems = self.soup.select(f"{sel} li") + for elem in elems: + assert elem.get("class") is not None + + def test_configure_selection(self): + """ + Private instance method ``SplitsLeaderboards.__configure_selection``. + """ + for query, sel_list in self.__selections.items(): + for sel in sel_list: + elems = self.soup.select(sel) + assert len(elems) == 1, query + + @pytest.mark.parametrize( + "selectors", + [__dropdowns, __splits, __switches] + ) + def test_configure(self, selectors: dict): + """ + Private instance method ``SplitsLeaderboards.__configure_dropdown``. + Private instance method ``SplitsLeaderboards.__configure_split``. + Private instance method ``SplitsLeaderboards.__configure_switch``. + + :param selectors: CSS Selectors + """ + for query, sel in selectors.items(): + elems = self.soup.select(sel) + assert len(elems) == 1, query + + def test_update(self): + """ + Instance method ``SplitsLeaderboards.update``. + """ + elems = self.soup.select("#button-update") + assert len(elems) == 0 + + def test_list_filter_groups(self): + """ + Instance method ``SplitsLeaderboards.list_filter_groups``. + """ + elems = self.soup.select(".fgBin.splits-bin-controller div") + assert len(elems) == 4 + options = ["Quick Splits", "Splits", "Filters", "Show All"] + assert [e.getText() for e in elems] == options + + def test_configure_filter_group(self): + """ + Instance method ``SplitsLeaderboards.configure_filter_group``. + """ + groups = ["Quick Splits", "Splits", "Filters", "Show All"] + elems = self.soup.select(".fgBin.splits-bin-controller div") + assert len(elems) == 4 + assert [e.getText() for e in elems] == groups + + def test_reset_filters(self): + """ + Instance method ``SplitsLeaderboards.reset_filters``. + """ + elems = self.soup.select("#stack-buttons .fgButton.small:nth-last-child(1)") + assert len(elems) == 1 + + def test_configure_quick_split(self): + """ + Instance method ``SplitsLeaderboards.configure_quick_split``. + """ + for qsplit, sel in self.__quick_splits.items(): + elems = self.soup.select(sel) + assert len(elems) == 1, qsplit + + def test_expand_table(self): + """ + Private instance method ``SplitsLeaderboards.__expand_table``. + """ + elems = self.soup.select(".table-page-control:nth-child(3) select") + assert len(elems) == 1 + options = ["30", "50", "100", "200", "Infinity"] + assert [e.getText() for e in elems[0].select("option")] == options + + def test_sortby(self): + """ + Private instance method ``SplitsLeaderboards.__sortby``. + """ + elems = self.soup.select(".table-scroll thead tr th") + assert len(elems) == 24 + + def test_write_table_headers(self): + """ + Private instance method ``SplitsLeaderboards.__write_table_headers``. + """ + elems = self.soup.select(".table-scroll thead tr th") + assert len(elems) == 24 + + def test_write_table_rows(self): + """ + Private instance method ``SplitsLeaderboards.__write_table_rows``. + """ + elems = self.soup.select(".table-scroll tbody tr") + assert len(elems) == 30 + for elem in elems: + assert len(elem.select("td")) == 24 + + +class TestSeasonStatGrid: + """ + Tests the attributes and methods in :py:class:`FanGraphs.leaders.SeasonStatGrid`. + The docstring in each test indentifies the attribute(s)/method(s) being tested. + """ + __selections = { + "stat": [ + "div[class*='fgButton button-green']:nth-child(1)", + "div[class*='fgButton button-green']:nth-child(2)" + ], + "type": [ + "div[class*='fgButton button-green']:nth-child(4)", + "div[class*='fgButton button-green']:nth-child(5)", + "div[class*='fgButton button-green']:nth-child(6)" + ] + } + __dropdowns = { + "start_season": ".row-season > div:nth-child(2)", + "end_season": ".row-season > div:nth-child(4)", + "popular": ".season-grid-controls-dropdown-row-stats > div:nth-child(1)", + "standard": ".season-grid-controls-dropdown-row-stats > div:nth-child(2)", + "advanced": ".season-grid-controls-dropdown-row-stats > div:nth-child(3)", + "statcast": ".season-grid-controls-dropdown-row-stats > div:nth-child(4)", + "batted_ball": ".season-grid-controls-dropdown-row-stats > div:nth-child(5)", + "win_probability": ".season-grid-controls-dropdown-row-stats > div:nth-child(6)", + "pitch_type": ".season-grid-controls-dropdown-row-stats > div:nth-child(7)", + "plate_discipline": ".season-grid-controls-dropdown-row-stats > div:nth-child(8)", + "value": ".season-grid-controls-dropdown-row-stats > div:nth-child(9)" + } + address = "https://fangraphs.com/leaders/season-stat-grid" + + @classmethod + def setup_class(cls): + with sync_playwright() as p: + browser = p.chromium.launch() + page = browser.new_page() + page.goto(cls.address, timeout=0) + page.wait_for_selector(".fg-data-grid.undefined") + cls.soup = bs4.BeautifulSoup( + page.content(), features="lxml" + ) + browser.close() + + def test_address(self): + """ + Class attribute ``SeasonStatGrid.address`` + """ + res = requests.get(self.address) + assert res.status_code == 200 + + def test_list_options_selections(self): + """ + Instance method ``SeasonStatGrid.list_options``. + + Uses the following class attributes: + + - ``SeasonStatGrid.__selections`` + """ + elem_count = { + "stat": 2, "group": 3, "type": 3 + } + for query, sel_list in self.__selections.items(): + elems = [self.soup.select(s)[0] for s in sel_list] + assert len(elems) == elem_count[query] + assert all([e.getText() for e in elems]) + + def test_list_options_dropdowns(self): + """ + Instance method ``SeasonStatGrid.list_options``. + + Uses the following class attributes: + + - ``SeasonStatGrid.__dropdowns`` + """ + elem_count = { + "start_season": 71, "end_season": 71, "popular": 6, + "standard": 20, "advanced": 17, "statcast": 8, "batted_ball": 24, + "win_probability": 10, "pitch_type": 25, "plate_discipline": 25, + "value": 11 + } + for query, sel in self.__dropdowns.items(): + elems = self.soup.select(f"{sel} li") + assert len(elems) == elem_count[query], query + assert all([e.getText() for e in elems]) + + def test_current_option_selections(self): + """ + Instance method ``SeasonStatGrid.current_option``. + + Tests the following class attributes: + + - ``SeasonStatGrid.__selections`` + """ + selector = "div[class='fgButton button-green active isActive']" + elems = self.soup.select(selector) + assert len(elems) == 2 + + def test_current_options_dropdowns(self): + """ + Instance method ``SeasonStatGrid.current_option``. + + Uses the following class attributes: + + - ``SeasonStatGrid.__dropdowns`` + """ + for query, sel in self.__dropdowns.items(): + elems = self.soup.select( + f"{sel} li[class$='highlight-selection']" + ) + if query in ["start_season", "end_season", "popular", "value"]: + assert len(elems) == 1, query + assert elems[0].getText() is not None + else: + assert len(elems) == 0, query + + def test_configure_selection(self): + """ + Private instance method ``SeasonStatGrid.__configure_selection``. + """ + for query, sel_list in self.__selections.items(): + for sel in sel_list: + elems = self.soup.select(sel) + assert len(elems) == 1, query + + def test_configure_dropdown(self): + """ + Private instance method ``SeasonStatGrid.__configure_dropdown``. + """ + for query, sel in self.__dropdowns.items(): + elems = self.soup.select(sel) + assert len(elems) == 1, query + + def test_expand_table(self): + """ + Private instance method ``SeasonStatGrid.__expand_table`` + """ + elems = self.soup.select(".table-page-control:nth-child(3) select") + assert len(elems) == 1 + options = ["30", "50", "100", "200", "Infinity"] + assert [e.getText() for e in elems[0].select("option")] == options + + def test_write_table_headers(self): + """ + Private instance method ``SeasonStatGrid.__write_table_headers``. + """ + elems = self.soup.select(".table-scroll thead tr th") + assert len(elems) == 12 + + def test_write_table_rows(self): + """ + Private instance method ``SeasonStatGrid.__write_table_rows``. + """ + elems = self.soup.select(".table-scroll tbody tr") + assert len(elems) == 30 + for elem in elems: + assert len(elem.select("td")) == 12 diff --git a/README.md b/README.md index 3dfaf6e..d0927fd 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,99 @@ # FanGraphs-Export -![GitHub last commit](https://img.shields.io/github/last-commit/JLpython-py/FanGraphs-Export) -![GitHub last commit (branch)](https://img.shields.io/github/last-commit/JLpython-py/FanGraphs-Export/development) +
+ + +
-![GitHub milestone](https://img.shields.io/github/milestones/progress/JLpython-py/FanGraphs-Export/1) -![GitHub tag (latest SemVer)](https://img.shields.io/github/v/tag/JLpython-py/FanGraphs-Export) -![GitHub](https://img.shields.io/github/license/JLpython-py/FanGraphs-Export) + -![FanGraphs logo](https://i.pinimg.com/originals/4d/00/4d/4d004da06c49d287031664203af77f85.png) + -FanGraphs is a popular website among the baseball community. +FanGraphs is a popular website among the baseball analytic community. The website is most well-known for its vast coverage of statistics. -The `FanGraphs` package contains various modules for scraping and exporting data from various webpages. -There are not, by any means, intentions to cover the entire website. -The sheer amount of data makes it too difficult of a task. -However, there are plans to cover the most popular webpages. +The `FanGraphs` package contains various modules for scraping and exporting data from the most popoular of webpages. ## Installation ## Dependencies +- Python >= 3.6 +- BeautifulSoup4 4.9.3 +- lxml 4.6.3 +- Playwright 1.9.2 +- Pytest 6.2.2 +- Requests 2.25.1 + +To install all the necessary packages, run: + +```commandline +pip install -r requirements.txt +``` + +*Note: Per the [Playwright documentation](https://playwright.dev/python/docs/intro/), the browser binaries must be installed. +To install the browser binaries, run:* + +```comandline +playwright install +``` + ## Documentation ## Basic Usage +Each group of FanGraphs pages (e.g. Leaders, Projections, etc.) which is covered has an individual module. +Each webpage in each group of webpages has an individual class covering the page. + +FanGraphs webpage groups: + +- [Leaders](#Leaders) + +### Leaders + +FanGraphs Leaders pages: + +- [MajorLeagueLeaderboards](https://fangraphs.com/leaders.aspx) +- [SplitsLeaderboards](https://fangraphs.com/leaders/splits-leaderboards) +- [SeasonStatGrid](https://fangraphs.com/leaders/season-stat-grid) + +```python +from FanGraphs import leaders +mll = leaders.MajorLeagueLeaderboards() +splits = leaders.SplitsLeaderboards() +ssg = leaders.SeasonStatGrid() +``` + ## Tests +To run all tests, run: + +```commandline +pytest FanGraphs +``` + +To run the tests for a specific module, run: + +```commandline +pytest FanGraphs/test_module_name +``` + +For example, for testing the `FanGraphs.leaders` module: + +```commandline +pytest FanGraphs/test_leaders +``` + ## License diff --git a/functional_tests.py b/functional_tests.py deleted file mode 100644 index 9f0b822..0000000 --- a/functional_tests.py +++ /dev/null @@ -1,261 +0,0 @@ -#! python3 -# functional_tests.py - -import csv -import os -import random -import unittest -from urllib.request import urlopen - -from FanGraphs import exceptions -from FanGraphs import leaders - - -class TestExceptions(unittest.TestCase): - - def test_major_league_leaderboards(self): - parser = leaders.MajorLeagueLeaderboards() - - with self.assertRaises( - exceptions.InvalidFilterQuery - ): - parser.list_options("nonexistent query") - - with self.assertRaises( - exceptions.InvalidFilterQuery - ): - parser.current_option("nonexistent query") - - with self.assertRaises( - exceptions.InvalidFilterQuery - ): - parser.configure("nonexistent query", "nonexistent option") - - parser.quit() - - def test_season_stat_grid(self): - parser = leaders.SeasonStatGrid() - - with self.assertRaises( - exceptions.InvalidFilterQuery - ): - parser.list_options("nonexistent query") - - with self.assertRaises( - exceptions.InvalidFilterQuery - ): - parser.current_option("nonexistent query") - - with self.assertRaises( - exceptions.InvalidFilterQuery - ): - parser.configure("nonexistent query", "nonexistent option") - - with self.assertRaises( - exceptions.InvalidFilterOption - ): - parser.configure("Stat", "nonexistent option") - - parser.quit() - - -class TestMajorLeagueLeaderboards(unittest.TestCase): - - parser = leaders.MajorLeagueLeaderboards() - - @classmethod - def setUpClass(cls): - cls.base_url = cls.parser.browser.current_url - - @classmethod - def tearDownClass(cls): - cls.parser.quit() - for file in os.listdir("out"): - os.remove(os.path.join("out", file)) - os.rmdir("out") - - def test_init(self): - res = urlopen(self.parser.address) - self.assertEqual(res.getcode(), 200) - - self.assertTrue(self.parser.tree) - - self.assertTrue( - os.path.exists(os.path.join(os.getcwd(), "out")) - ) - - self.assertTrue(self.parser.browser) - - def test_list_queries(self): - queries = self.parser.list_queries() - self.assertIsInstance(queries, list) - self.assertTrue( - all([isinstance(q, str) for q in queries]) - ) - - def test_list_options(self): - query_count = { - "group": 3, "stat": 3, "position": 13, - "league": 3, "team": 31, "single_season": 151, "split": 67, - "min_pa": 60, "season1": 151, "season2": 151, "age1": 45, "age2": 45, - "split_teams": 2, "active_roster": 2, "hof": 2, "split_seasons": 2, - "rookies": 2 - } - for query in query_count: - options = self.parser.list_options(query) - self.assertIsInstance(options, list) - self.assertTrue( - all([isinstance(o, str) for o in options]) - or all([isinstance(o, bool) for o in options]) - ) - self.assertEqual( - len(options), - query_count[query], - (query, len(options)) - ) - - def test_current_option(self): - query_options = { - "group": "Player Stats", "stat": "Batting", "position": "All", - "league": "All Leagues", "team": "All Teams", "single_season": "2020", - "split": "Full Season", "min_pa": "Qualified", "season1": "2020", - "season2": "2020", "age1": "14", "age2": "58", "split_teams": "False", - "active_roster": "False", "hof": "False", "split_seasons": "False", - "rookies": "False" - } - for query in query_options: - option = self.parser.current_option(query) - self.assertEqual( - option, - query_options[query], - (query, option) - ) - - def test_configure(self): - queries = [ - "group", "stat", "position", "league", "team", "single_season", - "split", "min_pa", "season1", "season2", "age1", "age2", - "split_teams", "active_roster", "hof", "split_seasons", "rookies" - ] - for query in queries: - option = random.choice(self.parser.list_options(query)) - self.parser.configure(query, option) - if query not in ["season1", "season2", "age1", "age2"]: - current = self.parser.current_option(query) - self.assertEqual( - option, - current, - (query, option, current) - ) - self.parser.reset() - - def test_reset(self): - self.parser.browser.get("https://google.com") - self.parser.reset() - self.assertEqual( - self.parser.browser.current_url, - self.base_url - ) - - def test_export(self): - self.parser.export("test.csv") - self.assertTrue( - os.path.exists(os.path.join("out", "test.csv")) - ) - - -class TestSeasonStatGrid(unittest.TestCase): - - parser = leaders.SeasonStatGrid() - - @classmethod - def setUpClass(cls): - cls.base_url = cls.parser.browser.current_url - - @classmethod - def tearDownClass(cls): - for file in os.listdir("out"): - os.remove(os.path.join("out", file)) - os.rmdir("out") - cls.parser.quit() - - def test_init(self): - self.assertEqual( - urlopen(self.parser.address).getcode(), 200 - ) - self.assertTrue(os.path.exists("out")) - self.assertTrue(self.parser.browser) - self.assertTrue(self.parser.soup) - - def test_list_queries(self): - self.assertEqual( - len(self.parser.list_queries()), 13 - ) - - def test_list_options(self): - option_count = { - "stat": 2, "type": 3, "start_season": 71, "end_season": 71, - "popular": 6, "standard": 20, "advanced": 17, "statcast": 8, - "batted_ball": 24, "win_probability": 10, "pitch_type": 25, - "plate_discipline": 25, "value": 11 - } - for query in option_count: - self.assertEqual( - len(self.parser.list_options(query)), option_count[query], - query - ) - - def test_current_option(self): - current_options = { - "stat": "Batting", "type": "Normal", "start_season": "2011", - "end_season": "2020", "popular": "WAR", "standard": "None", - "advanced": "None", "statcast": "None", "batted_ball": "None", - "win_probability": "None", "pitch_type": "None", - "plate_discipline": "None", "value": "WAR" - } - for query in current_options: - self.assertEqual( - self.parser.current_option(query), current_options[query], - query - ) - - def test_configure(self): - self.parser.reset() - queries = self.parser.list_queries() - for query in queries: - option = self.parser.list_options(query)[-1] - self.parser.configure(query, option) - if query not in ["end_season"]: - self.assertEqual( - self.parser.current_option(query), option, - query - ) - self.parser.reset() - - def test_export(self): - self.parser.reset() - self.parser.export("test.csv", size="30") - self.assertTrue( - os.path.exists(os.path.join("out", "test.csv")) - ) - with open(os.path.join("out", "test.csv")) as file: - reader = csv.reader(file) - data = list(reader) - self.assertEqual( - len(data), 31 - ) - self.assertTrue( - all([len(r) == 12 for r in data]) - ) - - def test_reset(self): - self.parser.browser.get("https://google.com") - self.parser.reset() - self.assertEqual( - self.parser.browser.current_url, - self.base_url - ) - - -if __name__ == "__main__": - unittest.main() diff --git a/requirements.txt b/requirements.txt index 6f8a9ee..2d78683 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,10 +1,23 @@ +atomicwrites==1.4.0 +attrs==20.3.0 beautifulsoup4==4.9.3 certifi==2020.12.5 chardet==4.0.0 -cssselect==1.1.0 +colorama==0.4.4 +greenlet==1.0.0 idna==2.10 -lxml==4.6.2 +iniconfig==1.1.1 +lxml==4.6.3 +packaging==20.9 +playwright==1.9.2 +pluggy==0.13.1 +py==1.10.0 +pyee==8.1.0 +pyparsing==2.4.7 +pytest==6.2.2 requests==2.25.1 selenium==3.141.0 -soupsieve==2.2 +soupsieve==2.2.1 +toml==0.10.2 +typing-extensions==3.7.4.3 urllib3==1.26.3 diff --git a/tests/test_leaders.py b/tests/test_leaders.py deleted file mode 100644 index 2ed76d8..0000000 --- a/tests/test_leaders.py +++ /dev/null @@ -1,479 +0,0 @@ -#! python3 -# tests/leaders.py - -import unittest -from urllib.request import urlopen - -import bs4 -from lxml import etree -from selenium import webdriver -from selenium.webdriver.common.by import By -from selenium.webdriver.firefox.options import Options -from selenium.webdriver.support import expected_conditions -from selenium.webdriver.support.ui import WebDriverWait - - -@unittest.SkipTest -class TestMajorLeagueLeaderboards(unittest.TestCase): - - @classmethod - def setUpClass(cls): - cls.address = "https://fangraphs.com/leaders.aspx" - cls.response = urlopen(cls.address) - cls.parser = etree.HTMLParser() - cls.tree = etree.parse(cls.response, cls.parser) - - def test_selections_ids(self): - ids = [ - "LeaderBoard1_tsGroup", - "LeaderBoard1_tsStats", - "LeaderBoard1_tsPosition", - "LeaderBoard1_tsType" - ] - for i in ids: - elems = self.tree.xpath( - f"//div[@id='{i}']" - ) - self.assertEqual( - len(elems), 1, len(elems) - ) - - def test_dropdowns_ids(self): - ids = [ - "LeaderBoard1_rcbLeague_Input", - "LeaderBoard1_rcbTeam_Input", - "LeaderBoard1_rcbSeason_Input", - "LeaderBoard1_rcbMonth_Input", - "LeaderBoard1_rcbMin_Input", - "LeaderBoard1_rcbSeason1_Input", - "LeaderBoard1_rcbSeason2_Input", - "LeaderBoard1_rcbAge1_Input", - "LeaderBoard1_rcbAge2_Input" - ] - for i in ids: - elems = self.tree.xpath( - f"//input[@id='{i}']" - ) - self.assertEqual( - len(elems), 1, len(elems) - ) - - def test_dropdown_options_ids(self): - ids = [ - "LeaderBoard1_rcbLeague_DropDown", - "LeaderBoard1_rcbTeam_DropDown", - "LeaderBoard1_rcbSeason_DropDown", - "LeaderBoard1_rcbMonth_DropDown", - "LeaderBoard1_rcbMin_DropDown", - "LeaderBoard1_rcbSeason1_DropDown", - "LeaderBoard1_rcbSeason2_DropDown", - "LeaderBoard1_rcbAge1_DropDown", - "LeaderBoard1_rcbAge2_DropDown" - ] - for i in ids: - elems = self.tree.xpath( - f"//div[@id='{i}']" - ) - self.assertEqual( - len(elems), 1, len(elems) - ) - - def test_checkboxes_ids(self): - ids = [ - "LeaderBoard1_cbTeams", - "LeaderBoard1_cbActive", - "LeaderBoard1_cbHOF", - "LeaderBoard1_cbSeason", - "LeaderBoard1_cbRookie" - ] - for i in ids: - elems = self.tree.xpath( - f"//input[@id='{i}']" - ) - self.assertEqual( - len(elems), 1, len(elems) - ) - - def test_buttons_ids(self): - ids = [ - "LeaderBoard1_btnMSeason", - "LeaderBoard1_cmdAge" - ] - for i in ids: - elems = self.tree.xpath( - f"//input[@id='{i}']" - ) - self.assertEqual( - len(elems), 1, len(elems) - ) - - def test_base_url(self): - self.assertEqual( - urlopen("https://fangraphs.com/leaders.aspx").getcode(), - 200 - ) - - def test_list_options_dropdown_options_ids(self): - ids = [ - "LeaderBoard1_rcbLeague_DropDown", - "LeaderBoard1_rcbTeam_DropDown", - "LeaderBoard1_rcbSeason_DropDown", - "LeaderBoard1_rcbMonth_DropDown", - "LeaderBoard1_rcbMin_DropDown", - "LeaderBoard1_rcbSeason1_DropDown", - "LeaderBoard1_rcbSeason2_DropDown", - "LeaderBoard1_rcbAge1_DropDown", - "LeaderBoard1_rcbAge2_DropDown" - ] - for i in ids: - elems = self.tree.xpath( - f"//div[@id='{i}']//div//ul//li" - ) - self.assertTrue(elems) - elem_text = [e.text for e in elems] - self.assertTrue( - all([isinstance(t, str) for t in elem_text]) - ) - - def test_list_options_selections_ids(self): - ids = [ - "LeaderBoard1_tsGroup", - "LeaderBoard1_tsStats", - "LeaderBoard1_tsPosition", - "LeaderBoard1_tsType" - ] - for i in ids: - elems = self.tree.xpath( - f"//div[@id='{i}']//div//ul//li//a//span//span//span" - ) - self.assertTrue(elems) - elem_text = [e.text for e in elems] - self.assertTrue( - all([isinstance(t, str) for t in elem_text]) - ) - - def test_current_option_checkbox_ids(self): - ids = [ - "LeaderBoard1_cbTeams", - "LeaderBoard1_cbActive", - "LeaderBoard1_cbHOF", - "LeaderBoard1_cbSeason", - "LeaderBoard1_cbRookie" - ] - for i in ids: - elems = self.tree.xpath( - f"//input[@id='{i}']" - ) - self.assertEqual( - len(elems), 1, len(elems) - ) - - def test_current_option_dropdowns_ids(self): - ids = [ - "LeaderBoard1_rcbLeague_Input", - "LeaderBoard1_rcbTeam_Input", - "LeaderBoard1_rcbSeason_Input", - "LeaderBoard1_rcbMonth_Input", - "LeaderBoard1_rcbMin_Input", - "LeaderBoard1_rcbSeason1_Input", - "LeaderBoard1_rcbSeason2_Input", - "LeaderBoard1_rcbAge1_Input", - "LeaderBoard1_rcbAge2_Input" - ] - for i in ids: - elems = self.tree.xpath( - f"//input[@id='{i}']" - ) - self.assertEqual( - len(elems), 1 - ) - self.assertIsNotNone( - elems[0].get("value") - ) - - def test_current_option_selections_ids(self): - ids = [ - "LeaderBoard1_tsGroup", - "LeaderBoard1_tsStats", - "LeaderBoard1_tsPosition", - "LeaderBoard1_tsType" - ] - for i in ids: - elems = self.tree.xpath( - f"//div[@id='{i}']//div//ul//li//a[@class='rtsLink rtsSelected']//span//span//span" - ) - self.assertEqual( - len(elems), 1 - ) - - def test_config_dropdown_ids(self): - ids = [ - "LeaderBoard1_rcbLeague_Input", - "LeaderBoard1_rcbTeam_Input", - "LeaderBoard1_rcbSeason_Input", - "LeaderBoard1_rcbMonth_Input", - "LeaderBoard1_rcbMin_Input", - "LeaderBoard1_rcbSeason1_Input", - "LeaderBoard1_rcbSeason2_Input", - "LeaderBoard1_rcbAge1_Input", - "LeaderBoard1_rcbAge2_Input" - ] - for i in ids: - elems = self.tree.xpath("//@id") - self.assertIn(i, elems) - self.assertEqual( - elems.count(i), 1, elems.count(i) - ) - - def test_config_dropdown_options_ids(self): - ids = [ - "LeaderBoard1_rcbLeague_DropDown", - "LeaderBoard1_rcbTeam_DropDown", - "LeaderBoard1_rcbSeason_DropDown", - "LeaderBoard1_rcbMonth_DropDown", - "LeaderBoard1_rcbMin_DropDown", - "LeaderBoard1_rcbSeason1_DropDown", - "LeaderBoard1_rcbSeason2_DropDown", - "LeaderBoard1_rcbAge1_DropDown", - "LeaderBoard1_rcbAge2_DropDown" - ] - for i in ids: - elems = self.tree.xpath( - f"//div[@id='{i}']//div//ul//li" - ) - self.assertTrue(elems) - - def test_config_selection_ids(self): - ids = [ - "LeaderBoard1_tsGroup", - "LeaderBoard1_tsStats", - "LeaderBoard1_tsPosition", - "LeaderBoard1_tsType" - ] - for i in ids: - elems = self.tree.xpath( - f"//div[@id='{i}']//div//ul//li" - ) - self.assertTrue(elems) - - def test_submit_form_id(self): - ids = [ - "LeaderBoard1_btnMSeason", - "LeaderBoard1_cmdAge" - ] - for i in ids: - elems = self.tree.xpath("//@id") - self.assertIn(i, elems) - self.assertEqual( - elems.count(i), 1, elems.count(i) - ) - - def test_export_id(self): - self.assertIn( - "LeaderBoard1_cmdCSV", - self.tree.xpath("//@id") - ) - - -class TestSeasonStatGrid(unittest.TestCase): - - options = Options() - options.headless = True - browser = webdriver.Firefox(options=options) - - @classmethod - def setUpClass(cls): - cls.address = "https://www.fangraphs.com/leaders/season-stat-grid" - cls.browser.get(cls.address) - WebDriverWait( - cls.browser, 5 - ).until(expected_conditions.presence_of_element_located( - (By.ID, "root-season-grid") - )) - cls.soup = bs4.BeautifulSoup( - cls.browser.page_source, features="lxml" - ) - - @classmethod - def tearDownClass(cls): - cls.browser.quit() - - def test_base_address(self): - self.assertEqual( - urlopen(self.address).getcode(), 200 - ) - - def test_selections_selectors(self): - selectors = { - "stat": [ - "div[class*='fgButton button-green']:nth-child(1)", - "div[class*='fgButton button-green']:nth-child(2)" - ], - "type": [ - "div[class*='fgButton button-green']:nth-child(4)", - "div[class*='fgButton button-green']:nth-child(5)", - "div[class*='fgButton button-green']:nth-child(6)" - ] - } - for cat in selectors: - for sel in selectors[cat]: - elems = self.soup.select(sel) - self.assertEqual( - len(elems), 1, (cat, sel) - ) - - def test_dropdown_selectors(self): - selectors = { - "start_season": ".row-season > div:nth-child(2)", - "end_season": ".row-season > div:nth-child(4)", - "popular": ".season-grid-controls-dropdown-row-stats > div:nth-child(1)", - "standard": ".season-grid-controls-dropdown-row-stats > div:nth-child(2)", - "advanced": ".season-grid-controls-dropdown-row-stats > div:nth-child(3)", - "statcast": ".season-grid-controls-dropdown-row-stats > div:nth-child(4)", - "batted_ball": ".season-grid-controls-dropdown-row-stats > div:nth-child(5)", - "win_probability": ".season-grid-controls-dropdown-row-stats > div:nth-child(6)", - "pitch_type": ".season-grid-controls-dropdown-row-stats > div:nth-child(7)", - "plate_discipline": ".season-grid-controls-dropdown-row-stats > div:nth-child(8)", - "value": ".season-grid-controls-dropdown-row-stats > div:nth-child(9)" - } - for cat in selectors: - elems = self.soup.select(selectors[cat]) - self.assertEqual( - len(elems), 1, cat - ) - - def test_list_options_selections(self): - selectors = { - "stat": [ - "div[class*='fgButton button-green']:nth-child(1)", - "div[class*='fgButton button-green']:nth-child(2)" - ], - "type": [ - "div[class*='fgButton button-green']:nth-child(4)", - "div[class*='fgButton button-green']:nth-child(5)", - "div[class*='fgButton button-green']:nth-child(6)" - ] - } - for cat in selectors: - elems = [ - self.soup.select(sel)[0] - for sel in selectors[cat] - ] - options = [e.getText() for e in elems] - self.assertEqual( - len(options), len(selectors[cat]) - ) - - def test_list_options_dropdowns(self): - selectors = { - "start_season": ".row-season > div:nth-child(2)", - "end_season": ".row-season > div:nth-child(4)", - "popular": ".season-grid-controls-dropdown-row-stats > div:nth-child(1)", - "standard": ".season-grid-controls-dropdown-row-stats > div:nth-child(2)", - "advanced": ".season-grid-controls-dropdown-row-stats > div:nth-child(3)", - "statcast": ".season-grid-controls-dropdown-row-stats > div:nth-child(4)", - "batted_ball": ".season-grid-controls-dropdown-row-stats > div:nth-child(5)", - "win_probability": ".season-grid-controls-dropdown-row-stats > div:nth-child(6)", - "pitch_type": ".season-grid-controls-dropdown-row-stats > div:nth-child(7)", - "plate_discipline": ".season-grid-controls-dropdown-row-stats > div:nth-child(8)", - "value": ".season-grid-controls-dropdown-row-stats > div:nth-child(9)" - } - elem_count = { - "start_season": 71, "end_season": 71, "popular": 6, - "standard": 20, "advanced": 17, "statcast": 8, "batted_ball": 24, - "win_probability": 10, "pitch_type": 25, "plate_discipline": 25, - "value": 11 - } - for cat in selectors: - elems = self.soup.select( - f"{selectors[cat]} li" - ) - self.assertEqual( - len(elems), elem_count[cat] - ) - self.assertTrue( - all([e.getText() for e in elems]) - ) - - def test_current_option_selections(self): - selector = "div[class='fgButton button-green active isActive']" - elems = self.soup.select(selector) - self.assertEqual( - len(elems), 2 - ) - - def test_current_options_dropdowns(self): - selectors = { - "start_season": ".row-season > div:nth-child(2)", - "end_season": ".row-season > div:nth-child(4)", - "popular": ".season-grid-controls-dropdown-row-stats > div:nth-child(1)", - "standard": ".season-grid-controls-dropdown-row-stats > div:nth-child(2)", - "advanced": ".season-grid-controls-dropdown-row-stats > div:nth-child(3)", - "statcast": ".season-grid-controls-dropdown-row-stats > div:nth-child(4)", - "batted_ball": ".season-grid-controls-dropdown-row-stats > div:nth-child(5)", - "win_probability": ".season-grid-controls-dropdown-row-stats > div:nth-child(6)", - "pitch_type": ".season-grid-controls-dropdown-row-stats > div:nth-child(7)", - "plate_discipline": ".season-grid-controls-dropdown-row-stats > div:nth-child(8)", - "value": ".season-grid-controls-dropdown-row-stats > div:nth-child(9)" - } - for cat in selectors: - elems = self.soup.select( - f"{selectors[cat]} li[class$='highlight-selection']" - ) - if cat in ["start_season", "end_season", "popular", "value"]: - self.assertEqual( - len(elems), 1, cat - ) - self.assertIsNotNone( - elems[0].getText() - ) - else: - self.assertEqual( - len(elems), 0, cat - ) - - def test_expand_table_dropdown_selector(self): - selector = ".table-page-control:nth-child(3) select" - elems = self.soup.select(selector) - self.assertEqual( - len(elems), 1 - ) - - def test_expand_table_dropdown_options_selectors(self): - options = ["30", "50", "100", "200", "Infinity"] - selector = ".table-page-control:nth-child(3) select option" - elems = self.soup.select(selector) - self.assertEqual( - len(elems), 5 - ) - self.assertEqual( - [e.getText() for e in elems], options - ) - - def test_sortby_option_selectors(self): - selector = ".table-scroll thead tr th" - elems = self.soup.select(selector) - self.assertEqual( - len(elems), 12 - ) - - def test_write_table_headers_selector(self): - selector = ".table-scroll thead tr th" - elems = self.soup.select(selector) - self.assertEqual( - len(elems), 12 - ) - - def test_write_table_rows_selector(self): - selector = ".table-scroll tbody tr" - elems = self.soup.select(selector) - self.assertEqual( - len(elems), 30 - ) - for elem in elems: - item_elems = elem.select("td") - self.assertEqual(len(item_elems), 12) - - -if __name__ == "__main__": - unittest.main()