Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for header mappings in form submit. #327

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 30 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Based on latest (pre) release.
| authentication | Configure HTTP authentication. `basic` or `digest`. Use this with username and password fields. | False | | string |
| username | The username for accessing the url. | False | | string |
| password | The password for accessing the url. | False | | string |
| headers | The headers for the requests. | False | | template - list |
| headers | The headers for the requests. | False | | template - list |
| params | The query params for the requests. | False | | template - list |
| method | The method for the request. Either `POST` or `GET`. | False | GET | string |
| payload | Optional payload to send with a POST request. | False | | string |
Expand All @@ -109,18 +109,14 @@ Configure the sensors that will scrape the data.
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- |
| unique_id | Will be used as entity_id and enables editing the entity in the UI | False | | string |
| name | Friendly name for the sensor | False | | string |
| select | CSS selector used for retrieving the value of the sensor. Only required when `select_list` is not provided. | True | | string/template |
| select_list | CSS selector for multiple values of multiple elements which will be returned as csv. Only required when `select` is not provided. | True | | string/template |
| attribute | Attribute from the selected element to read as value | False | | string |
| value_template | Defines a template applied on the result of the selector to extract the value. For binary sensors, the sensor is on if the template evaluates as True | False | | string/template |
| | Shared fields from the [Selector](#Selector). | True | | |
| attributes | See [Sensor attributes](#sensor-attributes) | False | | list |
| unit_of_measurement | Defines the units of measurement of the sensor | False | | string |
| device_class | Sets the device_class for [sensors](https://www.home-assistant.io/integrations/sensor/) or [binary sensors](https://www.home-assistant.io/integrations/binary_sensor/) | False | | string |
| state_class | Defines the state class of the sensor, if any. (measurement, total or total_increasing) (not for binary_sensor) | False | None | string |
| icon | Defines the icon or a template for the icon of the sensor. The value of the selector (or value_template when given) is provided as input for the template. For binary sensors, the value is parsed in a boolean. | False | | string/template |
| picture | Contains a path to a local image and will set it as entity picture | False | | string |
| force_update | Sends update events even if the value hasn’t changed. Useful if you want to have meaningful value graphs in history. | False | False | boolean |
| on_error | See [On-error](#on-error) | False | | |

### Refresh button

Expand All @@ -135,18 +131,14 @@ Configure a refresh button to manually trigger scraping.

Configure the attributes on the sensor that can be set with additional scraping values.

| name | description | required | default | type |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- |
| name | Name of the attribute (will be slugified) | True | | string |
| select | CSS selector used for retrieving the value of the attribute. Only required when `select_list` is not provided. | True | | string/template |
| select_list | CSS selector for multiple values of multiple elements which will be returned as csv. Only required when `select` is not provided. | True | | string/template |
| attribute | Attribute from the selected element to read as value | False | | string |
| value_template | Defines a template applied on the result of the selector to extract the value | False | | string/template |
| on_error | See [On-error](#on-error) | False | | |
| name | description | required | default | type |
| -------------- | --------------------------------------------- | -------- | ------- | --------------- |
| name | Name of the attribute (will be slugified) | True | | string |
| | Shared fields from the [Selector](#Selector). | True | | |

### Form-submit

Configure the form-submit functionality which enables you to submit a (login) form before scraping a site. More details on how this works [can be found on the wiki.](https://github.com/danieldotnl/ha-multiscrape/wiki/Form-submit-functionality)
Configure the form-submit functionality which enables you to submit a (login) form before scraping a site. More details on how this works [can be found on the wiki](https://github.com/danieldotnl/ha-multiscrape/wiki/Form-submit-functionality).

| name | description | required | default | type |
| ----------------- | --------------------------------------------------------------------------------------------------------- | -------- | ------- | ------------------- |
Expand All @@ -156,6 +148,29 @@ Configure the form-submit functionality which enables you to submit a (login) fo
| input_filter | A list of input fields that should not be submitted with the form | False | | string - list |
| submit_once | Submit the form only once on startup instead of each scan interval | False | False | boolean |
| resubmit_on_error | Resubmit the form after a scraping error is encountered | False | True | boolean |
| header_mappings | See [Header Mappings](#Header-Mappings) | False | | list |

### Header Mappings

Configure the headers you want to be forwarded from scraping the [Form-submit](#form-submit) page to scraping the main page for sensor data. A common use case is to populate the `X-Login-Token` header which is the result of the login.

| name | description | required | default | type |
| -------------- | --------------------------------------------- | -------- | ------- | --------------- |
| name | Name of the header | True | | string |
| | Shared fields from the [Selector](#Selector). | True | | |


### Selector

Shared field used in multiple configs above. Used to define the scraping: how to extract a value from the page.

| name | description | required | default | type |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- |
| select | CSS selector used for retrieving the value of the attribute. Only required when `select_list` or `value_template` is not provided. | False | | string/template |
| select_list | CSS selector for multiple values of multiple elements which will be returned as csv. Only required when `select` or `value_template` is not provided. | False | | string/template |
| attribute | Attribute from the selected element to read as value. | False | | string |
| value_template | Defines a template applied to extract the value from the result of the selector (if provided) or raw page (if selector not provided) | False | | string/template |
| on_error | See [On-error](#on-error) | False | | |

### On-error

Expand Down
10 changes: 9 additions & 1 deletion custom_components/multiscrape/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,13 +119,21 @@ async def _async_process_config(hass: HomeAssistant, config) -> bool:

http = create_http_wrapper(config_name, conf, hass, file_manager)

scraper = create_scraper(config_name, conf, hass, file_manager)

form_submit_config = conf.get(CONF_FORM_SUBMIT)
form_submitter = None
if form_submit_config:
form_http = create_http_wrapper(config_name, form_submit_config, hass, file_manager)
parser = conf.get(CONF_PARSER)
form_submitter = create_form_submitter(
config_name, form_submit_config, hass, form_http, file_manager, parser
config_name,
form_submit_config,
hass,
form_http,
scraper,
file_manager,
parser,
)

scraper = create_scraper(config_name, conf, hass, file_manager)
Expand Down
1 change: 1 addition & 0 deletions custom_components/multiscrape/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
CONF_FORM_INPUT_FILTER = "input_filter"
CONF_FORM_SUBMIT_ONCE = "submit_once"
CONF_FORM_RESUBMIT_ERROR = "resubmit_on_error"
CONF_FORM_HEADER_MAPPINGS = "header_mappings"
CONF_LOG_RESPONSE = "log_response"
DEFAULT_PARSER = "lxml"

Expand Down
2 changes: 2 additions & 0 deletions custom_components/multiscrape/coordinator.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ async def get_content(self) -> str:
if self._form_submitter:
try:
result = await self._form_submitter.async_submit(resource)
form_headers = self._form_submitter.scrape_header_mappings()
self._http.set_form_headers(form_headers)

if result:
_LOGGER.debug(
Expand Down
25 changes: 24 additions & 1 deletion custom_components/multiscrape/form.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
import logging
from urllib.parse import urljoin

from homeassistant.const import CONF_NAME
from .const import CONF_FORM_HEADER_MAPPINGS


from bs4 import BeautifulSoup

from homeassistant.core import HomeAssistant
Expand All @@ -16,19 +20,23 @@
)
from .file import LoggingFileManager
from .http import HttpWrapper
from .selector import Selector


_LOGGER = logging.getLogger(__name__)


def create_form_submitter(config_name, config, hass, http, file_manager, parser):
def create_form_submitter(config_name, config, hass, http, scraper, file_manager, parser):
"""Create a form submitter instance."""
resource = config.get(CONF_RESOURCE)
select = config.get(CONF_FORM_SELECT)
input_values = config.get(CONF_FORM_INPUT)
input_filter = config.get(CONF_FORM_INPUT_FILTER)
resubmit_error = config.get(CONF_FORM_RESUBMIT_ERROR)
submit_once = config.get(CONF_FORM_SUBMIT_ONCE)
header_mapping_selectors = {}
for header_mapping_conf in config.get(CONF_FORM_HEADER_MAPPINGS):
header_mapping_selectors[header_mapping_conf.get(CONF_NAME)] = Selector(hass, header_mapping_conf)

return FormSubmitter(
config_name,
Expand All @@ -41,6 +49,8 @@ def create_form_submitter(config_name, config, hass, http, file_manager, parser)
input_filter,
submit_once,
resubmit_error,
header_mapping_selectors,
scraper,
parser,
)

Expand All @@ -60,6 +70,8 @@ def __init__(
input_filter,
submit_once,
resubmit_error,
header_mapping_selectors,
scraper,
parser,
):
"""Initialize FormSubmitter class."""
Expand All @@ -74,6 +86,8 @@ def __init__(
self._input_filter = input_filter
self._submit_once = submit_once
self._resubmit_error = resubmit_error
self._header_mapping_selectors = header_mapping_selectors
self._scraper = scraper
self._parser = parser
self._should_submit = True

Expand Down Expand Up @@ -150,11 +164,20 @@ async def async_submit(self, main_resource):
if self._submit_once:
self._should_submit = False

await self._scraper.set_content(response.text)

if not self._form_resource:
return response.text
else:
return None

def scrape_header_mappings(self):
"""Scrape header mappings."""
result = {}
for header_mapping_key in self._header_mapping_selectors:
result[header_mapping_key] = self._scraper.scrape(self._header_mapping_selectors[header_mapping_key], header_mapping_key)
return result

def _determine_submit_resource(self, action, main_resource):
resource = main_resource
if action and self._form_resource:
Expand Down
7 changes: 7 additions & 0 deletions custom_components/multiscrape/http.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ def __init__(
self._params_renderer = params_renderer
self._headers_renderer = headers_renderer
self._data_renderer = data_renderer
self._form_headers = None

def set_authentication(self, username, password, auth_type):
"""Set http authentication."""
Expand All @@ -86,11 +87,17 @@ def set_authentication(self, username, password, auth_type):
self._auth = (username, password)
_LOGGER.debug("%s # Authentication configuration processed", self._config_name)

def set_form_headers(self, form_headers):
"""Set form headers."""
self._form_headers = form_headers

async def async_request(self, context, resource, method=None, request_data=None):
"""Execute a HTTP request."""
data = request_data or self._data_renderer()
method = method or self._method or "GET"
headers = self._headers_renderer(None)
if self._form_headers:
headers.update(self._form_headers)
params = self._params_renderer(None)

_LOGGER.debug(
Expand Down
24 changes: 15 additions & 9 deletions custom_components/multiscrape/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
from .const import CONF_FORM_SELECT
from .const import CONF_FORM_SUBMIT
from .const import CONF_FORM_SUBMIT_ONCE
from .const import CONF_FORM_HEADER_MAPPINGS
from .const import CONF_LOG_RESPONSE
from .const import CONF_ON_ERROR
from .const import CONF_ON_ERROR_DEFAULT
Expand Down Expand Up @@ -90,15 +91,6 @@
vol.Optional(CONF_TIMEOUT, default=DEFAULT_TIMEOUT): cv.positive_int,
}

FORM_SUBMIT_SCHEMA = {
**HTTP_SCHEMA,
vol.Optional(CONF_FORM_SELECT): cv.string,
vol.Optional(CONF_FORM_INPUT): vol.Schema({cv.string: cv.string}),
vol.Optional(CONF_FORM_INPUT_FILTER, default=[]): cv.ensure_list,
vol.Optional(CONF_FORM_SUBMIT_ONCE, default=False): cv.boolean,
vol.Optional(CONF_FORM_RESUBMIT_ERROR, default=True): cv.boolean,
}

INTEGRATION_SCHEMA = {
**HTTP_SCHEMA,
vol.Optional(CONF_PARSER, default=DEFAULT_PARSER): cv.string,
Expand Down Expand Up @@ -128,6 +120,20 @@
vol.Optional(CONF_ON_ERROR): vol.Schema(ON_ERROR_SCHEMA),
}

FORM_HEADERS_MAPPING_SCHEMA = {vol.Required(CONF_NAME): cv.string, **SELECTOR_SCHEMA}

FORM_SUBMIT_SCHEMA = {
**HTTP_SCHEMA,
vol.Optional(CONF_FORM_SELECT): cv.string,
vol.Optional(CONF_FORM_INPUT): vol.Schema({cv.string: cv.string}),
vol.Optional(CONF_FORM_INPUT_FILTER, default=[]): cv.ensure_list,
vol.Optional(CONF_FORM_SUBMIT_ONCE, default=False): cv.boolean,
vol.Optional(CONF_FORM_RESUBMIT_ERROR, default=True): cv.boolean,
vol.Optional(CONF_FORM_HEADER_MAPPINGS, default=[]): vol.All(
cv.ensure_list, [vol.Schema(FORM_HEADERS_MAPPING_SCHEMA)]
),
}

SENSOR_ATTRIBUTE_SCHEMA = {vol.Required(CONF_NAME): cv.string, **SELECTOR_SCHEMA}

SENSOR_SCHEMA = {
Expand Down