Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form variables #374

Merged
merged 38 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
2b93062
[Initial merge with master after manual rebase] Adding support for he…
jeremicmilan Feb 3, 2024
5abc00b
init fixups
jeremicmilan Apr 6, 2024
355750b
init -> form refactor fixup
jeremicmilan Apr 6, 2024
776fc1d
coordinator fixup
jeremicmilan Apr 6, 2024
a07ef50
schema fixup
jeremicmilan Apr 6, 2024
25e954a
Adding docs for public functions.
jeremicmilan Apr 6, 2024
cb99aa7
header_mapping -> variables initial rename, no functionality change. …
jeremicmilan May 26, 2024
186a93f
rearranging
jeremicmilan May 26, 2024
d7b2b94
transitioning from header mappings to variables.
jeremicmilan May 26, 2024
852e95e
Updating README
jeremicmilan May 27, 2024
06f8164
README feedback
jeremicmilan Jun 8, 2024
e3940f4
Merge branch 'danieldotnl:master' into master
jeremicmilan Jun 8, 2024
274a40a
bumping pytest-homeassistant-custom-component version
jeremicmilan Jun 8, 2024
22ee3de
Merging value and variables
jeremicmilan Jun 8, 2024
1f55b35
storing form variables in coordinator instead of http.
jeremicmilan Jun 8, 2024
8766124
Reusing _form_variables stored in request manager (which is a part of…
jeremicmilan Jun 8, 2024
0198e35
Merge branch 'master' into feature/form-variables
jeremicmilan Jun 8, 2024
9455f09
removing unused _variables in Scraper
jeremicmilan Jun 9, 2024
d990f71
Removing unused _scraper in content request manager
jeremicmilan Jun 9, 2024
29877e7
Merge branch 'master' into feature/form-variables
jeremicmilan Jun 12, 2024
a54807a
Merge branch 'master' into feature/form-variables
jeremicmilan Jun 12, 2024
e0e02e2
Adding a getter for form variables in coordinator
jeremicmilan Jun 12, 2024
bab22ae
Returning None by default if value_template is None
jeremicmilan Jun 12, 2024
c765894
get_form_variables() -> form_variables property
jeremicmilan Jun 14, 2024
f94c3f3
Merge branch 'master' into feature/form-variables
jeremicmilan Jul 5, 2024
7edada1
Making changes in service.py safe (not cause errors if coordinator is…
jeremicmilan Jul 5, 2024
c04c74c
raising on error in _render
jeremicmilan Jul 5, 2024
6e28727
Moving create_scraper into create_form_submitter
jeremicmilan Jul 5, 2024
2da36f3
scraper is not always initialized in the form submit.
jeremicmilan Jul 5, 2024
3ead484
README: variables can be used in sensor adn attribute configuration
jeremicmilan Jul 6, 2024
5bbe14e
Merge branch 'danieldotnl:master' into feature/form-variables
jeremicmilan Jul 9, 2024
315ede3
reworded
jeremicmilan Jul 9, 2024
c6ebd01
Fixing a bug where value was overriden with introduction of variables…
jeremicmilan Jul 9, 2024
f39f540
fixing logs
jeremicmilan Jul 10, 2024
43cdfbb
Creating file manager for service to ease debugging.
jeremicmilan Jul 10, 2024
357e534
Fixing templates for services, as multiscrpae has custom handling of …
jeremicmilan Jul 10, 2024
bf51d63
removing debug log
jeremicmilan Jul 10, 2024
142f644
Removing unneeded local variable.
jeremicmilan Jul 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 49 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Based on latest (pre) release.
| authentication | Configure HTTP authentication. `basic` or `digest`. Use this with username and password fields. | False | | string |
| username | The username for accessing the url. | False | | string |
| password | The password for accessing the url. | False | | string |
| headers | The headers for the requests. | False | | template - list |
| headers | The headers for the requests. | False | | template - list |
| params | The query params for the requests. | False | | template - list |
| method | The method for the request. Either `POST` or `GET`. | False | GET | string |
| payload | Optional payload to send with a POST request. | False | | string |
Expand All @@ -109,18 +109,14 @@ Configure the sensors that will scrape the data.
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- |
| unique_id | Will be used as entity_id and enables editing the entity in the UI | False | | string |
| name | Friendly name for the sensor | False | | string |
| select | CSS selector used for retrieving the value of the sensor. Only required when `select_list` is not provided. | True | | string/template |
| select_list | CSS selector for multiple values of multiple elements which will be returned as csv. Only required when `select` is not provided. | True | | string/template |
| attribute | Attribute from the selected element to read as value | False | | string |
| value_template | Defines a template applied on the result of the selector to extract the value. For binary sensors, the sensor is on if the template evaluates as True | False | | string/template |
| | See [Selector](#Selector) fields | True | | |
| attributes | See [Sensor attributes](#sensor-attributes) | False | | list |
| unit_of_measurement | Defines the units of measurement of the sensor | False | | string |
| device_class | Sets the device_class for [sensors](https://www.home-assistant.io/integrations/sensor/) or [binary sensors](https://www.home-assistant.io/integrations/binary_sensor/) | False | | string |
| state_class | Defines the state class of the sensor, if any. (measurement, total or total_increasing) (not for binary_sensor) | False | None | string |
| icon | Defines the icon or a template for the icon of the sensor. The value of the selector (or value_template when given) is provided as input for the template. For binary sensors, the value is parsed in a boolean. | False | | string/template |
| picture | Contains a path to a local image and will set it as entity picture | False | | string |
| force_update | Sends update events even if the value hasn’t changed. Useful if you want to have meaningful value graphs in history. | False | False | boolean |
| on_error | See [On-error](#on-error) | False | | |

### Refresh button

Expand All @@ -135,18 +131,14 @@ Configure a refresh button to manually trigger scraping.

Configure the attributes on the sensor that can be set with additional scraping values.

| name | description | required | default | type |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- |
| name | Name of the attribute (will be slugified) | True | | string |
| select | CSS selector used for retrieving the value of the attribute. Only required when `select_list` is not provided. | True | | string/template |
| select_list | CSS selector for multiple values of multiple elements which will be returned as csv. Only required when `select` is not provided. | True | | string/template |
| attribute | Attribute from the selected element to read as value | False | | string |
| value_template | Defines a template applied on the result of the selector to extract the value | False | | string/template |
| on_error | See [On-error](#on-error) | False | | |
| name | description | required | default | type |
| ---- | ----------------------------------------- | -------- | ------- | ------ |
| name | Name of the attribute (will be slugified) | True | | string |
| | See [Selector](#Selector) fields | True | | |

### Form-submit

Configure the form-submit functionality which enables you to submit a (login) form before scraping a site. More details on how this works [can be found on the wiki.](https://github.com/danieldotnl/ha-multiscrape/wiki/Form-submit-functionality)
Configure the form-submit functionality which enables you to submit a (login) form before scraping a site. More details on how this works [can be found on the wiki](https://github.com/danieldotnl/ha-multiscrape/wiki/Form-submit-functionality).

| name | description | required | default | type |
| ----------------- | --------------------------------------------------------------------------------------------------------- | -------- | ------- | ------------------- |
Expand All @@ -156,6 +148,47 @@ Configure the form-submit functionality which enables you to submit a (login) fo
| input_filter | A list of input fields that should not be submitted with the form | False | | string - list |
| submit_once | Submit the form only once on startup instead of each scan interval | False | False | boolean |
| resubmit_on_error | Resubmit the form after a scraping error is encountered | False | True | boolean |
| variables | See [Form Variables](#Form-Variables) | False | | list |

### Form Variables

Configure the variables that will be scraped from the [`form_submit`](#form-submit) response. You will be able to use those values in the `value_template` of a header or a selector in the main configuration. A common use case is to populate the `X-Login-Token` header which is the result of the login.

| name | description | required | default | type |
| ---- | -------------------------------- | -------- | ------- | ------ |
| name | Name of the variable | True | | string |
| | See [Selector](#Selector) fields | True | | |

Example:

```yaml
multiscrape:
- resource: "https://somesiteyouwanttoscrape.com"
form_submit:
submit_once: True
resource: "https://authforsomesiteyouwanttoscrape.com"
input:
email: "<email>"
password: "<password>"
variables:
- name: token
value_template: "{{ ... }}"
headers:
X-Login-Token: "{{ token }}"
sensor: ...
```

### Selector

Used to configure scraping options.

| name | description | required | default | type |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- |
| select | CSS selector used for retrieving the value of the attribute. Only required when `select_list` or `value_template` is not provided. | False | | string/template |
| select_list | CSS selector for multiple values of multiple elements which will be returned as csv. Only required when `select` or `value_template` is not provided. | False | | string/template |
| attribute | Attribute from the selected element to read as value. | False | | string |
| value_template | Defines a template applied to extract the value from the result of the selector (if provided) or raw page (if selector not provided) | False | | string/template |
| on_error | See [On-error](#on-error) | False | | |

### On-error

Expand All @@ -176,7 +209,7 @@ Multiscrape also offers a `get_content` and a `scrape` service. `get_content` re
`scrape` does what it says. It scrapes a website and provides the sensors and attributes.

Both services accept the same configuration as what you would provide in your configuration yaml (what is described above), with a small but important caveat: if the service input contains templates, those are automatically parsed by home assistant when the service is being called. That is fine for templates like `resource` and `select`, but templates that need to be applied on the scraped data itself (like `value_template`), cannot be parsed when the service is called. Therefore you need to slightly alter the syntax and add a `!` in the middle. E.g. `{{` becomes `{!{` and `%}` becomes `%!}`. Multiscrape will then understand that this string needs to handled as a template after the service has been called.\
*If someone has a better solution, please let me know!*
_If someone has a better solution, please let me know!_

To call one of those services, go to 'Developer tools' in Home Assistant and then to 'services'. Find the `multiscrape.get_content` or `multiscrape.scrape` services and go to yaml mode. There you enter your configuration.
Example:
Expand Down
52 changes: 22 additions & 30 deletions custom_components/multiscrape/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,27 @@

import voluptuous as vol
from homeassistant.config_entries import ConfigEntry
from homeassistant.const import CONF_NAME

from homeassistant.const import Platform
from homeassistant.const import SERVICE_RELOAD, CONF_RESOURCE, CONF_RESOURCE_TEMPLATE
from homeassistant.const import (CONF_NAME, CONF_RESOURCE,
CONF_RESOURCE_TEMPLATE, SERVICE_RELOAD,
Platform)
from homeassistant.core import HomeAssistant

from homeassistant.exceptions import HomeAssistantError
from homeassistant.helpers import discovery
from homeassistant.helpers.reload import async_integration_yaml_config
from homeassistant.helpers.reload import async_reload_integration_platforms
from homeassistant.helpers.reload import (async_integration_yaml_config,
async_reload_integration_platforms)
from homeassistant.util import slugify

from .service import setup_config_services, setup_integration_services

from .const import CONF_FORM_SUBMIT
from .const import CONF_LOG_RESPONSE
from .const import CONF_PARSER
from .const import COORDINATOR
from .const import DOMAIN
from .const import PLATFORM_IDX
from .const import SCRAPER
from .const import SCRAPER_DATA
from .const import SCRAPER_IDX
from .coordinator import (
create_multiscrape_coordinator,
)
from .coordinator import create_content_request_manager
from .const import (CONF_FORM_SUBMIT, CONF_LOG_RESPONSE, CONF_PARSER,
COORDINATOR, DOMAIN, PLATFORM_IDX, SCRAPER, SCRAPER_DATA,
SCRAPER_IDX)
from .coordinator import (create_content_request_manager,
create_multiscrape_coordinator)
from .file import LoggingFileManager
from .form import create_form_submitter
from .http import create_http_wrapper
from .schema import COMBINED_SCHEMA, CONFIG_SCHEMA # noqa: F401
from .scraper import create_scraper
from .service import setup_config_services, setup_integration_services

_LOGGER = logging.getLogger(__name__)
PLATFORMS = [Platform.SENSOR, Platform.BINARY_SENSOR, Platform.BUTTON]
Expand Down Expand Up @@ -117,22 +106,25 @@ async def _async_process_config(hass: HomeAssistant, config) -> bool:
file_manager = LoggingFileManager(folder)
await hass.async_add_executor_job(file_manager.create_folders)

http = create_http_wrapper(config_name, conf, hass, file_manager)

form_submit_config = conf.get(CONF_FORM_SUBMIT)
form_submitter = None
if form_submit_config:
form_http = create_http_wrapper(config_name, form_submit_config, hass, file_manager)
parser = conf.get(CONF_PARSER)
form_http = create_http_wrapper(config_name, form_submit_config, hass, file_manager)
form_scraper = create_scraper(config_name, conf, hass, file_manager)
form_submitter = create_form_submitter(
config_name, form_submit_config, hass, form_http, file_manager, parser
config_name,
form_submit_config,
hass,
form_http,
form_scraper,
file_manager,
parser,
)

http = create_http_wrapper(config_name, conf, hass, file_manager)
scraper = create_scraper(config_name, conf, hass, file_manager)

request_manager = create_content_request_manager(
config_name, conf, hass, http, form_submitter
)
request_manager = create_content_request_manager(config_name, conf, hass, http, form_submitter, scraper)
coordinator = create_multiscrape_coordinator(
config_name,
conf,
Expand Down
1 change: 1 addition & 0 deletions custom_components/multiscrape/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
CONF_FORM_INPUT_FILTER = "input_filter"
CONF_FORM_SUBMIT_ONCE = "submit_once"
CONF_FORM_RESUBMIT_ERROR = "resubmit_on_error"
CONF_FORM_VARIABLES = "variables"
CONF_LOG_RESPONSE = "log_response"
DEFAULT_PARSER = "lxml"

Expand Down
Loading