diff --git a/R/api.R b/R/api.R index 08c231b..3e8c801 100644 --- a/R/api.R +++ b/R/api.R @@ -23,7 +23,10 @@ #' #' @returns #' `s()` returns a `selenider_element` object. -#' `ss()` returns a `selenider_elements` object. +#' `ss()` returns a `selenider_elements` object. Note that this is not a list, +#' and you should be careful with the functions that you use with it. See the +#' advanced usage vignette for more details: +#' `vignette("advanced-usage", package = "selenider")`. #' #' @seealso #' * [find_element()] and [find_elements()] diff --git a/R/cache.R b/R/cache.R index 188485a..6f310f5 100644 --- a/R/cache.R +++ b/R/cache.R @@ -38,8 +38,8 @@ #' #' @seealso #' * [find_element()] and [find_elements()] to select elements. -#' * [element_list()], [find_each_element()] and [find_all_elements()] if you -#' want to iterate over an element collection. +#' * [as.list.selenider_elements()], [find_each_element()] and +#' [find_all_elements()] if you want to iterate over an element collection. #' #' @examplesIf selenider::selenider_available(online = FALSE) #' html <- " diff --git a/R/elem_filter.R b/R/elem_filter.R index 420a9e7..7ec953b 100644 --- a/R/elem_filter.R +++ b/R/elem_filter.R @@ -35,7 +35,7 @@ #' #' @seealso #' * [find_elements()] and [ss()] to get elements to filter. -#' * [is_present()] and other conditions for predicates for HTML elements. +#' * [is_present()] and other conditions for predicates on HTML elements. #' (If you scroll down to the *See also* section, you will find the rest). #' #' @examplesIf selenider::selenider_available(online = FALSE) diff --git a/R/find_elements.R b/R/find_elements.R index b8140c0..d4d7a11 100644 --- a/R/find_elements.R +++ b/R/find_elements.R @@ -16,15 +16,20 @@ #' `xpath`), the first element which satisfies every condition will be found. #' #' @returns -#' A `selenider_elements` object. +#' A `selenider_elements` object. Note that this is not a list, and you should +#' be careful with the functions that you use with it. See the advanced usage +#' vignette for more details: +#' `vignette("advanced-usage", package = "selenider")`. #' #' @seealso #' * [ss()] to quickly select multiple elements without specifying the session. -#' * [find_element()] to select multiple elements. +#' * [find_element()] to select a single element. #' * [selenider_session()] to begin a session. #' * [elem_children()] and family to select elements using their relative #' position in the DOM. #' * [elem_filter()] and [elem_find()] for filtering element collections. +#' * [as.list.selenider_elements()] to convert a `selenider_elements` object +#' to a list. #' #' @examplesIf selenider::selenider_available(online = FALSE) #' html <- " diff --git a/R/get_actual_element.R b/R/get_actual_element.R index 9b8ea46..8ad2517 100644 --- a/R/get_actual_element.R +++ b/R/get_actual_element.R @@ -25,7 +25,7 @@ #' * [s()], [ss()], [find_element()] and [find_elements()] to select selenider #' elements. #' * [elem_cache()] and [elem_cache()] to cache these values. -#' * The [Chrome Devtools Protocol documentation](https://chromedevtools.github.io/devtools-protocol/tot/) `r # nolint` +#' * The [Chrome Devtools Protocol documentation](https://chromedevtools.github.io/devtools-protocol/tot/) #' for the operations that can be performed using a backend node id. Note #' that this requires the [chromote::ChromoteSession] object, which can be #' retrieved using `$driver`. diff --git a/R/session-options.R b/R/session-options.R index 9ee20d5..ebdedf7 100644 --- a/R/session-options.R +++ b/R/session-options.R @@ -61,7 +61,8 @@ chromote_options <- function(headless = TRUE, #' @rdname chromote_options #' #' @param client_options A [selenium_client_options()] object. -#' @param server_options A [selenium_server_options()] object. +#' @param server_options A [selenium_server_options()] object, or `NULL` if you +#' don't want one to be created. #' #' @export selenium_options <- function(client_options = selenium_client_options(), diff --git a/README.md b/README.md index 7e7308f..2e84fff 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,3 @@ - # selenider @@ -10,6 +9,7 @@ coverage](https://codecov.io/gh/ashbythorpe/selenider/branch/main/graph/badge.svg)](https://app.codecov.io/gh/ashbythorpe/selenider?branch=main) [![CRAN status](https://www.r-pkg.org/badges/version/selenider)](https://CRAN.R-project.org/package=selenider) + Traditionally, automating a web browser is often unreliable, especially @@ -59,7 +59,7 @@ concise yet expressive code that is easy to read and easy to write: ## Installation -``` r +```r # Install selenider from CRAN install.packages("selenider") @@ -73,7 +73,7 @@ Additionally, you must install [selenium](https://ashbythorpe.github.io/selenium-r/). We recommend chromote, as it is quicker and easier to get up and running. -``` r +```r # Either: install.packages("chromote") @@ -91,7 +91,7 @@ is recommended. ## Usage -``` r +```r library(selenider) ``` @@ -99,7 +99,7 @@ The following code navigates to the [R project website](https://www.r-project.org/), finds the link to the CRAN mirror list, checks that the link is correct, and clicks the link element. -``` r +```r open_url("https://www.r-project.org/") s(".row") |> @@ -113,25 +113,25 @@ s(".row") |> Now that we’re in the mirror list page, let’s find the link to every CRAN mirror in the UK. -``` r +```r s("dl") |> find_elements("dt") |> elem_find(has_text("UK")) |> find_element(xpath = "./following-sibling::dd") |> find_elements("tr") |> + find_each_element("a") |> elem_expect(has_at_least(1)) |> as.list() |> lapply( \(x) x |> - find_element("a") |> elem_attr("href") ) #> [[1]] #> [1] "https://www.stats.bris.ac.uk/R/" -#> +#> #> [[2]] #> [1] "https://cran.ma.imperial.ac.uk/" -#> +#> #> [[3]] #> [1] "https://anorien.csc.warwick.ac.uk/CRAN/" ``` diff --git a/_pkgdown.yml b/_pkgdown.yml index cc50fb5..95d09a2 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -69,6 +69,7 @@ articles: contents: - unit-testing - with-rvest + - advanced-usage - title: Testing selenider desc: An article used to test selenider contents: diff --git a/man/chromote_options.Rd b/man/chromote_options.Rd index e168d31..a849c69 100644 --- a/man/chromote_options.Rd +++ b/man/chromote_options.Rd @@ -56,7 +56,8 @@ this to \code{FALSE}.} \item{client_options}{A \code{\link[=selenium_client_options]{selenium_client_options()}} object.} -\item{server_options}{A \code{\link[=selenium_server_options]{selenium_server_options()}} object.} +\item{server_options}{A \code{\link[=selenium_server_options]{selenium_server_options()}} object, or \code{NULL} if you +don't want one to be created.} \item{version}{The version of Selenium server to use.} diff --git a/man/elem_cache.Rd b/man/elem_cache.Rd index 52e06fc..e15089a 100644 --- a/man/elem_cache.Rd +++ b/man/elem_cache.Rd @@ -85,7 +85,7 @@ elem_click(button) \seealso{ \itemize{ \item \code{\link[=find_element]{find_element()}} and \code{\link[=find_elements]{find_elements()}} to select elements. -\item \code{\link[=element_list]{element_list()}}, \code{\link[=find_each_element]{find_each_element()}} and \code{\link[=find_all_elements]{find_all_elements()}} if you -want to iterate over an element collection. +\item \code{\link[=as.list.selenider_elements]{as.list.selenider_elements()}}, \code{\link[=find_each_element]{find_each_element()}} and +\code{\link[=find_all_elements]{find_all_elements()}} if you want to iterate over an element collection. } } diff --git a/man/elem_filter.Rd b/man/elem_filter.Rd index f8f638e..922457b 100644 --- a/man/elem_filter.Rd +++ b/man/elem_filter.Rd @@ -93,7 +93,7 @@ ss("button") |> \seealso{ \itemize{ \item \code{\link[=find_elements]{find_elements()}} and \code{\link[=ss]{ss()}} to get elements to filter. -\item \code{\link[=is_present]{is_present()}} and other conditions for predicates for HTML elements. +\item \code{\link[=is_present]{is_present()}} and other conditions for predicates on HTML elements. (If you scroll down to the \emph{See also} section, you will find the rest). } } diff --git a/man/find_elements.Rd b/man/find_elements.Rd index 89192df..7c494af 100644 --- a/man/find_elements.Rd +++ b/man/find_elements.Rd @@ -44,7 +44,10 @@ find_elements(x, ...) \item{name}{The name attribute of the element you want to select.} } \value{ -A \code{selenider_elements} object. +A \code{selenider_elements} object. Note that this is not a list, and you should +be careful with the functions that you use with it. See the advanced usage +vignette for more details: +\code{vignette("advanced-usage", package = "selenider")}. } \description{ Find every available HTML element using a CSS selector, an XPath, or a @@ -88,10 +91,12 @@ s("#outer-div") |> \seealso{ \itemize{ \item \code{\link[=ss]{ss()}} to quickly select multiple elements without specifying the session. -\item \code{\link[=find_element]{find_element()}} to select multiple elements. +\item \code{\link[=find_element]{find_element()}} to select a single element. \item \code{\link[=selenider_session]{selenider_session()}} to begin a session. \item \code{\link[=elem_children]{elem_children()}} and family to select elements using their relative position in the DOM. \item \code{\link[=elem_filter]{elem_filter()}} and \code{\link[=elem_find]{elem_find()}} for filtering element collections. +\item \code{\link[=as.list.selenider_elements]{as.list.selenider_elements()}} to convert a \code{selenider_elements} object +to a list. } } diff --git a/man/s.Rd b/man/s.Rd index c89546a..b1d0adc 100644 --- a/man/s.Rd +++ b/man/s.Rd @@ -22,7 +22,10 @@ ss(css = NULL, xpath = NULL, id = NULL, class_name = NULL, name = NULL) } \value{ \code{s()} returns a \code{selenider_element} object. -\code{ss()} returns a \code{selenider_elements} object. +\code{ss()} returns a \code{selenider_elements} object. Note that this is not a list, +and you should be careful with the functions that you use with it. See the +advanced usage vignette for more details: +\code{vignette("advanced-usage", package = "selenider")}. } \description{ Both \code{s()} and \code{ss()} allow you to select elements without specifying a diff --git a/vignettes/advanced-usage.Rmd b/vignettes/advanced-usage.Rmd new file mode 100644 index 0000000..3e4636b --- /dev/null +++ b/vignettes/advanced-usage.Rmd @@ -0,0 +1,229 @@ +--- +title: "Advanced usage of selenider" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Advanced usage of selenider} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +available <- selenider::selenider_available() +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = available +) +``` +```{r, eval = !available, include = FALSE} +message("Selenider is not available") +``` +selenider exposes some advanced features to allow for more complex automation. + +## Customizing the session creation + +```{r} +library(selenider) +``` + +[selenider_session()] is really just a wrapper around either +`chromote::ChromoteSession$new()`, or `selenium::selenium_server()` and +`selenium::SeleniumSession$new()`. selenider exposes arguments to these +functions (plus some additional options) via the `options` argument. + +The most common argument that you are going to want to use is `headless` in +`chromote_options()`: it allows you to run chromote in non-headless mode, +meaning that the browser you are controlling will be displayed: + +```{r eval=FALSE} +session <- selenider_session( + "chromote", + options = chromote_options(headless = TRUE) +) +``` + +Managing selenium options is a bit more complex, since you are can provide +options to the client `selenium_client_options()` and server +`selenium_server_options()`. One cool thing you can do is pass `NULL` into +the `server_options` parameter of `selenium_options()` to stop selenider +from creating its own server. This is useful if you have created a server +manually (using docker, for example): + +```{r eval=FALSE} +session <- selenider_session( + "selenium", + options = selenium_options( + server_options = NULL, # Stop selenider from creating a server + client_options = selenium_client_options( + host = "localhost", # Use the host and port of your manually created server + port = 4444L + ) + ) +) +``` + +## Accessing the underlying session + +While selenider provides a high level interface, sometimes you need to access +the underlying `chromote::ChromoteSession` or `selenium::SeleniumSession` to +perform more advanced tasks. The `driver` field of a `selenider_session()` +can be used to do this. + +This is especially useful for chromote, since much of the configuration is +done after the session is created: + +```{r eval=FALSE} +session <- selenider_session() + +chromote_session <- session$driver + +chromote_session$Browser$setDownloadBehavior( + behavior = "allow", + downloadPath = "" +) +``` + +## Accessing underlying elements + +Much like you can access the underlying chromote/selenium session behind a +selenider session, you can access the chromote/selenium element represented by +a `selenider_element`/`selenider_elements` object using `get_actual_element()` +and `get_actual_elements()`, respectively. + +If you are using chromote, the [backendNodeId](https://chromedevtools.github.io/devtools-protocol/tot/DOM/#type-BackendNodeId) +of the element is returned, while in selenium's case, the element is returned +as a `selenium::WebElement`. It's important to note that the element in this +form is no longer lazy, so should be used as soon as possible to avoid errors +as the page changes. + +## Element collections + +Let's use selenider to get every link element in the R Project's website. + +```{r} +open_url("https://www.r-project.org/") + +links <- ss("a") + +links +``` + +But what actually is `links`? In some ways, it acts like a list: + +```{r} +links[[1]] + +links[1:2] + +length(links) +``` + +But assuming it is a list in all scenarios can result in surprising behavior: + +```{r} +names(links) +``` + +To reveal why this is, let's emulate adding a new link to the page using +JavaScript. + +```{r} +execute_js_expr(" + const link = document.createElement('a'); + link.href = 'https://ashbythorpe.github.io/selenider/'; + link.innerText = 'Selenider'; + document.body.appendChild(link); +") +``` + +Now let's look at `links` again: + +```{r} +links + +links[[length(links)]] +``` + +`links` has been updated to include the new link! + +### A lazy list + +The core reason behind this strange behavior is selenider's promise of +*laziness*. This means that elements are only ever collected from the page right +before they are used by an *eager* function (`print()`, `elem_text()`, +`elem_click()`, etc.). The only thing a selenider element actually stores is +the *path* to an element (i.e. the set of steps you specified to reach the +element), rather than the element itself. + +This property offers an array of benefits when compared with the eager approach. +It offers a far more suitable representation of a constantly-changing webpage, +and as such side-steps many common errors encountered during web automation. +It also powers the automatic waiting feature that is also offered by selenider. + +The element collection, then, is a generalisation of this concept to sets of +elements. A `selenider_elements` object stores the path to its elements, but +not the elements itself. It therefore cannot be represented by a list; for one +thing, as seen above, it is necessarily unaware of its length. + +For all of the advantages of lazy elements, this choice of structure does come +with some caveats. The major one is that many list operations will not work on +an element collection; in fact, you should assume that any operation that works +on a list will not work on a `selenider_elements` object. This is in part due +to the fact that R does not natively support custom iterators. + +### So, what *can* I do? +selenider provides an API for working with element collections. All of the +methods below preserve the laziness of the element collection, meaning that +none of them will actually fetch any elements from the page until the resulting +element is used. + +* `elems[[x]]` and `elems[x]` work with *numeric* indices, including negative + numbers, allowing you to filter elements by position. +* `elem_filter()` and `elem_find()` allow you to filter an element collection + or find a single element based on a condition. +* `elem_flatten()` allow you to combine multiple elements or element collections + into a single collection. +* `find_each_element()` and `find_all_elements()` allow you to easily find + children of all the elements in a collection. + +As seen before, `length()` can be used on element collections to get the number +of elements. This is *not* lazy, meaning you shouldn't rely on this value to +always be accurate after it is called. + +However, sometimes you want to perform more complex operations on a set of +elements. One common example is iteration, either in a for loop or using +`lapply()`/`purrr::map()`. Iteration is an operation that goes against the idea +of a lazy collection: how do you iterate over a set that is constantly changing? + +In this situation, if you are willing to sacrifice some of the lazy properties +of an element collection, use `as.list()`. This function, when called on an +element collection `elems`, converts it to the following: + +```r +list(elems[[1]], elems[[2]], ..., elems[[n]]) +``` + +Where `n` is `length(elems)`. + +Notably, the elements of the list are still lazy, since `[[` preserves laziness +on element collections. However, the length of the list is not, since the call +to `length()` is not lazy. + +Since this is an actual list, it supports a much wider range of operations. +For example, in selenider's README, `as.list()` is used to iterate over a +collection of links to find their hyperlinks. Take a look at +`as.list.selenider_elements()` for more examples. + +## Forcing eager behaviour + +Sometimes it may be desirable to avoid the lazy behaviour of selenider's +elements. This is usually for performance reasons: you may have an element +represented by a long, complex set of steps, which needs to be used many times. +By default, selenider will follow the path every time the element is used, +which can end up being very slow, and may be redundant if you know the element's +position is unlikely to change. + +`elem_cache()` can be used to force an element or set of elements to be +retrieved from the DOM and stored, creating an "eager" element. Note the caveat +in the docs: further elements created using this element will not also be +eager, but will use this eager element as a starting point. diff --git a/vignettes/selenider.Rmd b/vignettes/selenider.Rmd index 2373c02..1334f4a 100644 --- a/vignettes/selenider.Rmd +++ b/vignettes/selenider.Rmd @@ -30,15 +30,23 @@ library(selenider) To use selenider, you must first start a session with `selenider_session()`. If you don't do this, it is done automatically for you, but you may want to change some of the options from their defaults (the backend, for example). Here, we use chromote as a backend (the default), and we set the timeout -to 10 seconds (the default is 4). +to 10 seconds (the default is 4). Finally, we'll use [chromote_options()] to set options that are +specific to chromote (here we want to disable headless mode, which will allow us to see the browser). -```{r} +```{r eval=FALSE} session <- selenider_session( "chromote", - timeout = 10 + timeout = 10, + options = chromote_options(headless = FALSE) ) ``` +```{r include=FALSE} +session <- selenider_session() +``` + + + The session, once created, will be set as the *local session* inside the current environment, meaning that in this case, it can be accessed anywhere in this script, and will be closed automatically when the script finishes running.