Skip to content

Commit

Permalink
Merge branch 'master' into additional-requests
Browse files Browse the repository at this point in the history
Adjust the code in line with the refactoring of
ResponseData into HttpResopnse in this PR:
#30
  • Loading branch information
BurnzZ committed Mar 28, 2022
2 parents 396ab8e + 256a0c3 commit f769147
Show file tree
Hide file tree
Showing 21 changed files with 692 additions and 200 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@ TBR
``web_poet.HttpClient``.
* Introduced ``web_poet.Meta`` to pass arbitrary information
inside a Page Object.
* removed support for Python 3.6
* added support for Python 3.10
* Backward Incompatible Change:

* ``ResponseData`` is now ``HttpResponse`` which has a new
specific attribute types like ``HttpResponseBody`` and
``HttpResponseHeaders``.


0.1.1 (2021-06-02)
Expand Down
42 changes: 16 additions & 26 deletions docs/advanced/additional_requests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,24 +53,18 @@ A simple ``GET`` request
}
# Simulates clicking on a button that says "View All Images"
response: web_poet.ResponseData = await self.http_client.get(
response: web_poet.HttpResponse = await self.http_client.get(
f"https://api.example.com/v2/images?id={item['product_id']}"
)
page = web_poet.WebPage(response)
item["images"] = page.css(".product-images img::attr(src)").getall()
item["images"] = response.css(".product-images img::attr(src)").getall()
return item
There are a few things to take note in this example:

* A ``GET`` request can be done via :class:`~.HttpClient`'s
:meth:`~.HttpClient.get` method.
* We're now using the ``async/await`` syntax.
* The response is of type :class:`~.ResponseData`.

* Though in order to use :meth:`~.ResponseShortcutsMixin.css`
`(and other shortcut methods)` we'll need to feed it into
:class:`~.WebPage`.
* The response is of type :class:`~.HttpResponse`.

As the example suggests, we're performing an additional request that allows us
to extract more images in a product page that might not otherwise be possible.
Expand Down Expand Up @@ -110,7 +104,7 @@ Thus, additional requests inside the Page Object is typically needed for it:
}
# Simulates "scrolling" through a carousel that loads related product items
response: web_poet.responseData = await self.http_client.post(
response: web_poet.HttpResponse = await self.http_client.post(
url="https://www.api.example.com/related-products/",
headers={
'Host': 'www.example.com',
Expand All @@ -123,15 +117,12 @@ Thus, additional requests inside the Page Object is typically needed for it:
}
),
)
second_page = web_poet.WebPage(response)
related_product_ids = self.parse_related_product_ids(second_page)
item["related_product_ids"] = related_product_ids
item["related_product_ids"] = self.parse_related_product_ids(response)
return item
@staticmethod
def parse_related_product_ids(page: web_poet.WebPage) -> List[str]:
return page.css("#main .related-products ::attr(product-id)").getall()
def parse_related_product_ids(response: web_poet.HttpResponse) -> List[str]:
return response.css("#main .related-products ::attr(product-id)").getall()
Here's the key takeaway in this example:

Expand Down Expand Up @@ -171,12 +162,11 @@ Let's modify the example in the previous section to see how it can be done:
self.create_request(page_num=page_num)
for page_num in range(2, default_pagination_limit)
]
responses: List[web_poet.ResponseData] = await self.http_client.batch_requests(*requests)
pages = map(web_poet.WebPage, responses)
responses: List[web_poet.HttpResponse] = await self.http_client.batch_requests(*requests)
related_product_ids = [
product_id
for page in pages
for product_id in self.parse_related_product_ids(page)
for response in responses
for product_id in self.parse_related_product_ids(response)
]
item["related_product_ids"].extend(related_product_ids)
Expand All @@ -200,17 +190,17 @@ Let's modify the example in the previous section to see how it can be done:
)
@staticmethod
def parse_related_product_ids(page: web_poet.WebPage) -> List[str]:
return page.css("#main .related-products ::attr(product-id)").getall()
def parse_related_product_ids(response: web_poet.HttpResponse) -> List[str]:
return response.css("#main .related-products ::attr(product-id)").getall()
The key takeaways for this example are:

* A :class:`~.Request` can be instantiated to represent a Generic HTTP Request.
It only contains the HTTP Request information for now and isn't executed yet.
This is useful for creating factory methods to help create them without any
download execution at all.
* :class:`~.HttpClient`' has a :meth:`~.HttpClient.batch_requests` method that
can process a series of :class:`~.Request` instances.
* :class:`~.HttpClient` has a :meth:`~.HttpClient.batch_requests` method that
can process a list of :class:`~.Request` instances asynchronously together.

* Note that it can accept different types of :class:`~.Request` that might
not be related *(e.g. a mixture of* ``GET`` *and* ``POST`` *requests)*.
Expand Down Expand Up @@ -246,7 +236,7 @@ This can be set using:

.. code-block:: python
def request_implementation(r: web_poet.Request) -> web_poet.ResponseData:
def request_implementation(r: web_poet.Request) -> web_poet.HttpResponse:
...
from web_poet import request_backend_var
Expand All @@ -273,7 +263,7 @@ an :class:`~.HttpClient` instance:

.. code-block:: python
def request_implementation(r: web_poet.Request) -> web_poet.ResponseData:
def request_implementation(r: web_poet.Request) -> web_poet.HttpResponse:
...
from web_poet import HttpClient
Expand Down
19 changes: 9 additions & 10 deletions docs/advanced/meta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ data entirely depending on the needs of the developer.

If you can recall from the previous basic tutorials, one essential requirement of
Page Objects that inherit from :class:`~.WebPage` or :class:`~.ItemWebPage` would
be :class:`~.ResponseData`. This holds the HTTP response information that the
be :class:`~.HttpResponse`. This holds the HTTP response information that the
Page Object is trying to represent.

In order to standardize how to pass arbitrary information inside Page Objects,
we'll need to use :class:`~.Meta` similar on how we use :class:`~.ResponseData`
we'll need to use :class:`~.Meta` similar on how we use :class:`~.HttpResponse`
as a requirement to instantiate Page Objects:

.. code-block:: python
Expand All @@ -24,15 +24,15 @@ as a requirement to instantiate Page Objects:
@attr.define
class SomePage(web_poet.ItemWebPage):
# ResponseData is inherited from ItemWebPage
# The HttpResponse attribute is inherited from ItemWebPage
meta: web_poet.Meta
response = web_poet.ResponseData(...)
response = web_poet.HttpResponse(...)
meta = web_poet.Meta("arbitrary_value": 1234, "cool": True)
page = SomePage(response=response, meta=meta)
However, similar with :class:`~.ResponseData`, developers using :class:`~.Meta`
However, similar with :class:`~.HttpResponse`, developers using :class:`~.Meta`
shouldn't care about how they are being passed into Page Objects. This will
depend on the framework that would use **web-poet**.

Expand Down Expand Up @@ -109,11 +109,10 @@ Let's try an example wherein :class:`~.Meta` is able to control how
for page_num in range(2, max_pages + 1)
]
responses = await http_client.batch_requests(*requests)
pages = [self] + list(map(web_poet.WebPage, responses))
return [
product_url
for page in pages
for product_url in self.parse_product_urls(page)
for response in responses
for product_url in self.parse_product_urls(response)
]
@staticmethod
Expand All @@ -122,8 +121,8 @@ Let's try an example wherein :class:`~.Meta` is able to control how
return web_poet.Request(url=next_page_url)
@staticmethod
def parse_product_urls(page):
return page.css("#main .products a.link ::attr(href)").getall()
def parse_product_urls(response: web_poet.HttpResponse):
return response.css("#main .products a.link ::attr(href)").getall()
From the example above, we can see how :class:`~.Meta` is able to arbitrarily
limit the pagination behavior by passing an optional **max_pages** info. Take
Expand Down
11 changes: 0 additions & 11 deletions docs/api_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,8 @@ Page Inputs
===========

.. automodule:: web_poet.page_inputs

.. autoclass:: ResponseData
:show-inheritance:
:members:
:undoc-members:
:inherited-members:
:no-special-members:

.. autoclass:: Meta
:show-inheritance:
:members:
:no-special-members:


Pages
=====
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,4 +192,5 @@
intersphinx_mapping = {
'python': ('https://docs.python.org/3', None, ),
'scrapy': ('https://docs.scrapy.org/en/latest', None, ),
'parsel': ('https://parsel.readthedocs.io/en/latest/', None, ),
}
Loading

0 comments on commit f769147

Please sign in to comment.