Merge branch 'master' into additional-requests

Adjust the code in line with the refactoring of ResponseData into HttpResopnse in this PR: #30
scrapinghub · Mar 28, 2022 · f769147 · f769147
2 parents 396ab8e + 256a0c3
commit f769147
Show file tree

Hide file tree

Showing 21 changed files with 692 additions and 200 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -11,6 +11,13 @@ TBR
   ``web_poet.HttpClient``.
 * Introduced ``web_poet.Meta`` to pass arbitrary information
   inside a Page Object.
+* removed support for Python 3.6
+* added support for Python 3.10
+* Backward Incompatible Change:
+
+    * ``ResponseData`` is now ``HttpResponse`` which has a new
+      specific attribute types like ``HttpResponseBody`` and
+      ``HttpResponseHeaders``.
 
 
 0.1.1 (2021-06-02)

diff --git a/docs/advanced/additional_requests.rst b/docs/advanced/additional_requests.rst
@@ -53,24 +53,18 @@ A simple ``GET`` request
             }
 
             # Simulates clicking on a button that says "View All Images"
-            response: web_poet.ResponseData = await self.http_client.get(
+            response: web_poet.HttpResponse = await self.http_client.get(
                 f"https://api.example.com/v2/images?id={item['product_id']}"
             )
-            page = web_poet.WebPage(response)
-
-            item["images"] = page.css(".product-images img::attr(src)").getall()
+            item["images"] = response.css(".product-images img::attr(src)").getall()
             return item
 
 There are a few things to take note in this example:
 
     * A ``GET`` request can be done via :class:`~.HttpClient`'s
       :meth:`~.HttpClient.get` method.
     * We're now using the ``async/await`` syntax.
-    * The response is of type :class:`~.ResponseData`.
-
-        * Though in order to use :meth:`~.ResponseShortcutsMixin.css`
-          `(and other shortcut methods)` we'll need to feed it into
-          :class:`~.WebPage`.
+    * The response is of type :class:`~.HttpResponse`.
 
 As the example suggests, we're performing an additional request that allows us
 to extract more images in a product page that might not otherwise be possible.
@@ -110,7 +104,7 @@ Thus, additional requests inside the Page Object is typically needed for it:
             }
 
             # Simulates "scrolling" through a carousel that loads related product items
-            response: web_poet.responseData = await self.http_client.post(
+            response: web_poet.HttpResponse = await self.http_client.post(
                 url="https://www.api.example.com/related-products/",
                 headers={
                     'Host': 'www.example.com',
@@ -123,15 +117,12 @@ Thus, additional requests inside the Page Object is typically needed for it:
                     }
                 ),
             )
-            second_page = web_poet.WebPage(response)
-
-            related_product_ids = self.parse_related_product_ids(second_page)
-            item["related_product_ids"] = related_product_ids
+            item["related_product_ids"] = self.parse_related_product_ids(response)
             return item
 
         @staticmethod
-        def parse_related_product_ids(page: web_poet.WebPage) -> List[str]:
-            return page.css("#main .related-products ::attr(product-id)").getall()
+        def parse_related_product_ids(response: web_poet.HttpResponse) -> List[str]:
+            return response.css("#main .related-products ::attr(product-id)").getall()
 
 Here's the key takeaway in this example:
 
@@ -171,12 +162,11 @@ Let's modify the example in the previous section to see how it can be done:
                 self.create_request(page_num=page_num)
                 for page_num in range(2, default_pagination_limit)
             ]
-            responses: List[web_poet.ResponseData] = await self.http_client.batch_requests(*requests)
-            pages = map(web_poet.WebPage, responses)
+            responses: List[web_poet.HttpResponse] = await self.http_client.batch_requests(*requests)
             related_product_ids = [
                 product_id
-                for page in pages
-                for product_id in self.parse_related_product_ids(page)
+                for response in responses
+                for product_id in self.parse_related_product_ids(response)
             ]
 
             item["related_product_ids"].extend(related_product_ids)
@@ -200,17 +190,17 @@ Let's modify the example in the previous section to see how it can be done:
             )
 
         @staticmethod
-        def parse_related_product_ids(page: web_poet.WebPage) -> List[str]:
-            return page.css("#main .related-products ::attr(product-id)").getall()
+        def parse_related_product_ids(response: web_poet.HttpResponse) -> List[str]:
+            return response.css("#main .related-products ::attr(product-id)").getall()
 
 The key takeaways for this example are:
 
     * A :class:`~.Request` can be instantiated to represent a Generic HTTP Request.
       It only contains the HTTP Request information for now and isn't executed yet.
       This is useful for creating factory methods to help create them without any
       download execution at all.
-    * :class:`~.HttpClient`' has a :meth:`~.HttpClient.batch_requests` method that
-      can process a series of :class:`~.Request` instances.
+    * :class:`~.HttpClient` has a :meth:`~.HttpClient.batch_requests` method that
+      can process a list of :class:`~.Request` instances asynchronously together.
 
         * Note that it can accept different types of :class:`~.Request` that might
           not be related *(e.g. a mixture of* ``GET`` *and* ``POST`` *requests)*.
@@ -246,7 +236,7 @@ This can be set using:
 
 .. code-block:: python
 
-    def request_implementation(r: web_poet.Request) -> web_poet.ResponseData:
+    def request_implementation(r: web_poet.Request) -> web_poet.HttpResponse:
         ...
 
     from web_poet import request_backend_var
@@ -273,7 +263,7 @@ an :class:`~.HttpClient` instance:
 
 .. code-block:: python
 
-    def request_implementation(r: web_poet.Request) -> web_poet.ResponseData:
+    def request_implementation(r: web_poet.Request) -> web_poet.HttpResponse:
         ...
 
     from web_poet import HttpClient

diff --git a/docs/advanced/meta.rst b/docs/advanced/meta.rst
@@ -10,11 +10,11 @@ data entirely depending on the needs of the developer.
 
 If you can recall from the previous basic tutorials, one essential requirement of
 Page Objects that inherit from :class:`~.WebPage` or :class:`~.ItemWebPage` would
-be :class:`~.ResponseData`. This holds the HTTP response information that the
+be :class:`~.HttpResponse`. This holds the HTTP response information that the
 Page Object is trying to represent.
 
 In order to standardize how to pass arbitrary information inside Page Objects,
-we'll need to use :class:`~.Meta` similar on how we use :class:`~.ResponseData`
+we'll need to use :class:`~.Meta` similar on how we use :class:`~.HttpResponse`
 as a requirement to instantiate Page Objects:
 
 .. code-block:: python
@@ -24,15 +24,15 @@ as a requirement to instantiate Page Objects:
 
     @attr.define
     class SomePage(web_poet.ItemWebPage):
-        # ResponseData is inherited from ItemWebPage
+        # The HttpResponse attribute is inherited from ItemWebPage
         meta: web_poet.Meta
 
-    response = web_poet.ResponseData(...)
+    response = web_poet.HttpResponse(...)
     meta = web_poet.Meta("arbitrary_value": 1234, "cool": True)
 
     page = SomePage(response=response, meta=meta)
 
-However, similar with :class:`~.ResponseData`, developers using :class:`~.Meta`
+However, similar with :class:`~.HttpResponse`, developers using :class:`~.Meta`
 shouldn't care about how they are being passed into Page Objects. This will
 depend on the framework that would use **web-poet**.
 
@@ -109,11 +109,10 @@ Let's try an example wherein :class:`~.Meta` is able to control how
                 for page_num in range(2, max_pages + 1)
             ]
             responses = await http_client.batch_requests(*requests)
-            pages = [self] + list(map(web_poet.WebPage, responses))
             return [
                 product_url
-                for page in pages
-                for product_url in self.parse_product_urls(page)
+                for response in responses
+                for product_url in self.parse_product_urls(response)
             ]
 
         @staticmethod
@@ -122,8 +121,8 @@ Let's try an example wherein :class:`~.Meta` is able to control how
             return web_poet.Request(url=next_page_url)
 
         @staticmethod
-        def parse_product_urls(page):
-            return page.css("#main .products a.link ::attr(href)").getall()
+        def parse_product_urls(response: web_poet.HttpResponse):
+            return response.css("#main .products a.link ::attr(href)").getall()
 
 From the example above, we can see how :class:`~.Meta` is able to arbitrarily
 limit the pagination behavior by passing an optional **max_pages** info. Take

diff --git a/docs/api_reference.rst b/docs/api_reference.rst
@@ -6,19 +6,8 @@ Page Inputs
 ===========
 
 .. automodule:: web_poet.page_inputs
-
-.. autoclass:: ResponseData
-   :show-inheritance:
    :members:
    :undoc-members:
-   :inherited-members:
-   :no-special-members:
-
-.. autoclass:: Meta
-   :show-inheritance:
-   :members:
-   :no-special-members:
-
 
 Pages
 =====

diff --git a/docs/conf.py b/docs/conf.py
@@ -192,4 +192,5 @@
 intersphinx_mapping = {
     'python': ('https://docs.python.org/3', None, ),
     'scrapy': ('https://docs.scrapy.org/en/latest', None, ),
+    'parsel': ('https://parsel.readthedocs.io/en/latest/', None, ),
 }