Suggested upgrade to the SerpApiClient.get_html() method #69

gutoarraes · 2024-06-14T05:19:37Z

My suggestion to fix the open issue 66.

Since the returned object from requests.get isn't necessarily an HTML, I approached this by accessing the JSON returned from get_dictionary and performing another requests.get on the raw_html_file.

Also added an assertion in test_google_search.py to confirm there is a <html> tag present in the returned data.

…le-search-results-python into augusto-get-html

gutoarraes · 2024-06-15T00:38:14Z

Current get_html method returns:

Result with suggested changes:

I'm not sure if the <html> tag is a definitive requirement for all HTML pages. Therefore, the assertion I added to tests/test_google_search.py may not be appropriate. Appreciate if anyone can confirm/deny.

hartator · 2024-06-17T18:54:56Z

@gutoarraes Sorry about that!

You should be able to grad directly the HTML from just the ID. No need to download the JSON first. The URL should be always structured like this: https://serpapi.com/searches/5b50d58a304bda2fca30bac9.json?api_key=f0776d63edb8dd607154f126eb2c07049ef4bc06caf6f621d5065be2458a607d.

Do you want to update your PR in this way or should we do it? Thank you!

gutoarraes · 2024-06-18T01:31:54Z

Hey @hartator, thanks for clarifying! If you wouldn't mind I'd like to give it a crack.

hartator · 2024-06-18T03:52:49Z

Cool, thank you!

Meant https://serpapi.com/searches/5b50d58a304bda2fca30bac9.html?api_key=f0776d63edb8dd607154f126eb2c07049ef4bc06caf6f621d5065be2458a607d instead of https://serpapi.com/searches/5b50d58a304bda2fca30bac9.json?api_key=f0776d63edb8dd607154f126eb2c07049ef4bc06caf6f621d5065be2458a607d in my earlier comment of course.

gutoarraes · 2024-06-18T12:28:19Z

I was cracking my head a bit about this just to realize I had already changed the PR but didn't update the description here. I removed the accessing of the JSON and simply appendended .html to the path which yields the same result.

Let me know if there is a better approach, but I believe this may be what you're talking about.

ilyazub

@gutoarraes Thank you for your help!

Let's fix the test for get_html.

serpapi/serp_api_client.py

tests/test_google_search.py

gutoarraes

Thank you @ilyazub for the suggestions. I've applied everything and it's now ready for review.

Will start looking for the my next contribution. Cheers!

ilyazub

Works and looks good. Thank you, @gutoarraes.

gutoarraes added 4 commits June 14, 2024 00:58

get_html() returns html of page

59d86b2

added assertion to get_html() test

fb5659d

added assertion to get_html() test

299d53b

Merge branch 'augusto-get-html' of https://github.com/gutoarraes/goog…

240dfa1

…le-search-results-python into augusto-get-html

gutoarraes marked this pull request as ready for review June 15, 2024 00:38

ilyazub reviewed Jun 18, 2024

View reviewed changes

serpapi/serp_api_client.py Outdated Show resolved Hide resolved

tests/test_google_search.py Outdated Show resolved Hide resolved

tests/test_google_search.py Show resolved Hide resolved

gutoarraes added 2 commits June 18, 2024 19:07

adds html as output in the params_dict of get_html

3db723f

broaden get_html test

6816b81

gutoarraes force-pushed the augusto-get-html branch from 643c415 to 6816b81 Compare June 18, 2024 23:19

gutoarraes commented Jun 18, 2024

View reviewed changes

ilyazub approved these changes Jun 19, 2024

View reviewed changes

ilyazub merged commit 264be6d into serpapi:master Jun 19, 2024

gutoarraes deleted the augusto-get-html branch June 20, 2024 00:39

ilyazub mentioned this pull request Jun 28, 2024

get_html() Returns JSON Instead of HTML #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested upgrade to the SerpApiClient.get_html() method #69

Suggested upgrade to the SerpApiClient.get_html() method #69

gutoarraes commented Jun 14, 2024

gutoarraes commented Jun 15, 2024

hartator commented Jun 17, 2024

gutoarraes commented Jun 18, 2024

hartator commented Jun 18, 2024

gutoarraes commented Jun 18, 2024

ilyazub left a comment •

edited

Loading

gutoarraes left a comment

ilyazub left a comment

Suggested upgrade to the SerpApiClient.get_html() method #69

Suggested upgrade to the SerpApiClient.get_html() method #69

Conversation

gutoarraes commented Jun 14, 2024

gutoarraes commented Jun 15, 2024

hartator commented Jun 17, 2024

gutoarraes commented Jun 18, 2024

hartator commented Jun 18, 2024

gutoarraes commented Jun 18, 2024

ilyazub left a comment • edited Loading

Choose a reason for hiding this comment

gutoarraes left a comment

Choose a reason for hiding this comment

ilyazub left a comment

Choose a reason for hiding this comment

ilyazub left a comment •

edited

Loading