Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested upgrade to the SerpApiClient.get_html() method #69

Merged
merged 6 commits into from
Jun 19, 2024

Conversation

gutoarraes
Copy link
Contributor

My suggestion to fix the open issue 66.

Since the returned object from requests.get isn't necessarily an HTML, I approached this by accessing the JSON returned from get_dictionary and performing another requests.get on the raw_html_file.

Also added an assertion in test_google_search.py to confirm there is a <html> tag present in the returned data.

@gutoarraes
Copy link
Contributor Author

app

  • Current get_html method returns:

before

  • Result with suggested changes:

after

I'm not sure if the <html> tag is a definitive requirement for all HTML pages. Therefore, the assertion I added to tests/test_google_search.py may not be appropriate. Appreciate if anyone can confirm/deny.

@gutoarraes gutoarraes marked this pull request as ready for review June 15, 2024 00:38
@hartator
Copy link
Collaborator

@gutoarraes Sorry about that!

You should be able to grad directly the HTML from just the ID. No need to download the JSON first. The URL should be always structured like this: https://serpapi.com/searches/5b50d58a304bda2fca30bac9.json?api_key=f0776d63edb8dd607154f126eb2c07049ef4bc06caf6f621d5065be2458a607d.

Do you want to update your PR in this way or should we do it? Thank you!

@gutoarraes
Copy link
Contributor Author

Hey @hartator, thanks for clarifying! If you wouldn't mind I'd like to give it a crack.

@gutoarraes
Copy link
Contributor Author

I was cracking my head a bit about this just to realize I had already changed the PR but didn't update the description here. I removed the accessing of the JSON and simply appendended .html to the path which yields the same result.

Let me know if there is a better approach, but I believe this may be what you're talking about.

Copy link
Contributor

@ilyazub ilyazub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gutoarraes Thank you for your help!

Let's fix the test for get_html.

serpapi/serp_api_client.py Outdated Show resolved Hide resolved
tests/test_google_search.py Outdated Show resolved Hide resolved
tests/test_google_search.py Show resolved Hide resolved
Copy link
Contributor Author

@gutoarraes gutoarraes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ilyazub for the suggestions. I've applied everything and it's now ready for review.

Will start looking for the my next contribution. Cheers!

Copy link
Contributor

@ilyazub ilyazub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works and looks good. Thank you, @gutoarraes.

@ilyazub ilyazub merged commit 264be6d into serpapi:master Jun 19, 2024
@gutoarraes gutoarraes deleted the augusto-get-html branch June 20, 2024 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants