Skip to content
This repository has been archived by the owner on Sep 10, 2021. It is now read-only.

Qri console not sure if the object is NoneType or SoupNode #518

Open
TheWorldEndsWithUs opened this issue Jul 4, 2019 · 3 comments
Open
Labels

Comments

@TheWorldEndsWithUs
Copy link

Platform
How were you accessing the qri frontend?
[ ] app.qri.io
[ ] electron app
[x] dev webapp

Version
What version of the Qri frontend are you using?
0.8.2
Describe the bug
A clear and concise description of what the bug is.
I'm not too sure if it's a bug, or I'm just confused. Based on the output in the console the same object can be determined as a NoneType and a SoupNode.
To Reproduce
Steps to reproduce the behavior:
I posted a gif below to better help shed some light on the issue, but if I change the method after a soup node from body() to contents() the compiler will say one is a NoneType and the other one is a Soup Node.
Expected behavior
Both of the items to be the same

Screenshots
If applicable, add screenshots to help explain your problem.
Jul 4 2019 4_03 PM - Edited

Additional context
Add any other context about the problem here.

@dustmop
Copy link
Contributor

dustmop commented Jul 8, 2019

Hard to say what exactly is going on without knowing the url being scraped and the contents it is returning. Given the class names "_e296pg", "_qgfkoz", "_dwmetq" it's possible that the names are newly generated for each response, which means sometimes "_dwmetq" will exist in the page and sometimes it won't. In the former case it will have the type SoupNode whereas in the later case it will be None.

https://godoc.org/github.com/qri-io/starlib/bsoup
The correct method is "contents", not "body", and it returns the list of children.

@dustmop
Copy link
Contributor

dustmop commented Jul 8, 2019

Just saw the additional information had been posted in our Discord, adding it here so that the issue has the full context. The script being used is at https://pastebin.com/VrqQ7Pwt, and the url in question is https://www.khanacademy.org/science. From some testing, the server is returning different responses, but they all seem to be using the "_dwmetq" class in the html. It's possible that, due to seeing many requests during your script development, the server may have began sending back rate-limited responses that had different bodies.

We need better auditing tools in our http library, such as some way to capture and reuse http responses, or least some way to tell when a server response has changed significantly.

@TheWorldEndsWithUs
Copy link
Author

So I figured out it was my error. The page was inconsistent all the way through like you stated above.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants