Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter Fails to Fetch Answer #21

Open
jggouvea opened this issue Oct 14, 2020 · 3 comments
Open

Converter Fails to Fetch Answer #21

jggouvea opened this issue Oct 14, 2020 · 3 comments

Comments

@jggouvea
Copy link

$ ../software-git/quora-backup/converter.py answers-en answers-en-ready
Found 2503 answers
Filename: 2015-01-18 What-are-some-of-the-worst-baby-names.html
Traceback (most recent call last):
File "../software-git/quora-backup/converter.py", line 216, in
print('[WARNING] Failed to locate answer on page (Source URL was %s)' % url, file=sys.stderr)
NameError: name 'url' is not defined

@t3nsor
Copy link
Owner

t3nsor commented Oct 30, 2020

The crash bug should be fixed by 1032cbe
If you want me to look into why it failed to locate the answer, then you have to send me the HTML file

@InvincibleJuggernaut
Copy link

It seems it didn't fix the problem. Actually, the HTML files generated by the crawler are able to fetch only the first few lines of the articles.
I have attached the HTML file below. (GitHub doesn't seem to support .html format, so I have attached a .docx file with the HTML code)
html.docx

@t3nsor
Copy link
Owner

t3nsor commented Dec 29, 2020

It looks like Quora has changed their page format, so now the answer content is initially loaded in a structured format but JavaScript is required to actually render it as HTML. So the converter in its current form will not work.
I will think about how to address this. I am going to get a copy of my answer archive using the GDPR tool and then see whether there is still a need for the converter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants