Converter Fails to Fetch Answer #21

jggouvea · 2020-10-14T02:15:40Z

$ ../software-git/quora-backup/converter.py answers-en answers-en-ready
Found 2503 answers
Filename: 2015-01-18 What-are-some-of-the-worst-baby-names.html
Traceback (most recent call last):
File "../software-git/quora-backup/converter.py", line 216, in
print('[WARNING] Failed to locate answer on page (Source URL was %s)' % url, file=sys.stderr)
NameError: name 'url' is not defined

t3nsor · 2020-10-30T00:50:45Z

The crash bug should be fixed by 1032cbe
If you want me to look into why it failed to locate the answer, then you have to send me the HTML file

InvincibleJuggernaut · 2020-11-18T08:13:14Z

It seems it didn't fix the problem. Actually, the HTML files generated by the crawler are able to fetch only the first few lines of the articles.
I have attached the HTML file below. (GitHub doesn't seem to support .html format, so I have attached a .docx file with the HTML code)
html.docx

t3nsor · 2020-12-29T21:30:17Z

It looks like Quora has changed their page format, so now the answer content is initially loaded in a structured format but JavaScript is required to actually render it as HTML. So the converter in its current form will not work.
I will think about how to address this. I am going to get a copy of my answer archive using the GDPR tool and then see whether there is still a need for the converter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converter Fails to Fetch Answer #21

Converter Fails to Fetch Answer #21

jggouvea commented Oct 14, 2020

t3nsor commented Oct 30, 2020

InvincibleJuggernaut commented Nov 18, 2020

t3nsor commented Dec 29, 2020

Converter Fails to Fetch Answer #21

Converter Fails to Fetch Answer #21

Comments

jggouvea commented Oct 14, 2020

t3nsor commented Oct 30, 2020

InvincibleJuggernaut commented Nov 18, 2020

t3nsor commented Dec 29, 2020