New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Segmentation fault accessing attributes #135

Open

wooque opened this issue Oct 21, 2024 · 1 comment

wooque commented Oct 21, 2024

Here is the script that reproduces the crash

import urllib.request

import selectolax

with urllib.request.urlopen(
    "https://rhodes-ltd-339.myshopify.com"
) as response:
    data = response.read()

html = data.decode("utf-8")
parser = selectolax.lexbor.LexborHTMLParser(html)

for elem in parser.head.iter():
    print("tag", elem.tag)
    print("attributes", elem.attributes)

print("done")

It crashes when trying to access attributes of 3rd comment

mxnurmi commented Nov 7, 2024 •

edited

Loading

Commenting to indicate another case where the lexbor causes segmentation fault but modest works:

Causes segmentation fault:

import selectolax
parser = selectolax.lexbor.LexborHTMLParser("")
for node in parser.root.traverse():
    parent = node.parent.attributes.get("anything")

print("done")

Works as expected:

import selectolax
parser = selectolax.parser.HTMLParser("")
for node in parser.root.traverse():
    parent = node.parent.attributes.get("anything")

print("done")

In lexbor the issue seems to be that when generating html elements the parents of those generated elements won't have .attributes in some cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment