Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault accessing attributes #135

Open
wooque opened this issue Oct 21, 2024 · 1 comment
Open

Segmentation fault accessing attributes #135

wooque opened this issue Oct 21, 2024 · 1 comment

Comments

@wooque
Copy link

wooque commented Oct 21, 2024

Here is the script that reproduces the crash

import urllib.request

import selectolax

with urllib.request.urlopen(
    "https://rhodes-ltd-339.myshopify.com"
) as response:
    data = response.read()

html = data.decode("utf-8")
parser = selectolax.lexbor.LexborHTMLParser(html)

for elem in parser.head.iter():
    print("tag", elem.tag)
    print("attributes", elem.attributes)

print("done")

It crashes when trying to access attributes of 3rd comment

@mxnurmi
Copy link

mxnurmi commented Nov 7, 2024

Commenting to indicate another case where the lexbor causes segmentation fault but modest works:

Causes segmentation fault:

import selectolax
parser = selectolax.lexbor.LexborHTMLParser("")
for node in parser.root.traverse():
    parent = node.parent.attributes.get("anything")

print("done")

Works as expected:

import selectolax
parser = selectolax.parser.HTMLParser("")
for node in parser.root.traverse():
    parent = node.parent.attributes.get("anything")

print("done")

In lexbor the issue seems to be that when generating html elements the parents of those generated elements won't have .attributes in some cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants