Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bsoup needs lots of improvements #155

Open
dustmop opened this issue Nov 30, 2021 · 1 comment
Open

bsoup needs lots of improvements #155

dustmop opened this issue Nov 30, 2021 · 1 comment
Assignees

Comments

@dustmop
Copy link
Contributor

dustmop commented Nov 30, 2021

Bsoup was written a long time ago, based mainly on the reference python implementation, without too much regard for how easy it would be to use by developers. It was also built when starlark was much younger, and was missing key features such as arbitrary attribute support.

Some things that should be fixed:

  • printing nodes should work. Perhaps they could display as an expandable tree
  • contents() returns weird results
  • get_text is not recursive
  • no method to get tag name
  • parent.div should work, returning a div child node of parent. child() would be unnecessary then
  • parseHtml -> bsoup() rename

Also the docs need lots of work.

@dustmop dustmop self-assigned this Nov 30, 2021
@GeoffBarrett
Copy link

This issue is old, but I am also seeing issues with get_text(). I am calling it on the entire page contents and it returns an empty string, I am assuming due to the lack of recursion described in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants