-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The futex facility returned an unexpected error code #15
Comments
Rdrview is failing to handle pages that use the Windows-1252 encoding: #15 Just like GB2312, this encoding is only supported via iconv, not directly by libxml2. Since iconv conversions need to read files from disk, initialize the conversion descriptor before setting up the sandbox.
The problem is indeed with the encoding of the document. To handle a new
encoding iconv needs to read files from disk, but this is forbidden by the
security sandbox. My crude solution is to just preread the files for any
encodings we might need, before setting up the sandbox. I've added cp1252 to
that list now, which should fix your problem.
|
The previous patch added support for page 1252 encoding. Include a test for it in our check script, using the page provided by the reporter at #15
Got it.
It did. The errror message is gone. Funny thing. I've been using these "readability" scripts written in python and other scripting languages for years. And I had never thought about the security implications of parsing eventually "foreign evill" html that you "stuff" into my system. Now I remember, there's even a page in the ArchWiki on how to run Firefox "sandboxed" in Firejail. The only other command line program that I fed loads of external foreign html to in my system is Pandoc. But is written in Haskell, and so i hear, should be secure. So, as an end user, not a developer, I'm now beginnig to understand better why a "simple" C program parsing "foreign" html has to be somehow "sandboxed" to shield the system from an attack. But it makes sense.
Who knows what could be put in a html page. This is all "obvious" to you of course. But not for an end user. Could you edit just one or two lines to the README about this security implications ? Just so that end users can some how understand where and why these security related error messages come from. I did notice the "security" section,
but took it for a message for programmers. And who is this "security sandbox" anyway ? Is it a feature of the kernel itself ? Or is it some tightening/tunning made by my linux distro ? Or some big library ? glib ? PS: Found this https://en.wikipedia.org/wiki/Seccomp that you could had to the README |
Real browsers have their own sandboxes like rdrview, but far more complicated. This means that, once in a while, a security researcher will find a way to bypass them. You can setup your system to sandbox them further if you want, but it isn't always practical. Rdrview's sandbox is very simple and tight, so I don't think it's likely to be bypassable, unless there are bugs in the kernel.
A sandbox of some sort is a good idea for code written in any language. It means that you only need to audit a small fraction of my code to confirm that it's not doing anything stupid or malicious; without a sandbox you would have to read the whole thing. C and C++ just have the additional issue of memory bugs, but I'm not sure if the risk is big in practice, for something like rdrview.
For rdrview, yes, but it doesn't hurt to be careful. For high value targets with huge codebases like Firefox or Chrome, the risk is much bigger.
You can't assume that in general: security is usually in the hands of the developer of the program. But it's rare to parse html in C these days. I think some distros are moving towards doing some sandboxing themselves (via AppArmor or the like) but we aren't there yet.
I guess I assumed that most people who are willing to build a command-line tool from source would know this stuff already, or research it on their own.
Seccomp is a service provided by the kernel, that the program needs to setup and request. You can't do sandboxing in a library because it runs with the same privileges as the program, so it could be bypassed easily.
I like to keep the readme short, just quick installation instructions. Most of the usage information is in the man page, which I prefer to keep small too so that users can actually read it and there are no surprises. There is an endless amount of extra information that could be added, about how rdrview works, or about different ways to use it (like the w3m shortcuts you mentioned in the other post). I might add some of that in time, but in the meantime it might be more practical if I just start a wiki here on Github, and you can write your findings yourself for other users. |
Thanks for taking the time for giving this feedback and insights.
Just before you close this issue, and in case you're curious, and so that you can understand why someone who's not a developer is so much interested in this. I've been super proficient in "modern" browsers Firefox/Qutebrowers for years. I can bend their customizations to my needs. Hack user.js and twist them to be keyboard based to my liking. Without touching the GUI or pressing buttons. But, the way the "modern" web seems to be going, I am actually in the process of dumping trashing out completely those "modern" tools, and pushing hard to go back to simple tools. Very inspired also by gemini this year. In my personal case, a heavy command line and unix user, w3m, (elinks, lynx, newsbeuter etc) and others )are not just a gimmick or an ocasional tool for browsing the web. I actually use them daily as my main tools. Command line html parsing is my firefox. It's associates (youtube-dl, tmux, newsbeuter, readabilty,pandoc,mpv,vim,git etc) are my desktop.They suck, parse and consumes thousand of http/html lines daily, coming out from the outside "evil" internet. Daily. One of my raspberry pi's swallows daily thousand of rss and html, parses, trims and filters that soup with rdrview and many others and produces simple epub ebooks that I can read offline later away from the internet. And it's not only for work or serious stuff. Not just reading/dumping "simple" websites for productive/professional use. It's even for fun and time wasters.To do the same other people need a 1000$ desktop mac or an expensive smartphone.. I'm talking about heavy javascript infested stuff like youtube, twitter, facebook, reddit etc. Pure entertainment. With the help from youtube-dl, invidious, nitter.net, teddit.net etc . Many cant just believe that you can also consume than on the command line with tools that were written some 30 or 40 years ago. |
i got error from rdrview The futex facility returned an unexpected error code. how to fix it? please |
@Phantasimay sorry for the long delay, I haven't been paying much attention to rdrview. If you still care about this, I would need to see the url for the page that's giving you trouble. |
For me, I get the response for every website, including the example of this GH in the README:
|
The rdrview sandbox has been broken again for more than a year: #15 (comment) I can reproduce this myself since my last Ubuntu upgrade. The culprit this time is the futex() syscall, which should have been obvious given the error message. It seems harmless, so add it to the whitelist. Take this chance to also allow fstat(). It's not really needed, but it does get called for some reason, so allowing it makes the strace output cleaner.
@cameronj86 Sorry for the delay, I just made a release that should fix this. |
Got it up and running, thanks! (Did additionally have to install |
I get the message
The futex facility returned an unexpected error code
on pages like this, http://www.softpanorama.org/Admin/Monitoring/sar.shtml
Using the flag "--disable-sandbox" , rdrview does work and does the job.
The only "abnormal" thing I notice is that this page is served through http not https, it is .shtml not .html
and it's document charset is cp1252 Latin1, not Unicode.
Is this expected ?
The text was updated successfully, but these errors were encountered: