Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The futex facility returned an unexpected error code #15

Open
m040601 opened this issue Jan 25, 2021 · 9 comments
Open

The futex facility returned an unexpected error code #15

m040601 opened this issue Jan 25, 2021 · 9 comments

Comments

@m040601
Copy link

m040601 commented Jan 25, 2021

I get the message

The futex facility returned an unexpected error code

on pages like this, http://www.softpanorama.org/Admin/Monitoring/sar.shtml

Using the flag "--disable-sandbox" , rdrview does work and does the job.

The only "abnormal" thing I notice is that this page is served through http not https, it is .shtml not .html
and it's document charset is cp1252 Latin1, not Unicode.

Is this expected ?

eafer added a commit that referenced this issue Jan 25, 2021
Rdrview is failing to handle pages that use the Windows-1252 encoding:

  #15

Just like GB2312, this encoding is only supported via iconv, not
directly by libxml2. Since iconv conversions need to read files from
disk, initialize the conversion descriptor before setting up the
sandbox.
@eafer
Copy link
Owner

eafer commented Jan 25, 2021 via email

eafer added a commit that referenced this issue Jan 26, 2021
The previous patch added support for page 1252 encoding. Include a test
for it in our check script, using the page provided by the reporter at

  #15
@m040601
Copy link
Author

m040601 commented Jan 26, 2021

The problem is indeed with the encoding of the document.

Got it.

...which should fix your problem

It did. The errror message is gone.

Funny thing. I've been using these "readability" scripts written in python and other scripting languages for years.

And I had never thought about the security implications of parsing eventually "foreign evill" html that you "stuff" into my system.
But that's what "real" browsers have to do all the time right ?

Now I remember, there's even a page in the ArchWiki on how to run Firefox "sandboxed" in Firejail.

The only other command line program that I fed loads of external foreign html to in my system is Pandoc. But is written in Haskell, and so i hear, should be secure.

So, as an end user, not a developer, I'm now beginnig to understand better why a "simple" C program parsing "foreign" html has to be somehow "sandboxed" to shield the system from an attack.

But it makes sense.
Just like you should never do

curl https://github.com/some_script.sh | sh

Who knows what could be put in a html page.
But this is a very improbable vector attack right ?
Modern linux systems shield me as an end user from C programs parsing "crazy" html right ?

This is all "obvious" to you of course. But not for an end user.

Could you edit just one or two lines to the README about this security implications ? Just so that end users can some how understand where and why these security related error messages come from.

I did notice the "security" section,

This tool is young and written in C, so it's reasonable to wonder about the potential for memory issues. To be safe, all HTML parsing happens inside a sandboxed subprocess. Seccomp is used for this purpose on Linux, Pledge on OpenBSD, and Capsicum on FreeBSD.

but took it for a message for programmers.

And who is this "security sandbox" anyway ? Is it a feature of the kernel itself ? Or is it some tightening/tunning made by my linux distro ? Or some big library ? glib ?

PS: Found this

https://en.wikipedia.org/wiki/Seccomp

that you could had to the README

@eafer
Copy link
Owner

eafer commented Jan 26, 2021

And I had never thought about the security implications of parsing eventually "foreign evill" html that you "stuff" into my system.
But that's what "real" browsers have to do all the time right ?

Real browsers have their own sandboxes like rdrview, but far more complicated. This means that, once in a while, a security researcher will find a way to bypass them. You can setup your system to sandbox them further if you want, but it isn't always practical.

Rdrview's sandbox is very simple and tight, so I don't think it's likely to be bypassable, unless there are bugs in the kernel.

So, as an end user, not a developer, I'm now beginnig to understand better why a "simple" C program parsing "foreign" html has to be somehow "sandboxed" to shield the system from an attack.

A sandbox of some sort is a good idea for code written in any language. It means that you only need to audit a small fraction of my code to confirm that it's not doing anything stupid or malicious; without a sandbox you would have to read the whole thing. C and C++ just have the additional issue of memory bugs, but I'm not sure if the risk is big in practice, for something like rdrview.

But this is a very improbable vector attack right ?

For rdrview, yes, but it doesn't hurt to be careful. For high value targets with huge codebases like Firefox or Chrome, the risk is much bigger.

Modern linux systems shield me as an end user from C programs parsing "crazy" html right ?

You can't assume that in general: security is usually in the hands of the developer of the program. But it's rare to parse html in C these days. I think some distros are moving towards doing some sandboxing themselves (via AppArmor or the like) but we aren't there yet.

This is all "obvious" to you of course. But not for an end user.

I guess I assumed that most people who are willing to build a command-line tool from source would know this stuff already, or research it on their own.

And who is this "security sandbox" anyway ? Is it a feature of the kernel itself ? Or is it some tightening/tunning made by my linux distro ? Or some big library ? glib ?

Seccomp is a service provided by the kernel, that the program needs to setup and request. You can't do sandboxing in a library because it runs with the same privileges as the program, so it could be bypassed easily.

that you could had to the README

I like to keep the readme short, just quick installation instructions. Most of the usage information is in the man page, which I prefer to keep small too so that users can actually read it and there are no surprises. There is an endless amount of extra information that could be added, about how rdrview works, or about different ways to use it (like the w3m shortcuts you mentioned in the other post). I might add some of that in time, but in the meantime it might be more practical if I just start a wiki here on Github, and you can write your findings yourself for other users.

@m040601
Copy link
Author

m040601 commented Jan 27, 2021

Thanks for taking the time for giving this feedback and insights.
I learned more with it today, than in years of browsing with command line apps.

... But it's rare to parse html in C these days....
.... I guess I assumed that most people who are willing to build a command-line tool from source would know this stuff already, or research it on their own....

Just before you close this issue, and in case you're curious, and so that you can understand why someone who's not a developer is so much interested in this.

I've been super proficient in "modern" browsers Firefox/Qutebrowers for years. I can bend their customizations to my needs. Hack user.js and twist them to be keyboard based to my liking. Without touching the GUI or pressing buttons.

But, the way the "modern" web seems to be going, I am actually in the process of dumping trashing out completely those "modern" tools, and pushing hard to go back to simple tools. Very inspired also by gemini this year.

In my personal case, a heavy command line and unix user, w3m, (elinks, lynx, newsbeuter etc) and others )are not just a gimmick or an ocasional tool for browsing the web. I actually use them daily as my main tools.

Command line html parsing is my firefox.

It's associates (youtube-dl, tmux, newsbeuter, readabilty,pandoc,mpv,vim,git etc) are my desktop.They suck, parse and consumes thousand of http/html lines daily, coming out from the outside "evil" internet. Daily.

One of my raspberry pi's swallows daily thousand of rss and html, parses, trims and filters that soup with rdrview and many others and produces simple epub ebooks that I can read offline later away from the internet.

And it's not only for work or serious stuff. Not just reading/dumping "simple" websites for productive/professional use.

It's even for fun and time wasters.To do the same other people need a 1000$ desktop mac or an expensive smartphone..

I'm talking about heavy javascript infested stuff like youtube, twitter, facebook, reddit etc. Pure entertainment. With the help from youtube-dl, invidious, nitter.net, teddit.net etc . Many cant just believe that you can also consume than on the command line with tools that were written some 30 or 40 years ago.

eafer pushed a commit that referenced this issue Sep 12, 2021
Mirrors 1a7b95a that relates to #15 but for koi8-r.
@Phantasimay
Copy link

i got error from rdrview

The futex facility returned an unexpected error code.

how to fix it? please

@eafer
Copy link
Owner

eafer commented Mar 2, 2024

@Phantasimay sorry for the long delay, I haven't been paying much attention to rdrview. If you still care about this, I would need to see the url for the page that's giving you trouble.

@cameronj86
Copy link

@Phantasimay sorry for the long delay, I haven't been paying much attention to rdrview. If you still care about this, I would need to see the url for the page that's giving you trouble.

For me, I get the response for every website, including the example of this GH in the README:

./rdrview 'https://github.com/eafer/rdrview' The futex facility returned an unexpected error code.

eafer added a commit that referenced this issue Oct 28, 2024
The rdrview sandbox has been broken again for more than a year:

  #15 (comment)

I can reproduce this myself since my last Ubuntu upgrade. The culprit
this time is the futex() syscall, which should have been obvious given
the error message. It seems harmless, so add it to the whitelist.

Take this chance to also allow fstat(). It's not really needed, but it
does get called for some reason, so allowing it makes the strace output
cleaner.
@eafer
Copy link
Owner

eafer commented Oct 28, 2024

@cameronj86 Sorry for the delay, I just made a release that should fix this.

@cameronj86
Copy link

Got it up and running, thanks!

(Did additionally have to install libseccomp-dev and libxml2-dev fyi)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants