Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.12.2] Regression in 1.8.4 related to OpenOS exception handling #3727

Open
RealityAnomaly opened this issue Jul 20, 2024 · 10 comments
Open
Labels
help needed Seeking help from the community - reproduction notes, debugging, etc.

Comments

@RealityAnomaly
Copy link

In 1.8.4, when I cause an exception to be thrown in OpenOS, occasionally the console will be spammed with text, which boils down to the following error: /lib/process.lua:63: in function </lib/process.lua:59>attempt to yield across a C-call boundary.

image

This does not happen in 1.8.3, but it requires a fresh install of OpenOS, for unknown reasons. This commit 9d4f7ea looks potentially suspicious as a cause for the problem as it was introduced in 1.8.4.

@ff66theone
Copy link
Contributor

I agree with you. It is certainly caused by resolving GHSA-54j4-xpgj-cq4g, after all it is something solved in OC 1.8.4 and it affected also pcall / xpcall. Knowing these functions are used for error handling, it wouldn't surprise me if it was the cause, but the problem is that it resolved a security issue allowing to DDoS a server by throwing an exception...

@ff66theone
Copy link
Contributor

And also, can you tell me a bit more about the circumstances (MC version, progamm running...), so I can attempt to reproduce ?

@asiekierka asiekierka added the help needed Seeking help from the community - reproduction notes, debugging, etc. label Jan 5, 2025
@asiekierka
Copy link
Contributor

I'm going to need a reproduction case.

@kcinnaJlol
Copy link

the following (as a standalone script) reproduces a similar issue, however here the error loop is caused by a yield in a call to gpu.copy, not gpu.set (as in the original error)
additionally, the error loop stops after a while

local function explode()
  error(("\n"):rep(100))
end

local magicTable = setmetatable({}, {
    __tostring = explode,
})
tostring(magicTable)

@asiekierka
Copy link
Contributor

image

It appears that this isn't strictly speaking a regression; the same "C-call yielding" crash occurs in 1.8.3 and below, but these did not implement xpcall() according to Lua's specification (which allows the error handler to be called recursively). This is more akin to a bugfix which uncovered a bug.

@JasonS05
Copy link

JasonS05 commented Jan 18, 2025

Seems this still isn't completely resolved. a46e77b causes a regression of its own. Specifically, the error message is lost since msg on line 79 of /lib/process.lua is not defined anywhere, since the original definition as a function argument on line 67 is now in a different scope. To fix this, the following changes must be made:

  • change line 71 to table.pack(debug.traceback(), msg)
  • on line 78 replace result[2] with result[2][1]
  • on line 79 replace msg with result[2][2]

Since lines 87 and 89 use result, the value of which is modified by my change to line 71, there's a chance this fix will cause another regression. I simply don't know enough about OpenOS to say one way or the other, and don't have the energy right now to find out. Examining the code from prior to a46e77b suggests that in case of an error, result[2] should be 128, so adding result[2] = 128 between lines 83 and 84 may also be in order.

For sanitation purposes, I also suggest that line 69 be somehow modified to ensure that it necessarily returns a number. Otherwise code like error({reason="terminated", code="this is a string, not a number"}) will cause problems. The simplest fix is probably to just modify the condition on line 68 by adding and type(msg.code) == "number" to the end.

@JasonS05
Copy link

JasonS05 commented Jan 20, 2025

Ok, I found another issue. My fix doesn't address what happens in an out of memory error. To properly report what happens, it appears better to fix it a different way, not like I described above. Replace lines 60 to 84 of the code, which in 1.8.7 looks like:

    -- pcall code so that we can remove it from the process list on exit
    local result =
    {
      xpcall(function(...)
          init = init or function(...) return ... end
          return code(init(...))
        end,
        function(msg)
          if type(msg) == "table" and msg.reason == "terminated" then
            return msg.code or 0
          end
          return debug.traceback()
        end, ...)
    }

    if not result[1] and type(result[2]) ~= "number" then
      -- run exception handler
      xpcall(function()
          local stack = result[2]:gsub("^([^\n]*\n)[^\n]*\n[^\n]*\n","%1")
          io.stderr:write(string.format("%s:\n%s", msg or "", stack))
        end,
        function(msg)
          io.stderr:write("process library exception handler crashed: ", tostring(msg))
        end)
    end

with this version:

    -- pcall code so that we can remove it from the process list on exit
    local traceback = "" -- <-- additional line
    local result =
    {
      xpcall(function(...)
          init = init or function(...) return ... end
          return code(init(...))
        end,
        function(msg)
          if type(msg) == "table" and msg.reason == "terminated" then
            return msg.code or 0
          end
          traceback = debug.traceback() -- <-- changed "return" to "traceback ="
          return msg -- <-- additional line
        end, ...)
    }

    if not result[1] and type(result[2]) ~= "number" then
      -- run exception handler
      xpcall(function()
          local stack = traceback:gsub("^([^\n]*\n)[^\n]*\n[^\n]*\n","%1") -- <-- changed "result[2]" to "traceback"
          io.stderr:write(string.format("%s:\n%s", result[2] or "", stack)) -- <-- changed "msg" to "result[2]"
        end,
        function(msg)
          io.stderr:write("process library exception handler crashed: ", tostring(msg))
        end)

      result[2] = 128 -- <-- additional line
    end

If I'm not mistaken, this should return behavior to 1.8.6 in all important situations, while retaining the fix for the issue from 1.8.4.

@asiekierka asiekierka reopened this Jan 20, 2025
@asiekierka
Copy link
Contributor

Do you have test scripts to reproduce the bugs in behaviour, so it doesn't happen again?

@JasonS05
Copy link

JasonS05 commented Jan 20, 2025

Here:

error("err")

Intended output:

/home/test.lua:1: err:
stack traceback:
        <traceback>

with exit code 128.

Actual output (1.8.7):

:
stack traceback:
        <traceback>

with exit code 2.

The other two cases that need to be handled correctly are os.exit(<number>), which should produce the right exit code, and out of memory, which should have any output that indicates that OOM happened. These two are in fact satisfied in 1.8.7, but my original fix broke OOM reporting (but did not affect os.exit(<number>)). If I'm not mistaken, my second fix handles all three cases correctly.

On a technical level, when OOM happens, the original xpcall that executed the program returns false, "out of memory" without invoking the callback function. With my original fix, this crashed the exception handler without causing the string "out of memory" to make its way to the user.

@JasonS05
Copy link

JasonS05 commented Jan 20, 2025

Ok, I don't know if this is related to this thread or not, but I found a possible error handling bug with some really cursed code. Here's the output:

Image

And here's the code that produced it (in test.lua):

local x = 0
local y = 0

xpcall(error, function()
  x = x + 1
  y = 0
  xpcall(error, function()
    y = y + 1
    
    error()
  end)
  print("x: ", x)
  --print("y: ", y)
  error()
end)

print(x)
print(y)

The code does have a lot of extraneous stuff but the output seems very sensitive to normally inconsequential details so I'm hesitant to remove the fluff.

In order to make the error, it needs to be run like so:

./test.lua | less

And even like this it only produces the error maybe 1/2 to 1/3 of the time.

Note that this result is produced both with my second fix and on a fresh install of OpenOS 1.8.7.

Update: after running it a bunch more times it randomly added attempt to yield across a C-call boundary to the end of the error message:

Image

Update 2: using this instead reliably produces the problem 100% of the time:

./test.lua | lua

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help needed Seeking help from the community - reproduction notes, debugging, etc.
Projects
None yet
Development

No branches or pull requests

5 participants