Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CVE-2023-36250] Arbitrary code exection possible through CSV import #750

Open
matthijskooijman opened this issue Dec 24, 2023 · 12 comments

Comments

@matthijskooijman
Copy link
Member

At Debian, someone noticed that a CVE report exists. It's a bit vague and I'm a bit surprised that no report was opened about it by the one who created the CVE in the first place, but it seems something to check.

Here is the original report for which the CVE was created: https://github.com/BrunoTeixeira1996/CVE-2023-36250/blob/main/README.md

I haven't looked at the code yet, but it seems the problem is the CSV/TSV parser interprets formulas in the import, and some of these formulas could potentially do more than just math. This sounds like it might really a problem in the library we use for parsing, or maybe a configuration problem in our code with that library.

Anyway, no more time now, but I wanted to at least have an issue to track this.

@GeraldJansen
Copy link
Contributor

GeraldJansen commented Dec 27, 2023

Hamster only has a TSV exporter and no TSV importer. It is not Hamster itself that is executing the example formula in the CVE report, but rather some spreadsheet program which gets automatically called to open the saved TSV file. The problem, if any, would seem to be the call to gtk.show_uri() in the method on_report_chosen in overview.py. We could simply remove that call and avoid automatically opening the saved TSV file.

Personally I find the vulnerability report rather far-fetched. Note that it only occurs when an interactive user chooses the TSV export format from the GUI, not when hamster export tsv is used from the command line. Given that there are few Hamster users and only a fraction of them would ever use a tsv export, it seems extremely unlikely that any malicious hacker would ever look our way.

@matthijskooijman
Copy link
Member Author

Seems you are indeed right, looking more closely the original report shows a TSV export being opened in some external spreadsheet program. So if this problem actually exists, it would exist in the external program, not hamster. And I expect spreadsheet problems will interpret innocent formulas like 3+3, and maybe render hyperlinks, but not allow executing arbitrary commands.

To confirm, I exported the following activity with "=3+3" in the activity field:

image

Which opens up in gnumeric, which still shows the original formula in the field:

image

So hamster is indeed not interpreting anything.

So I think no action required on our part.

I'll file an issue with the CVE repository to see if the original reporter has any additional comments, and see if we can get the CVE report marked as invalid or something like that. Let's wait for a response for a few days, and close this issue next week or so.

@matthijskooijman
Copy link
Member Author

Posted: BrunoTeixeira1996/CVE-2023-36250#1

@matthijskooijman
Copy link
Member Author

I just gave this a little more thought, what we could maybe do is add quoting of string values, which could ensure values are taken literally and not interpreted at all. On the other hand, that does change the format away from a simple and clean tab-separated value format...

@BrunoTeixeira1996
Copy link

Personally I find the vulnerability report rather far-fetched. Note that it only occurs when an interactive user chooses the TSV export format from the GUI, not when hamster export tsv is used from the command line.

Sorry but this makes no sense at all ... So for your understanding a security vulnerability is not a vulnerability if there's interaction? I created the PoC using my hamster use case (GUI) so that still represents a vulnerability because HAMSTER is responsible for the creation of the TSV file meaning that HAMSTER is responsible for escaping all fields since these will be exporter to some spreadsheet software.

Given that there are few Hamster users and only a fraction of them would ever use a tsv export, it seems extremely unlikely that any malicious hacker would ever look our way.

This again makes no sense at all. If a software is out there and has 0 users and still has plenty of vulnerabilities, those vulnerabitlies are still valid, the risk might be lower but they are still there ...

@BrunoTeixeira1996
Copy link

I just gave this a little more thought, what we could maybe do is add quoting of string values, which could ensure values are taken literally and not interpreted at all. On the other hand, that does change the format away from a simple and clean tab-separated value format...

This is the approach to take ... This is a security vulnerability and its not the spreadsheet's fault.
You should escape special chars only meaning that if any user uses some of the following then, that symbol would be converted into something (like a string):

  • Equals to (=)
  • Plus (+)
  • Minus (-)
  • At (@)
  • Tab (0x09)
  • Carriage return (0x0D)

Lets say your user uses something like =3+3 in the activity tab. You could convert = to the literal string equal making this impossible to exploit ...
This is just an example

@GeraldJansen
Copy link
Contributor

what we could maybe do is add quoting of string values, which could ensure values are taken literally and not interpreted at all.

Unfortunately this could break things for users who, like me, post-process exported TSV files to make customized activity reports. I routinely use this technique to prepare invoices. While I could easily adjust my scripts to unquote the fields, other users might not appreciate the unexpected change in behavior. The same goes for the quoting of special characters proposed by @BrunoTeixeira1996 .

I continue to think the vulnerability, if any, is in the spreadsheet that executes formulae upon loading TSV files. A TSV file completely equivalent to that exported by Hamster, with a field containing say =3+3, could easily be created by any text editor. Should we report all text editors as having this vulnerability? Clearly, that "makes no sense at all".

Therefore, my preference would be to remove the gtk.show_uri() from Hamster so that the saved TSV file is not automatically opened by a spreadsheet program. I will prepare a small pull request for this.

@BrunoTeixeira1996
Copy link

I continue to think the vulnerability, if any, is in the spreadsheet that executes formulae upon loading TSV files. A TSV file completely equivalent to that exported by Hamster, with a field containing say =3+3, could easily be created by any text editor. Should we report all text editors as having this vulnerability? Clearly, that "makes no sense at all".

The vulnerability is not on the spreadsheet ... The problem here is that hamster uses a field as a formula without escaping it, knowing that that entry would be exported to some kind of spreadsheet that will execute the formula.

CSV Injection, also known as Formula Injection, occurs when websites embed untrusted input inside CSV files. With that being said, if your software is embeding untrusted input your software is vulnerable.

You should consider read about how CVE's are made being that MITRE already approved the veracity of this one, making this a valid CVE from a valid vulnerability. The score is 7.8 (HIGH) because is command injection however is not critical because there's need to be user interaction.

@matthijskooijman
Copy link
Member Author

@BrunoTeixeira1996, Thanks for your additional explanations! I see your point that hamster is producing output that can be, for some consumers, not what hamster intended (and potentially have unintended side effects).

I guess the crux here is about the definition of the "TSV" format, which is, I think, vague and not well-specified. In its minimal specification, which is what @GeraldJansen is referring to and I think might be how hamster intends the format, TSV is just arbitrary strings, with no special meaning, separated by tabs. In this interpretation, no injection problem can exist (except that tabs cannot be allowed in the field values).

In practice, spreadsheet programs can import TSV values and do assign special meaning to the values contained in the format. As you say, a = is commonly used to start a formula, at least in programs that aim to be somewhat compatible with MS office, but AFAIU there is no well-defined specification for this - different programs will have different ways to interpret data. At the same time, I expect different programs might have different ways to escape such values as well. So it seems to me that it's hard to 1) decide when and what values to escape, and 2) how to escape them. I suspect there is no strategy that will work for all data consumers, so a solution is always a partial solution.

The owasp link you provided is useful, and agrees that mitigation is hard. It has three recommended steps for escaping, which would escape =1+"2" like "'=1+""2"""", which IMHO completely screws up the format for any consumer except spreadsheet programs...

Another thing to consider is that a spreadsheet program should typically treat files it opens as untrusted input, and not allow arbitrary code execution based on that input, at least not without the user marking such input as trusted. Especially when the input comes from a vague format like TSV, but I suspect this latter distinction is not usually made. Such limitations in consumers are indeed acknowledged by the OWASP page, but it (correctly) identifies a risk with this approach saying "... by exploiting the user’s tendency to ignore security warnings in spreadsheets that they downloaded from their own website.".

In summary, I'm not quite convinced that this is really a problem in Hamster, though I agree that Hamster could be modified to reduce the (already quite small) impact of this problem. However, I'm not convinced that changing the current TSV export is the way forward, since it breaks simpler consumers.

One middle ground would be to not escape/quote all values, but only when they begin with one of the potentially dangerous prefixes listed by OWASP (and since most values will not start with one of these values, existing consumers will not be affected in most of the cases). I'm not a big fan of this, since this list of potentially dangerous prefixes seems arbitrary and prone to be incomplete.

Another middle ground would be to split into two different export formats - "Simple TSV" like now, or "Spreadsheet TSV" with escaping for typical spreadsheet consumers. I guess this would be elegant, but probably also more work which I might rather invest into other improvements personally...

@BrunoTeixeira1996
Copy link

BrunoTeixeira1996 commented Dec 28, 2023

One middle ground would be to not escape/quote all values, but only when they begin with one of the potentially dangerous prefixes listed by OWASP (and since most values will not start with one of these values, existing consumers will not be affected in most of the cases). I'm not a big fan of this, since this list of potentially dangerous prefixes seems arbitrary and prone to be incomplete.

I agree that this is prone to be incomplete but i guess it helps creating a little small layer of security.
Do you think escaping would cause damage? @GeraldJansen converting =3+3 to equal3plus3 would break some of your work? or do you think that would be ok?

Another middle ground would be to split into two different export formats - "Simple TSV" like now, or "Spreadsheet TSV" with escaping for typical spreadsheet consumers. I guess this would be elegant, but probably also more work which I might rather invest into other improvements personally...

This is not easy to implement and people could still use the Simple TSV to bypass the escaping mechanism.

@GeraldJansen
Copy link
Contributor

converting =3+3 to equal3plus3 would break some of your work? or do you think that would be ok?

That would not be a problem for me personally. However, it is hard to say how users might be using some special characters. For example, I use a '/' in the category field to separate client and project. For sure '+' and '-' could be used casually by users in non-formulae. The description field could contain anything, including newlines, single and double quotes or whatever. Sanitation could be complex.

Hamster is not a website subject to untrusted input from third parties. It is single-user desktop software whose input comes from the user themself. The risk factor here is perhaps non-zero, but it is pretty darn close to zero.

I already mentioned that any editor could be used to create a TSV file. Consider also
echo '=3+3'>some.tsv && gnumeric some.tsv.
Does that mean bash maintainers are responsible for sanitizing any text passed through echo because it could potentially be a security threat? No.

As long as we do not automatically open the TSV exported by Hamster using a spreadsheet program, the spreadsheet program can not be considered a downstream component of Hamster and the CVE should not be applicable to subsequent Hamster versions.

@matthijskooijman Please let me know if you are interested in a PR to remove the gtk.open_uri() call. Otherwise, I'll bow out of this discussion as I feel it has already taken more energy than it merits.

@rhertzog
Copy link
Contributor

rhertzog commented Dec 29, 2023

You should consider read about how CVE's are made being that MITRE already approved the veracity of this one, making this a valid CVE from a valid vulnerability. The score is 7.8 (HIGH) because is command injection however is not critical because there's need to be user interaction.

I'm sorry but MITRE doesn't do any serious validation, there are plenty of examples of completely ill-founded CVE filed by people who want to get credits for discovering CVE that are reporting minor issues as security issues. The score is even more prone to issues because the impact is quite often (and this example is a good one) completely overrated.

Recent discussion on this: https://lwn.net/Articles/944209/

A good spreasheet software like libreoffice, doesn't open CSV/TSV automatically, it proposes a "textual import" screen where the checkbox "Evaluate formulas" is disabled by default. So to me the issue in in the spreadsheet software.

On a more practical level, if we decide to do something here I would suggest a minimal fix of replacing any leading \s*= with an UTF-8 character that is visually close like or . I would not care about any other edge case for Excel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants