Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove libc dependencies from src/enclave #151

Open
20 of 25 tasks
davidchisnall opened this issue May 4, 2020 · 20 comments
Open
20 of 25 tasks

Remove libc dependencies from src/enclave #151

davidchisnall opened this issue May 4, 2020 · 20 comments
Labels
area: sgx-lkl Core SGX-LKL functionality enhancement p1 Medium priority
Milestone

Comments

@davidchisnall
Copy link
Contributor

davidchisnall commented May 4, 2020

As part of the relayering work, we should ensure that nothing in src/enclave depends on libc init. Some of these changes are simple and mechanical:

  • Change malloc to oe_malloc so that we don't need musl's version of malloc to be working before we start.
  • Remove use of C asserts and use sgxlkl_fail.

Others are more complex.

We can't remove a small set of things that we expect to exist in a freestanding C environment (e.g. memset / memcpy), because the compiler will emit these even if we don't.

Checklist:

  • enclave_event_channel.c
    • calloc / free
    • __assert_fail
  • enclave_init.c
    • __init_libc / __init_tls / __libc / __libc_start_init. This can't be fixed until we can start LKL without any other libc dependencies.
    • malloc.
    • strdup
    • wg_get_device / wg_key_to_base64 / wgu_add_peers / wgu_list_devices
  • enclave_mem.c
    • ___errno_location. We should not be using errno anywhere at this layer.
    • mprotect should be an explicit call to the host mprotect.
    • lseek / read are both used in the syscall replacements. These functions should be moved into src/lkl and should call the relevant LKL functions directly, not go via libc.
  • enclave_oe.c
    • __init_tls / _dlstart_c should be removed after we no longer depend on libc for this layer.
  • enclave_util.c
    • snprintf This is used only for tracing. The tracing is LKL-specific and so should be in src/lkl. This particular usage can be replaced by strcpy / memcpy because it's just appending to a buffer.
  • sgxlkl_app_config.c
    • ___errno_location. We should not be using errno anywhere at this layer.
    • fprintf / __stderr_FILE should be sgxlkl_verbose / sgxlkl_fail.
    • malloc / free
    • strdup
    • snprintf
    • strerror (We shouldn't be calling anything that sets errno, so this should go away once the other fixes happen)
  • mpmc_queue.c
    • assert (Replaced with SGXLKL_ASSERT)

The full list of dependencies can be seen by running the following command:

$ cd build_musl/sgxlkl/enclave
$ for I in *.o ; do echo $I ; nm -g  $I | grep ' U ' ; done
@davidchisnall davidchisnall added enhancement area: sgx-lkl Core SGX-LKL functionality labels May 4, 2020
@davidchisnall
Copy link
Contributor Author

@wintersteiger , FYI: please keep this in mind when replacing the config data. I hope that we'll get a lot of the fixes in sgxlkl_app_config.c for free when we move to using JSON and @mikbras's JSON library.

@prp
Copy link
Member

prp commented May 4, 2020

Note that we have SGXLKL_ASSERT for assertions.

@wintersteiger
Copy link

My work is mostly between OE and LKL, i.e. before LKL starts up. Can I use all of OE's musl at those points in the code or just the *alloc functions?

The new json library also depends on libc (e.g. string functions), but I think we can work around them within reasonable effort.

Is anybody emotionally attached to sgxlkl_app_config_t? It would simplify all of this quite a bit if we just removed it and kept only one config structure (e.g. half-initialized *disk_config_t's being passed around to mount filesystems etc).

@prp
Copy link
Member

prp commented May 4, 2020

@wintersteiger conceptually it makes sense to me to have a struct that contains the full app_config, the way it has passed attestation. Or do I misunderstand your proposal?

@wintersteiger
Copy link

@prp basically, yes. I started with putting the entire configuration into the json configs, I think there will be very few settings that we don't want attested, so I don't think we'll need a separate config for that purpose. And all of the shared memory should go into a separate struct, not into sgxlkl_enclave.

@prp
Copy link
Member

prp commented May 4, 2020

@wintersteiger ok, I see. So the question is if the parsed JSON app_config goes into a single app_config struct or into many separate structs/variables for the different SGX-LKL components.

@wintersteiger
Copy link

Yes, I think we should have a single struct that corresponds to the json config, otherwise it will be very tricky for users/reviewers to see what is in fact attested and what is not. I don't think we'll need an additional unattested config, but if it turns out we do, we should keep it separate (as opposed to the current merge of host, enclave, and app configs).

@davidchisnall
Copy link
Contributor Author

I agree, having a separate attested config makes a lot of sense, I don't have strong opinions as to whether we want attested vs nonattested config or a load of foo_config_attested and bar_config_unattested things.

We don't want to depend on OE's libc, not least because it also pulls in a load of oesyscalls dependencies, which increase our attack surface, but also because (assuming we can make it work at all) we'll end up with two copies of a load of libc things.

I'm happy for us to depend on things that a freestanding C environment would provide.

@prp prp added the p1 Medium priority label May 5, 2020
@prp prp added this to the Milestone 1 milestone May 5, 2020
@davidchisnall
Copy link
Contributor Author

OE core provides a number of these functions that we can use:

Functions expected in any freestanding C implementation:

memcmp, memcpy, memmove memset.

oe_-prefixed versions of some useful things:

oe_strlen, oe_snprinf, oe_strchr, oe_strchrnul, oe_strcmp, oe_strcspn, oe_strdup, oe_strlcat, oe_strlcpy, oe_strlen, oe_strncmp, oe_strrchr, oe_strspn, oe_strstr, oe_strtok_r, oe_strtoul.

We should add a compatibility header that #defines their non-oe_ versions to call these and -include it when building all of the enclave files.

@wintersteiger
Copy link

These are used in the new json parser without an oe_* version: strcpy, strtoll, strtod, strtoull.

Would it simplify things if we built src/enclave without sgxlkl-musl and only with oe-musl?

@SeanTAllen
Copy link
Contributor

SeanTAllen commented May 11, 2020

malloc to oe_malloc is going to be a little tricky in a couple places. I'm still tracking it down.

When making some updates from calloc to oe_calloc, strdup to oe_strdup, and malloc to oe_malloc, some of the test suite segfaults. I spent a couple hours today looking at it and it's unclear to me at the moment why. I thought I had missed something important where perhaps memory allocated with oe_malloc was being freed using free either inside or outside of the src/enclave tree. However, that doesn't appear to be the case (although its possible I missed something that is non-obvious).

I'm opening a PR for the changes related to malloc/free/calloc/strdup that are definitely contained to src/enclave and don't cause any segfaults. Baby steps to closing the ticket. Baby steps.

SeanTAllen added a commit that referenced this issue May 11, 2020
This commit removes some of the malloc/calloc/free/strdup libc usage
in src/enclave. Not all usage as been removed yet. There are other
cases that I have left because when changed, the test suite segfaults
for non-obvious reasons that need to be looked into.

This commit addresses a small part of #151.
SeanTAllen added a commit that referenced this issue May 12, 2020
This commit removes some of the malloc/calloc/free/strdup libc usage
in src/enclave. Not all usage as been removed yet. There are other
cases that I have left because when changed, the test suite segfaults
for non-obvious reasons that need to be looked into.

This commit addresses a small part of #151.
@SeanTAllen
Copy link
Contributor

SeanTAllen commented May 14, 2020

Status update...

This didn't end up where I expected it to. There's a 151_2 branch for this repo, plus a corresponding one for sgx-lkl-musl. It contains the additional calloc, malloc, strdup changes for this issue. However, a couple applications in the test suite (for example dotnet) segfault.

They only segfault in HW mode. Never in SW mode.

The initial thought was "we don't have enough memory and are getting back NULL from a malloc/calloc and that is the issue". The code on the branch contains NULL checks after the new oe_malloc and oe_calloc calls (many of the replaced calls made no such checks). That didn't turn anything up.

The next thought was "we are running out of memory somewhere else because of this change". Given that could happen in a number of locations, I increased various memory limits. You can see these in the branches as well. Same problem.

What exists now on the branches also includes a lot of printf style debugging so we can trace through where things blow up.

Here is what that shows:

  • This is related to thread local storage handling in HW mode. If you turn off thread local storage, there will be no segfault.

I tested this by changing https://github.com/lsds/sgx-lkl/blob/oe_port/src/enclave/enclave_init.c#L209

Setting

libc.user_tls_enabled = sgxlkl_enclave->mode == SW_DEBUG_MODE ? 1 : sgxlkl_enclave->fsgsbase;

so that it is either:

ON:

libc.user_tls_enabled = 1;

OFF:

libc.user_tls_enabled = 0;

results in no segfault when off, and segfault when on.

The printf debugging will show the different code paths that are taken. I've gotten to the point where I know that we get to https://github.com/lsds/sgx-lkl-musl/blob/oe_port/src/env/__init_tls.c#L50 and then segfault.

My working assumption is that assumptions about memory layout related to the app_config that is used for configuration are being made. However, that is just an assumption. There's a lot of nuance in the thread local storage code that is spread out over a number of files that would need to be investigated.

I'm setting aside work on this for now. @davidchisnall believes that when #155 is completed, we can do away with the problematic code entirely and replace with a user space implementation of pthreads. If that turns out to be correct then there's not a lot of point in figuring out the specific issue. For that reason, I'm pausing work on the additional malloc/calloc/strdup => oe_malloc/oe_calloc/oe_strdup changes for the time being.

See https://github.com/lsds/sgx-lkl/pull/234/files for additional context on this without having to check out the branches.

@wintersteiger
Copy link

(Just in case you haven't come across it before: thread locals may be reinitialized upon ecalls. For some reason this was the default behavior when I last looked at it, but it may not be anymore. It's called the "TCS policy"; see e.g. intel/linux-sgx#283)

@davidchisnall
Copy link
Contributor Author

@wintersteiger, I believe this is not an issue with the recent OE, which ensures that FS base is preserved across ecalls.

@SeanTAllen
Copy link
Contributor

malloc changes are now passing CI. there was a fix in OE that addressed the problem.

davidchisnall pushed a commit that referenced this issue May 26, 2020
SeanTAllen added a commit that referenced this issue May 27, 2020
This is part of the work on #151. We're removing direct libc dependencies
from src/enclave.
SeanTAllen added a commit that referenced this issue May 27, 2020
This is part of the work on #151. We're removing direct libc dependencies
from src/enclave.
SeanTAllen added a commit that referenced this issue May 27, 2020
As part of work on #151, this commit breaks enclave_util.c apart into
the parts only needed by lkl & sgx-lkl-musl and parts that are also
used by the enclave.

All parts that aren't needed for the enclave have been moved to lkl_util.c

This removes the libc dependency on snprintf from enclave_util and advances
towards our goal of removing libc dependencies from src/enclave.
SeanTAllen added a commit that referenced this issue May 27, 2020
Removes libc dependencies except for memset which would be emitted
by the compiler.

This is part of the work for #151
davidchisnall pushed a commit that referenced this issue May 28, 2020
As part of work on #151, this commit breaks enclave_util.c apart into
the parts only needed by lkl & sgx-lkl-musl and parts that are also
used by the enclave.

All parts that aren't needed for the enclave have been moved to lkl_util.c

This removes the libc dependency on snprintf from enclave_util and advances
towards our goal of removing libc dependencies from src/enclave.
SeanTAllen added a commit that referenced this issue May 28, 2020
As part of the work on #151, we are removing libc dependencies from src/enclave.
SeanTAllen added a commit that referenced this issue May 28, 2020
As part of the work on #151, we are removing libc dependencies from src/enclave.
@davidchisnall davidchisnall added the needs-triage Bug does not yet have a priority assigned label Jul 28, 2020
@SeanTAllen SeanTAllen removed their assignment Jul 29, 2020
@SeanTAllen SeanTAllen removed the needs-triage Bug does not yet have a priority assigned label Jul 29, 2020
@SeanTAllen SeanTAllen self-assigned this Jul 29, 2020
@SeanTAllen
Copy link
Contributor

I'm as assignee so I don't lose track of checking what of this has been done once #604 is merged.

@SeanTAllen SeanTAllen assigned SeanTAllen and unassigned SeanTAllen Aug 3, 2020
@SeanTAllen
Copy link
Contributor

Updated to reflect changes from #604 and associated work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: sgx-lkl Core SGX-LKL functionality enhancement p1 Medium priority
Projects
None yet
Development

No branches or pull requests

4 participants