Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: handle symlinks in a centralized way #9800

Closed
wants to merge 2 commits into from

Conversation

humitos
Copy link
Member

@humitos humitos commented Dec 12, 2022

Simplifies symlinks hanlding to centralize the security logic in one place only (safe_open); reducing the complexity of this code, making it easier to maintain and more secure.

Besides, it improves the UX for users to understand why their build is failing. It clearly communicates the error using our internal pattern for known exception managed at Celery.on_failure.

These are the highlights 🔆 of this work:

  • Always follow symlinks
  • Always copy symlink as-is. Only resolve them when uploading to storage.
  • Always check symlinks are inside DOCROOT and base_path if passed
  • Hard fail the build if there are symlinks in the output that are outside
    DOCROOT
  • Centralize all the symlink checks in safe_open (used to open configuration
    files while building and to open files for uploading them)
  • Clearly communicate the error to the user
  • Move upload to storage function inside Celery.execute handler to allow us
    raising UnsupportedSymlinkFileError and using on_failure to clearly
    communicate the error to the user

I'm having issues running the tests right now, but I'll come back to this PR to update them soon.

NOTE this PR is done over rel branch, but we should switch if to main when the other commit gets merged here.

Hard fail the build if there is symlinks in the output.
Simplifies symlinks hanlding to centralize the security logic in one place
only (`safe_open`); reducing the complexity of this code, making it easier to
maintain and more secure.

Besides, it improves the UX for users to understand why their build is failing.
It clearly communicates the error using our internal pattern for known exception
managed at `Celery.on_failure`.

These are the highlights of this work:

- Always follow symlinks
- Always copy symlink as-is. Only resolve them when uploading to storage.
- Always check symlinks are inside DOCROOT and `base_path` if passed
- Hard fail the build if there are symlinks in the output that are outside
  DOCROOT
- Centralize all the symlink checks in `safe_open` (used to open configuration
  files while building and to open files for uploading them)
- Clearly communicate the error to the user
- Move upload to storage function inside `Celery.execute` handler to allow us
  raising `UnsupportedSymlinkFileError` and using `on_failure` to clearly
  communicate the error to the user
@humitos humitos requested a review from a team as a code owner December 12, 2022 18:49
Comment on lines -90 to -97
# Don't follow symlinks when uploading to storage.
if filepath.is_symlink():
log.info(
"Skipping symlink upload.",
path_resolved=str(filepath.resolve()),
)
continue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should follow symlinks when uploading files (github pages doesn't for example), and with our plans to test rclone, rclone doesn't support something like "follow symlinks but only inside this path", and we also need to check that the symlinks aren't outside the version path not just the docroot path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove this check from here to allow communicating this decision, "do not follow symlinks", to the user clearly. Otherwise, with this check we are just ignoring symlinks without telling this to the user, which lead to unexpected behavior from a user perspective.

# TODO: this `relative_path` can be improved to remove the first part:
# /symlink-security-exploit/artifacts/latest/generic/passwd.txt
# This is shown to the user currently.
relative_path = str(path).replace(str(docroot.resolve()), '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the Path.relative_to method that should be used instead of replace.

)

if path.is_symlink():
symlink_path = str(path.readlink().resolve())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve already follows symlinks, readlink isn't needed here.

Comment on lines -103 to -117
from_dir = Path(from_dir)
to_dir = Path(to_dir)
if from_dir.is_symlink() or to_dir.is_symlink():
log.info(
"Not copying directory, one of the paths is a symlink.",
from_dir=from_dir,
from_dir_resolved=from_dir.resolve(),
to_dir=to_dir,
to_dir_resolved=to_dir.resolve(),
)
return False

_assert_path_is_inside_docroot(from_dir)
_assert_path_is_inside_docroot(to_dir)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these checks are needed, since copytree will follow the symlink when doing the copy of the initial directory (there is a test for this as well), what it won't do is follow any symlinks inside that directory. And the check for _assert_path_is_inside_docroot is also useful, it will help us to prevent any directory traversal vulnerabilities being exploited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copytree will follow the symlink when doing the copy of the initial directory (there is a test for this as well)

What is that initial directory you refer to here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory itself that you are going to copy (from_dir)

resolved_path=path.resolve(),
)
return None
_assert_path_is_inside_docroot(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rmtree fails hard (OSError) when trying to delete a dir that points to a symlink, but _assert_path_is_inside_docroot is still useful as mentioned in my other comment.


time_before_store_build_artifacts = timezone.now()
# Store build artifacts to storage (local or cloud storage)
self.store_build_artifacts(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably do this change in another PR, doesn't look related to symlinks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move this call inside execute since we cannot raise exceptions from on_success and be able to manage them inside on_failure since it's not called when the code is already inside on_success.

@humitos
Copy link
Member Author

humitos commented Jan 23, 2023

I think there are some good ideas implemented on this PR. However, this code has been changing due to #9842 and #9888

We can come back to this once we have those settled down for a while.

@humitos humitos added the Status: blocked Issue is blocked on another issue label Jan 23, 2023
@humitos
Copy link
Member Author

humitos commented Jan 31, 2023

I'm closing this PR. It was useful to show my proposal about symlinks. Besides, the part related to store build artifacts was already implemented and merged from another PR.

@humitos humitos closed this Jan 31, 2023
@humitos humitos deleted the humitos/hard-fail-symlinks branch January 31, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: blocked Issue is blocked on another issue
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants