Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking] Remove ResumableFileSlot and rely on high ulimits #1582

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

allada
Copy link
Member

@allada allada commented Feb 9, 2025

ResumableFileSlot became to difficult to manage, instead of managing resources this way, we use much higher
DEFAULT_OPEN_FILE_PERMITS, set ulimit (when unix) and warn user if the limits are likely too low.

Breaking: Removed idle_file_descriptor_timeout_millis from config.

closes: #1288, #513, #1298, #527


This change is Reviewable

@allada allada force-pushed the remove-local-fs-limit branch from b74b3fc to 43044cc Compare February 9, 2025 04:43
@allada allada force-pushed the remove-local-fs-limit branch from 43044cc to d7a3655 Compare February 9, 2025 04:52
@allada allada force-pushed the remove-local-fs-limit branch from d7a3655 to d81c645 Compare February 9, 2025 04:55
@allada allada force-pushed the remove-local-fs-limit branch from d81c645 to f2a2385 Compare February 9, 2025 04:56
@allada allada force-pushed the remove-local-fs-limit branch from f2a2385 to c8c51ba Compare February 9, 2025 05:00
@allada allada force-pushed the remove-local-fs-limit branch from c8c51ba to a9302dc Compare February 9, 2025 05:08
Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 0 of 21 files reviewed, and pending CI: Web Platform Deployment / ubuntu-24.04, pre-commit-checks, and 1 discussions need to be resolved (waiting on @aaronmondal)


a discussion (no related file):
+@aaronmondal


nativelink-worker/src/running_actions_manager.rs line 995 at r1 (raw file):

                            "Could not kill process",
                        );
                    } else {

fyi: This branch is the success case and the error was wrong, and it shouldn't be an ERROR.


nativelink-store/tests/filesystem_store_test.rs line 1313 at r1 (raw file):

#[serial]
#[nativelink_test]
async fn update_with_whole_file_slow_path_when_low_file_descriptors() -> Result<(), Error> {

fyi: Removed this as it is an outlier and we mostly rely on ulimit now.


nativelink-worker/tests/running_actions_manager_test.rs line 458 at r1 (raw file):

            arguments: vec!["touch".to_string(), "./some/path/test.txt".to_string()],
            output_files: vec!["some/path/test.txt".to_string()],
            environment_variables: vec![EnvironmentVariable {

fyi: sneaking these in on this PR.


nativelink-worker/tests/running_actions_manager_test.rs line 1478 at r1 (raw file):

        assert_eq!(poll!(&mut run_action_fut), Poll::Pending);
        tokio::task::yield_now().await;
        match fs::metadata(&process_started_file).await {

fyi: This should fix this flaky test by waiting for the process to start. The test was fine before, but gave exit code 127 or 9 depending on the state of the process during the kill event.


nativelink-util/src/buf_channel.rs line 375 at r1 (raw file):

                .recv()
                .await
                .err_tip(|| "During next read of buf_channel::take()")?;

fyi: Fixed typo.

@allada allada force-pushed the remove-local-fs-limit branch from bc8b804 to 72fc743 Compare February 9, 2025 18:32
@allada allada force-pushed the remove-local-fs-limit branch from 72fc743 to 511f002 Compare February 9, 2025 19:03
@allada allada force-pushed the remove-local-fs-limit branch from 511f002 to af64864 Compare February 9, 2025 21:22
@allada allada force-pushed the remove-local-fs-limit branch from af64864 to a73bcab Compare February 9, 2025 21:58
@allada allada force-pushed the remove-local-fs-limit branch from a73bcab to a47c20e Compare February 9, 2025 21:58
@allada allada force-pushed the remove-local-fs-limit branch from a47c20e to c2298d4 Compare February 9, 2025 22:17
ResumableFileSlot became to difficult to manage, instead of
managing resources this way, we use much higher
DEFAULT_OPEN_FILE_PERMITS, set ulimit (when unix) and warn
user if the limits are likely too low.

Breaking: Removed idle_file_descriptor_timeout_millis from config.

closes: TraceMachina#1288, TraceMachina#513, TraceMachina#1298, TraceMachina#527
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apparently creating too many threads for tracing-subscriber
2 participants