forked from bacalhau-project/bacalhau
-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from bacalhau-project:main #11
Open
pull
wants to merge
327
commits into
DeCenter-AI:main
Choose a base branch
from
bacalhau-project:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+120,860
−148,764
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For some reason, timed-out executions were marked as cancelled instead of failed, which is wrong. Also this resulted in compute node calling `OnCancelComplete` on the requester node, which is a noop. This means the requester node will only mark the execution as failed when the housekeeper kicks in which has a buffer of 2 minutes, instead of as soon as the failure is reported by the compute node. Previously, this job will be marked as failed after 2-2:30 minutes: ``` bacalhau docker --timeout 10 run ubuntu sleep 120 ``` With this change it will marked as failed in ~10 seconds
## Example API and CLI usage when on <= 1.3.2 ``` → curl 127.0.0.1:1234/api/v1/requester/nodes "This endpoint is deprecated. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information" ``` ## Example CLI Usage when on 1.4 ``` Walid-MacBook-5: ~/ProtocolLabs/workspace/bacalhau (main) ✗ [bacalhau]→ bacalhau create job.yaml Command "create" is deprecated, Please use `job run` to create jobs. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information. Walid-MacBook-5: ~/ProtocolLabs/workspace/bacalhau (main) ✗ [bacalhau]→ echo hi | bacalhau create Command "create" is deprecated, Please use `job run` to create jobs. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information. Walid-MacBook-5: ~/ProtocolLabs/workspace/bacalhau (main) ✗ [bacalhau]→ bacalhau id Command "id" is deprecated, Please use `agent node` to inspect bacalhau nodes. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information. Walid-MacBook-5: ~/ProtocolLabs/workspace/bacalhau (main) ✗ [bacalhau]→ bacalhau get Command "get" is deprecated, Please use `job get` to download results of a job. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information. Walid-MacBook-5: ~/ProtocolLabs/workspace/bacalhau (main) ✗ [bacalhau]→ bacalhau list Command "list" is deprecated, Please use `job list` to list jobs. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information. Walid-MacBook-5: ~/ProtocolLabs/workspace/bacalhau (main) ✗ [bacalhau]→ bacalhau logs Command "logs" is deprecated, Please use `job logs` to follow job logs. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information. Walid-MacBook-5: ~/ProtocolLabs/workspace/bacalhau (main) ✗ [bacalhau]→ bacalhau validate Command "validate" is deprecated, Please use `job validate` to validate jobs. See the migration guide at https://docs.bacalhau.org/v/v.1.4.0/references/cli-reference/command-migration for more information. ```
Closes #4146, #4141, #4119, and #4143 --------- Authored-by: frrist <[email protected]>
Co-authored-by: frrist <[email protected]>
--------- Co-authored-by: frrist <[email protected]>
Bumps [github.com/hashicorp/go-retryablehttp](https://github.com/hashicorp/go-retryablehttp) from 0.7.5 to 0.7.7. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/hashicorp/go-retryablehttp/blob/main/CHANGELOG.md">github.com/hashicorp/go-retryablehttp's changelog</a>.</em></p> <blockquote> <h2>0.7.7 (May 30, 2024)</h2> <p>BUG FIXES:</p> <ul> <li>client: avoid potentially leaking URL-embedded basic authentication credentials in logs (<a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/158">#158</a>)</li> </ul> <h2>0.7.6 (May 9, 2024)</h2> <p>ENHANCEMENTS:</p> <ul> <li>client: support a <code>RetryPrepare</code> function for modifying the request before retrying (<a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/216">#216</a>)</li> <li>client: support HTTP-date values for <code>Retry-After</code> header value (<a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/138">#138</a>)</li> <li>client: avoid reading entire body when the body is a <code>*bytes.Reader</code> (<a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/197">#197</a>)</li> </ul> <p>BUG FIXES:</p> <ul> <li>client: fix a broken check for invalid server certificate in go 1.20+ (<a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/210">#210</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/1542b31176d3973a6ecbc06c05a2d0df89b59afb"><code>1542b31</code></a> v0.7.7</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/defb9f441dcf67a2a56fae733482836ea83349ac"><code>defb9f4</code></a> v0.7.7</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/a99f07beb3c5faaa0a283617e6eb6bcf25f5049a"><code>a99f07b</code></a> Merge pull request <a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/158">#158</a> from dany74q/danny/redacted-url-in-logs</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/8a28c574da4098c0612fe1c7135f1f6de113d411"><code>8a28c57</code></a> Merge branch 'main' into danny/redacted-url-in-logs</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/86e852df43aa0d94150c4629d74e5116d1ff3348"><code>86e852d</code></a> Merge pull request <a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/227">#227</a> from hashicorp/dependabot/github_actions/actions/chec...</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/47fe99e6460cddc5f433aad2b54dcf32281f8a53"><code>47fe99e</code></a> Bump actions/checkout from 4.1.5 to 4.1.6</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/490fc06be0931548d3523a4245d15e9dc5d9214d"><code>490fc06</code></a> Merge pull request <a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/226">#226</a> from testwill/ioutil</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/f3e9417dbfcd0dc2b4a02a1dfdeb75f1e636b692"><code>f3e9417</code></a> chore: remove refs to deprecated io/ioutil</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/d969eaa9c97860482749df718a35b4a269361055"><code>d969eaa</code></a> Merge pull request <a href="https://redirect.github.com/hashicorp/go-retryablehttp/issues/225">#225</a> from hashicorp/manicminer-patch-2</li> <li><a href="https://github.com/hashicorp/go-retryablehttp/commit/2ad8ed4a1d9e632284f6937e91b2f9a1d30e8298"><code>2ad8ed4</code></a> v0.7.6</li> <li>Additional commits viewable in <a href="https://github.com/hashicorp/go-retryablehttp/compare/v0.7.5...v0.7.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/hashicorp/go-retryablehttp&package-manager=go_modules&previous-version=0.7.5&new-version=0.7.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/bacalhau-project/bacalhau/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Walid Baruni <[email protected]>
This PR does the following - Run `swag fmt` on `pkg/publicapi/orchestrator/` this formats all API comments on endpoints. - Generate and update the swager `go generate pkg/swaggger/generate.sh` - Build the apiclient `make build-python-apiclient` - Fix the SDK meaning all files under `python/bacalhau_sdk` - Update README.md for `python/` folder. Github Keywords close #4164
This PR aims at the following - Fix Makefile to correctly release python api client closes #4202
This PR aims at the following Fix Ruff Formatting and hence the Bacalhau Airflow Build closes #4239
This PR aims at the following Add support for pagination at JobStore layer for job histories closes #4220
This PR improves the visibility of queued jobs by adding a new history event, dedicated job state, and enables updating the job state from running back to queued if necessary. ### This change includes: - Decoupling adding history events from updating job and executions states by exposing first class methods to add events - Make sure we are using transactions in many places we missed using them - Refactor the structure of `models.JobHistory` so that events are not coupled with state updates where we can add events even if the job/execution state remains the same. This plays nicely with our intention to add more granular events for job execution - Drop unhelpful and too granular history events ### Issues To maintain backward compatibility with v1.4 and to avoid `nil` exceptions due to missing `JobState` fields, we are still keeping those fields and adding undefined value for them at the API layer. Older clients will still see `Undefined` as the job state, but that is better than `nil` exception. When `v1.5` is released, we will set that version as the min version and all requests from older clients will be rejected with a message asking them to update. ### Example Output #### Old Format ``` → bacalhau job describe j-ae5403b8-0934-4d86-a6e0-31e2f981f021 ID = j-ae5403b8-0934-4d86-a6e0-31e2f981f021 Name = j-ae5403b8-0934-4d86-a6e0-31e2f981f021 Namespace = default Type = batch State = Completed Count = 1 Created Time = 2024-06-21 09:18:10 Modified Time = 2024-06-21 09:18:10 Version = 0 Summary Completed = 1 Job History TIME REV. STATE TOPIC EVENT 2024-06-21 09:18:10 1 Pending Submission Job submitted 2024-06-21 09:18:10 2 Running 2024-06-21 09:18:10 3 Completed Executions ID NODE ID STATE DESIRED REV. CREATED MODIFIED COMMENT e-6403a485 QmPLPUUj Completed Stopped 6 4h37m ago 4h37m ago Accepted job Execution e-6403a485 History TIME REV. STATE TOPIC EVENT 2024-06-21 09:18:10 1 New 2024-06-21 09:18:10 2 AskForBid 2024-06-21 09:18:10 3 AskForBidAccepted Requesting Node Accepted job 2024-06-21 09:18:10 4 AskForBidAccepted 2024-06-21 09:18:10 5 BidAccepted 2024-06-21 09:18:10 6 Completed Standard Output heelo ``` #### New Format ``` ID = j-65ba9fe0-f6d0-4d23-bd5c-e2f7480db30b Name = j-65ba9fe0-f6d0-4d23-bd5c-e2f7480db30b Namespace = default Type = batch State = Completed Count = 1 Created Time = 2024-07-24 07:20:49 Modified Time = 2024-07-24 07:20:51 Version = 0 Summary Completed = 1 Job History TIME TOPIC EVENT 2024-07-24 09:20:49 Submission Job submitted 2024-07-24 09:20:51 State Update Running 2024-07-24 09:20:51 State Update Completed Executions ID NODE ID STATE DESIRED REV. CREATED MODIFIED COMMENT e-42188026 n-6e8998ad Completed Stopped 6 42s ago 40s ago Execution e-42188026 History TIME TOPIC EVENT 2024-07-24 09:20:49 Scheduling Requested execution on n-6e8998ad 2024-07-24 09:20:51 Execution Running 2024-07-24 09:20:51 Execution Completed successfully Standard Output hello ``` #### New Format with Queueing ``` ID = j-333dbef1-65d6-46e7-af93-c763ab7c08f6 Name = j-333dbef1-65d6-46e7-af93-c763ab7c08f6 Namespace = default Type = batch State = Queued Message = Job queued. not enough nodes to run job. requested: 1, available: 0, suitable: 0. Count = 1 Created Time = 2024-07-24 07:22:11 Modified Time = 2024-07-24 07:22:11 Version = 0 Summary Job History TIME TOPIC EVENT 2024-07-24 09:22:11 Submission Job submitted 2024-07-24 09:22:11 Queueing Job queued. not enough nodes to run job. requested: 1, available: 0, suitable: 0. Executions ID NODE ID STATE DESIRED REV. CREATED MODIFIED COMMENT ``` Closes #3991 Closes #4101
This PR aims at the following - Introduce Live Table Writer - Refactor Printer and split out Job Progress Printer Logic - Get History Events and Print All Job Progress in Table Format closes #4233
Recently, our Circle CI Plan got downgraded, this PR aims to make sure we only use Large instances
# Bacalhau NCL (NATS Client Library) ## Overview The NCL (NATS Client Library) is an internal library for Bacalhau, designed to provide reliable, scalable, and efficient communication between orchestrator and compute nodes. It leverages NATS for messaging and implements an event-driven architecture with support for asynchronous communication, granular event logging, and robust state management. ## Key Components 1. **EnvelopedRawMessageSerDe**: Handles serialization and deserialization of RawMessages with versioning and CRC checks. 2. **MessageSerDeRegistry**: Manages serialization and deserialization of different message types. 3. **Publisher**: Handles asynchronous message publishing. 4. **Subscriber**: Manages message consumption and processing. 5. **MessageHandler**: Interface for processing received messages. 6. **MessageFilter**: Interface for filtering incoming messages. 7. **Checkpointer**: Interface for managing checkpoints in message processing. ## Technical Details ### Message Flow 1. **Publishing**: - The publisher accepts a `Message` struct through its `Publish` method. - The `MessageSerDeRegistry` serializes the `Message` into a `RawMessage` using the appropriate `MessageSerDe` for the message type. - The `EnvelopedRawMessageSerDe` serializes the `RawMessage` into a byte slice with an envelope containing a version byte and a CRC checksum. - The serialized message is published to NATS using the configured subject. 2. **Subscribing**: - The subscriber sets up a NATS subscription for specified subjects. - When a message is received, it's passed to the `processMessage` method. - The `EnvelopedRawMessageSerDe` deserializes the raw bytes into a `RawMessage`. The envelope version helps determine the deserialization method, and the CRC checksum is used to verify the message integrity. - The message filter is applied to determine if the message should be processed. - The `MessageSerDeRegistry` deserializes the `RawMessage` into a `Message` using the appropriate `MessageSerDe` for the message type. - The deserialized `Message` is passed to each configured `MessageHandler`. ### Serialization/Deserialization (SerDe) Flow 1. **Message to bytes (for sending)**: `Message` -> `MessageSerDe.Serialize()` -> `RawMessage` -> `EnvelopedRawMessageSerDe.Serialize()` -> `[]byte` 2. **Bytes to Message (when receiving)**: `[]byte` -> `EnvelopedRawMessageSerDe.Deserialize()` -> `RawMessage` -> `MessageSerDe.Deserialize()` -> `Message` ### EnvelopedRawMessageSerDe The `EnvelopedRawMessageSerDe` adds a version byte and a CRC checksum to each serialized `RawMessage`. The envelope structure is as follows: ``` +----------------+----------------+--------------------+ | Version (1 byte)| CRC (4 bytes) | Serialized Message | +----------------+----------------+--------------------+ ``` This allows for future extensibility, backward compatibility, and data integrity verification. ### MessageSerDeRegistry The `MessageSerDeRegistry` manages the serialization and deserialization of different message types. It allows registering custom message types with unique names and provides methods for serializing and deserializing messages. Key methods: - `Register(name string, messageType any, serde MessageSerDe) error` - `Serialize(message *Message) (*RawMessage, error)` - `Deserialize(rawMessage *RawMessage) (*Message, error)` ### Publisher The `publisher` struct handles message publishing. It supports asynchronous publishing and can be configured with options like message serializer, MessageSerDeRegistry, and destination subject or prefix. Key method: - `Publish(ctx context.Context, message *Message) error` ### Subscriber The `subscriber` struct manages message consumption. It sets up NATS subscriptions, processes incoming messages, and routes them to the appropriate message handlers. Key methods: - `Subscribe(subjects ...string) error` - `Close(ctx context.Context) error` ## Usage Within Bacalhau This library is designed to be used internally within the Bacalhau project. It should be integrated into the orchestrator and compute node components to handle all inter-node communication. Example integration points: 1. Job assignment from orchestrator to compute nodes 2. Status updates from compute nodes to orchestrator 3. Heartbeat messages for health monitoring **This PR only migrated heartbeats to NCL** ## Remaining Work A lot of work remains to support progress checkpointing, event sequencing for optimistic concurrency, demuxing of subscribers, and migration of existing Bacalhau internal APIs. Those will be handled in future PRs ## Reference https://www.notion.so/expanso/Reliable-Orchestration-Design-397083a957794668adb15322553e6652?pvs=4
Fix wrong validation of docker env variables which assumed that `--env` is accepting a pair of key values, but actually it accepts a string in the format of `key=value` ### Existing Behaviour ``` # job with single env var → bacalhau docker run --env FOO=bar ubuntu:latest printenv building docker engine spec: invalid docker engine param: 'EnvironmentVariables' ([FOO=bar]) must contain an even number of elements to represent environment variable key-value pairs # job with two env vars → bacalhau docker run --env FOO=bar --env name=walid ubuntu:latest printenv HOSTNAME=e9e235618726 FOO=bar name=walid HOME=/root ``` ### After the fix ``` → bacalhau docker run --env FOO=bar ubuntu:latest printenv -f Job successfully submitted. Job ID: j-fb02dd3c-0251-4c74-b7f1-a207745434a8 Waiting for logs... (Enter Ctrl+C to exit at any time, your job will continue running): PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=809627b2cb7d FOO=bar HOME=/root ``` ### Note: Both the client and compute nodes needs to be updated. Otherwise the job will fail server side validation in the compute node Fixes #4227 https://bacalhauproject.slack.com/archives/C056T5GUYN7/p1721907201581259
This change replaces `repo.version` with `system_metadata.yaml` based on the criteria defined in #4218. Further it includes a migration that does the following: - Updates the repo version to `4` and sets it in `system_metadata.yaml` - moves the installationID to `system_metadata.yaml` if one is present in the config. - delete the `update.json` file, its replacement is `system_metadata.yaml` Additionally the update check has been modified to use an `UpdateStore` interface for reading and writing the `LastUpdateTime` field from the `system_metadata.yaml` file. The FsRepo satasfies this interface. Lastly the UpdateCheckStatePath field has been removed from the config. A migration for this is not possible as we no longer permit users setting the field --------- Co-authored-by: frrist <[email protected]>
This PR aims at the following - Create a buildkite-hosted-agent for bacalhau - Create a bacalhau-golang pipeline - Add buildkite scripts for different components closes #4297
This PR aims at the following - Send testsuite executions to Buildkite Portal for better insights closes #4300
# Watcher Library ## Overview The Watcher Library is an internal component of the Bacalhau project that provides a robust event watching and processing system. It's designed to efficiently store, retrieve, and process events. The library ensures events are stored in a durable, ordered manner, allowing for consistent and reliable event processing. It supports features like checkpointing, filtering, and long-polling, while maintaining the ability to replay events from any point in the event history. ## Key Features 1. **Ordered Event Processing**: Events are processed in the exact order they were created, ensuring consistency and predictability in event handling. 2. **Durability**: Events are stored persistently in BoltDB, ensuring they survive system restarts or crashes. 3. **Replayability**: The system allows replaying events from any point in history, facilitating data recovery, debugging, and system reconciliation. 4. **Concurrency**: Multiple watchers can process events concurrently, improving system throughput. 5. **Filtering**: Watchers can filter events based on object types and operations, allowing for targeted event processing. 6. **Checkpointing**: Watchers can save their progress and resume from where they left off, enhancing reliability and efficiency. 7. **Long-polling**: Efficient event retrieval with support for long-polling, reducing unnecessary network traffic and database queries. 8. **Garbage Collection**: Automatic cleanup of old events to manage storage while maintaining the ability to replay from critical points. 9. **Flexible Event Iteration**: Different types of iterators for various use cases, including the ability to start from the oldest event, the latest event, or any specific point in the event history. ## Key Components 1. **Registry**: Manages multiple watchers and provides methods to create and manage watchers. 2. **Watcher**: Represents a single event watcher that processes events sequentially. 3. **EventStore**: Responsible for storing and retrieving events, with BoltDB as the default implementation. 4. **EventHandler**: Interface for handling individual events. 5. **Serializer**: Handles the serialization and deserialization of events. ## Core Concepts ### Event An `Event` represents a single occurrence in the system. It has the following properties: - `SeqNum`: A unique, sequential identifier for the event. - `Operation`: The type of operation (Create, Update, Delete). - `ObjectType`: The type of object the event relates to. - `Object`: The actual data associated with the event. - `Timestamp`: When the event occurred. ### EventStore The `EventStore` is responsible for persisting events and providing methods to retrieve them. It uses BoltDB as the underlying storage engine and supports features like caching, checkpointing, and garbage collection. ### Registry The `Registry` manages multiple watchers. It's the main entry point for components that want to subscribe to events. ### Watcher A `Watcher` represents a single subscriber to events. It processes events sequentially and can be configured with filters and checkpoints. ### EventIterator An `EventIterator` defines the starting position for reading events. There are four types of iterators: 1. **TrimHorizonIterator**: Starts from the oldest available event. 2. **LatestIterator**: Starts from the latest available event. 3. **AtSequenceNumberIterator**: Starts at a specific sequence number. 4. **AfterSequenceNumberIterator**: Starts after a specific sequence number. ## Usage Here's how you typically use the Watcher library within Bacalhau: 1. Create an EventStore: ```go db, _ := bbolt.Open("events.db", 0600, nil) store, _ := boltdb.NewEventStore(db) ``` 2. Create a Registry: ```go registry := watcher.NewRegistry(store) ``` 3. Implement an EventHandler: ```go type MyHandler struct{} func (h *MyHandler) HandleEvent(ctx context.Context, event watcher.Event) error { // Process the event return nil } ``` 4. Start watching for events: ```go watcher, _ := registry.Watch(ctx, "my-watcher", &MyHandler{}, watcher.WithFilter(watcher.EventFilter{ ObjectTypes: []string{"Job", "Execution"}, Operations: []watcher.Operation{watcher.OperationCreate, watcher.OperationUpdate}, }), ) ``` 5. Store events: ```go store.StoreEvent(ctx, watcher.OperationCreate, "Job", jobData) ``` ## Configuration ### Watch Configuration When creating a watcher, you can configure it with various options: - `WithInitialEventIterator(iterator EventIterator)`: Sets the starting position for watching if no checkpoint is found. - `WithFilter(filter EventFilter)`: Sets the event filter for watching. - `WithBufferSize(size int)`: Sets the size of the event buffer. - `WithBatchSize(size int)`: Sets the number of events to fetch in each batch. - `WithInitialBackoff(backoff time.Duration)`: Sets the initial backoff duration for retries. - `WithMaxBackoff(backoff time.Duration)`: Sets the maximum backoff duration for retries. - `WithMaxRetries(maxRetries int)`: Sets the maximum number of retries for event handling. - `WithRetryStrategy(strategy RetryStrategy)`: Sets the retry strategy for event handling. Example: ```go watcher, err := registry.Watch(ctx, "my-watcher", &MyHandler{}, watcher.WithInitialEventIterator(watcher.TrimHorizonIterator()), watcher.WithFilter(watcher.EventFilter{ ObjectTypes: []string{"Job", "Execution"}, Operations: []watcher.Operation{watcher.OperationCreate, watcher.OperationUpdate}, }), watcher.WithBufferSize(1000), watcher.WithBatchSize(100), watcher.WithMaxRetries(3), watcher.WithRetryStrategy(watcher.RetryStrategyBlock), ) ``` ### EventStore Configuration (BoltDB) The BoltDB EventStore can be configured with various options: - `WithEventsBucket(name string)`: Sets the name of the bucket used to store events. - `WithCheckpointBucket(name string)`: Sets the name of the bucket used to store checkpoints. - `WithEventSerializer(serializer watcher.Serializer)`: Sets the serializer used for events. - `WithCacheSize(size int)`: Sets the size of the LRU cache used to store events. - `WithLongPollingTimeout(timeout time.Duration)`: Sets the timeout duration for long-polling requests. - `WithGCAgeThreshold(threshold time.Duration)`: Sets the age threshold for event pruning. - `WithGCCadence(cadence time.Duration)`: Sets the interval at which garbage collection runs. - `WithGCMaxRecordsPerRun(max int)`: Sets the maximum number of records to process in a single GC run. - `WithGCMaxDuration(duration time.Duration)`: Sets the maximum duration for a single GC run. Example: ```go store, err := boltdb.NewEventStore(db, boltdb.WithEventsBucket("myEvents"), boltdb.WithCheckpointBucket("myCheckpoints"), boltdb.WithCacheSize(1000), boltdb.WithLongPollingTimeout(10*time.Second), ) ``` ## Best Practices 1. Use meaningful watcher IDs to easily identify different components subscribing to events. 2. Implement error handling in your `EventHandler` to ensure robust event processing. 3. Use appropriate filters to minimize unnecessary event processing. 4. Regularly checkpoint your watchers to enable efficient restarts. 5. Monitor watcher stats to ensure they're keeping up with event volume. ## Troubleshooting 1. If a watcher is falling behind, consider increasing the batch size or optimizing the event handling logic. 2. For performance issues, check the BoltDB file size and consider tuning the garbage collection parameters. ## Future Improvements 1. Enhanced monitoring and metrics.
This PR aims at the following - Send failure test results to buildkite and fail the pipeline closes #4300
This PR introduces two key features: ## 1. Enhanced Execution Environment Variables Added support for passing rich job metadata to execution engines via environment variables including: - `BACALHAU_PARTITION_INDEX`: Current partition index (0 to N-1) - `BACALHAU_PARTITION_COUNT`: Total number of partitions - `BACALHAU_JOB_ID`: Unique job identifier - `BACALHAU_JOB_NAME`: User-provided job name - `BACALHAU_JOB_NAMESPACE`: Job namespace - `BACALHAU_JOB_TYPE`: Job type (Batch/Service) - `BACALHAU_EXECUTION_ID`: Unique execution identifier - `BACALHAU_NODE_ID`: ID of executing compute node This allows jobs to: - Be partition-aware and handle their specific partition's work - Access their execution context - Track node assignment ## 2. Test Suite for Partition Scheduling Added comprehensive test suite that validates: - Environment variable propagation to executors - Partition scheduling behavior: - Unique partition indices - Node distribution - Retry behavior - Service job continuous execution <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit Here are the release notes for this pull request: **New Features** - Added environment variable management for job executions - Enhanced support for system and task-level environment variables - Improved job partitioning and execution context handling **Bug Fixes** - Fixed potential nil slice access in job task retrieval - Added validation for environment variable naming conventions **Improvements** - Streamlined executor and job handling interfaces - Added utility functions for environment variable manipulation - Enhanced test coverage for job execution scenarios **Technical Enhancements** - Refactored execution context management - Improved error handling in task and job validation - Added robust environment variable sanitization and merging capabilities <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR makes execution information available to storage providers in preparation for enabling partitioned inputs https://linear.app/expanso/issue/ENG-520/partitioned-s3-input-source <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **Refactor** - Updated storage preparation methods across multiple packages to include execution context. - Modified method signatures to support more comprehensive input handling. - Enhanced flexibility in the storage preparation process. - **Testing** - Updated test suites to incorporate mock execution contexts. - Improved test coverage for storage-related functionality. These changes represent a significant architectural refinement in how storage and execution contexts are managed throughout the system, focusing on more robust and context-aware storage preparation mechanisms. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR introduces configurable partitioning strategies for S3 input sources, enabling distributed job executions to efficiently process subsets of S3 objects. When a job is created with multiple executions (N > 1), each execution is assigned a unique partition index (0 to N-1) and will only process its designated subset of objects based on the configured partitioning strategy. ## Motivation - Enable parallel processing of large S3 datasets across multiple job executions - Allow users to control how objects are distributed based on their data organization patterns - Provide deterministic object distribution for reproducible results ## Features - Multiple partitioning strategies: - `none`: No partitioning, all objects available to all executions (default) - `object`: Partition by complete object key using consistent hashing - `regex`: Partition using regex pattern matches from object keys - `substring`: Partition based on a specific portion of object keys - `date`: Partition based on dates found in object keys - Hash-based partitioning using FNV-1a ensures: - Deterministic assignment of objects to partitions - Distribution based on the chosen strategy and input data patterns - Robust handling of edge cases: - Fallback to partition 0 for unmatched objects - Proper handling of directories and empty paths - Unicode support for substring partitioning ## Example Usage Basic object partitioning: ```yaml source: type: s3 params: bucket: mybucket key: data/* partition: type: object ``` Regex partitioning with capture groups: ```yaml source: type: s3 params: bucket: mybucket key: data/* partition: type: regex pattern: "data/(\d{4})/(\d{2})/.*\.csv" ``` Date-based partitioning: ```yaml source: type: s3 params: bucket: mybucket key: logs/* partition: type: date dateFormat: "2006-01-02" ``` ## Testing - Unit tests covering all partitioning strategies - Integration tests with actual S3 storage - Edge case handling and error scenarios - Distribution analysis with various input patterns <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit Based on the comprehensive summary of changes, here are the release notes: ## Release Notes - **New Features** - Added S3 Object Partitioning system with support for multiple partitioning strategies (Object, Regex, Substring, Date) - Enhanced storage and compute modules to support execution-level context - **Improvements** - Refined method signatures across multiple packages to include execution context - Updated error handling and message formatting in various storage and compute modules - Improved flexibility in resource calculation and bidding strategies - **Bug Fixes** - Updated volume size calculation methods to handle more complex input scenarios - Enhanced validation for storage and partitioning configurations - **Documentation** - Added comprehensive documentation for S3 Object Partitioning system - Improved inline documentation for new features and method changes <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Removed environment variable tracking for `NODE_ID`, `JOB_NAME`, and `JOB_NAMESPACE` in job execution. - Updated test cases to remove node-specific verifications and focus on partition-specific outputs. - Adjusted expected outputs in tests to reflect changes in environment variable priorities. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
) This PR removes the dependency on kubectl's template package by providing a simplified implementation for CLI help text formatting. Binary size reduced from `92M` to `81M`. The new implementation: - Removes i18n support to reduce complexity - Drops markdown processing (blackfriday) dependency - Keeps the core formatting functionality for command help text and examples - Maintains heredoc support for clean multiline strings - Includes comprehensive tests The result is a lighter, more focused package that handles just what we need for CLI help formatting while removing a heavy dependency. Original inspiration from kubectl is credited in the package documentation. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Removed internationalization (i18n) support from CLI commands - Updated template handling to use a custom local package - Simplified string definitions for command descriptions and examples - Reduced external dependencies related to Kubernetes libraries - **Chores** - Cleaned up module dependencies in `go.mod` - Removed unused Kubernetes-related packages - **New Features** - Introduced a new `templates` utility package for CLI text formatting <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Documentation** - Updated terminology from "consistent hashing" to "deterministic hashing" in S3 object partitioning documentation - Added guidance on date format verification for timezone scenarios in best practices section <!-- end of auto-generated comment: release notes by coderabbit.ai -->
# Add `bacalhau license inspect` Command ## Summary Added a new CLI command `bacalhau license inspect` that allows users to inspect and validate Bacalhau license files. The command supports both offline validation and multiple output formats (table, JSON, YAML). ## Features - New `bacalhau license inspect` command with the following capabilities: - Validates license file authenticity using RSA public key verification - Displays license details including product name, license ID, customer ID, validity period, and capabilities - Supports offline validation using embedded JWKS public keys - Multiple output formats: table (default), JSON, and YAML - Includes metadata field in JSON/YAML output formats ## Implementation Details - Added `inspect.go` implementing the license inspection command - Integrated with existing license validation framework - Added `NewOfflineLicenseValidator` with hardcoded JWKS verification keys for offline validation - Comprehensive test coverage including: - Unit tests for various license scenarios - Integration tests for CLI functionality - Tests for different output formats - Invalid license handling ## Usage Examples ```bash # Basic inspection bacalhau license inspect license.json # JSON output bacalhau license inspect license.json --output=json # YAML output bacalhau license inspect license.json --output=yaml ``` ## Example output ``` Product = Bacalhau License ID = e66d1f3a-a8d8-4d57-8f14-00722844afe2 Customer ID = test-customer-id-123 Valid Until = 2045-07-28 Version = v1 Capabilities = max_nodes=1 Metadata = {} ``` ## Test Coverage - Unit tests covering: - Valid/invalid license validation - Various output formats - Error handling scenarios - Offline validation - Integration tests verifying CLI functionality <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added a new `license` command to the CLI for inspecting and validating local license information. - Introduced functionality to inspect license details with support for JSON and YAML output formats. - Added new test files for various license scenarios, including valid and invalid licenses. - **Testing** - Enhanced test coverage for license validation and inspection, including offline validation scenarios. - Added integration tests for local license validation scenarios. - **Improvements** - Implemented offline license validation. - Refined error messaging for license-related operations. - **Configuration** - Updated configuration files to include new settings for orchestrator and API. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
# MetricRecorder MetricRecorder is a helper for recording OpenTelemetry metrics with consistent attribute handling and aggregation capabilities. It simplifies the process of recording latencies, counters, and gauges while maintaining a clean API. ## Features - Aggregates metrics internally until explicitly published - Perfect for loops - automatically sums up latencies for each sub-operation type - Reduces number of metrics published to just the totals - Records operation latencies with sub-operation tracking - Supports counters with increment-by-one and increment-by-n operations - Handles gauge measurements - Manages attributes consistently across all measurements - Built-in error type recording using OpenTelemetry semantic conventions - Not thread-safe by design (one recorder per goroutine) ## Usage ### Basic Usage ```go // Create a new recorder with base attributes recorder := NewMetricRecorder(attribute.String("operation", "process")) // Ensure metrics are published when done defer recorder.Done(ctx, totalDurationHistogram) // Record latency for specific operations recorder.Latency(ctx, dequeueHistogram, "dequeue") // Count operation recorder.Count(ctx, operationCounter) // Record gauge values recorder.Gauge(ctx, queueSizeGauge, float64(queueSize)) ``` ### Tracking Sub-Operations ```go func ProcessJob(ctx context.Context, job *Job) (err error) { // Create recorder with base attributes recorder := NewMetricRecorder( attribute.String("job_type", job.Type), attribute.String("priority", job.Priority), ) // Records total duration when done defer recorder.Done(ctx, jobTotalDurationHist) defer recorder.Error(err) // Each Latency() call measures time since the previous operation if err := validateJob(job); err != nil { return err } recorder.Latency(ctx, jobStepHist, "validation") if err := processJob(job); err != nil { return err } recorder.Latency(ctx, jobStepHist, "processing") cleanup(job) recorder.Latency(ctx, jobStepHist, "cleanup") return nil } ``` ### Aggregating Metrics in Loops ```go func ProcessBatch(ctx context.Context, items []Item) (err error) { recorder := NewMetricRecorder(attribute.String("operation", "batch_process")) defer recorder.Done(ctx, batchDurationHist) defer recorder.Error(err) for _, item := range items { // These latencies are automatically summed by operation type if err := validate(item); err != nil { return } recorder.Latency(ctx, stepHist, "validation") if err := unmarshall(item); err != nil { return } recorder.Latency(ctx, stepHist, "unmarshalling") if err := process(item); err != nil { return } recorder.Latency(ctx, stepHist, "processing") } // When Done() is called: // - "validation" latency will be the total time spent in validation across all items // - "unmarshalling" latency will be the total time spent in unmarshalling across all items // - "processing" latency will be the total time spent in processing across all items return nil } ``` ### Recording Errors ```go if err := process(msg); err != nil { // Records error type using OpenTelemetry semantic conventions recorder.Error(err) return err } ``` ### Adding Attributes ```go // Add attributes at creation recorder := NewMetricRecorder( attribute.String("service", "processor"), attribute.String("version", "1.0"), ) // Add attributes later recorder.AddAttributes(attribute.Int("retry_count", retryCount)) ``` ### Recording Different Metric Types ```go // Record latency since last operation recorder.Latency(ctx, processHistogram, "process") // Increment counter by 1 recorder.Count(ctx, requestCounter) // Increment counter by specific value recorder.CountN(ctx, bytesProcessedCounter, bytesProcessed) // Set gauge value recorder.Gauge(ctx, activeWorkersGauge, float64(workerCount)) // Record specific duration recorder.Duration(ctx, customDurationHist, measureDuration) ``` ## Important Notes 1. **Thread Safety**: MetricRecorder is not thread-safe. Create separate recorders for each goroutine if you need to record metrics from multiple goroutines. 2. **Lifecycle Management**: - The recorder starts timing when created - Metrics are aggregated internally until `Done()` is called - Call `Done()` to publish all aggregated metrics - Use `defer recorder.Done(ctx, histogram)` right after creation 3. **Attribute Handling**: - Base attributes are set at creation - Additional attributes can be added later - All attributes are included in every metric recording - Final attributes can be added when calling `Done()` 4. **Aggregation Behavior**: - Latencies and counts are aggregated internally - Gauges and direct durations are published immediately - All aggregated metrics are published when `Done()` is called <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added standard histogram bucket boundaries for various measurements. - Enhanced metric recording capabilities with more flexible telemetry tracking. - **Dependency Updates** - Updated multiple dependencies, including OpenTelemetry, gRPC, and crypto libraries. - Upgraded test and instrumentation packages. - **Improvements** - Simplified error checking mechanisms. - Improved logging and telemetry setup. - Enhanced metric recorder with more robust attribute management. - **Refactoring** - Removed telemetry-related functionality from some existing components. - Restructured metric recording approach. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added comprehensive telemetry metrics for job store operations - Enhanced observability with detailed performance tracking - Introduced metrics for database operation duration, data read/write, and store size - **Improvements** - Updated method signatures to support context-based telemetry - Implemented detailed error logging and performance monitoring - Added granular tracking for various database operations - **Performance** - Integrated OpenTelemetry for advanced monitoring - Added instrumentation for job retrieval, creation, and deletion processes <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR also reduces provisioned disk space as well as deploying Ops Agent for monitoring disk utilization <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced monitoring capabilities with Ops Agent integration - Added firewall rules for Ops Agent traffic - **Bug Fixes** - Adjusted disk size configurations in production and staging environments - **Chores** - Updated Bacalhau software version from v1.6.0 to v1.6.1 - Added service account labels and IAM role bindings for improved resource management <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced telemetry and metrics tracking for job scheduling processes - Added detailed metrics for job executions, node matching, and retry attempts - **Improvements** - Improved error handling and observability in scheduler components - Expanded metric recording capabilities with new histogram and count tracking methods - **Performance** - Refined job scheduling metrics to provide more granular insights into system operations <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added a metrics planner to track and monitor performance of job scheduling and planning processes - Introduced detailed telemetry metrics using OpenTelemetry for tracking plan processing, executions, and events - **Improvements** - Enhanced state update process with more granular error tracking and performance monitoring - Added flexibility in metric recording with new `DoneWithoutTotalDuration` method - **Telemetry** - Implemented comprehensive metrics tracking for plan processing, including duration, counts, and event distributions <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Bumps [next](https://github.com/vercel/next.js) from 14.2.15 to 14.2.21. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v14.2.21</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Upgrade React from 14898b6a9 to 178c267a4e: <a href="https://redirect.github.com/vercel/next.js/pull/74115">vercel/next.js#74115</a></li> <li>Fix unstable_allowDynamic when used with pnpm: <a href="https://redirect.github.com/vercel/next.js/pull/73765">vercel/next.js#73765</a></li> </ul> <h3>Misc Changes</h3> <ul> <li>chore(docs): add missing search: '' on remotePatterns: <a href="https://redirect.github.com/vercel/next.js/pull/73927">vercel/next.js#73927</a></li> <li>chore(docs): update version history of next/image: <a href="https://redirect.github.com/vercel/next.js/pull/73926">vercel/next.js#73926</a></li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/unstubbable"><code>@unstubbable</code></a>, <a href="https://github.com/ztanner"><code>@ztanner</code></a>, and <a href="https://github.com/styfle"><code>@styfle</code></a> for helping!</p> <h2>v14.2.20</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Fix fetch cloning bug (<a href="https://redirect.github.com/vercel/next.js/pull/73532">vercel/next.js#73532</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/wyattjoh"><code>@wyattjoh</code></a> for helping!</p> <h2>v14.2.19</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>ensure worker exits bubble to parent process (<a href="https://redirect.github.com/vercel/next.js/issues/73433">#73433</a>)</li> <li>Increase max cache tags to 128 (<a href="https://redirect.github.com/vercel/next.js/issues/73125">#73125</a>)</li> </ul> <h3>Misc Changes</h3> <ul> <li>Update max tag items limit in docs (<a href="https://redirect.github.com/vercel/next.js/issues/73445">#73445</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/ztanner"><code>@ztanner</code></a> and <a href="https://github.com/ijjk"><code>@ijjk</code></a> for helping!</p> <h2>v14.2.18</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Fix: (third-parties) sendGTMEvent not queueing events before GTM init (<a href="https://redirect.github.com/vercel/next.js/issues/68683">#68683</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/72111">#72111</a>)</li> <li>Ignore error pages for cache revalidate (<a href="https://redirect.github.com/vercel/next.js/issues/72412">#72412</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/72484">#72484</a>)</li> </ul> <h3>Credits</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vercel/next.js/commit/2655f6efd3379cf68205b3b5c5173399294f7731"><code>2655f6e</code></a> v14.2.21</li> <li><a href="https://github.com/vercel/next.js/commit/8803d2b46e66ad831aec7979f2b7f0243057e48f"><code>8803d2b</code></a> Backport (v14): Upgrade React from 14898b6a9 to 178c267a4e (<a href="https://redirect.github.com/vercel/next.js/issues/74115">#74115</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/6e35243eae7f876177462b32763b3749d84c7d03"><code>6e35243</code></a> chore(docs): add missing <code>search: ''</code> on <code>remotePatterns</code> (<a href="https://redirect.github.com/vercel/next.js/issues/73925">#73925</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/73927">#73927</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/54919d2f2829e112a24ab2559a1e71e0d3ab859d"><code>54919d2</code></a> chore(docs): update version history of <code>next/image</code> (<a href="https://redirect.github.com/vercel/next.js/issues/73926">#73926</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/049a6907af00488a607e958a863fe42328d8cd6a"><code>049a690</code></a> Backport: Fix <code>unstable_allowDynamic</code> when used with pnpm (<a href="https://redirect.github.com/vercel/next.js/issues/73765">#73765</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/663fa9cb295730b5adfce8f968593f3838228d16"><code>663fa9c</code></a> Fix SWC and React versions for <code>14-2-1</code> branch (<a href="https://redirect.github.com/vercel/next.js/issues/73791">#73791</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/ed78a4aa673034719d5664536a80d326eebac7e1"><code>ed78a4a</code></a> v14.2.20</li> <li><a href="https://github.com/vercel/next.js/commit/530421d3a2cf00e94a8d68ef5b093bb866f13f14"><code>530421d</code></a> [backport] Fix/dedupe fetch clone (<a href="https://redirect.github.com/vercel/next.js/issues/73532">#73532</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/cbc62adabae6517b0f848a43a777af9b161cfd97"><code>cbc62ad</code></a> v14.2.19</li> <li><a href="https://github.com/vercel/next.js/commit/92280dc4359c319aaac86e9795d74477b77f09ff"><code>92280dc</code></a> [backport] Update max tag items limit in docs (<a href="https://redirect.github.com/vercel/next.js/issues/73445">#73445</a>)</li> <li>Additional commits viewable in <a href="https://github.com/vercel/next.js/compare/v14.2.15...v14.2.21">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=14.2.15&new-version=14.2.21)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/bacalhau-project/bacalhau/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jamil Shamy <[email protected]>
…airflow (#4809) Bumps [virtualenv](https://github.com/pypa/virtualenv) from 20.25.0 to 20.26.6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pypa/virtualenv/releases">virtualenv's releases</a>.</em></p> <blockquote> <h2>20.26.6</h2> <!-- raw HTML omitted --> <h2>What's Changed</h2> <ul> <li>release 20.26.5 by <a href="https://github.com/gaborbernat"><code>@gaborbernat</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2766">pypa/virtualenv#2766</a></li> <li>Fix <a href="https://redirect.github.com/pypa/virtualenv/issues/2768">#2768</a>: Quote template strings in activation scripts by <a href="https://github.com/y5c4l3"><code>@y5c4l3</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2771">pypa/virtualenv#2771</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/y5c4l3"><code>@y5c4l3</code></a> made their first contribution in <a href="https://redirect.github.com/pypa/virtualenv/pull/2771">pypa/virtualenv#2771</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pypa/virtualenv/compare/20.26.5...20.26.6">https://github.com/pypa/virtualenv/compare/20.26.5...20.26.6</a></p> <h2>20.26.5</h2> <!-- raw HTML omitted --> <h2>What's Changed</h2> <ul> <li>release 20.26.4 by <a href="https://github.com/gaborbernat"><code>@gaborbernat</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2761">pypa/virtualenv#2761</a></li> <li>Use uv over pip by <a href="https://github.com/gaborbernat"><code>@gaborbernat</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2765">pypa/virtualenv#2765</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pypa/virtualenv/compare/20.26.4...20.26.5">https://github.com/pypa/virtualenv/compare/20.26.4...20.26.5</a></p> <h2>20.26.4</h2> <!-- raw HTML omitted --> <h2>What's Changed</h2> <ul> <li>release 20.26.3 by <a href="https://github.com/gaborbernat"><code>@gaborbernat</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2742">pypa/virtualenv#2742</a></li> <li>Fix whitespace around backticks in changelog by <a href="https://github.com/edmorley"><code>@edmorley</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2751">pypa/virtualenv#2751</a></li> <li>Test latest Python 3.13 by <a href="https://github.com/hugovk"><code>@hugovk</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2752">pypa/virtualenv#2752</a></li> <li>Fix typo in Nushell activation script by <a href="https://github.com/edmorley"><code>@edmorley</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2754">pypa/virtualenv#2754</a></li> <li>GitHub Actions: Replace deprecated macos-12 with macos-13 by <a href="https://github.com/hugovk"><code>@hugovk</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2756">pypa/virtualenv#2756</a></li> <li>Fix <a href="https://redirect.github.com/pypa/virtualenv/issues/2728">#2728</a>: Activating venv create unwanted console output by <a href="https://github.com/ShootGan"><code>@ShootGan</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2748">pypa/virtualenv#2748</a></li> <li>Upgrade bundled wheels by <a href="https://github.com/gaborbernat"><code>@gaborbernat</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2760">pypa/virtualenv#2760</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/ShootGan"><code>@ShootGan</code></a> made their first contribution in <a href="https://redirect.github.com/pypa/virtualenv/pull/2748">pypa/virtualenv#2748</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pypa/virtualenv/compare/20.26.3...20.26.4">https://github.com/pypa/virtualenv/compare/20.26.3...20.26.4</a></p> <h2>20.26.3</h2> <!-- raw HTML omitted --> <h2>What's Changed</h2> <ul> <li>release 20.26.2 by <a href="https://github.com/gaborbernat"><code>@gaborbernat</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2724">pypa/virtualenv#2724</a></li> <li>Bump embeded wheels by <a href="https://github.com/gaborbernat"><code>@gaborbernat</code></a> in <a href="https://redirect.github.com/pypa/virtualenv/pull/2741">pypa/virtualenv#2741</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pypa/virtualenv/compare/20.26.2...20.26.3">https://github.com/pypa/virtualenv/compare/20.26.2...20.26.3</a></p> <h2>20.26.2</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/virtualenv/blob/main/docs/changelog.rst">virtualenv's changelog</a>.</em></p> <blockquote> <h2>v20.26.6 (2024-09-27)</h2> <p>Bugfixes - 20.26.6</p> <pre><code>- Properly quote string placeholders in activation script templates to mitigate potential command injection - by :user:`y5c4l3`. (:issue:`2768`) <h2>v20.26.5 (2024-09-17)</h2> <p>Bugfixes - 20.26.5 </code></pre></p> <ul> <li>Upgrade embedded wheels: setuptools to <code>75.1.0</code> from <code>74.1.2</code> - by :user:<code>gaborbernat</code>. (:issue:<code>2765</code>)</li> </ul> <h2>v20.26.4 (2024-09-07)</h2> <p>Bugfixes - 20.26.4</p> <pre><code>- no longer create `()` output in console during activation of a virtualenv by .bat file. (:issue:`2728`) - Upgrade embedded wheels: <ul> <li>wheel to <code>0.44.0</code> from <code>0.43.0</code></li> <li>pip to <code>24.2</code> from <code>24.1</code></li> <li>setuptools to <code>74.1.2</code> from <code>70.1.0</code> (:issue:<code>2760</code>)</li> </ul> <h2>v20.26.3 (2024-06-21)</h2> <p>Bugfixes - 20.26.3 </code></pre></p> <ul> <li> <p>Upgrade embedded wheels:</p> <ul> <li>setuptools to <code>70.1.0</code> from <code>69.5.1</code></li> <li>pip to <code>24.1</code> from <code>24.0</code> (:issue:<code>2741</code>)</li> </ul> </li> </ul> <h2>v20.26.2 (2024-05-13)</h2> <p>Bugfixes - 20.26.2</p> <pre><code>- ``virtualenv.pyz`` no longer fails when zipapp path contains a symlink - by :user:`HandSonic` and :user:`petamas`. (:issue:`1949`) - Fix bad return code from activate.sh if hashing is disabled - by :user:'fenkes-ibm'. (:issue:`2717`) <h2>v20.26.1 (2024-04-29)</h2> <p>Bugfixes - 20.26.1 </code></pre></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/virtualenv/commit/ec04726d065372ffad9920998aef1ce41252a61d"><code>ec04726</code></a> release 20.26.6</li> <li><a href="https://github.com/pypa/virtualenv/commit/86dddeda7c991f8529e1995bbff280fb7b761972"><code>86ddded</code></a> Fix <a href="https://redirect.github.com/pypa/virtualenv/issues/2768">#2768</a>: Quote template strings in activation scripts (<a href="https://redirect.github.com/pypa/virtualenv/issues/2771">#2771</a>)</li> <li><a href="https://github.com/pypa/virtualenv/commit/6bb3f6226c18d69bb6cfa3475b6d46dd463bb530"><code>6bb3f62</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/pypa/virtualenv/issues/2769">#2769</a>)</li> <li><a href="https://github.com/pypa/virtualenv/commit/220d49c2e3ade2ed24f5712ab5a23895cde2e04c"><code>220d49c</code></a> Bump pypa/gh-action-pypi-publish from 1.10.1 to 1.10.2 (<a href="https://redirect.github.com/pypa/virtualenv/issues/2767">#2767</a>)</li> <li><a href="https://github.com/pypa/virtualenv/commit/cf340c83c2828a92def78c77b3e037a2baa4d557"><code>cf340c8</code></a> Merge pull request <a href="https://redirect.github.com/pypa/virtualenv/issues/2766">#2766</a> from pypa/release-20.26.5</li> <li><a href="https://github.com/pypa/virtualenv/commit/f3172b4da576b88275a14d2e7bbeb98b8f958a05"><code>f3172b4</code></a> release 20.26.5</li> <li><a href="https://github.com/pypa/virtualenv/commit/22b9795eb6bed0c17d0415c5513eca099a0a11ad"><code>22b9795</code></a> Use uv over pip (<a href="https://redirect.github.com/pypa/virtualenv/issues/2765">#2765</a>)</li> <li><a href="https://github.com/pypa/virtualenv/commit/35d8269aba12a1e3c60183a2082b2c4d0cc1192f"><code>35d8269</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/pypa/virtualenv/issues/2764">#2764</a>)</li> <li><a href="https://github.com/pypa/virtualenv/commit/ee77feb77ccb3c5deefa318630c59315bcfda521"><code>ee77feb</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/pypa/virtualenv/issues/2763">#2763</a>)</li> <li><a href="https://github.com/pypa/virtualenv/commit/c5160566293ed098ca30e0856dbf44588dd5c3a3"><code>c516056</code></a> Update README.md</li> <li>Additional commits viewable in <a href="https://github.com/pypa/virtualenv/compare/20.25.0...20.26.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=virtualenv&package-manager=pip&previous-version=20.25.0&new-version=20.26.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/bacalhau-project/bacalhau/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jamil Shamy <[email protected]>
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated indirect dependencies: - Upgraded `github.com/MicahParks/jwkset` to version 0.8.0 - Upgraded `golang.org/x/time` to version 0.9.0 <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR fixes critical issues with BoltDB transaction handling: 1. Panics occurring during context cancellation of long-running transactions 2. Potential transaction leaks in early-return scenarios This issue is reproducible when running a job across hundreds of nodes where transactions can take long and context is timedout. Investigation on why transactions are taking long is handled in a separate issue. Problem 1 - Transaction Panics: - Current implementation attempts to rollback transactions in a separate goroutine when context is cancelled - This creates a race condition as the transaction may still be in use by the main operation - Results in panics when the cursor becomes invalid during concurrent rollback: "panic: runtime error: invalid memory address or nil pointer dereference" - Issue occurs because BoltDB transactions are not thread-safe, but we're trying to access them concurrently Problem 2 - Transaction Leaks: - Current defer pattern only rolls back on error: `if err != nil { _ = tx.Rollback() }` - However, some operations return early without error (e.g., stopping an already stopped job) - This leaves transactions open when they should be cleaned up - Could lead to resource leaks and database locks Solution: 1. For Transaction Panics: - Remove concurrent transaction rollback on context cancellation - Add explicit context cancellation checks at the start of operations - Keep transaction operations in a single goroutine 2. For Transaction Leaks: - Always rollback in defer (our txContext.Rollback() safely handles already committed transactions) - Simplify defer pattern to: ```go defer tx.Rollback() // Safe to call multiple times or after commit // ... operations ... return tx.Commit() ``` This aligns better with BoltDB's design which expects transactions to be handled in a single thread and ensures proper cleanup in all code paths. The changes improve reliability by preventing both race conditions and resource leaks. Example stack trace of the fixed panic: ```go panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x1041dae40] goroutine 308 [running]: go.etcd.io/bbolt.(*Cursor).seek(0x14000e1ea00, {0x14000d0d258, 0x4, 0x8}) /Users/walid/.go/pkg/mod/go.etcd.io/[email protected]/cursor.go:159 +0x70 ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Simplified transaction context management by removing synchronization mechanisms. - Streamlined error handling in database operations, including unconditional rollbacks. - **Bug Fixes** - Added explicit context cancellation checks in database update and view operations. - **Tests** - Updated test suite to verify context handling in database transactions. - Enhanced test cases for canceled and timeout contexts. - Improved transaction context tests to ensure proper rollback behavior under cancellation. - Added rollback expectations in various test cases to reinforce transactional integrity. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Currently, the scheduler can re-approve already approved executions because it only looks at the compute state (AskForBidAccepted) without considering the desired state (Running) that was set by previous schedulers. Additionally, some compute states that should be set by the scheduler (like BidAccepted/BidRejected) are defined but never used, leading to incomplete state transitions. Changes: - Update getApprovalStatuses to consider both compute and desired states when approving executions - Ensure scheduler sets compute states it owns (BidAccepted/BidRejected) - Clean up execution canceller watcher as its functionality is now handled by scheduler The key issue was that state transitions required tracking both the compute state (what is happening) and desired state (what we want to happen). An execution could be in AskForBidAccepted compute state but already have Running desired state, meaning it was already approved but the compute node hadn't acknowledged yet. By checking both states, we prevent duplicate approvals, and by also allowing the scheduler to update compute states will prevent inconsistencies between actual and desired staet. Additionally, compute states like BidAccepted were defined but never set by anyone. This fix ensures the scheduler, which makes bid acceptance decisions, properly sets these states. Example of the fixed behavior: 1. Execution starts in (New, Pending) 2. Message handler sets (AskForBidAccepted, Pending) 3. Scheduler approves by setting (BidAccepted, Running) 4. Compute node completes with (Completed, Stopped) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Enhanced execution state management with detailed tracking of desired and compute states. - Improved logging and traceability for job and execution state transitions. - **Bug Fixes** - Updated state transition logic to accurately represent execution statuses. - Refined handling of job and execution states across different scheduler types. - **Refactoring** - Replaced "stopped" execution states with more precise "cancelled" and "failed" states. - Simplified and generalized execution filtering and grouping mechanisms. - Removed execution canceller watcher, integrating its functionality into existing schedulers. - **Testing** - Added comprehensive test cases for new state management scenarios. - Updated test utilities to support more granular execution state tracking. These changes improve the system's ability to track and manage job and execution states with greater precision and clarity. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Updated logging configuration to set log level earlier in the initialization process - Streamlined log configuration logic to ensure consistent log level application <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This PR refactors the LoggingPlanner to provide more structured and comprehensive debugging information while keeping the code maintainable. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced logging capabilities for job state transitions and execution lifecycle - Added detailed logging for job events, executions, and evaluations - **Improvements** - Refined log message structure to provide more context about job states and revisions - Introduced configurable log levels for better debugging and tracing <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…4827) This PR refactors devstack to support two key features: 1. Allow compute nodes to join an existing orchestrator: - Added --computes flag (alias for --compute-nodes) to specify number of compute nodes - Added --orchestrators flag (alias for --requester-nodes) to specify number of orchestrator nodes - Added --hybrids flag (alias for --hybrid-nodes) to specify hybrid nodes - When no orchestrator nodes are specified and orchestrator address is provided via -c flag, devstack will run compute-only nodes that connect to the external orchestrator 2. Use test configuration as base: - Devstack now uses NewTestConfig() as base configuration - All configuration can be overridden using -c flags (same as bacalhau serve) - Node-specific settings are layered on top of base configuration - Maintains backward compatibility with existing devstack flags This allows for: ```bash # Run orchestrator node bacalhau devstack --orchestrators 1 # Run compute nodes connecting to existing orchestrator bacalhau devstack --computes 3 -c Compute.Orchestrators=127.0.0.1:4222 # Run both with custom config bacalhau devstack --computes 3 --orchestrators 1 -c Compute.AllowListedLocalPaths=/tmp <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Updated CLI command-line flags for devstack configuration with more intuitive naming. - Enhanced configuration setup process with more flexible option handling. - Introduced a new package for organizing related functionalities. - **Refactor** - Simplified devstack node configuration terminology. - Improved configuration management in devstack and utility functions. - Streamlined node setup logic in devstack configuration. - **Chores** - Updated method signatures to support more dynamic configuration options. - Maintained backward compatibility with existing flags. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced telemetry and metrics tracking for publishing, requesting, and subscribing operations - Added detailed performance metrics for async publishing, message processing, and request handling - Introduced new metric attributes and outcome constants for better categorization - **Improvements** - Improved error handling and observability across messaging components - Added latency and performance tracking for various messaging operations - Enhanced metric recording capabilities with new methods for attribute management <!-- end of auto-generated comment: release notes by coderabbit.ai -->
# New Command: `bacalhau agent license inspect` This PR introduces a new command that allows users to inspect the license information of a Bacalhau orchestrator node without exposing the license itself. The command provides a secure way to verify license status, capabilities, and metadata while maintaining the confidentiality of the underlying license file. ### Features - New command: `bacalhau agent license inspect` - Supports multiple output formats (default, JSON, YAML) - Displays key license information: - Product name - License ID - Customer ID - Expiration date - Version - Capabilities - Custom metadata ### Add License Manager for Node Orchestration Introduce a LicenseManager component that handles license validation for orchestrator nodes. Key features: - Validates node count against licensed limits - Simple API for license verification and capability checks Usage example: ```golang // Initialize license manager (config is bacalhau License config) // IF any issues, other than expiry date validity, appears during initialization, it will fail licenseManager, err := licensing.NewLicenseManager(config) // Get current license claims, it will be nil if no license is configured for the orchestrator licenseClaims := licenseManager.License() // Then these helper functions can be called on the licenseClaims struct licenseClaims.IsExpired() // Returns only a boolean licenseClaims.MaxNumberOfNodes() // Returns only a number ``` ### Security Considerations - Does not expose the raw license file or cryptographic material - Only returns parsed, necessary information for verification - Maintains license confidentiality while providing essential details ### License File Structure The license file, fed to the orchestrator node config, uses a JSON format to support future extensibility and dynamic configuration. The structure is intentionally simple: ```json { "license": "your_license_token_here" } ``` We chose JSON format for the license file because: - It allows for easy addition of future configuration options - Provides a structured way to include additional metadata if needed ### Configuration To add a license to an orchestrator, you need to configure your orchestrator node. Here's a sample configuration example: ```yaml NameProvider: "uuid" API: Port: 1234 Orchestrator: Enabled: true Auth: Token: "your_secret_token_here" License: LocalPath: "/path/to/your/license.json" Labels: label1: label1Value label2: label2Value ``` Key configuration points: - The `Orchestrator.License.LocalPath` field specifies the path to your license file - If no license is configured, the command will return an error message saying that no license was configured - If the license if expired, the inspect command will return the same license details, but will not that it is expired. ### Example Usage ```bash # Default format $ bacalhau agent license inspect # JSON format $ bacalhau agent license inspect --output=json # YAML format $ bacalhau agent license inspect --output=yaml ``` ### Example Output For `bacalhau agent license inspect`: ```bash Product = Bacalhau License ID = 2d58c7c9-ec29-45a5-a5cd-cb8f7fee6678 Customer ID = test-customer-id-123 Valid Until = 2045-07-28 Version = v1 Expired = false Capabilities = max_nodes=1 Metadata = someMetadata=valueOfSomeMetadata ``` For `bacalhau agent license inspect --output=yaml`: ```yaml capabilities: max_nodes: "1" customer_id: test-customer-id-123 exp: 2384889682 iat: 1736889682 iss: https://expanso.io/ jti: 2d58c7c9-ec29-45a5-a5cd-cb8f7fee6678 license_id: 2d58c7c9-ec29-45a5-a5cd-cb8f7fee6678 license_type: standard license_version: v1 metadata: someMetadata: valueOfSomeMetadata product: Bacalhau sub: test-customer-id-123 ``` ### Documentation - Added command documentation with usage examples - Included field descriptions in help text Linear: https://linear.app/expanso/issue/ENG-498/license-path-configuration <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added a new CLI command to inspect agent license information. - Introduced a new API endpoint to retrieve agent license details. - Implemented license configuration support for orchestrator nodes. - **Configuration** - Added a new configuration option for specifying local license file path. - Enhanced configuration to support orchestrator settings with license metadata. - **API Enhancements** - Created a new method to retrieve license information via API client. - Updated Swagger documentation to include license-related endpoints. - **Testing** - Added comprehensive integration tests for license inspection scenarios, including expired licenses. - Included test cases for various license configuration states and error handling. - Enhanced tests for validating license output formats and error messages. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Walid Baruni <[email protected]>
existing timeout values are too aggressive and cause lots of timeouts and retries with larger networks <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Configuration Updates** - Increased default acknowledgment wait time from 5 to 30 seconds for publisher configurations - Extended default processing timeout from 5 to 30 seconds for subscriber configurations These changes provide more generous timeout and acknowledgment periods, potentially improving system resilience and message processing reliability. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit Release Notes: - **New Features** - Added execution rate limiting for job scheduling. - Introduced configurable maximum executions per evaluation. - Implemented backoff mechanism for execution limits. - Added method to check for pending work in plans. - **Configuration Updates** - Added new system configuration parameters for rate limiting. - Introduced execution limit and backoff duration settings. - **Performance Improvements** - Enhanced job scheduling with controlled execution rates. - Implemented rate limiter for batch, daemon, and ops job types. - **Testing** - Added comprehensive test coverage for rate limiting scenarios. - Verified rate limit behavior across different job types and node configurations. - Enhanced test suite for `Plan`, `BatchServiceJobScheduler`, `DaemonJobScheduler`, and `OpsJobScheduler` to validate rate limiting functionality. - Introduced new tests for handling various scheduling conditions and rate limits. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced telemetry metrics for message handling - Added new performance tracking metrics for message processing - **Refactor** - Updated method signatures to support metric recording - Restructured metrics tracking from worker-related to message-handling focused - **Chores** - Improved observability of message processing operations - Reorganized constant attributes for better metric management <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated Bacalhau software version from v1.6.1 to v1.6.2 in production and staging environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Bumps [github.com/samber/lo](https://github.com/samber/lo) from 1.47.0 to 1.49.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/samber/lo/releases">github.com/samber/lo's releases</a>.</em></p> <blockquote> <h2>v1.49.0</h2> <h2>What's Changed</h2> <ul> <li>feat: add SampleBy and SamplesBy by <a href="https://github.com/bramvandewalle"><code>@bramvandewalle</code></a> in <a href="https://redirect.github.com/samber/lo/pull/516">samber/lo#516</a></li> <li>feat: Add IsNotNil by <a href="https://github.com/haoxins"><code>@haoxins</code></a> in <a href="https://redirect.github.com/samber/lo/pull/523">samber/lo#523</a></li> <li>feat: Implement ChunkMap Function (<a href="https://redirect.github.com/samber/lo/issues/533">#533</a>) by <a href="https://github.com/oswaldom-code"><code>@oswaldom-code</code></a> in <a href="https://redirect.github.com/samber/lo/pull/538">samber/lo#538</a></li> <li>feat: Add NewThrottle by <a href="https://github.com/Lee-Minjea"><code>@Lee-Minjea</code></a> in <a href="https://redirect.github.com/samber/lo/pull/427">samber/lo#427</a></li> <li>feat: adding FilterSliceToMap by <a href="https://github.com/samber"><code>@samber</code></a> in <a href="https://redirect.github.com/samber/lo/pull/581">samber/lo#581</a></li> <li>feat: add <code>Product</code> and <code>ProductBy</code> functions by <a href="https://github.com/JohnDevitt"><code>@JohnDevitt</code></a> in <a href="https://redirect.github.com/samber/lo/pull/566">samber/lo#566</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/bramvandewalle"><code>@bramvandewalle</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/516">samber/lo#516</a></li> <li><a href="https://github.com/oswaldom-code"><code>@oswaldom-code</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/538">samber/lo#538</a></li> <li><a href="https://github.com/Lee-Minjea"><code>@Lee-Minjea</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/427">samber/lo#427</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/samber/lo/compare/v1.48.0...v1.49.0">https://github.com/samber/lo/compare/v1.48.0...v1.49.0</a></p> <h2>v1.48.0</h2> <h2>What's Changed</h2> <h3>Feature</h3> <ul> <li>feat: add (Min|Max)Index(By) by <a href="https://github.com/aria3ppp"><code>@aria3ppp</code></a> in <a href="https://redirect.github.com/samber/lo/pull/569">samber/lo#569</a></li> <li>feat: add UniqMap by <a href="https://github.com/nicklaus-dev"><code>@nicklaus-dev</code></a> in <a href="https://redirect.github.com/samber/lo/pull/527">samber/lo#527</a></li> <li>feat: add CrossJoin function by <a href="https://github.com/JohnDevitt"><code>@JohnDevitt</code></a> in <a href="https://redirect.github.com/samber/lo/pull/567">samber/lo#567</a></li> <li>feat: Implement CoalesceOrEmptySlice Function by <a href="https://github.com/chg1f"><code>@chg1f</code></a> in <a href="https://redirect.github.com/samber/lo/pull/542">samber/lo#542</a></li> <li>feat: adding WithoutNth by <a href="https://github.com/samber"><code>@samber</code></a> in <a href="https://redirect.github.com/samber/lo/pull/575">samber/lo#575</a></li> <li>feat: deprecate lo.Reverse and move it to lom.Reverse by <a href="https://github.com/samber"><code>@samber</code></a> in <a href="https://redirect.github.com/samber/lo/pull/576">samber/lo#576</a></li> <li>feat: deprecate lo.Shuffle and move it to lom.Shuffle by <a href="https://github.com/samber"><code>@samber</code></a> in <a href="https://github.com/samber/lo/commit/699707a0db372bc44ca5619b6ca61c15f5dc1de6#comments">https://github.com/samber/lo/commit/699707a0db372bc44ca5619b6ca61c15f5dc1de6#comments</a></li> <li>feat: adding lo.BufferWithContext by <a href="https://github.com/samber"><code>@samber</code></a> in <a href="https://redirect.github.com/samber/lo/pull/580">samber/lo#580</a></li> <li>feat: add SliceToSet by <a href="https://github.com/nicklaus-dev"><code>@nicklaus-dev</code></a> in <a href="https://redirect.github.com/samber/lo/pull/514">samber/lo#514</a></li> <li>feat: add WithoutBy by <a href="https://github.com/nicklaus-dev"><code>@nicklaus-dev</code></a> in <a href="https://redirect.github.com/samber/lo/pull/515">samber/lo#515</a></li> <li>feat: add lom.Fill by <a href="https://github.com/samber"><code>@samber</code></a></li> </ul> <h3>Fix</h3> <ul> <li>fix: change examples for MapKeys and MapValues by <a href="https://github.com/luxcgo"><code>@luxcgo</code></a> in <a href="https://redirect.github.com/samber/lo/pull/341">samber/lo#341</a></li> <li>fix: order of GroupBy and PartitionBy by <a href="https://github.com/liyishuai"><code>@liyishuai</code></a> in <a href="https://redirect.github.com/samber/lo/pull/572">samber/lo#572</a></li> </ul> <h3>Refactor</h3> <ul> <li>refactor RandomString function by <a href="https://github.com/pigwantacat"><code>@pigwantacat</code></a> in <a href="https://redirect.github.com/samber/lo/pull/524">samber/lo#524</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/luxcgo"><code>@luxcgo</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/341">samber/lo#341</a></li> <li><a href="https://github.com/haoxins"><code>@haoxins</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/522">samber/lo#522</a></li> <li><a href="https://github.com/muya"><code>@muya</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/521">samber/lo#521</a></li> <li><a href="https://github.com/NathanBaulch"><code>@NathanBaulch</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/519">samber/lo#519</a></li> <li><a href="https://github.com/jiz4oh"><code>@jiz4oh</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/535">samber/lo#535</a></li> <li><a href="https://github.com/guyareco2"><code>@guyareco2</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/537">samber/lo#537</a></li> <li><a href="https://github.com/pigwantacat"><code>@pigwantacat</code></a> made their first contribution in <a href="https://redirect.github.com/samber/lo/pull/524">samber/lo#524</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/samber/lo/commit/8a634a81bf916d5032e5565b9a5c2fb379c35f1e"><code>8a634a8</code></a> bump v1.49.0</li> <li><a href="https://github.com/samber/lo/commit/2bdcacae5e8b8bce66b1049bd4ab8c11c0926529"><code>2bdcaca</code></a> doc: add example for Product and ProductBy</li> <li><a href="https://github.com/samber/lo/commit/1c1dfd9d295b6206abf8dd5d58b2233ce6ebff1e"><code>1c1dfd9</code></a> feat: add Product and ProductBy functions (<a href="https://redirect.github.com/samber/lo/issues/566">#566</a>)</li> <li><a href="https://github.com/samber/lo/commit/124d3004ded9bdca19d68ea28f701710561ae7fa"><code>124d300</code></a> feat: adding FilterSliceToMap (<a href="https://redirect.github.com/samber/lo/issues/581">#581</a>)</li> <li><a href="https://github.com/samber/lo/commit/19d8355ae84e216be311b742a30eba5bd62434c6"><code>19d8355</code></a> Merge branch 'Lee-Minjea-feat/throttle'</li> <li><a href="https://github.com/samber/lo/commit/5cd32660f67539bce86825c9f096f0ea3c38164b"><code>5cd3266</code></a> feat: adding ExampleNewThrottleBy and ExampleNewThrottleByWithCount</li> <li><a href="https://github.com/samber/lo/commit/fdd886509abb4fba371ca5673ce384880486d463"><code>fdd8865</code></a> fix: test case fixed</li> <li><a href="https://github.com/samber/lo/commit/27638ea5b726b3cb126549db614b4e9028992d2c"><code>27638ea</code></a> feat: adding NewThrottle (<a href="https://redirect.github.com/samber/lo/issues/396">#396</a>)</li> <li><a href="https://github.com/samber/lo/commit/d5876775c8ba6237cc0dae3b981f7751feac172f"><code>d587677</code></a> feat: faster ChunkEntries</li> <li><a href="https://github.com/samber/lo/commit/28a4d94ffe45405b3d049b5ea8e4065c2a91cb9c"><code>28a4d94</code></a> Merge branch 'oswaldom-code-feat-adding-chunk-map'</li> <li>Additional commits viewable in <a href="https://github.com/samber/lo/compare/v1.47.0...v1.49.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/samber/lo&package-manager=go_modules&previous-version=1.47.0&new-version=1.49.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Walid Baruni <[email protected]>
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced Docker image workflows now support multi-platform builds and streamlined tagging for efficient image management. - Refined container startup processes for Docker-in-Docker environments, including improved execution checks and progress indicators. - **Documentation** - Added comprehensive guides for both base and Docker-in-Docker images, with clear usage examples, troubleshooting tips, and best practice instructions. - **Chores** - Performed several internal optimizations to improve overall build consistency and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
# Add plain encoding option to S3 Publisher ## Description This PR adds support for publishing job results without gzip compression through a new `Encoding` type. This enables more efficient data pipelines where subsequent jobs need to access individual files from previous job results. Previously, all job results were automatically gzip compressed before uploading to S3. While this is efficient for storage and download, it requires downloading and decompressing the entire archive to access any file. This can be inefficient for workflows like map-reduce where jobs only need specific files from previous results. ### Changes - Added new `Encoding` type with `EncodingGzip` (default) and `EncodingPlain` options - Updated validation to check for valid encoding values - Maintains gzip compression as default behavior for backwards compatibility ## Benefits and Tradeoffs ### Benefits - Enables efficient access to individual files from job results - Better support for data pipelines where jobs consume partial results - No decompression overhead when accessing results ### Tradeoffs - More S3 PUT requests (one per file vs one archive) - Higher storage costs (no compression) - Higher network costs for full result downloads - Cannot use `bacalhau job get` with pre-signed URLs when using plain encoding (requires individual file URLs) ## Usage Recommendations - Use plain encoding when: - Subsequent jobs need to access individual files - Results will be frequently accessed by file - Building data pipelines with partial result access - Keep default gzip encoding when: - Results are typically accessed as a complete set - Storage/transfer costs are a concern - Using `bacalhau job get` functionality <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced S3 publishing now supports flexible encoding options, allowing for compressed archive uploads (gzip) or individual file uploads (plain). - New test cases added to validate different encoding scenarios for both publishing and downloading. - Improved handling of publisher specifications with a focus on encoding types. - **Bug Fixes** - Addressed error handling for invalid encoding values in publisher specifications. - **Documentation** - Updated test cases to reflect changes in encoding handling and improve overall test coverage. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Fixes race conditions when multiple dind containers are started at the same time, which is usually the case with docker-compose <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Enhanced the container startup process for improved reliability. - Introduced a random initial delay to help prevent simultaneous startup issues. - Implemented a robust retry mechanism with better error handling and cleanup for smoother operation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )