-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: enhance distributed consistent hashing configuration and er… #1980
Conversation
WalkthroughThis pull request introduces significant enhancements to Permify's distributed systems and load balancing capabilities. The changes primarily focus on improving the consistent hashing mechanism, adding more granular configuration options for distributed systems, and refactoring the balancer implementation. The version has been updated from v1.2.7 to v1.2.8, reflecting these substantial architectural improvements. Changes
Possibly related PRs
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (25)
pkg/balancer/picker.go (3)
4-7
: Use structured logging instead oflog
package
Consider using a more structured logging approach (likeslog
fromgolang.org/x/exp/slog
) or the logger used elsewhere in the code for consistency and better log correlation.
19-35
: Consider checking for collisions in randomIndex usage
TherandomIndex
function handles error scenarios gracefully by returning 0 on failures. However, be aware that returning 0 might inadvertently bias the picking logic toward the zero index in repeated failures. Consider logging these collisions at higher severity or introducing a retry logic if preventing random-index collisions is essential.🧰 Tools
🪛 golangci-lint (1.62.2)
[warning] 20-20: redefines-builtin-id: redefinition of the built-in function max
(revive)
52-52
: Wrap the error instead of using %v
Switch to using the Go 1.13+ error wrapping style%w
to preserve the original error details.- return balancer.PickResult{}, fmt.Errorf("failed to get closest members: %v", err) + return balancer.PickResult{}, fmt.Errorf("failed to get closest members: %w", err)🧰 Tools
🪛 golangci-lint (1.62.2)
52-52: non-wrapping format verb for fmt.Errorf. Use
%w
to format errors(errorlint)
pkg/balancer/builder.go (6)
4-5
: Avoid excessive imports
Most of these imports look necessary; ensure that any unused imports (especiallyfmt
if purely used for errors) are trimmed based on static analysis or compile-time usage.Also applies to: 8-8, 10-14
24-30
: Document usage constraints
Add a doc comment or example usage for each config field (PartitionCount, ReplicationFactor, Load, PickerWidth), clarifying any upper or lower limit constraints. This can be invaluable when diagnosing runtime misconfigurations.
33-68
: Validate non-zero config fields before applying fallbacks
While default values are applied here, consider proactively logging or warning when zero (or invalid) config values are overridden. This helps detect user misconfiguration earlier.
70-72
: Consider returning an error
NewBuilder
always returns a builder without error. If, in the future, invalid hashing functions or usage constraints become relevant, returning an error might be useful.
97-99
: Preserve domain naming
The builder’sName()
approach is straightforward. If multiple specialized builders are introduced later, consider adding domain-specific suffixes or clarifications to avoid collisions.
100-114
: Consider dynamic config updates
Build
is only called once, butParseConfig
might be called repeatedly. Ensure that any dynamic updates inParseConfig
are gracefully handled by the constructed balancer.pkg/balancer/balancer.go (3)
1-2
: File-level overview
This file introduces a top-levelBalancer
controlling subConns, hashing config, and states. The struct is large; consider grouping logically related fields into smaller, self-contained types or sub-structs to enhance readability.
197-248
: Consolidate transient failure handling
UpdateSubConnState
transitions to transient failure in multiple places. Ensure consistency in the approach (error logging, last error tracking) and consider grouping repeated logic into a helper method.
250-250
: ImplementClose
or add a TODO
An emptyClose()
can confuse maintainers. If no teardown is required, add a comment clarifying that no resources are held. Otherwise, implement close logic for subConns or the hashing ring if needed.pkg/consistent/consistent.go (4)
83-84
: Consider storing the interface directly instead of a pointer to the interfaceCurrently, the code uses
map[string]*Member
formembers
. Storing a pointer to an interface is unusual in Go and can cause confusion. It’s more standard to store the interface type directly, as Go already manages interface references internally:- members map[string]*Member + members map[string]Member
201-203
: Avoid copying a pointer to an interface—prefer direct usageRetrieving
currentMember := *c.ring[currentHash]
creates a new interface value by dereferencing the pointer to an interface. This can be misleading. Consider removing the pointer indirection and storing the interface itself. This simplifies usage and avoids partial copies.
194-195
: Reconsider panicking on insufficient capacityWhen partition distribution fails, the code panics, which can abruptly terminate the application. Returning an error or providing a configurable fallback may be more appropriate for production systems needing graceful error handling.
430-445
: Pre-allocate capacity when building the member hash listAn optional optimization is to pre-allocate
memberHashes
since you already know its final size (len(c.members)
). This might reduce the overhead of repeated allocations:- var memberHashes []uint64 + memberHashes := make([]uint64, 0, len(c.members))🧰 Tools
🪛 golangci-lint (1.62.2)
430-430: Consider pre-allocating
memberHashes
(prealloc)
pkg/balancer/builder_test.go (1)
8-26
: Good coverage for JSON output; consider flexible matchingThe test strictly compares the entire JSON string, including ordering and formatting. Slight reformatting breaks the test. Consider parsing and comparing individual fields or using a library that handles ordering/whitespace differences. For example:
var got map[string]interface{} json.Unmarshal([]byte(jsonString), &got) Expect(got["loadBalancingConfig"]).To(Equal(...))pkg/consistent/consistent_test.go (5)
32-38
: Consider using a more robust hash function for testing.The current hash function implementation is overly simplistic and may not provide good distribution characteristics. Consider using a well-known hash function implementation for testing.
- hasher = func(data []byte) uint64 { - var hash uint64 - for _, b := range data { - hash = hash*31 + uint64(b) - } - return hash - } + hasher = func(data []byte) uint64 { + const fnvPrime = uint64(1099511628211) + const fnvOffset = uint64(14695981039346656037) + hash := fnvOffset + for _, b := range data { + hash *= fnvPrime + hash ^= uint64(b) + } + return hash + }
40-46
: Document the rationale behind configuration values.The configuration values (271, 20, 1.25, 1) appear to be magic numbers. Consider adding comments explaining the reasoning behind these specific values.
config = Config{ Hasher: hasher, + // Use a prime number for better distribution PartitionCount: 271, + // Balance between redundancy and overhead ReplicationFactor: 20, + // Allow 25% more load than average Load: 1.25, + // Single picker for testing simplicity PickerWidth: 1, }
60-85
: Add more edge cases to member management tests.While the basic operations are covered, consider adding tests for:
- Adding nil or invalid members
- Removing non-existent members
- Adding members with duplicate string representations
- Concurrent member operations
87-119
: Enhance partition management test coverage.Consider adding tests for:
- Partition redistribution after member removal
- Partition stability during member changes
- Key consistency across member changes
- Edge cases with minimum and maximum loads
121-128
: Expand error handling test coverage.The error handling tests only cover requesting more members than available. Consider adding tests for:
- Invalid key types
- Zero or negative member requests
- Nil keys
- Operations on empty hash ring
internal/engines/balancer/balancer.go (1)
125-125
: Consider keeping debug level for request forwarding logs.Changing the log level from debug to info might lead to excessive logging in production. Request forwarding is typically a debug-level concern unless needed for audit purposes.
- slog.InfoContext(ctx, "Forwarding request with key to the underlying client") + slog.DebugContext(ctx, "Forwarding request with key to the underlying client")pkg/cmd/serve.go (1)
137-140
: LGTM! Consider updating documentation.The new distributed hashing configuration flags provide essential parameters for fine-tuning the consistent hashing mechanism. These parameters will help in optimizing data distribution, fault tolerance, and load balancing.
Consider adding documentation that explains:
- Recommended values or ranges for each parameter
- Impact of each parameter on system behavior
- Trade-offs between different parameter combinations
pkg/cmd/config.go (1)
105-108
: Add new distributed configuration flags to the table output.The new distributed hashing configuration flags are properly integrated into the configuration system. However, they should also be added to the configuration table output for consistency and visibility.
Add the following entries to the
data
slice in theconf()
function:[]string{"distributed.enabled", fmt.Sprintf("%v", cfg.Distributed.Enabled), getKeyOrigin(cmd, "distributed-enabled", "PERMIFY_DISTRIBUTED_ENABLED")}, []string{"distributed.address", cfg.Distributed.Address, getKeyOrigin(cmd, "distributed-address", "PERMIFY_DISTRIBUTED_ADDRESS")}, []string{"distributed.port", cfg.Distributed.Port, getKeyOrigin(cmd, "distributed-port", "PERMIFY_DISTRIBUTED_PORT")}, +[]string{"distributed.partition_count", fmt.Sprintf("%v", cfg.Distributed.PartitionCount), getKeyOrigin(cmd, "distributed-partition-count", "PERMIFY_DISTRIBUTED_PARTITION_COUNT")}, +[]string{"distributed.replication_factor", fmt.Sprintf("%v", cfg.Distributed.ReplicationFactor), getKeyOrigin(cmd, "distributed-replication-factor", "PERMIFY_DISTRIBUTED_REPLICATION_FACTOR")}, +[]string{"distributed.load", fmt.Sprintf("%v", cfg.Distributed.Load), getKeyOrigin(cmd, "distributed-load", "PERMIFY_DISTRIBUTED_LOAD")}, +[]string{"distributed.picker_width", fmt.Sprintf("%v", cfg.Distributed.PickerWidth), getKeyOrigin(cmd, "distributed-picker-width", "PERMIFY_DISTRIBUTED_PICKER_WIDTH")},
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pkg/pb/base/v1/openapi.pb.go
is excluded by!**/*.pb.go
📒 Files selected for processing (24)
cmd/permify/permify.go
(2 hunks)docs/api-reference/apidocs.swagger.json
(1 hunks)docs/api-reference/openapiv2/apidocs.swagger.json
(1 hunks)internal/config/config.go
(1 hunks)internal/engines/balancer/balancer.go
(4 hunks)internal/info.go
(1 hunks)pkg/balancer/balancer.go
(1 hunks)pkg/balancer/balancer_test.go
(1 hunks)pkg/balancer/builder.go
(1 hunks)pkg/balancer/builder_test.go
(1 hunks)pkg/balancer/errors.go
(0 hunks)pkg/balancer/hashring.go
(0 hunks)pkg/balancer/manager.go
(0 hunks)pkg/balancer/manager_test.go
(0 hunks)pkg/balancer/picker.go
(1 hunks)pkg/balancer/picker_test.go
(0 hunks)pkg/balancer/queue.go
(0 hunks)pkg/balancer/queue_test.go
(0 hunks)pkg/cmd/config.go
(1 hunks)pkg/cmd/flags/serve.go
(1 hunks)pkg/cmd/serve.go
(2 hunks)pkg/consistent/consistent.go
(1 hunks)pkg/consistent/consistent_test.go
(1 hunks)proto/base/v1/openapi.proto
(1 hunks)
💤 Files with no reviewable changes (7)
- pkg/balancer/errors.go
- pkg/balancer/manager.go
- pkg/balancer/queue_test.go
- pkg/balancer/queue.go
- pkg/balancer/manager_test.go
- pkg/balancer/picker_test.go
- pkg/balancer/hashring.go
✅ Files skipped from review due to trivial changes (5)
- pkg/balancer/balancer_test.go
- proto/base/v1/openapi.proto
- internal/info.go
- docs/api-reference/openapiv2/apidocs.swagger.json
- docs/api-reference/apidocs.swagger.json
🧰 Additional context used
🪛 golangci-lint (1.62.2)
pkg/balancer/balancer.go
80-80: Error return value is not checked
(errcheck)
114-114: SA1019: b.clientConn.NewSubConn is deprecated: please be aware that in a future version, SubConns will only support one address per SubConn.
(staticcheck)
139-139: Error return value is not checked
(errcheck)
145-145: SA1019: b.clientConn.RemoveSubConn is deprecated: use SubConn.Shutdown instead.
(staticcheck)
pkg/balancer/picker.go
[warning] 20-20: redefines-builtin-id: redefinition of the built-in function max
(revive)
52-52: non-wrapping format verb for fmt.Errorf. Use %w
to format errors
(errorlint)
pkg/consistent/consistent.go
[high] 118-118: G115: integer overflow conversion int -> uint64
(gosec)
[high] 251-251: G115: integer overflow conversion uint64 -> int
(gosec)
[high] 372-372: G115: integer overflow conversion uint64 -> int
(gosec)
430-430: Consider pre-allocating memberHashes
(prealloc)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Test with Coverage
🔇 Additional comments (17)
pkg/balancer/picker.go (2)
14-16
: Ensure thread safety or clarify concurrency model
Thepicker
struct maintains a reference to*consistent.Consistent
, but there's no mention of concurrency management. Confirm that all associated methods are safe for concurrent access or document the expected usage pattern.
38-71
: Verify stable fallback in casemembers
is empty or has unexpected types
The code gracefully handles empty slices (len(members) == 0
) and type assertions. Ensure that any future changes to theconsistent.Consistent
implementation continue to provide a consistent type formembers[index]
. Additionally, consider adding unit tests covering these fallback paths.🧰 Tools
🪛 golangci-lint (1.62.2)
52-52: non-wrapping format verb for fmt.Errorf. Use
%w
to format errors(errorlint)
pkg/balancer/builder.go (4)
15-21
: Naming consistency
TheName
andKey
constants are well-defined, but ensure that references to them are consistent throughout the codebase; for example, references outside builder code should use the same constants.
75-83
: Reuse the subConn name concept elsewhere
ConsistentMember
includes bothSubConn
and aname
. Consider ensuring that subConn naming is standardized across the entire balancer package to help debug and log consistently.
84-89
: Lock usage
The embeddedsync.Mutex
inbuilder
is a good step toward concurrency safety. Verify that all read/write operations onconfig
andhasher
consistently acquire the mutex to avoid race conditions.
91-95
: Interface clarity
Merging multiple balancer interfaces into a singleBuilder
interface is neat. Maintain thorough doc comments clarifying the responsibilities for each embedded interface method.pkg/balancer/balancer.go (5)
3-7
: Imported logging package
You’re usinglog/slog
for structured logging, which is consistent with other areas. Keep in mind that older loggers in sibling files might benefit from adoptingslog
for uniformity.
16-49
: Thread safety
Many fields here (likesubConnStates
,config
, etc.) can be updated frequently. Confirm that external concurrency or internal synchronization is used when accessing/modifying them.
51-67
: Graceful fallback in resolver errors
The logic transitions toTransientFailure
upon resolver errors if there are no active addresses. This is desirable. Test coverage for repeated resolver errors should confirm that the state transitions remain consistent over time.
69-195
: Deprecation warning withNewSubConn
/RemoveSubConn
Per static analysis hints,NewSubConn
andRemoveSubConn
are marked as deprecated in future gRPC versions. Gather more info on the recommended approach—specifically,SubConn.Shutdown()
—and consider upgrading to ensure forward compatibility.🧰 Tools
🪛 golangci-lint (1.62.2)
80-80: Error return value is not checked
(errcheck)
114-114: SA1019: b.clientConn.NewSubConn is deprecated: please be aware that in a future version, SubConns will only support one address per SubConn.
(staticcheck)
139-139: Error return value is not checked
(errcheck)
145-145: SA1019: b.clientConn.RemoveSubConn is deprecated: use SubConn.Shutdown instead.
(staticcheck)
136-151
: Check error return value fromconsistent.Remove
Ifconsistent.Remove
returns an error or a boolean status, consider capturing it for logging. If it never fails, confirm that’s documented inconsistent
package.🧰 Tools
🪛 golangci-lint (1.62.2)
139-139: Error return value is not checked
(errcheck)
145-145: SA1019: b.clientConn.RemoveSubConn is deprecated: use SubConn.Shutdown instead.
(staticcheck)
cmd/permify/permify.go (1)
17-17
: xxhash usage looks correctRegistering the balancer with
xxhash.Sum64
is a suitable approach for performance and consistency. Ensurexxhash
is pinned to a safe and well-tested version, especially if it’s critical to replicating state across nodes.internal/engines/balancer/balancer.go (3)
48-48
: LGTM! Enhanced TLS configuration check.The updated TLS check is more comprehensive by including the
Enabled
flag along with certificate paths.
58-68
: LGTM! Well-structured balancer configuration.The balancer configuration is properly structured with clear field names and appropriate error handling.
121-121
: Document the rationale for increasing the timeout.The timeout has been doubled from 2 to 4 seconds. Please add a comment explaining why this increase was necessary.
pkg/cmd/flags/serve.go (1)
567-593
: LGTM! Well-structured flag bindings.The new flag bindings for distributed configuration follow the established pattern and include both command-line flags and environment variables.
pkg/cmd/serve.go (1)
457-461
: Discuss the removal of caching in distributed mode.The removal of caching for the distributed checker could potentially impact performance. Please clarify:
- Why was caching removed in distributed mode?
- What are the performance implications?
- Is there an alternative caching strategy being implemented?
If caching is still beneficial in distributed mode, consider keeping it with appropriate cache invalidation strategies across nodes.
partitionCount: uint64(config.PartitionCount), | ||
ring: make(map[uint64]*Member), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Consider adding basic validation for PartitionCount range
While the current usage shows safe values (default: 271), adding basic validation would improve robustness:
if config.PartitionCount <= 0 {
config.PartitionCount = DefaultPartitionCount
}
This simpler validation is sufficient because:
- PartitionCount represents virtual partitions in a hash ring where extremely large values would be impractical
- All observed usage patterns show small, reasonable values
- The conversion between int and uint64 is safe for the typical use case
🔗 Analysis chain
Check for potential integer overflows when converting between int and uint64
The static analysis flags lines where PartitionCount
is converted from int
to uint64
(line 118), and where uint64
is converted back to int
(lines 251, 372). Although these conversions are common in this context, consider validating the range of PartitionCount
to avoid unintentional overflows if values exceed math.MaxInt64
or if negative values are passed:
if config.PartitionCount < 0 || config.PartitionCount > math.MaxInt64 {
panic("PartitionCount out of acceptable range")
}
Also applies to: 251-251, 372-372
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
# Find Config struct definition and PartitionCount field
ast-grep --pattern 'type Config struct {
$$$
PartitionCount $_
$$$
}'
# Find all occurrences of PartitionCount validation or assignments
rg "PartitionCount" -A 2 -B 2
# Look for any existing validation patterns
rg "if.*PartitionCount.*[<>=]" -A 2 -B 2
Length of output: 9516
🧰 Tools
🪛 golangci-lint (1.62.2)
[high] 118-118: G115: integer overflow conversion int -> uint64
(gosec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
pkg/balancer/picker_test.go (3)
39-43
: Consider using a simpler hash function for testing.While SHA-256 is cryptographically secure, it might introduce unnecessary computational overhead for testing purposes. Consider using a simpler hash function like FNV-1a or xxHash for better test performance.
- // Custom hasher using SHA-256 for consistent hashing - hasher = func(data []byte) uint64 { - hash := sha256.Sum256(data) - return binary.BigEndian.Uint64(hash[:8]) // Use the first 8 bytes as the hash - } + // Simple hasher using FNV-1a for testing + hasher = func(data []byte) uint64 { + h := fnv.New64a() + h.Write(data) + return h.Sum64() + }
47-52
: Document configuration parameters.The consistent hashing configuration parameters would benefit from documentation explaining their purpose and impact.
c = consistent.New(consistent.Config{ - Hasher: hasher, - PartitionCount: 100, - ReplicationFactor: 2, - Load: 1.5, + // Function used to generate hash values for keys + Hasher: hasher, + // Number of partitions in the hash ring + PartitionCount: 100, + // Number of virtual nodes per member for better distribution + ReplicationFactor: 2, + // Maximum load factor before rebalancing + Load: 1.5, })
65-102
: Add test cases for edge cases and boundary conditions.While the current test cases cover basic functionality, consider adding tests for:
- Maximum width exceeded
- Single member in the ring
- Member removal during pick operation
- Concurrent picks
- Invalid key types
Example test case for maximum width:
It("should handle width larger than available members", func() { p := &picker{ consistent: c, width: 5, // More than available members } result, err := p.Pick(balancer.PickInfo{Ctx: testCtx}) Expect(err).To(MatchError("failed to get closest members: not enough members to satisfy the request")) Expect(result.SubConn).To(BeNil()) })
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/balancer/picker_test.go
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Test with Coverage
- GitHub Check: Analyze (go)
- GitHub Check: Test with Coverage
🔇 Additional comments (2)
pkg/balancer/picker_test.go (2)
15-30
: LGTM! Well-structured mock types for testing.The new
mockConnection
andtestMember
types are well-designed and provide a clean interface for testing the consistent hashing mechanism.
Line range hint
1-102
: Implementation aligns well with PR objectives.The changes successfully enhance the distributed consistent hashing configuration and error handling:
- More granular configuration options through
consistent.Config
- Comprehensive error handling test coverage
- Clear separation of concerns in the test structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/balancer/builder.go (1)
33-68
: Consider extracting default value logic to avoid duplication.The default value logic is duplicated in both
ServiceConfigJSON
andParseConfig
methods. Consider extracting it to a separate method to maintain DRY principles and ensure consistency.+// applyDefaults applies default values to zero or invalid fields +func (c *Config) applyDefaults() { + if c.PartitionCount <= 0 { + c.PartitionCount = consistent.DefaultPartitionCount + } + if c.ReplicationFactor <= 0 { + c.ReplicationFactor = consistent.DefaultReplicationFactor + } + if c.Load <= 0 { + c.Load = consistent.DefaultLoad + } + if c.PickerWidth < 1 { + c.PickerWidth = consistent.DefaultPickerWidth + } +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/balancer/builder.go
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Test with Coverage
- GitHub Check: Analyze (go)
- GitHub Check: Test with Coverage
🔇 Additional comments (4)
pkg/balancer/builder.go (4)
4-21
: LGTM! Well-organized imports and constants.Good practice to define package-level constants for the balancer name and hash key. This centralizes string literals and improves maintainability.
24-30
: LGTM! Well-documented configuration struct.The Config struct is well-documented with clear field descriptions and appropriate JSON tags.
75-114
: LGTM! Well-structured builder implementation.The builder implementation shows good practices:
- Proper thread safety with mutex
- Clean interface composition
- Clear member representation
116-149
: LGTM! Good error handling and logging practices.The ParseConfig implementation shows good practices:
- Proper error wrapping
- Structured logging with both raw and parsed config
- Thread-safe configuration updates
…ror handling
Summary by CodeRabbit
Release Notes for Permify v1.2.8
New Features
Improvements
Version Update