Skip to content

fix: sync WebSocket max message size fix from v0.1.12 #1678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ aider_stdlib_map.md

# Local tools and notes
.local-tools/
*.local/
*.local

# Release artifacts (downloaded binaries)
release-artifacts/
Expand Down
147 changes: 63 additions & 84 deletions CLAUDE.local.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,74 @@
- contract states are commutative monoids, they can be "merged" in any order to arrive at the same result. This may reduce some potential race conditions.

## Transport Layer Key Management Issues (2025-01-06)
## Important Testing Notes

### Problem
Integration test `test_put_contract` was failing with "Failed to decrypt packet" errors after v0.1.5 deployment to production. The same decryption failures were affecting the River app in production.
### Always Use Network Mode for Testing
- **NEVER use local mode for testing** - it uses very different code paths
- Local mode bypasses critical networking components that need to be tested
- Always test with `freenet network` to ensure realistic behavior

### Root Cause
The transport layer was incorrectly handling symmetric key establishment for gateway connections:
## Quick Reference - Essential Commands

1. **Gateway connection key misuse**: Gateway was using different keys for inbound/outbound when it should use the same client key for both directions
2. **Client ACK encryption error**: Client was encrypting its final ACK response with the gateway's key instead of its own key
3. **Packet routing overflow**: When existing connection channels became full, packets were misrouted to new gateway connection handlers instead of waiting
### River Development
```bash
# Publish River (use this, not custom scripts)
cd ~/code/freenet/river && RUST_MIN_STACK=16777216 cargo make publish-river-debug

### Key Protocol Rules
- **Gateway connections**: Use the same symmetric key (client's key) for both inbound and outbound communication
- **Peer-to-peer connections**: Use different symmetric keys for each direction (each peer's own inbound key)
- **Connection establishment**: Only initial gateway connections and explicit connect operations should create new connections
- **PUT/GET/SUBSCRIBE/UPDATE operations**: Should only use existing active connections, never create new ones

### Fixes Applied

#### 1. Gateway Connection Key Fix (`crates/core/src/transport/connection_handler.rs:578-584`)
```rust
// For gateway connections, use the same key for both directions
let inbound_key = outbound_key.clone();
let outbound_ack_packet = SymmetricMessage::ack_ok(
&outbound_key,
outbound_key_bytes.try_into().unwrap(),
remote_addr,
)?;
```

#### 2. Client ACK Response Fix (`crates/core/src/transport/connection_handler.rs:798-811`)
```rust
// Use our own key to encrypt the ACK response (same key for both directions with gateway)
outbound_packets
.send((
remote_addr,
SymmetricMessage::ack_ok(
&inbound_sym_key, // Use our own key, not the gateway's
inbound_sym_key_bytes,
remote_addr,
)?
.prepared_send(),
))
.await
.map_err(|_| TransportError::ChannelClosed)?;
```

#### 3. Packet Sending Consistency (`crates/core/src/transport/connection_handler.rs:740-747`)
```rust
let packet_to_send = our_inbound.prepared_send();
outbound_packets
.send((remote_addr, packet_to_send.clone()))
.await
.map_err(|_| TransportError::ChannelClosed)?;
sent_tracker
.report_sent_packet(SymmetricMessage::FIRST_PACKET_ID, packet_to_send);
# Verify River build time (CRITICAL - only way to confirm new version is served)
curl -s http://127.0.0.1:50509/v1/contract/web/BcfxyjCH4snaknrBoCiqhYc9UFvmiJvhsp5d4L5DuvRa/ | grep -o 'Built: [^<]*' | head -1
```

### Testing
- Created specialized transport tests in `crates/core/src/transport/test_gateway_handshake.rs`
- `test_gateway_handshake_symmetric_key_usage()`: Verifies gateway connections use same key for both directions
- `test_peer_to_peer_different_keys()`: Verifies peer-to-peer connections use different keys
- Both specialized tests pass, confirming the transport layer fixes work correctly
### Freenet Management
```bash
# Start Freenet
./target/release/freenet network > freenet-debug.log 2>&1 &

### Root Cause Analysis Complete
# Check status
ps aux | grep freenet | grep -v grep | grep -v tail | grep -v journalctl

#### PUT Operation Connection Creation Issue
**Location**: `crates/core/src/node/network_bridge/p2p_protoc.rs:242-291`

**Problem**: PUT/GET/SUBSCRIBE/UPDATE operations create new connections when no existing connection is found, violating the protocol rule that these operations should only use existing active connections.

**Behavior**: When `NetworkBridge.send()` is called and no existing connection exists:
1. System logs warning: "No existing outbound connection, establishing connection first"
2. Creates new connection via `NodeEvent::ConnectPeer`
3. Waits up to 5 seconds for connection establishment
4. Attempts to send message on newly created connection

**Channel Overflow Root Cause**: Channels fill up due to throughput mismatch:
- **Fast UDP ingress**: Socket receives packets quickly
- **Slow application processing**: `peer_connection_listener` processes one message at a time sequentially
- **Limited buffering**: 100-packet channel buffer insufficient for high-throughput scenarios
- **No flow control**: System creates new connections instead of implementing proper backpressure

**Cascade Effect**: Channel overflow → packet misrouting → wrong connection handlers → decryption failures → new connection creation

#### Required Fix
The network bridge should fail gracefully or retry with existing connections instead of creating new ones for PUT/GET/SUBSCRIBE/UPDATE operations. Only initial gateway connections and explicit CONNECT operations should establish new connections.
# Monitor logs
tail -f freenet-debug.log
```

### Next Steps
1. Modify PUT/GET operation handling to use only existing connections
2. Implement proper backpressure handling for full channels instead of creating new connections
3. Test that integration test `test_put_contract` passes after the fix
## Detailed Documentation Files

### Current Active Debugging
- **Directory**: `freenet-invitation-bug.local/` (consolidated debugging)
- `README.md` - Overview and quick commands
- `river-notes/` - River-specific debugging documentation
- `contract-test/` - Minimal Rust test to reproduce PUT/GET issue

### River Invitation Bug (2025-01-18)
- **Status**: CONFIRMED - Contract operations hang on live network, work in integration tests
- **Root Cause**: Freenet node receives WebSocket requests but never responds
- **Test Directory**: `freenet-invitation-bug.local/live-network-test/`
- **Confirmed Findings**:
- River correctly sends PUT/GET requests via WebSocket
- Raw WebSocket test: Receives binary error response from server
- freenet-stdlib test: GET request sent but never receives response (2min timeout)
- Integration test `test_put_contract` passes when run in isolation
- Issue affects both PUT and GET operations
- **Current Investigation**: Systematically debugging why Freenet node doesn't respond to contract operations
- **See**: `freenet-invitation-bug.local/river-notes/invitation-bug-analysis-update.md`

### Historical Analysis (Reference Only)
- **Transport Layer Issues**: See lines 3-145 in previous version of this file (archived)
- **River Testing Procedures**: See lines 97-145 in previous version of this file (archived)

### CI Tools
- **GitHub CI Monitoring**: `~/code/agent.scripts/wait-for-ci.sh [PR_NUMBER]`

### Testing Tools
- **Puppeteer Testing Guide**: `puppeteer-testing-guide.local.md` - Essential patterns for testing Dioxus apps with MCP Puppeteer tools

## Key Code Locations
- **River Room Creation**: `/home/ian/code/freenet/river/ui/src/components/room_list/create_room_modal.rs`
- **River Room Synchronizer**: `/home/ian/code/freenet/river/ui/src/components/app/freenet_api/room_synchronizer.rs`
- **River Room Data**: `/home/ian/code/freenet/river/ui/src/room_data.rs`

## Organization Rules
1. **Check this file first** for command reference and active debugging directories
2. **Use standard commands** instead of creating custom scripts
3. **Verify River build timestamps** after publishing
4. **Create timestamped .local directories** for complex debugging sessions
5. **Update this index** when adding new debugging directories or tools
8 changes: 4 additions & 4 deletions apps/freenet-ping/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions ci_test_log.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"message": "Not Found",
"documentation_url": "https://docs.github.com/rest",
"status": "404"
}
5 changes: 4 additions & 1 deletion crates/core/src/client_events/websocket.rs
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,10 @@ async fn websocket_commands(
}
};

ws.on_upgrade(on_upgrade)
// Increase max message size to 100MB to handle contract uploads
// Default is ~64KB which is too small for WASM contracts
ws.max_message_size(100 * 1024 * 1024)
.on_upgrade(on_upgrade)
}

async fn websocket_interface(
Expand Down
18 changes: 14 additions & 4 deletions crates/core/tests/operations.rs
Original file line number Diff line number Diff line change
Expand Up @@ -630,11 +630,13 @@ async fn test_multiple_clients_subscription() -> TestResult {
}
.boxed_local();

let test = tokio::time::timeout(Duration::from_secs(180), async {
// Wait for nodes to start up
tokio::time::sleep(Duration::from_secs(20)).await;
let test = tokio::time::timeout(Duration::from_secs(600), async {
// Wait for nodes to start up - CI environments need more time
tokio::time::sleep(Duration::from_secs(40)).await;

// Connect first client to node A's websocket API
tracing::info!("Starting WebSocket connections after 40s startup wait");
let start_time = std::time::Instant::now();
let uri_a = format!(
"ws://127.0.0.1:{}/v1/contract/command?encodingProtocol=native",
ws_api_port_a
Expand All @@ -655,6 +657,10 @@ async fn test_multiple_clients_subscription() -> TestResult {
let mut client_api_node_b = WebApi::start(stream3);

// First client puts contract with initial state (without subscribing)
tracing::info!(
"Client 1: Starting PUT operation (elapsed: {:?})",
start_time.elapsed()
);
make_put(
&mut client_api1_node_a,
wrapped_state.clone(),
Expand All @@ -666,10 +672,14 @@ async fn test_multiple_clients_subscription() -> TestResult {
// Wait for put response
loop {
let resp =
tokio::time::timeout(Duration::from_secs(60), client_api1_node_a.recv()).await;
tokio::time::timeout(Duration::from_secs(120), client_api1_node_a.recv()).await;
match resp {
Ok(Ok(HostResponse::ContractResponse(ContractResponse::PutResponse { key }))) => {
assert_eq!(key, contract_key, "Contract key mismatch in PUT response");
tracing::info!(
"Client 1: PUT completed successfully (elapsed: {:?})",
start_time.elapsed()
);
break;
}
Ok(Ok(other)) => {
Expand Down
11 changes: 7 additions & 4 deletions scripts/deploy-to-gateways.sh
Original file line number Diff line number Diff line change
Expand Up @@ -175,21 +175,24 @@ compile_for_target() {
# Create cross-compiled directory
mkdir -p "$CROSS_BINARIES_DIR"

local compile_output
local compile_result
local compile_output=""
local compile_result=0

# Download from GitHub workflow artifacts for both architectures
show_progress "Downloading binary from GitHub workflow" "start"

# Get the latest successful workflow run for main branch with timestamp
local run_info=$(gh run list --repo freenet/freenet-core --workflow cross-compile.yml --branch main --status success --limit 1 --json databaseId,createdAt --jq '.[0]')
# Get the latest successful workflow run (any branch) with timestamp
local run_info=$(gh run list --repo freenet/freenet-core --workflow cross-compile.yml --status success --limit 1 --json databaseId,createdAt,headBranch --jq '.[0]')

if [ -z "$run_info" ]; then
compile_output="Failed to find successful workflow run"
compile_result=1
else
local run_id=$(echo "$run_info" | jq -r '.databaseId')
local created_at=$(echo "$run_info" | jq -r '.createdAt')
local branch=$(echo "$run_info" | jq -r '.headBranch')

log_verbose "Using workflow run $run_id from branch $branch"

# Check if artifact is older than 12 hours
local current_time=$(date +%s)
Expand Down