diff --git a/.gitignore b/.gitignore index 2f2fd2771..af6161062 100644 --- a/.gitignore +++ b/.gitignore @@ -2,6 +2,8 @@ aider_stdlib_map.md # Local tools and notes .local-tools/ +*.local/ +*.local # Release artifacts (downloaded binaries) release-artifacts/ diff --git a/CLAUDE.local.md b/CLAUDE.local.md index ab3de5c02..0b8fa34c3 100644 --- a/CLAUDE.local.md +++ b/CLAUDE.local.md @@ -1,95 +1,74 @@ - contract states are commutative monoids, they can be "merged" in any order to arrive at the same result. This may reduce some potential race conditions. -## Transport Layer Key Management Issues (2025-01-06) +## Important Testing Notes -### Problem -Integration test `test_put_contract` was failing with "Failed to decrypt packet" errors after v0.1.5 deployment to production. The same decryption failures were affecting the River app in production. +### Always Use Network Mode for Testing +- **NEVER use local mode for testing** - it uses very different code paths +- Local mode bypasses critical networking components that need to be tested +- Always test with `freenet network` to ensure realistic behavior -### Root Cause -The transport layer was incorrectly handling symmetric key establishment for gateway connections: +## Quick Reference - Essential Commands -1. **Gateway connection key misuse**: Gateway was using different keys for inbound/outbound when it should use the same client key for both directions -2. **Client ACK encryption error**: Client was encrypting its final ACK response with the gateway's key instead of its own key -3. **Packet routing overflow**: When existing connection channels became full, packets were misrouted to new gateway connection handlers instead of waiting +### River Development +```bash +# Publish River (use this, not custom scripts) +cd ~/code/freenet/river && RUST_MIN_STACK=16777216 cargo make publish-river-debug -### Key Protocol Rules -- **Gateway connections**: Use the same symmetric key (client's key) for both inbound and outbound communication -- **Peer-to-peer connections**: Use different symmetric keys for each direction (each peer's own inbound key) -- **Connection establishment**: Only initial gateway connections and explicit connect operations should create new connections -- **PUT/GET/SUBSCRIBE/UPDATE operations**: Should only use existing active connections, never create new ones - -### Fixes Applied - -#### 1. Gateway Connection Key Fix (`crates/core/src/transport/connection_handler.rs:578-584`) -```rust -// For gateway connections, use the same key for both directions -let inbound_key = outbound_key.clone(); -let outbound_ack_packet = SymmetricMessage::ack_ok( - &outbound_key, - outbound_key_bytes.try_into().unwrap(), - remote_addr, -)?; -``` - -#### 2. Client ACK Response Fix (`crates/core/src/transport/connection_handler.rs:798-811`) -```rust -// Use our own key to encrypt the ACK response (same key for both directions with gateway) -outbound_packets - .send(( - remote_addr, - SymmetricMessage::ack_ok( - &inbound_sym_key, // Use our own key, not the gateway's - inbound_sym_key_bytes, - remote_addr, - )? - .prepared_send(), - )) - .await - .map_err(|_| TransportError::ChannelClosed)?; -``` - -#### 3. Packet Sending Consistency (`crates/core/src/transport/connection_handler.rs:740-747`) -```rust -let packet_to_send = our_inbound.prepared_send(); -outbound_packets - .send((remote_addr, packet_to_send.clone())) - .await - .map_err(|_| TransportError::ChannelClosed)?; -sent_tracker - .report_sent_packet(SymmetricMessage::FIRST_PACKET_ID, packet_to_send); +# Verify River build time (CRITICAL - only way to confirm new version is served) +curl -s http://127.0.0.1:50509/v1/contract/web/BcfxyjCH4snaknrBoCiqhYc9UFvmiJvhsp5d4L5DuvRa/ | grep -o 'Built: [^<]*' | head -1 ``` -### Testing -- Created specialized transport tests in `crates/core/src/transport/test_gateway_handshake.rs` -- `test_gateway_handshake_symmetric_key_usage()`: Verifies gateway connections use same key for both directions -- `test_peer_to_peer_different_keys()`: Verifies peer-to-peer connections use different keys -- Both specialized tests pass, confirming the transport layer fixes work correctly +### Freenet Management +```bash +# Start Freenet +./target/release/freenet network > freenet-debug.log 2>&1 & -### Root Cause Analysis Complete +# Check status +ps aux | grep freenet | grep -v grep | grep -v tail | grep -v journalctl -#### PUT Operation Connection Creation Issue -**Location**: `crates/core/src/node/network_bridge/p2p_protoc.rs:242-291` - -**Problem**: PUT/GET/SUBSCRIBE/UPDATE operations create new connections when no existing connection is found, violating the protocol rule that these operations should only use existing active connections. - -**Behavior**: When `NetworkBridge.send()` is called and no existing connection exists: -1. System logs warning: "No existing outbound connection, establishing connection first" -2. Creates new connection via `NodeEvent::ConnectPeer` -3. Waits up to 5 seconds for connection establishment -4. Attempts to send message on newly created connection - -**Channel Overflow Root Cause**: Channels fill up due to throughput mismatch: -- **Fast UDP ingress**: Socket receives packets quickly -- **Slow application processing**: `peer_connection_listener` processes one message at a time sequentially -- **Limited buffering**: 100-packet channel buffer insufficient for high-throughput scenarios -- **No flow control**: System creates new connections instead of implementing proper backpressure - -**Cascade Effect**: Channel overflow → packet misrouting → wrong connection handlers → decryption failures → new connection creation - -#### Required Fix -The network bridge should fail gracefully or retry with existing connections instead of creating new ones for PUT/GET/SUBSCRIBE/UPDATE operations. Only initial gateway connections and explicit CONNECT operations should establish new connections. +# Monitor logs +tail -f freenet-debug.log +``` -### Next Steps -1. Modify PUT/GET operation handling to use only existing connections -2. Implement proper backpressure handling for full channels instead of creating new connections -3. Test that integration test `test_put_contract` passes after the fix \ No newline at end of file +## Detailed Documentation Files + +### Current Active Debugging +- **Directory**: `freenet-invitation-bug.local/` (consolidated debugging) + - `README.md` - Overview and quick commands + - `river-notes/` - River-specific debugging documentation + - `contract-test/` - Minimal Rust test to reproduce PUT/GET issue + +### River Invitation Bug (2025-01-18) +- **Status**: CONFIRMED - Contract operations hang on live network, work in integration tests +- **Root Cause**: Freenet node receives WebSocket requests but never responds +- **Test Directory**: `freenet-invitation-bug.local/live-network-test/` +- **Confirmed Findings**: + - River correctly sends PUT/GET requests via WebSocket + - Raw WebSocket test: Receives binary error response from server + - freenet-stdlib test: GET request sent but never receives response (2min timeout) + - Integration test `test_put_contract` passes when run in isolation + - Issue affects both PUT and GET operations +- **Current Investigation**: Systematically debugging why Freenet node doesn't respond to contract operations +- **See**: `freenet-invitation-bug.local/river-notes/invitation-bug-analysis-update.md` + +### Historical Analysis (Reference Only) +- **Transport Layer Issues**: See lines 3-145 in previous version of this file (archived) +- **River Testing Procedures**: See lines 97-145 in previous version of this file (archived) + +### CI Tools +- **GitHub CI Monitoring**: `~/code/agent.scripts/wait-for-ci.sh [PR_NUMBER]` + +### Testing Tools +- **Puppeteer Testing Guide**: `puppeteer-testing-guide.local.md` - Essential patterns for testing Dioxus apps with MCP Puppeteer tools + +## Key Code Locations +- **River Room Creation**: `/home/ian/code/freenet/river/ui/src/components/room_list/create_room_modal.rs` +- **River Room Synchronizer**: `/home/ian/code/freenet/river/ui/src/components/app/freenet_api/room_synchronizer.rs` +- **River Room Data**: `/home/ian/code/freenet/river/ui/src/room_data.rs` + +## Organization Rules +1. **Check this file first** for command reference and active debugging directories +2. **Use standard commands** instead of creating custom scripts +3. **Verify River build timestamps** after publishing +4. **Create timestamped .local directories** for complex debugging sessions +5. **Update this index** when adding new debugging directories or tools \ No newline at end of file diff --git a/apps/freenet-ping/Cargo.lock b/apps/freenet-ping/Cargo.lock index 32182971c..6e5311916 100644 --- a/apps/freenet-ping/Cargo.lock +++ b/apps/freenet-ping/Cargo.lock @@ -1247,7 +1247,7 @@ dependencies = [ [[package]] name = "freenet" -version = "0.1.8" +version = "0.1.11" dependencies = [ "aes-gcm", "ahash", @@ -1326,7 +1326,7 @@ dependencies = [ [[package]] name = "freenet-ping-app" -version = "0.1.0" +version = "0.1.11" dependencies = [ "anyhow", "chrono", @@ -1350,7 +1350,7 @@ dependencies = [ [[package]] name = "freenet-ping-contract" -version = "0.1.0" +version = "0.1.11" dependencies = [ "freenet-ping-types", "freenet-stdlib", @@ -1359,7 +1359,7 @@ dependencies = [ [[package]] name = "freenet-ping-types" -version = "0.1.0" +version = "0.1.11" dependencies = [ "chrono", "clap", diff --git a/ci_test_log.txt b/ci_test_log.txt new file mode 100644 index 000000000..b5435a605 --- /dev/null +++ b/ci_test_log.txt @@ -0,0 +1,5 @@ +{ + "message": "Not Found", + "documentation_url": "https://docs.github.com/rest", + "status": "404" +} diff --git a/crates/core/src/client_events/websocket.rs b/crates/core/src/client_events/websocket.rs index 87a12501e..fb7ace0ac 100644 --- a/crates/core/src/client_events/websocket.rs +++ b/crates/core/src/client_events/websocket.rs @@ -325,7 +325,10 @@ async fn websocket_commands( } }; - ws.on_upgrade(on_upgrade) + // Increase max message size to 100MB to handle contract uploads + // Default is ~64KB which is too small for WASM contracts + ws.max_message_size(100 * 1024 * 1024) + .on_upgrade(on_upgrade) } async fn websocket_interface( diff --git a/crates/core/tests/operations.rs b/crates/core/tests/operations.rs index 7356ed538..90c2552ba 100644 --- a/crates/core/tests/operations.rs +++ b/crates/core/tests/operations.rs @@ -630,11 +630,13 @@ async fn test_multiple_clients_subscription() -> TestResult { } .boxed_local(); - let test = tokio::time::timeout(Duration::from_secs(180), async { - // Wait for nodes to start up - tokio::time::sleep(Duration::from_secs(20)).await; + let test = tokio::time::timeout(Duration::from_secs(600), async { + // Wait for nodes to start up - CI environments need more time + tokio::time::sleep(Duration::from_secs(40)).await; // Connect first client to node A's websocket API + tracing::info!("Starting WebSocket connections after 40s startup wait"); + let start_time = std::time::Instant::now(); let uri_a = format!( "ws://127.0.0.1:{}/v1/contract/command?encodingProtocol=native", ws_api_port_a @@ -655,6 +657,10 @@ async fn test_multiple_clients_subscription() -> TestResult { let mut client_api_node_b = WebApi::start(stream3); // First client puts contract with initial state (without subscribing) + tracing::info!( + "Client 1: Starting PUT operation (elapsed: {:?})", + start_time.elapsed() + ); make_put( &mut client_api1_node_a, wrapped_state.clone(), @@ -666,10 +672,14 @@ async fn test_multiple_clients_subscription() -> TestResult { // Wait for put response loop { let resp = - tokio::time::timeout(Duration::from_secs(60), client_api1_node_a.recv()).await; + tokio::time::timeout(Duration::from_secs(120), client_api1_node_a.recv()).await; match resp { Ok(Ok(HostResponse::ContractResponse(ContractResponse::PutResponse { key }))) => { assert_eq!(key, contract_key, "Contract key mismatch in PUT response"); + tracing::info!( + "Client 1: PUT completed successfully (elapsed: {:?})", + start_time.elapsed() + ); break; } Ok(Ok(other)) => { diff --git a/scripts/deploy-to-gateways.sh b/scripts/deploy-to-gateways.sh index 9372e7d57..416ad948e 100755 --- a/scripts/deploy-to-gateways.sh +++ b/scripts/deploy-to-gateways.sh @@ -175,14 +175,14 @@ compile_for_target() { # Create cross-compiled directory mkdir -p "$CROSS_BINARIES_DIR" - local compile_output - local compile_result + local compile_output="" + local compile_result=0 # Download from GitHub workflow artifacts for both architectures show_progress "Downloading binary from GitHub workflow" "start" - # Get the latest successful workflow run for main branch with timestamp - local run_info=$(gh run list --repo freenet/freenet-core --workflow cross-compile.yml --branch main --status success --limit 1 --json databaseId,createdAt --jq '.[0]') + # Get the latest successful workflow run (any branch) with timestamp + local run_info=$(gh run list --repo freenet/freenet-core --workflow cross-compile.yml --status success --limit 1 --json databaseId,createdAt,headBranch --jq '.[0]') if [ -z "$run_info" ]; then compile_output="Failed to find successful workflow run" @@ -190,6 +190,9 @@ compile_for_target() { else local run_id=$(echo "$run_info" | jq -r '.databaseId') local created_at=$(echo "$run_info" | jq -r '.createdAt') + local branch=$(echo "$run_info" | jq -r '.headBranch') + + log_verbose "Using workflow run $run_id from branch $branch" # Check if artifact is older than 12 hours local current_time=$(date +%s)