Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF Error when running jobs with dev private key #515

Open
PBillingsby opened this issue Feb 18, 2025 · 2 comments
Open

EOF Error when running jobs with dev private key #515

PBillingsby opened this issue Feb 18, 2025 · 2 comments

Comments

@PBillingsby
Copy link
Contributor

Describe the bug

Before the release of Lilypad v2.12, I was able to successfully run jobs on the CLI using my regular development private key. However, after updating to v2.12, attempting to run jobs resulted in an EOF error. I checked the RP logs but the job doesn't reach the RP.

I created a new development private key, retried the same command, and it worked fine. This suggests that something changed in how private keys are handled in v2.12, possibly affecting previously used keys.

Note: The Severity of this bug is somewhere between annoyance and blocking all Lilypad usage for the user (if that were their only private key)

Reproduction

Use an existing development private key (generated before v2.12).
Run the following job command:

lilypad run cowsay:v0.0.4 -i Message="moo" --web3-private-key $WEB3_PRIVATE_KEY

Observe the following error:

2025-02-18T11:28:27-05:00 ERR ../runner/work/lilypad/lilypad/pkg/jobcreator/controller.go:338 > 🟢 JC failed to download results="error downloading results for deal: unexpected EOF"

Generate a new development private key.
Run the same command with the new key—job runs successfully.

Logs

lilypad run cowsay:v0.0.4 -i Message="moo" --web3-private-key $WEB3_PRIVATE_KEY

⠀⠀⠀⠀⠀⠀⣀⣤⣤⢠⣤⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⢴⣿⣿⣿⣿⢸⣿⡟⠀⠀⠀⠀⠀    ██╗     ██╗██╗  ██╗   ██╗██████╗  █████╗ ██████╗
⠀⠀⣰⣿⣦⡙⢿⣿⣿⢸⡿⠀⠀⠀⠀⢀⠀    ██║     ██║██║  ╚██╗ ██╔╝██╔══██╗██╔══██╗██╔══██╗
⠀⢰⣿⣿⣿⣿⣦⡙⣿⢸⠁⢀⣠⣴⣾⣿⡆    ██║     ██║██║   ╚████╔╝ ██████╔╝███████║██║  ██║
⠀⣛⣛⣛⣛⣛⣛⣛⠈⠀⣚⣛⣛⣛⣛⣛⣛    ██║     ██║██║    ╚██╔╝  ██╔═══╝ ██╔══██║██║  ██║
⠀⢹⣿⣿⣿⣿⠟⣡⣿⢸⣮⡻⣿⣿⣿⣿⡏    ███████╗██║███████╗██║   ██║     ██║  ██║██████╔╝
⠀⠀⢻⣿⡟⣩⣾⣿⣿⢸⣿⣿⣌⠻⣿⡟⠀    ╚══════╝╚═╝╚══════╝╚═╝   ╚═╝     ╚═╝  ╚═╝╚═════╝ v2.12.0
⠀⠀⠀⠉⢾⣿⣿⣿⣿⢸⣿⣿⣿⡷⠈⠀⠀
⠀⠀⠀⠀⠀⠈⠙⠛⠛⠘⠛⠋⠁⠀ ⠀⠀⠀   Decentralized Compute Network  https://lilypad.tech

🌟  Lilypad submitting job
2025-02-18T11:28:26-05:00 WRN ../runner/work/lilypad/lilypad/cmd/lilypad/utils.go:63 > failed to get GPU info: gpuFillInfo not implemented on darwin
2025-02-18T11:28:26-05:00 INF ../runner/work/lilypad/lilypad/pkg/web3/sdk.go:209 > Connected to arbitrum-sepolia-rpc.publicnode.com
2025-02-18T11:28:26-05:00 INF ../runner/work/lilypad/lilypad/pkg/jobcreator/run.go:27 > Public Address: 0x765fEB3FB358867453B26c715a29BDbbC10Be772
2025-02-18T11:28:27-05:00 ERR ../runner/work/lilypad/lilypad/pkg/jobcreator/controller.go:338 > 🟢 JC failed to download results="error downloading results for deal: unexpected EOF"

Screenshots

System Info

- Apple M1 Pro
- arm64 CPU binary
- Funded wallet (for original dev and new dev wallet)

Severity

Annoyance

@walkerlj0
Copy link
Contributor

walkerlj0 commented Feb 18, 2025

I had a similar issue when running jobs. Used my old Job Creator wallet

lilypad run github.com/noryev/module-sdxl-ipfs:ae17e969cadab1c53d7cabab1927bb403f02fd2a -i prompt="a technology decentralized network"

⠀⠀⠀⠀⠀⠀⣀⣤⣤⢠⣤⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⢴⣿⣿⣿⣿⢸⣿⡟⠀⠀⠀⠀⠀    ██╗     ██╗██╗  ██╗   ██╗██████╗  █████╗ ██████╗
⠀⠀⣰⣿⣦⡙⢿⣿⣿⢸⡿⠀⠀⠀⠀⢀⠀    ██║     ██║██║  ╚██╗ ██╔╝██╔══██╗██╔══██╗██╔══██╗
⠀⢰⣿⣿⣿⣿⣦⡙⣿⢸⠁⢀⣠⣴⣾⣿⡆    ██║     ██║██║   ╚████╔╝ ██████╔╝███████║██║  ██║
⠀⣛⣛⣛⣛⣛⣛⣛⠈⠀⣚⣛⣛⣛⣛⣛⣛    ██║     ██║██║    ╚██╔╝  ██╔═══╝ ██╔══██║██║  ██║
⠀⢹⣿⣿⣿⣿⠟⣡⣿⢸⣮⡻⣿⣿⣿⣿⡏    ███████╗██║███████╗██║   ██║     ██║  ██║██████╔╝
⠀⠀⢻⣿⡟⣩⣾⣿⣿⢸⣿⣿⣌⠻⣿⡟⠀    ╚══════╝╚═╝╚══════╝╚═╝   ╚═╝     ╚═╝  ╚═╝╚═════╝ v2.12.0
⠀⠀⠀⠉⢾⣿⣿⣿⣿⢸⣿⣿⣿⡷⠈⠀⠀
⠀⠀⠀⠀⠀⠈⠙⠛⠛⠘⠛⠋⠁⠀ ⠀⠀⠀   Decentralized Compute Network  https://lilypad.tech

🌟  Lilypad submitting job
2025-02-18T15:18:08-06:00 WRN ../runner/work/lilypad/lilypad/cmd/lilypad/utils.go:63 > failed to get GPU info: gpuFillInfo not implemented on darwin
2025-02-18T15:18:08-06:00 INF ../runner/work/lilypad/lilypad/pkg/web3/sdk.go:209 > Connected to arbitrum-sepolia-rpc.publicnode.com
2025-02-18T15:18:08-06:00 INF ../runner/work/lilypad/lilypad/pkg/jobcreator/run.go:27 > Public Address: 0xEC26a7C56e05D208ad93586D108981200406Eca5
2025-02-18T15:18:09-06:00 INF ../runner/work/lilypad/lilypad/pkg/module/utils.go:149 > updating cached git repo=/tmp/lilypad/data/repos/noryev/module-sdxl-ipfs
∙∙∙ Deal agreed. Running job...2025-02-18T15:20:13-06:00 ERR ../runner/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_arm64.s:1223 > websocket error error="websocket: close 1006 (abnormal closure): unexpected EOF"
💌  Deal agreed. Running job...
🤔  Results submitted. Awaiting verification...
🤔  Results submitted. Awaiting verification...
✅  Results accepted. Downloading result...
🆔  Data ID: QmQFQfnKD8xU17ZjsUtqN9VT16kHmKqzEDR7untADuExBU

🍂 Lilypad job completed, try 👇
    open /tmp/lilypad/data/downloaded-files/Qmb2tpLYX3xhWjhLHTzYzB2pS1iVj1jvZcShEBNeuvZqkx
    cat /tmp/lilypad/data/downloaded-files/Qmb2tpLYX3xhWjhLHTzYzB2pS1iVj1jvZcShEBNeuvZqkx/stdout
    cat /tmp/lilypad/data/downloaded-files/Qmb2tpLYX3xhWjhLHTzYzB2pS1iVj1jvZcShEBNeuvZqkx/stderr
➜  ~ open /tmp/lilypad/data/downloaded-files/Qmb2tpLYX3xhWjhLHTzYzB2pS1iVj1jvZcShEBNeuvZqkx

It ran anyway, but I saw a similar error websocket error error="websocket: close 1006 (abnormal closure): unexpected EOF"

@bgins
Copy link
Contributor

bgins commented Feb 25, 2025

We discussed this issue earlier today and have a theory about it.

The CLI run command receives and handles messages from previous job runs that were abandoned. The intent is likely to finish the work and download results from jobs that may have not completed when the run process was not active.

It may be that these private keys were involved with a job that went bad somehow. When we run another job with them, they attempt to continue the previous job, but run into the bad state and are unable to proceed with the next job. We have some investigation to determine the failure case, why did the job reach a bad state?

We may also want to remove the "pick up old jobs" functionality. The UX has for the most part been confusing, and it might be better for the CLI run command to only focus on one job at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@bgins @PBillingsby @walkerlj0 and others