-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP8266 Crashes at seemingly random intervals after mining starts #24
Comments
@matteocrippa |
nope, will try during a night this week |
Rien d'urgent pour moi... je voudrais pouvoir aider...
Mais mon domaine est plus le python...
Le mar. 19 mars 2024, 22:42, Matteo Crippa ***@***.***> a
écrit :
… nope, will try during a night this week
—
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AENQRJQ5N7KHGXTAJTUUSJ3YZCWMZAVCNFSM6AAAAABE2ZRYR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGE4DGNBTGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I know why the crash is occurring. I ran the miner with a stratum proxy and was watching the screen. 2024-03-20 14:03:41,301 INFO proxy client_service.handle_event # New job 201819d for prevhash 78501535, clean_jobs=False The miner crashes and restarts. |
@wmikrut |
Good catch, so probably is going out of memory when it calculates the coinbase |
for test, increase of the delay at line 32 in the main.cpp file |
I'm not sure it's an out of memory condition. I loaded your Job context with breakpoints and the whole process completes every time. I don't think its a WDT issue because WDT was disabled in main... I re-enabled it and saw the same result. Could it be as simple as a stack overflow? |
Et la version 10 fonctionnait? |
I compiled all the way back to v5. v5 does not crash on new work notifications. |
@wmikrut |
I've been running it for 15 minutes now with no crashes. I'll keep digging and see if I can spot where it's binding up. |
OK top, Can you send me the functional version in bin without automatic update in the meantime? in *.bin format? |
Just pushed a few changes, quick tested and seems way more stable for me for a Weimos D1, give a try to 0.0.13 (or just way the autoupdate forcing a reboot) |
Greeaaattt |
It is definitely more stable and now I think I can see a potential memory issue that would be easy to fix. Every so often the stratum server comes along and assigns a new job and parm 9 will be clean jobs false. Under normal circumstances this is fine because hardware miners are quick and jump from job to job. Now the ESP8266 is much slower and can't handle a lot of queued work. I've see after 3-6 notifications of work being queued the chip freezes up completely. I've added some quick code to say when work is already queued up, skip queuing up additional jobs. When true comes down reset everything and start over. |
Definitely much more stable with the limiting of queued jobs. After an hour it was still running. It stopped only because by wifi router is garbage and the connection dropped. The program picked it up and reconnected, but it never re-subscribed or authorized the new connection so I kept getting the Connection is not subscribed error. A future item - subscribe(), authorize(), difficulty() on a new connection. |
I forked the project so you can look at some of the code I am playing with on my dev branch. |
I prepared a branch |
What caught my attention was line 83 in current.cpp This was firing every time new work was sent down from the proxy with clean jobs = false. It was only a guess that perhaps the new Job was somehow allocating memory and/or stack space. |
We can close this issue. |
I've been running v14 for 30 minutes now.
Not a single error!
21:49:24.498 > [I] Miner: [0] > [247b563] > 0x00188f0f - diff 0.000043553264
21:49:24.498 > [I] Network: >>>
{"id":232,"method":"mining.submit","params":["wmikrut.ex1","247b563","745449","65fe4245","00188f0f"]}
21:49:24.514 >
21:49:24.639 > [I] Network: <<< [mining.submit] {"error": null, "id": 232,
"result": true}
21:49:24.639 > [I] Network: Share accepted
21:49:24.639 > [I] Current: Hash accepted: 200
That's a lot of work for the 8266!
Can't wait to let it run overnight.
…On Fri, Mar 22, 2024 at 4:41 PM Matteo Crippa ***@***.***> wrote:
I prepared a branch v/0.0.14, but didn't had time to test it yet with
some changes according to the feedback you shared.
In general any 1 core device will skip enqueuing any extra job and just
replace the current on clean_jobs is true.
Also add logic to try to force reset the session if connection is failing
—
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALJJSQ2UQKE5CZTGWMYJRLYZSQPZAVCNFSM6AAAAABE2ZRYR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVHE3TAMJSGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Can we close this? |
Pas encore svp
Le sam. 30 mars 2024, 22:16, Matteo Crippa ***@***.***> a
écrit :
… Can we close this?
—
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AENQRJXXRE5MAS5DYZLUWY3Y24MQ3AVCNFSM6AAAAABE2ZRYR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYGQ3TANJUGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I tried to flash ESP8266MOD with LeafMiner 0.0.13 and it was not stable. It mined for several minutes and then it stopped to mine. I flashed firmware in this way:
|
Hello After rebooting it connects and starts mining. I can see shares being submited to vkbit.com, however if I try to simply ping its IP address it responds intermittently, not sure if it is disconnecting and reconnecting the network. |
@ffrediani it's correct, web gui is available only at the first boot or if you erase the flash, keeping a web ui up and running will kill the already limited performances of an esp8266. If you want there's a branch v0.0.16 with some patches, but it's still a work in progress |
@matteocrippa thanks. In any way, even not pinging it or trying to access the web interface I can see that it disconnects and re-connects to the Wifi Access Point every in a while. Is this expected ? If so, if this is happening while it is solving a block it may not be able to submit a share. Yeah I can try the 0.0.16, however I was not able to find the .bin in Github and can't compile it myself. Any particular URL I can download from ? |
Yeah, seems that after a while mining the ESP8266 disconnects or crashes and doesn't recover anymore until rebooted. Do you think 0.0.16 would have any chance to avoid this behavior ? |
0.0.16 has some patches in that direction but didn't have time to fully test it, for sure it's more stable than 0.0.15 for esp8266. You can install only building manually for now, till it's released via CI. |
Well, at the moment there is not much too loose as 0.0.15 on the ESP8266 (NodeMCU) is crashing every in a while and only a reboot brings it back to life. If anyone can make the binary available somewhere I can upgrade here. |
@matteocrippa is there any way to prioritize the processes (the one runs the web interface over the mining one) when required ? So it could decrease the hashrate as necessary in order to reply to the web client for a short period while it is being accessed and be able to show the web interface for troubleshooting and upgrade ? |
No plan for such a feature for solo mining, it's already kinda success that an ESP8266 can barely mine. |
It is not a feature. Feature is an extra or new thing, but prioritizing processes, if possible for ESP8266, is already something that is already there. I managed to compile 0.0.16 in PlatformIO and upload it. I am monitoring on the serial monitor. If I find anything before the crash I will share. |
0.0.16 is available as binary too if you need. |
Hi I have been using 0.0.16 for about 2 days. I left it running connected to the PC with a Serial Monitor. It had some issues about 2 times, but it recovers by rebooting and starting over again so the mining process doesn't die and all stops. How the fact to be connected to the serial monitor can change its behavior in a way that it didn't die once ? Any particular debug/troubleshooting or data collection I can make in order to find out more? |
Maybe serial console, that TX buffer is not read by PC and buffer overflows? What happens when serial buffer is full?? |
Hi. I've connected again to serial console to watch it and left it running for a while. After some random time it simply stopped mining but didn't throw any errors. Find below the last couple of messages I see in the serial console. Let me know if there is anything else I can collect or do in order to get more information that identifies the problem. [D] Network: <<< len: 37 [D] Network: <<< len: 37 [D] Current: Job: 6211ad is cleaned and replaced with 6242b1 |
To add up managed to catch up a different error on the serial console. On this time however it "recover" by starting over again on reboot: [I] Job: Random value: 1964889275 [I] Network: <<< [mining.notify] {"id":null,"method":"mining.notify","params":["624857","ce1c8a9a80ef4ed23a53a2fb74117cd532dbc4ed000164fa0000000000000000","02000000010000000000000000000000000000000000000000000000000000000000000000ffffffff1503afd60c564b4249542e434f4d","ffffffff038fdd17000000000016001427e04fe106a94b902e0f35ce87e0531b0ff47f3395398d1200000000160014e55dea92fb04b1d68e692a016f29989966d538eb0000000000000000266a24aa21a9ed181cd5416c2d26099b6873b2514ef6ae3d0d9e0438f2a4bec2d813958d05fdaa00000000",["eb85016e05b7149817abc553d40fb18f3f7a1fc547a07ef9f015253d53ca36b5","ab96f6c6e5a76f22b77a1e58dc4a28aa381fa727b21f60ac02b29b2a8a50e079","ec5bc63184fef71824c363b89ab350e63cfd0b43e60b0fa0b09947a96323044b","fba15909307f25533d3f723f4fa8463bd30c21684e9da0ed014a8badcae97744"],"20000000","170331db","662fdb00",true]} [D] Current: Job: 62483c is cleaned and replaced with 624857 3ffffa40: 0000003c 00000000 40104919 000000ff 3ffffd10: 3f 0x3fff20b8, len 40, room 8 ~ld ������n��r��n|�l�l [I] Main: Compiled: Apr 27 2024 17:14:19 [I] Configuration: pool_password: x [I] AutoUpdate: Connecting to MySSID... [D] AutoUpdate: Remote Version: 0.0.16 |
Managed to catch another error which also recovered by a reboot [I] Current: Hash accepted: 821 [D] Network: <<< len: 37 [D] Current: Job: 663d05 is cleaned and replaced with 6662c1 User exception (panic/abort/assert) --------------- CUT HERE FOR EXCEPTION DECODER --------------- Panic core_esp8266_main.cpp:191 __yield
[I] Configuration: lcd_on_start: off |
Seems mostly related to the change of the job, btw good that it reboots and does not stay stuck |
@matteocrippa yeah, most of the time it reboots which is Ok to keep going, however one of the reports above it simply stops, doesn't show any errors and only recovery when a Poweroff/Poweron. I just got back to the PC and again it was stopped without any errors. Just the last message "Job: Random value: 62289229" and that's it. Strangely it happens mostly (but not only) when it is connected directly to a power supply where is not possible to get serial output. |
Hello @matteocrippa I have been running 0.0.16 here on the serial console and most of the time it just gets stuck with no error shown, stops mining wihtout a reboot or anything else and just a poweroff/poweron reovers it. Anyway, I have flashed 0.0.17 already to the ESP8266 and I'm testing it. Will provide feedback soon. |
I'm working locally with 0.0.17, I will push changes remote just in case of any relevant improvement. Sadly nothing so far. |
I managed to capture the crash
|
Hi Matteo Regards the crash you shared yes I got similar ones, with the cut part for the decoder, for these ones it reboots and keeps going. |
After a short period of time the ESP8266 crashes and restarts.
Mine on an pool and wait a few minutes to reproduce the issue.
I see that shares are being submitted and can verify that those sahes are making it through my local stratum proxy to the pool.
Here is 10 minutes of run time from PlatformIO serial monitor with decoder.
I am running a local stratum proxy and I definitely see valid shares being submitted to the pool.
Could this be an issue with parsing a network response?
Each time it fails it seems to be on network_listen() at src\network/network.cpp:416
I have experienced this with v11 and v12.
esp8266.log
The text was updated successfully, but these errors were encountered: