Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname" #6

Open
AkshayGujare opened this issue Nov 22, 2024 · 3 comments

Comments

@AkshayGujare
Copy link

AkshayGujare commented Nov 22, 2024

Product Being Used
Device: Onion Omega2 APSoC DRAM: 128 MB
Carrier Board: Onion Omega2+
Firmware Version
Firmware:
UBoot Version: Unable to retrieve as the device gets stuck during boot.
Below details got from the same device while it is in working mode (in detailed log named as Onion_Omega_2+_Normal_Working_log.txt)
Image Name: MIPS OpenWrt Linux-4.14.81
Linux version 4.14.81 (root@c34dd7d43957) (gcc version 7.3.0 (OpenWrt GCC 7.3.0 r0+7498-730f2344ba)) #0 Fri Nov 25 04:10:54 2022

Description of the Issue and Context
The issue occurs after running custom scripts (main_app.py with main program and check_system_status.sh for auto reboot) through /etc/rc.local.
The device gets stuck in bootloader mode after running for 8–9 days with the following conditions:
An orange LED remains constantly ON.
The device hostname is not available, and the usual Onion Omega2 UBoot Version log does not appear during boot.
Internet connection was working previously, but the device is currently not connected to the internet.
Resetting for 10–15 seconds only resets the device momentarily, with the orange LED returning to a steady ON state.
Log after reset


/ __ ___ ()__ ___ / __ _ _ ___ ___ ____ _
/ // / _ / / _ / _ \ / // / ' / -) _ / _ /
_
/////___//// _////_/_, /_,/
W H A T W I L L Y O U I N V E N T ? /
__/"
Board: Onion Omega2 APSoC DRAM: 128 MB
relocate_code Pointer at: 87f60000

21-11-2024 14:32:08.90 [RX] - flash manufacture id: c2, device id 20 19
find flash: MX25L25635E
*** Warning - bad CRC, using default environment

Expected Behavior
The device should boot normally, display the hostname, and show the Onion Omega2 UBoot Version during startup.
and other logs as mentioned in the attached file Onion_Omega2+_NotWorking.txt.

Observed Behavior
The device fails to boot correctly, getting stuck in bootloader mode with a steady orange LED and no hostname availability.
Only a partial boot log is available after resetting.

Steps to Reproduce the Issue
Configure the device to execute main_app.py and check_system_status.sh via /etc/rc.local.
Let the device run for maximum 8–9 days while performing normal operations.

Reboot the device.
Observe the device failing to boot properly and entering bootloader mode with a constant orange LED.

What's Been Attempted to Resolve the Issue (But not working)
Reset the device for 10–15 seconds.
Result: The orange LED remains steady, and the boot fails as before.
Verified the /etc/rc.local configuration for any errors.
Attempted to re-establish an internet connection without success.

Help Looking For?
Steps to recover the device from its current stucked bootmode state.
Assistance in diagnosing the issue with the bootloader or firmware.
Guidance to prevent this issue in the future.
#############################################

Normal Working Log (Before Failure)
Onion_Omega2+_Normal_Working_log.txt

Not Working Log (after Failure)
Onion_Omega2+_NotWorking.txt

Also attaching log images of working and not working device below

Working_Device

Not_Working_Device

@greenbreakfast
Copy link
Contributor

Hi @AkshayGujare, sorry to hear about the issue you're experiencing and thanks for the detailed bug report.

It's very odd that the bootloader doesn't seem to start. Let's focus on 1) trying to recover the "stuck" devices, and 2) avoiding this issue recurring.

Recovering a "stuck" device

Are you able to get to the bootloader command line on a "stuck" device? This involves powering the Omega and asserting the reboot at the same time.
See this docs section for more details.

Avoiding this issue

Can you describe what the main_app.py and check_system_status.sh programs do from a high level?
Do they do a lot of writes to disk? How frequent are the writes?

A few other users have reported file system instability when programs are running that frequently write to the flash storage. To get around this, we recommend moving any file writes to the /tmp directory (as this is actually on the RAM, not the flash). In this case, data that should persist indefinitely should be copied over from /tmp to the flash filesystem (anything else on /) at some longer interval, perhaps daily. Cron is solid tool for this copy job.

Please let me know on both points!

@AkshayGujare
Copy link
Author

Hii @greenbreakfast, Thanks for reaching to my issue.

I am unable to access the bootloader command line on the "stuck" device. After a reset, I can only retrieve the log from the port, as mentioned below.

I followed the steps to activate Web Recovery Mode, but the U-Boot version is not displayed. After resetting for 10-15 seconds and repeatedly pressing the Enter/Space bar keys to exit boot mode, I still see the same log as shown in the previously mentioned image ("################## ONION OMEGA2+ ISSUE - Not Working Device") and do not see the menu option.


/ __ ___ ()__ ___ / __ _ _ ___ ___ ____ _
/ // / _ / / _ / _ \ / // / ' / -) _ / _ /
_
/////___//// _////_/_, /_,/
W H A T W I L L Y O U I N V E N T ? /
__/"

Board: Onion Omega2 APSoC DRAM: 128 MB
relocate_code Pointer at: 87f60000
flash manufacture id: c2, device id 20 19
find flash: MX25L25635E
*** Warning - bad CRC, using default environment

Every 2 seconds, a data packet is captured from Modbus to the Onion Omega2+ for a total of 32 packets.
Each packet, consisting of 105 bytes, is sequentially written/appended to an Excel sheet within 3 seconds for all 32 packets.
The JSON data string, approximately 3147 bytes in size, is published via MQTT every 5 minutes.
Additionally, print statements from main_app.py are used to display logs of the time intervals between the 5-minute periods.
The main_app.py script is responsible for retrieving data from Modbus, writing it to an Excel file, erasing the data once processed, reading the Excel parameters, and connecting to MQTT to publish the data.
The check_system_status.sh script ensures system reliability by automatically rebooting the system if main_app.py becomes corrupted or data logging gets stuck.
Both scripts have been thoroughly tested.

In my case, I am frequently writing data to /root/. I will modify this and test the setup by writing the frequent data to /tmp/ instead and will provide an update on the results.
Meanwhile, could you please guide me on how to recover the "stuck" device?

@greenbreakfast
Copy link
Contributor

Hi @AkshayGujare
Yes, please modify the programs to use the /tmp directory for frequent data writes. That should resolve the issue.

Regarding recovering stuck devices:

How many stuck devices do you currently have?
I'd like to first confirm if the bootloader can be accessed on a stuck device.

On a working device, the bootloader menu can be enabled by powering on the device while holding the FW_RST pin (GPIO38) active. This reset pin is active-high, and this is the pin used by the reset button on the Omega2 Docks.

Keep in mind pressing the enter or space keys will not activate the bootloader menu.

Please try this first on a working device, and then try it on a "stuck" device. Report back how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants