Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for Realtek RTL8720CF #46

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

prokoma
Copy link
Contributor

@prokoma prokoma commented Dec 1, 2024

Hi, I successfully liberated a TP-Link Tapo P100 Mini smart plug and I want to share the modifications I had to do to make it work with LibreTiny and ESPHome.

First I had issues with the initial flashing, I used the UF2 image, but the device couldn't boot after flashing. With ltchiptool uf2 dump I found out that the partition table dump didn't start at zero and traced it back to collect_data, which skipped the block starting at zero.

But it still didn't boot, so I tried passing only one image to uf2ota (so using the SINGLE scheme instead of DUAL) and was confused that the flasher didn't write anything. I found out that the flasher skips parts with SINGLE scheme and fixed it, that also fixed the default config of libretiny, which uses DUAL for OTA1, but SINGLE for partition table and bootloader, so both of those were skipped and the chip didn't boot.

Thank you for all your hard work on this project!

@kuba2k2
Copy link
Member

kuba2k2 commented Dec 1, 2024

Hi,

The SINGLE scheme is used for devices which have a separate "download" partition for OTA - for now it's only BK7231. Realtek chips, as well as ESP chips for example, use a DUAL scheme, which means that two application images are available at any point.
I'm not sure why the bootloader image wasn't flashed before - maybe this was an error on my side, and I never noticed it because I didn't have boot up problems.
To make ltchiptool flash the bootloader/partition table image, you could try passing two of the same image files while generating the image. Kind of like the app image itself, but with the same partition names.

I wonder why did you encounter issues with booting. It's possible that TP-LINK used their own firmware keys (or, more likely, different partition offsets).


I see you're also working on implementing OTA on this chip. My main problem was having to generate the app image hash on-device. The issue here is that the app header has a counter value. In order to flash an OTA update, the 2nd partition needs to have the counter value higher than the 1st partition. But this same value is also checksummed using the app hash key.

When LibreTiny generates an app image, it would need to set this value to a number higher than what's already on the device. But it can't possibly know the number - even if we used the build timestamp, it wouldn't allow for downgrading.

I had two solutions in mind:

  1. When flashing OTA updates, clear the header of the other app partition. That works, but doesn't allow for rollback. Normally this isn't a problem, but the current API is made this way. I would even remove this feature.
  2. Keeping the counter and hash values as 0xFF. The device could then flash the app and calculate the two values, then write it to flash without even erasing (I think that's possible). It's way harder and probably pointless, anyway.

Another problem (and the reason why OTA was never implemented on AmebaZ2) is partition tables. BK7231 has a fixed app offset and mostly fixed download offset - so it's not an issue.

It is, however, an issue on Realtek chips. AmebaZ has a fixed OTA1 offset, but OTA2 offset is stored on the flash. AmebaZ2 has both offsets changeable.

Manufacturers often change the offsets. They can also vary between chip types and are also different for devboards. You won't believe how many times I found myself flashing an OTA update without getting the desired results, just to find out that it was written to an offset that wasn't even used by the bootloader.

An obvious solution would be to flash the partition table along with the app (in UF2 images). It is implemented this way on AmebaZ2. There is normally no problem with that, but what if someone wants to change the partition table in an OTA update? (as in, not via ltchiptool, but OTA). Two things can happen:

  • the new offset can be larger than the current app's end address, this should be fine
  • the new offset can be smaller and will overlap the currently running app. This is a problem. LibreTiny (uf2ota) has measures to avoid that (it marks some regions as "protected") but it makes such an update impossible.

Another problem is flash wear out. There is no point in flashing the partition table and the bootloader every single time an OTA update is applied. Some people like to keep their devices up to date, which might mean flashing new versions even several times a month. Yeah, sure, flash chips can (theoretically) withstand like 100,000 erase cycles, but we have seen some cheaper devices fail prematurely because of this issue.

I tried to fail some middle ground here, but I got caught up in some other things and never got the opportunity to get back to this. I'm open for any ideas.


Fixing OTA issues is planned for the LibreTiny v2.0 refactor. That, as well as many more things. See issue libretiny#(insert issue number here). I have described most issues I would like to address there. I wanted to make partition offsets (and keys) configurable separately from the board type - currently, offsets change whether you select bw15, generic rtl8720cf, or wbr3.
However, I can't estimate when will I be able to work on this update.

@prokoma
Copy link
Contributor Author

prokoma commented Dec 2, 2024

Hi,

thank you for your quick reply!

The LibreTiny AmbZ2 builder generates UF2 where the bootloader and partition table are stored in the SINGLE scheme. So either that has to be fixed, or ltchiptool has to be updated to also flash the SINGLE parts, otherwise the partition table and bootloader aren't updated and can result in the chip not booting. I agree with you that we shouldn't flash the bootloader and partition table every time to reduce wear on the flash.

Maybe something like this would be better then:

UF2OTA=[
    # same OTA images for flasher and device
    f"{image_firmware_is},{image_firmware_is}=device:ota1,ota2;flasher:ota1,ota2",
    # having flashed an application image, update the bootloader and partition table (incl. keys)
    f"{image_bootloader},{image_bootloader}=flasher:boot,boot",
    f"{image_part_table},{image_part_table}=flasher:part_table,part_table",
    # clearing headers of the "other" OTA image (hence the indexes are swapped)
    f"{image_ota_clear},{image_ota_clear}=device:ota2,ota1;flasher:ota2,ota1",
],

The OTA implementation I hacked together works, but it is not ready for release.

I chose the simpler approach (which is also used in the SDK) where I corrupt the header of the other image. Instead of erasing it entirely, it is possible to reversibly manipulate the signature, which allows for rollback. The header doesn't contain any constant magic bytes which could be used to detect its presence, so I use the first 4 bytes of the known public key to implement lt_ota_is_valid. I'll maybe refactor this to load the key from the json at compile time.

The worse issue is that the signature doesn't checksum the whole image, but just the header of the first sub-image. That means that when the OTA is interrupted, the signature can be valid, but the rest may be corrupted. If this happens to the first image (which has priority when the serial numbers are equal), the device is bricked and needs UART flashing. The solution is to write the signature after the rest of the image is written, but I don't see any obvious way to do that in the current code, because writing to the flash is handled by your uf2ota library, which calls the FAL directly without any callbacks.

Also it may be not safe to overwrite the beginning of the currently booted image, I don't know whether there are any interrupt handlers etc, so we shouldn't include the empty header in the UF2, but handle it in the OTA code with proper locking and other measures. So I propose the final config to be:

UF2OTA=[
    # same OTA images for flasher and device
    f"{image_firmware_is},{image_firmware_is}=device:ota1,ota2;flasher:ota1,ota2",
    # having flashed an application image, update the bootloader and partition table (incl. keys)
    f"{image_bootloader},{image_bootloader}=flasher:boot,boot",
    f"{image_part_table},{image_part_table}=flasher:part_table,part_table",
    # clearing headers of the "other" OTA image (hence the indexes are swapped)
    f"{image_ota_clear},{image_ota_clear}=flasher:ota2,ota1",
],

Regarding OTA modification of the partition table, I personally don't need this feature. For initial flashing I can just connect via UART and later use OTA for upgrades. Can you give me some examples when would this be useful? Maybe only if you wanted a single UF2 to be flashable to multiple different boards already running libretiny, but I don't know how common it is.

I am using ESPHome, where you build a firmware for the specific device every time you change a setting, so there is no concept of universal firmwares. I am also not an embedded or IoT developer and just wanted to try this as a fun weekend project to learn more about this stuff.

@prokoma
Copy link
Contributor Author

prokoma commented Dec 2, 2024

I was thinking about it some more and had another idea. We can pre-corrupt the OTA image during build (only for the device scheme), flash the corrupted image and then repair it in the lt_ota_switch function. This would require minimal changes to the code and would be safe against interruptions during the OTA upgrade. I'll probably implement this in my fork.

@kuba2k2
Copy link
Member

kuba2k2 commented Dec 3, 2024

Hi
1.

Maybe something like this would be better then:
Yes, that's what I meant. I think this was just a mistake that I forgot to set the dual schema to bootloader and partition table.

  1. Erasing the current image could indeed cause problems if the XIP code would be erased by the operation too. You're right the empty header shouldn't be included for device scheme.
  2. The ability of changing partition tables can be used if the currently flashed image is too small (e.g. you're adding more components over-the-air). But you're probably right it is a niche usecase - UART flashing is so far necessary anyway, so if the flashing tool updates the partition table, it will suffice.
    • About "universal" firmwares - yes, it would be good to support such a feature. This would be used, for example, by ESPHome-Kickstart (which is a GPIO scanning tool that allows to "adopt" a device in HA). But I still don't know how to support the partition table switching in OTA updates, so it's better to leave that feature for now.
    • There is one way I can think of: UF2 has a built-in partition table. It used to only rely on the device's own table, which wrote firmware to the previously used offsets. That worked on BK7231 (always same offsets), it would have worked on AmebaZ2 (OTA1/2 code is the same), but didn't work on AmebaZ (firmware is position-dependent, can't be simply reflashed to a different location). Perhaps this idea should be introduced again? Or perhaps another time, in the future... That solution still failed if the existing partition was too small to fit the new app.
  3. About pre-corrupting the image: I meant something similar with setting the keys to 0xFF - that would make the bootloader ignore this application, unless it's manually fixed in the OTA activation code.
    Your method with a simple XOR is much easier. However, it still requires to erase/write the first flash block twice.
    • Since UF2 is written sequentially, how about leaving the 1st block until the end of the process? That would fix the case where OTA is interrupted. However, it would mean that "activation" becomes pretty pointless because the UF2 writing itself makes the new image "valid". So flashing an UF2 without activation (and rebooting) would boot either of the two apps, depending on which one has a higher number.
  4. Maybe a proper solution would be to actually bring back the "other app" header erasing? Of course, we would need to make sure that it can be erased safely. The last step of UF2 writing would then 1) write the new app's header, 2) erase the old app's header.
    • This would properly support the bootloader's app booting choice, but wouldn't support the "activation" mechanism. Maybe it should just be dropped? After all, what's the point of having to activate the app separately anyway?
    • Currently, BK7231's activation is a no-op, unless you want to "revert" the OTA update (that's pointless too, as it only allows to cancel the pending firmware switch before rebooting).
    • On AmebaZ it is actually needed to flip a bit in the system partition - maybe that should be part of the UF2 process (the last step, called by uf2ota in some kind of callback)? E.g. without the requirement to call that method separately.
    • Then, "activation" would be optional (per-family). It would only be implemented on AmebaZ.
    • AmebaZ2 would simply write the app headers (write new, erase old) as the last part of the UF2.

@prokoma
Copy link
Contributor Author

prokoma commented Dec 16, 2024

Hi, to move this forward a little bit, I've separated the first commit to #48. That one is clearly a bug with an obvious fix, whereas this OTA stuff needs more discussion.

You've proposed a few times in this thread to remove the rollback functionality:

When flashing OTA updates, clear the header of the other app partition. That works, but doesn't allow for rollback. Normally this isn't a problem, but the current API is made this way. I would even remove this feature.

Currently, BK7231's activation is a no-op, unless you want to "revert" the OTA update (that's pointless too, as it only allows to cancel the pending firmware switch before rebooting).

I agree with you, because it complicates things and the usefulness is limited. It would help only if the firmware could detect that it is bootlooping and switch to the other image, but this would require firmware developers to implement this and the firmware would have to be working enough to reach this check in the first place. It cam also be mostly replaced by a safe mode built into the firmware, like the one in ESPHome.

Then the next decision needs to be made whether we want to put the cleverness into the UF2 file, or into the OTA functions inside the firmware. From your comments I see that you lean more into the UF2 direction and I don't disagree. My current solution is suboptimal, because it doubles the size of the UF2 file and also erases the first image block twice, but it was easy for me to implement.

I don't know Python much and moving the logic into UF2 would require some larger changes to uf2tool and if I implemented that, it would probably look out of place in your clean and well-thought-out code (at least to my eyes 🙃).

@prokoma
Copy link
Contributor Author

prokoma commented Dec 17, 2024

After merging libretiny-eu/libretiny#307 it should be possible to flash RTL8720CF using the UF2 file produced by libretiny. ESPHome should also work, albeit without OTA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants