-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify foc fpu math #372
Conversation
mcpwm_foc.c
Outdated
@@ -4088,8 +4116,8 @@ static void svm(float alpha, float beta, uint32_t PWMHalfPeriod, | |||
// sector 1-2 | |||
case 1: { | |||
// Vector on-times | |||
uint32_t t1 = (alpha - ONE_BY_SQRT3 * beta) * PWMHalfPeriod; | |||
uint32_t t2 = (TWO_BY_SQRT3 * beta) * PWMHalfPeriod; | |||
uint32_t t1 = (alpha - beta13) * PWMHalfPeriod; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outside the scope of this PR, but I notice that all these values are truncated. I'm not sure if it's worth the performance penalty to add 0.5f
to the end to get a rounding-ish behavior.
Tested, motor runs fine. RFM. |
Outside the scope of this PR, but before minimizing the number of divisions by making code harder to read I think we should tackle the issue with redundant loads and eliminate the |
Do you feel that is the result? I put the original, pedantic math in comments so that the reader could quickly understand that's where the simplified forms came from. |
I hadn't realized |
I'm afraid |
Yes, the fact that we need to write the original math in a comment instead of the code makes the code harder to follow and debug. |
I guess I see it the other way around. I find it a lot easier to understand This gets especially painful when working with long equations and with variable names which are somewhat obscure, such as a hypothetical Just my personal theory on things, but I see the comment as the intent, and the contract that the code must deliver. The code's responsibility is to deliver the promise the comment makes. If the two disagree, then the code as a whole is buggy. This does mean that the comments need to be faithful and clear. |
Interesting. When I saw the proposal I was really happy because it made the math much clearer for me. Dividing by (2/3 * Vbus) is not clear to me at all, even after having seen it many times in the code. |
This is exactly how I feel about this PR. In general I don't like higher cpu performance if it comes at a cost of code readability. The way I see it if the code is more readable it invites people and opens the door in the future to improve algorithms to achieve higher motor performance. If I can get 5% more torque it would be huge, but 5% better cpu usage? I don't care.
In general I would need a lot more data to feel confident about a merge, maybe we should have a template so we can show the hardware that has been used for testing, motor characteristics, voltage/current/torque/rpm/power of the test, such that we have an idea of the coverage of such testing. I watched my dyno run fine for a while, until it blew when it entered FW for example. Its very common that "testing" is just spinning a motor at no load, that's not enough when you're changing core stuff. Same about removing volatile declarations... gotta be real careful to make that call and get some good testing coverage. |
My suggestion would be as follows: /* Voltage normalization such that 1 corresponds to the absolute maximum voltage the inverter can make including overmodulation. Without overmodulation the max value is sqrt(3)/2 = 0.866 */ This actually explains where the factor comes from, instead of a factor 2/3 just appearing. It also tells you what is means for the normalization amplitudes. |
I think that's what @ElwinBoots and I are saying. As we're going through the code implementing https://vesc-project.com/node/3257, we find that it is in general difficult to read and understand some of the basic mathematical underpinnings. We're not so much concerned with the nitty-gritty of the implementation as the why and what it all means. Stuff like It was a similar process with https://vesc-project.com/node/3225. Catching those bugs made a night and day difference for my 20kW motor, but it was harder to do because the math is not well documented. We have to read individual lines of code to understand the gist of what's going on, and that's not as efficient as we'd like it to be. Do you guys see a happy compromise? We both find this new form a lot more readable, but I certainly appreciate why you'd prefer the old one.
I don't think available CPU can help with max current/torque, as unfortunately that's a limit of the FET switching losses. However, max eRPM is going to be driven in part by how quickly the code can run. Eliminating 50 operations per loop and helping the system avoid caching misses gives a little more stability, so we can push the motors to a little higher eRPM. This isn't completely academic. In our FRF testing we are seeing the controller crash and reboot when trying to use the trace sample functionality when running at 30kHz. This is indicative of loops consuming too many resources, possibly running past their margins and triggering a watchdog. While the impact is minor, I wouldn't say it's trivial. We have a saying in aviation, "take care of the grams the the kilograms take care of themselves". I feel like similar approaches in embedded dev pay off.
I agree! That's why I limited the PR to only reshuffling the math. We can validate that by eye and with light testing because we'll know instantly if, for instance, the Short of having some kind of hardware-in-the-loop testing as part of the PR, I don't really know how to accomplish this. There are so many variables and edge cases which pop up even after the extensive testing done before a release, such as https://vesc-project.com/comment/8408#comment-8408. When I was doing autopilots, our goal was an automated build which pushed the code to hardware, and then the hardware ran a script which tested a variety of configurations. Maybe something similar to this could happen in VESC? |
Comments like that would be super super appreciated, even links to point to literature.
Yes, I know its unsafe to run the foc loop at >60KHz, my builds usually limit the max swtiching frequency to avoid RTOS crashes as it might end up in a bricked board, but that kHz limit changes with each version as we add more tasks to the code. In my experience 60kHz is more than enough, but I don't use ultra low inductance motors... Its not too hard to ask the RTOS how is it doing (% of idling time) to estimate in runtime what's the max available loop freq.
I hear you there, I had the same goal for vesc, we built into the vesc firmware a motor model to be able to run a virtual motor: My end goal was to setup a continuous integration that could test each commit by running it into real hardware (stm32f4 discovery) making it drive the built-in virtual motor. I couldn't push this to the end as I got swamped with dayjob work... I'm not even sure if the virtual motor is working in the latest build. |
In general readability should always go before CPU efficiency, unless it makes a significant difference. In this case the maximum interrupt rate is 30 kHz, so 50 cycles per interrupt makes a difference of 50 / (168000000 / 30000) * 100 = 0.9%. I wouldn't sacrifice any readability for that. That being said, there are some things I like about this PR that make it more readable such as the voltageNormalize (which I would write as voltage_normalize to be consistent with (most of) the rest of the code). The new version of the mod signs are also as readable as before, but faster, so that is nice. A small comment: In general it is better to write out floating point constants with decimals, like 2.0 instead of 2. In most cases that does not matter and they will used as floats, but I think it is good to make that extra clear. The SVM function is a bit less clear now and I don't know if it even makes a meaningful performance difference as the values are used very few times through the nested if and switch statements. beta23 is also only used in 4 out of 6 cases where it is used only once, so if anything the pre-calculation makes it slower. I also don't like the auto-generated comment in the beginning of the function as it only takes space unless the parameters actually are described there. |
About the motor state being volatile: The main reason for that is that it is read from threads in several places, and if you, for example, run a loop that cannot alter the state and wait for something to change it the optimization can mess things up by assuming that the state will be unchanged in that loop as it is unaware of interrupts. The detect functions do this a lot, so I think it is better to keep it volatile to not run into surprises. |
Another thing: the terminal command last_adc_duration can be used while the motor is running to measure how long the interrupt takes to finish. That gives some some indication about the effectiveness of the optimizations. Duty cycle more usually runs the slowest and using encoders also slows things down, so testing under those conditions is useful. |
I haven't done any A/B testing yet so any performance gains are purely hypothetical as of yet. 1% for marginal changes in the code sounds almost too good to be true. Thanks for the tip on
This sounds like a stylistic question. It seems opinions differ on whether the original or the resulting is more readable, but since none of us are programming in assembly it comes down to preferences of how abstract we like the code. Obv. the project maintainer preferences are the most important, so we'll cater to those. Would you like to highlight the sections you agree with and the ones you think need improvement? We can merge in only the uncontroversial changes, and then I can test the speed changes for the "less readable" optimizations. If it's still something worthwhile to consider I'll open a new PR.
Usually yes, but in this case is intentionally as an integer, for the case where some optimizations/compilers will use a faster bit shift of the exponential in IEEE 754 format. By using the exact
I suspect that the compiler will resolve this for us. It's usually smart enough not to compile the number if it doesn't need it, but not smart enough to realize that multiple math equations evaluate to identical results. As an aside, how important is it that SVM() pathway execution be identical in time? I suspect it doesn't here, since we can guarantee that in a complete 6-step cycle we will wind up executing all pathways, but I'd love your opinion on it.
Yup, agreed. Part of the reason this is a draft is because I've been trying to figure out what the SVM is doing. I'd like to precisely document it, but it's not completely clear yet what the basic assumptions are. For instance, why are all the calls to it passing negatives? And why the complex branching execution path vs. the clearer If you can fill in the blanks, perhaps by giving the paper where that SVM methodology is described, I will happily get that added. |
What currently has a higher prio than the ADC interrupt? We could look at those individual sections and see if they affect anything which is calculated during the ADC interrupt service handler. |
92cd71e
to
257eb85
Compare
/* mod_alpha_sign = 2/3*sign(ia) - 1/3*sign(ib) - 1/3*sign(ic) */ | ||
/* mod_beta_sign = 1/sqrt(3)*sign(ib) - 1/sqrt(3)*sign(ic) */ | ||
const float mod_alpha_filter_sgn = (1.0 / 3.0) * (2 * SIGN(ia_filter) - SIGN(ib_filter) - SIGN(ic_filter)); | ||
const float mod_beta_filter_sgn = ONE_BY_SQRT3 * (SIGN(ib_filter) - SIGN(ic_filter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identical to the above, only here we have [-2/sqrt(3), 0, 2/sqrt(3)]
as the three possible outputs.
Nice explanation Benjamin! Funny coincidence that just today I looked into why those inputs to the svm function are negative. My conclusion was that that function calculates rising edge moments (so a high value would be a low voltage). From what I saw the setting of the center aligned PWM expects the duty cycle instead (high value is high voltage). So you inverted this by inverting the inputs. I guess this matches what you describe above. This does make the sector output incorrect, but I don't think that is being used. Maybe we can clean svm up in some future release. |
If that's the case, then all volatiles should be able to be safely removed as they cannot be modified outside the ADC handler.
Yes... and no. FPUs are magic, but they can have a cost which isn't always obvious.
While it's worth knowing, none of this is a super compelling reason to use
Thanks. I will update the PR. @vedderb, I'm still a little unclear on how I should change this PR to move it forward. When you have a sec, could you flag which changes you'd like me to keep and which changes you'd like to defer/decline? |
`/((2.0/3.0) * v_bus)` is the same as `* 1.5/V_bus`. Abstracting this out saves at least one division per loop, and in some cases as many as three. Each STM32F4 FPU division takes 14 cycles.
257eb85
to
2cb9b3f
Compare
|
||
/* mod_alpha_sign = 2/3*sign(ia) - 1/3*sign(ib) - 1/3*sign(ic) */ | ||
/* mod_beta_sign = 1/sqrt(3)*sign(ib) - 1/sqrt(3)*sign(ic) */ | ||
const float mod_alpha_filter_sgn = (1.0 / 3.0) * (2.0 * SIGN(ia_filter) - SIGN(ib_filter) - SIGN(ic_filter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is basically taking a truth table as input and outputs in the set of [-4/3, -2/3, 2/3, 4/3]
. This could be moved into a function and turned into a simple if()
hierarchy.
mcpwm_foc.c
Outdated
@@ -4127,8 +4154,8 @@ static void svm(float alpha, float beta, uint32_t PWMHalfPeriod, | |||
// sector 3-4 | |||
case 3: { | |||
// Vector on-times | |||
uint32_t t3 = (TWO_BY_SQRT3 * beta) * PWMHalfPeriod; | |||
uint32_t t4 = (-alpha - ONE_BY_SQRT3 * beta) * PWMHalfPeriod; | |||
uint32_t t3 = 2.0 * beta13 * PWMHalfPeriod; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be left as TWO_BY_SQRT3
. The reason to use 2.0 * beta13
is to let the compiler know we don't need beta
anymore. This isn't a very strong reason and if it's found that there's a readability preference for the old format then let's go back to that.
No. There are loops assume that the state will change as they go. For example, there are a few places that wait for the motor to stop. If the compiler thinks that nothing in that loop can affect the state, it might optimize away those checks and go forever as the loop does nothing that can affect e.g. the rpm in the state. In most cases that is not a problem, but I have seen it happen, especially when changing compiler version or compiler settings. I think everything except the changes in the SVM-function are fine. The reason is that, as it is now, it is relatively easy to look at the svm function, draw some figures and see exactly what the steps are and where the calculations come from. It was quite easy for me to draw the figures above because of that, and I think that has some value. |
0f47264
to
80adea0
Compare
Okay, they've been taken out. |
Of interest: https://gist.github.com/kubark42/333bfce67e309474fe1801eb2356f7be Compared to using However, I think it is a good example of how compilers sometimes do unexpected and screwy things when floats are involved. |
This simplifies math, eliminating FPU operations --and in particular FPU divisions-- on the critical ADC interrupt handler pathway.
There should be no effect aside from more compact, more streamlined assembly code.
Before
After
Example output
Note there is one less division and one less operation; there's no access of the program counter; and
1.5
is able to be loaded by avmov.f32
instruction (which likely helps the ART accelerator optimize predictive caching at run-time). It's not much of a difference, but it's free for the taking.This also isn't the complete story since
voltageNormalize
is reused further down in the code, saving at least two additional divisions. Divisions take 14 cycles per operation, so this alone is a net savings of >50 cycles per ADC interrupt.Before
After