-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed width formatting #133
Comments
@jagerber48 Yes, going below 7 characters is very hard, and probably not worth supporting. Just sign, one digit, decimal, 'e', exponent sign, 2 value exponent) use 7 characters. So, there won't be very many general-use cases for fewer than 7 characters, so it's probably fine to just assert that is not supported. I think it would even be OK to say 8 or more characters. |
@newville thanks for checking this out and responding. Yes... I think if the target length is >= 8 then you can always represent the number (assuming -99<=exp<=99) in scientific notation with length-6 sig figs. The trick is just figuring out exactly when you need to switch between fixed point and scientific notation (which I think has been worked out in All of that said, I think there are use cases around (I recall reading them on e.g. stack overflow) where people want numbers formatted to a fixed width, and I think, naively, they're thinking about fixed point number. That is they have numbers like 123, 0.23, 0.001, and 1 and they want to see them listed as
Here the target length is 6, the minimum length that can be used to display all numbers in fixed point format. For this, if the target length was e.g. 4 then 0.001 would overflow no matter how it is represented. I'll consider which of the following should be in
|
I haven't explicitly tested this, but looking at the link you provided, it looks like
So the two formatting algorithms make different decisions, at least on this case, about which to switch from fixed_point to scientific notation. Maybe changing I plan to also include a function that accepts a list of input numbers (or number/uncertainty pairs) to format as well as format settings (like Neither of these functions provide a guarantee that the target length will be hit, but I will include an example either in the docstring or readthedocs or both showing that if you selected An almost exact
However, I don't think I'll provide something like this in |
Right, I think it is ambiguous at the 1e-4 level, giving the same precision with both formats. Anyway, thanks! |
Two issues I've been pondering while trying to think about a function that formats a collection/sequence of numbers or number/uncertainty pairs to have the same lengths. Issue 1: Value formatting with thousandths separatorsWhen thousandths separators are used it may not be possible to hit any sufficiently large width just by adding sig figs.
This could be compensated by left padding with zeros or spaces
Issue 2: Value/Uncertainty formatting even/odd issueFor value/uncertainty pairs the problem is worse
Adding a sig fig always adds characters equally to both the value and the uncertainty, so it is impossible for certain value/uncertainty pairs to be made to have the same length. This problem could be addressed by left padding one of the value or the uncertainty, but that may not be possible given the behavior of Another possibly even worse case is when doing value/uncertainty formatting with thousandths separators. I'm not sure what algorithm can be used to address both of these issues. I think worst case scenario two value/uncertainty pairs might be off by 3 or 4 with respect to their target or neighboring numbers. Possibly you can always do better than 3 with a clever algorithm but it's not obvious to me. My knee-jerk reaction to this feature request was to not be to excited about it. Scientific number formatting should be driven by uncertainty orders of magnitude and significant figures. It should not be driven by string length. The python built in format specification mini-language has functionality for padding strings to a fixed width, but I'm tempted to say that a number formatting package shouldn't provide functionality for controlling string widths and that modifying number formatting based on string width is an anti-pattern based on the philosophy above. Of course, the Curious for your thoughts @newville. For the specific use case of output result formatting for |
Hm, do you have a citation for that.
Hm, that's confusing. Is the point of If that is not the goal, then is it fair to say that I think I may have misunderstood the goal of the project. Good luck and all the best! |
@newville thanks for the feedback. My opinions in my last post possibly came off too strong. I'd say I'm on the fence about these issues. Thinking as a scientist, the character width of a number should never matter in the least. But, as you say, if In any case, so far If I could ask your opinion once more: In simple cases it is possible to format individual numbers to a specified character width (like
I don't think you've misunderstood. |
Said another way: If there was an obvious algorithm to format value and value/uncertainty pairs to specified lengths for all |
Thinking as a scientist, communication of non-trivial numerical results is the most important function we perform. When discussing communication of scientific results, I cannot think of any attribute X where I would ever agree with a statement like "X should never matter in the least". If the width of a string should "never matter in the least", then should the height of the string also not ever matter? How about a "numerical base"? How would you feel about "x = 14401.5 +/- 0x632"? When I look at the aim of this project, the formatting of numerical values to communicate scientific results seems to be the point. I must have misunderstood. Good luck and all the best. |
@newville my apologies. I don't want to waste your time. Leaving the philosophical discussion about string/number formatting aside, If you are able to spare any more time to this discussion I would really appreciate your feedback specifically on how
Perhaps the conclusion is that If you care to answer I have a question about your There is a sciform example using tabulate to accomplish a similar goal. Here I transcribe the table but with modifications to make the strings more "ragged".
Here |
It seems to me that you wrote pages of philosophical discussion. I do not know what you would like to set aside. I cannot wrap my head around the idea that someone would want to write and discuss software for formatting numbers to strings and then say But you said rather a lot of "should not" statements. The width of the output string is one of the fundamental characteristics that a formatting process must address. I'll have to leave it there. Best of luck with the project. You might do well to find people to work with. |
@newville Sorry if I've been unclear and come on overly strong in some of my statements. My main interest in this issue has been trying to do some brainstorming about the specific edge cases that have been bothering me and blocking me from including general character width formatting in
Thanks for the well wishes. @newville, I would love to find people to work with. I'm actively seeking feedback and collaboration for Going forward I'll continue an investigation into if/how it makes sense to include character-width formatting in Also, this arose out of the |
Collecting my thoughts on by-overall-width formatting for the moment. The first comment will have a lengthy history of how
Originally Free from the constraint of making an extension to the built-in FSML, I could reconsider from the ground up what exactly a value or value/uncertainty formatter should do. The main shortcoming of the built-in FSML is that you can't independently control the exponent mode (e.g. fixed point or scientific notation) and the number of significant figures displayed. If you use The next part of the built-in FSML I considered was the width control. After developing the Looking at the built-in FSML was strange through this lens. The width specifier in the built-in FSML is not parametrized in terms of decimal places at all, but it is rather trying to control the overall character-width of the string, including incidental characters like the sign symbols, decimal symbols, thousands/thousandths separators, exponent symbols, etc. At this point I elected to include the
So the idea was that However, importantly, one thing is lost by having I have so-far concluded that adding padding symbols between the sign symbol and the most significant digit to reach a certain overall length is out-of-scope for I am curious to learn more about use cases for controlling overall string width by controlling the number of decimal places occupied to the left and right of the most significant digit. Why do left/right/center string padding not suffice? The If instead users had a collection of numbers of similar order of magnitude that should be compared that they wanted to display in columns I would still argue that formatting by decimal places like !!!!! *By conservative extension I mean the |
After that long comment, more thoughts on the specific issue at hand. The previous comment was a long history of why I haven't included by-total-length formatting SO FAR. That doesn't mean I'm 100% opposed to including it ever. As evidence see #139. If it was very easy to include total-length formatting as a helper/wrapper around
How should these cases be addressed? Ideas:
|
On my local fork of See jagerber48/lmfit-py#1. See especially the diff on Old fit results:
New fit results:
If If |
Here is how the table might look using
Or with the variable table in fixed_point mode:
|
There seems to be generic demand for functions that format numbers but work very hard to preserve the overall width of the string.
sciform
could support a function like this. See e.g. lmfit gformat and an example usage. This was brought up here.NOTICE (2024/02/05): This issue is on hold until some discussion can happen around how the edge cases described below can be handled. It would also be helpful (but not strictly necessary) to see example use cases that aren't covered by applying normal python string left/right/center padding to the string (actually
FormattedNumber
) that is output bysciform
.It's not totally clear to me exactly how this could be done, and I'm worried there are some edge cases that can't perfectly satisfy the length requirements. For example, suppose we want to display -112345678 (10 characters) in 7 characters
We definitely can't do it in fixed point notation because there are 9 digits already before the decimal symbol, so we must use scientific notation. We see that with one sig fig we require 6 characters, but with 2 sig figs we require 8 characters because adding the second sig fig introduced a decimal symbol. The resolution is we have to carefully choose the exponent and number of sig figs so that there is no decimal symbol to get around this "characters jumping by 2" problem. However, it might be that this problem is unique to
length=7
when the exponent requires 2 digits to express (i.e.-99 <= exp <= +99
).What then should be the exact procedure for getting a fixed length representation? I suggest the following procedure which would be relatively simple with
sciform
.Formatter
(with any settings whatsoever)exp_mode
to try, e.g.["fixed_point", "engineering", "scientific"]
Formatter
but overriding (1) theexp_mode
with the entries in the list above (2) theround_mode
to besig_fig
and (3)ndigits
to vary from 1 up tolength
.exp_mode
then the ordered list is used to select a winner. If no mode combination can hit the target length the mode combination which hits the shortest length with the most sig figs and with precedence given to theexp_mode
list is selected.This algorithm will probably miss the
-1.1e+08
->-11e+07
optimization described above, but I think this is an ok price to pay (one sig fig) for staying strictly in standard exponent modes like "scientific" (withexp_val=AutoExpVal
unless otherwise specified) or "engineering".The performance of the above algorithm is very poor, it's jut brute force. But I think it would be good for a brute force approach and to get some tests/examples written. I think some simple optimizations could be done like guessing close to the right number of sig figs using the magnitude of the number and a static analysis of the character "overhead" for each mode then guessing +/- 2 sig figs or so to cover strange edge cases like described above. Note also that non-trivial upper/lower separators will introduce even more edge cases, make the need for a guess-and-check algorithm even more pressing.
The text was updated successfully, but these errors were encountered: