Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should spectrum.average_photon_energy document and test how nan is handled? #2423

Closed
RDaxini opened this issue Mar 27, 2025 · 6 comments · Fixed by #2426
Closed

Should spectrum.average_photon_energy document and test how nan is handled? #2423

RDaxini opened this issue Mar 27, 2025 · 6 comments · Fixed by #2426

Comments

@RDaxini
Copy link
Contributor

RDaxini commented Mar 27, 2025

Is your feature request related to a problem? Please describe.
pvlib.spectrum.average_photon_energy docs indicate np.nan is returned for all-zero input
How nan values are handled when supplied as an input is neither documented nor tested.

Three examples:

import numpy as np
import pandas as pd
from pvlib.spectrum import average_photon_energy as calcape

test_all_nan = pd.DataFrame({100: [np.nan], 500: [np.nan], 1000: [np.nan]})
test_one_nan = pd.DataFrame({100: [1e-19], 500: [1e2], 1000: [np.nan]})
test_zero_nan = pd.DataFrame({100: [1e-19], 500: [1e2], 1000: [10]})

ape1 = calcape(test_all_nan)
ape2 = calcape(test_one_nan)
ape3 = calcape(test_zero_nan)

print(ape1, ape2, ape3)
0   NaN
dtype: float64 0   NaN
dtype: float64 0    2.3557
dtype: float64

In the case >= 1 nan value, ape=nan.
I was just handling a large spectral irradiance dataset and it took me a minute to realise that the reason for the resultant APE values being far fewer than the input spectral distributions was that there were occasional nan values dotted around for individual wavelengths at certain times throughout the year.

Describe the solution you'd like
Should the docs explain this behaviour and should the tests cover this? Is the current behaviour correct?

All-nan irradiance implies non-existence (different from zero) so it seems to make sense to return a nan value. In the case of nan at a single or even a few wavelengths seems more like a data quality issue. I lean towards leaving the behaviour as is, but documenting it. If documented, the user can then decide how to handle the nan-value cases, e.g. using a filling method. Seems out of scope for the function to offer this functionality. If documented, I think tests are required(?)

@RDaxini RDaxini changed the title should pvlib.spectrum.average_photon_energy document and/or test how np.nan is handled? should spectrum.average_photon_energy document and test how nan is handled? Mar 27, 2025
@RDaxini RDaxini changed the title should spectrum.average_photon_energy document and test how nan is handled? Should spectrum.average_photon_energy document and test how nan is handled? Mar 27, 2025
@AdamRJensen
Copy link
Member

I very much agree with your assessment! Document and test!

@cwhanse
Copy link
Member

cwhanse commented Mar 28, 2025

I agree with document behavior and confirm with testing, but in this case the nan is likely propagated to the output in this integration step. Handling nan for the user seems to be a workable option. scipy.integrate.trapezoid doesn't handle nan natively, so we could instead drop the nan's and the integration should work. If we do this, we should document the implied filling of nan's with linear interpolation.

@RDaxini
Copy link
Contributor Author

RDaxini commented Mar 28, 2025

Handling nan for the user seems to be a workable option.

@cwhanse are you suggesting adding a nan_policy argument, e.g. as used in this scipy.stats function?
I am wondering whether it is scope for this parameter calculation function to deal with data quality issues from the user. I don't think automatically nan values without any user input or control would be a good idea, but I'd be open to an optional nan_policy argument similar to scipy's policy linked above.

Past discussion/context: I'm reminded of #2300, it was concluded the user could/should easily handle the question of wavelength range outside of the function.

@cwhanse
Copy link
Member

cwhanse commented Mar 29, 2025

@RDaxini I was thinking of something like nan_policy = 'raise', 'omit' but I admit 'omit' seems to be a slippery slope - at what point does the integration, although a number, become meaningless? My motivation was to just fill irradiance at perhaps 10s of wavelengths, so that calculation doesn't stop if a few wavelengths were missing.

But I'm also OK if the function expects the user to handle NaNs upstream in their code.

@RDaxini
Copy link
Contributor Author

RDaxini commented Mar 31, 2025

I admit 'omit' seems to be a slippery slope - at what point does the integration, although a number, become meaningless?

I was wondering about the same thing...

I'll open a PR soon with updated docs and tests, and the expectation that the user handles nan values themselves upstream. We can always revisit the nan-handling aspect of the PR.

@RDaxini RDaxini added this to the v0.12.1 milestone Mar 31, 2025
@echedey-ls
Copy link
Contributor

But I'm also OK if the function expects the user to handle NaNs upstream in their code.

+1 for simplicity

There would be many things to take into account otherwise: interpolation kind, check that certain conditions met some criteria for the exceptions...

As someone recently said in his own issue, shit in -> shit out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants