Fixed GP fitness again #301

jmafoster1 · 2025-01-09T09:24:24Z

The fitness function would still, very rarely, give a fitness that was a complex number. I think this was because some of the predicted values from candidate functions would evaluate sqrt(-1), which would then give complex distances, and thus a complex fitness. I now return float("inf") if the dtype of the predicted values is not the same as the dtype of the expected values, which should hopefully fix the problem in a robust way.

github-actions · 2025-01-09T09:25:24Z

🦙 MegaLinter status: ✅ SUCCESS

Descriptor	Linter	Files	Fixed	Errors	Elapsed time
✅ PYTHON	black	36		0	0.98s
✅ PYTHON	pylint	36		0	5.87s

See detailed report in MegaLinter reports

MegaLinter is graciously provided by

codecov · 2025-01-09T09:27:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.02%. Comparing base (f22af96) to head (1cd053e).
Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #301   +/-   ##
=======================================
  Coverage   97.02%   97.02%           
=======================================
  Files          29       29           
  Lines        1849     1849           
=======================================
  Hits         1794     1794           
  Misses         55       55

Files with missing lines	Coverage Δ
...stimation/genetic_programming_regression_fitter.py	`98.86% <100.00%> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bf3f4c...1cd053e. Read the comment docs.

f-allian · 2025-01-09T09:56:58Z

@jmafoster1 Can you explain in a bit more detail what the problem is here? It doesn't make sense for any of the predicted values to be complex so I think hard-coding a condition to eliminate them is probably not the best way of resolving it

jmafoster1 · 2025-01-09T10:40:35Z

The problem is that candidate expressions are generated at random in GP, so there is the possibility of evaluating the square root of a negative number during GP. We cannot prevent this unless we remove the sqrt operator from the set of operators. Some versions of sqrt give an exception if you try to evaluate a negative (which we catch in the fitness method), but the one I've been using returns an instance of np.complex128. This PR addresses that by giving candidate expressions and infinite fitness (i.e. really bad) if they produce an output that's of a different type to the observed output. Does that make sense?

f-allian · 2025-01-09T11:28:11Z

The problem is that candidate expressions are generated at random in GP, so there is the possibility of evaluating the square root of a negative number during GP. We cannot prevent this unless we remove the sqrt operator from the set of operators. Some versions of sqrt give an exception if you try to evaluate a negative (which we catch in the fitness method), but the one I've been using returns an instance of np.complex128. This PR addresses that by giving candidate expressions and infinite fitness (i.e. really bad) if they produce an output that's of a different type to the observed output. Does that make sense?

This doesn't make much sense to me. If, for whatever strange reason, your y_estimates is yielding an array of complex numbers then your current formula for nrmse isn't appropriate. You would instead have to calculate the magnitude of the sum of squares, i.e:

Edit:

~~nrmse = np.abs(sqerrors.sum() / len(self.df)) / (self.df[self.outcome].max() - self.df[self.outcome].min())~~

What I meant was:

sqerrors = np.abs(self.df[self.outcome] - y_estimates) ** 2
nrmse = np.sqrt(sqerrors.sum() / len(self.df)) / (self.df[self.outcome].max() - self.df[self.outcome].min())

Does that make sense?

jmafoster1 · 2025-01-09T11:32:01Z

My point here is that if it's returning an array of complex numbers, then the candidate expression is wrong, so should be assigned infinite fitness (we are minimising here, so fitness infinity is infinitely bad).

f-allian · 2025-01-09T11:42:28Z

My point here is that if it's returning an array of complex numbers, then the candidate expression is wrong, so should be assigned infinite fitness (we are minimising here, so fitness infinity is infinitely bad).

Sorry, had a typo in my above comment (see above).

If some candidate expressions are wrong/complex dtypes, can you not filter them out instead and avoid doing all of this?

jmafoster1 · 2025-01-09T11:46:46Z

Unfortunately not. Every individual in the population must have a fitness value assigned to it. Better individuals will persist across generations of the population, with poorer individuals being filtered out (based on fitness value). However, in this case, there is no easy way to generate guaranteed valid individuals (i.e. individuals which will always produce real values). The best we can do is give invalid individuals very poor fitness values so that they (hopefully) do not persist for long. It's a fairly standard practice in GP.

jmafoster1 · 2025-01-09T11:47:38Z

The ideal situation would be to do this in a strongly typed language, so we could guarantee that every individual was at least valid, but that's just a limitation of doing it in Python

f-allian · 2025-01-09T12:08:19Z

Unfortunately not. Every individual in the population must have a fitness value assigned to it. Better individuals will persist across generations of the population, with poorer individuals being filtered out (based on fitness value). However, in this case, there is no easy way to generate guaranteed valid individuals (i.e. individuals which will always produce real values). The best we can do is give invalid individuals very poor fitness values so that they (hopefully) do not persist for long. It's a fairly standard practice in GP.

It sounds like you've thought it through, but I can't quite agree with this approach. Assigning specific fitness values to a selected group of individuals is fine, and sounds like some form of regularisation. But it sounds like the fitness function/model you're employing is probably not well-constrained. I'll approve this PR but it might be worth something coming back to in the future IMO.

jmafoster1 · 2025-01-09T13:05:34Z

Thanks Farhad. Yes, I'm not really a fan. DEAP has lots of limitations and weird workarounds like this, but it's the most established and best documented toolkit for genetic algorithms that I've found so far.

Fixed GP fitness again

1cd053e

jmafoster1 marked this pull request as ready for review January 9, 2025 09:28

jmafoster1 requested a review from f-allian January 9, 2025 09:28

f-allian approved these changes Jan 9, 2025

View reviewed changes

jmafoster1 merged commit 641107f into main Jan 9, 2025
22 checks passed

jmafoster1 deleted the jmafoster1/fix-gp-fitness branch January 9, 2025 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed GP fitness again #301

Fixed GP fitness again #301

jmafoster1 commented Jan 9, 2025

github-actions bot commented Jan 9, 2025

codecov bot commented Jan 9, 2025 •

edited

Loading

f-allian commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

f-allian commented Jan 9, 2025 •

edited

Loading

jmafoster1 commented Jan 9, 2025

f-allian commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

f-allian commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

Fixed GP fitness again #301

Fixed GP fitness again #301

Conversation

jmafoster1 commented Jan 9, 2025

github-actions bot commented Jan 9, 2025

🦙 MegaLinter status: ✅ SUCCESS

codecov bot commented Jan 9, 2025 • edited Loading

Codecov Report

f-allian commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

f-allian commented Jan 9, 2025 • edited Loading

jmafoster1 commented Jan 9, 2025

f-allian commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

f-allian commented Jan 9, 2025

jmafoster1 commented Jan 9, 2025

codecov bot commented Jan 9, 2025 •

edited

Loading

f-allian commented Jan 9, 2025 •

edited

Loading