Releases: mad-lab-fau/tpcp
Releases · mad-lab-fau/tpcp
v2.0.0 - Major scorer rework
[2.0.0] - 2024-10-24
Added
- The global cache helper now support algorithms with multiple action methods by specifying the name of the action
method you want to cache.
(#118) - Global disk cache helper should now be able to cache the action methods of algorithm classes defined in the main
script.
(#118) - There are new builtin
FloatAggregator
andMacroFloatAggregator
that should cover many of the use cases that
previously required custom aggregators.
(#118) - Scorers now support passing a
final_aggregator
. This is called after all scoring and aggregation happens and allows
to implement complicated "meta" aggregation that depends on the results of all scores of all datapoints.
Note, that we are not sure yet, if this should be used more as an escape hedge and overusing it should be considered
an anti-pattern, or if it is exactly the other way around.
We need to experiment in a couple of real-life applications to figure this out.
(#120) - Dataset classes now have a proper
__equals__
implementation.
(#120)
Changed
- Relative major overhall of how aggregator in scoring functions work. Before, aggregators were classes that were
initialized with the value of a score. Now they are instances of a class that is called with the value of a score.
This change allows it to create "configurable" aggregators that get the configuration at initialization time.
(#118)
This comes with a couple of breaking changes:- The most "user-facing" one is that the
NoAgg
aggregator is now calledno_agg
indicating that it is an instance
of a class and not a class itself. - All custom aggregators need to be rewritten, but you will likely find, that they are much simpler now.
(see the reworked examples for custom aggregators)
- The most "user-facing" one is that the
Fixed
- Fixed massive performance regression in version 0.34.1 affecting people that had tensorflow or torch installed, but
did not use it in their code.
The reason for that was, that we imported the two modules in the global scope, which caused importing tpcp to be very
slow.
This was particularly noticeable in case of multiprocessing, as the module was imported in every worker process.
We now only import the module, within the clone function and only, if you had imported it before.
(#118) - The custom hash function now has a different way of hashing functions and classes defined in local scopes.
This should prevent strange pickling errors from just using "tpcp" normally.
(#118)
Removed
score
functions implemented directly as method on the pipeline class are no longer supported.
Score functions now need to be independent functions that take a pipeline instance as their first argument.
For this reason, it is also no longer supported to passNone
as argument toscoring
in any validate or optimize
method.
(#120)
v1.0.1 - Resolved install issues with UV
[1.0.1] - 2024-10-18
Fixes names of optional dependency groups. That should resolve install issues when using uv as package manager.
v1.0.0 - Cross-Validation improved!
[1.0.0] - 2024-07-03
Note: This is a major version bump, because we have quite substantial breaking changes. The 1.0 should not signal that we
are now feature complete. Though the core APIs have been mostly stable for quite some time now.
BREAKING CHANGE
- Instead of the (annoying)
mock_label
andgroup_label
arguments, all functions that take a cv-splitter as input,
can now take an instance of the newDatasetSplitter
class, which elegantly handles grouping and stratification and
also removes the need of forwarding themock_label
andgroup_label
arguments to the underlying optimizer.
The use of themock_label
andgroup_label
arguments has been removed without depreciation.
(#114) - All classes and methods that "grid-search" or "cross-validate" like output (
GridSearch
,GridSearchCv
,cross_validate
,validate
)
have updated names for all their output attributes.
In most cases the output naming has switched from a single underscore to a double underscore to separate the different
parts of the output name to make it easier to programmatically access the output.
(#117)
v0.34.1 - Fix Torch and Tensorflow support
Fixed
- The torch hasher was not working at all. This is hopefully fixed now.
- The tensorflow clone method did not work. Switched to specialized implementation that hopefully works.
v0.34.0 - Some smaller improvments
[0.34.0] - 2024-06-28
Added
- Dataset classes are now generic and allow you to provide the group-label tuple as generic. This allows for better type
checking and IDE support. (#113)
Changed/Fixed
- The snapshot utilities are much more robust now and rais appropriate errors when the stored dataframes have
unsupported properties. (#112)
v0.33.1 - Less caching warnings
Less cahching warnings and closes #111
v0.33.0 - Some more TypedIterator stuff and some QoL improvements
[0.33.0] - 2024-05-23
Added
custom_hash
the internally used hashing method based on pickle is now part of the public API viatpcp.misc
.DummyOptimize
allows to ignore the warning that it usually throws.
Changed
- Relative large rework of the TypedIterator. We recommend to reread the example.
v0.32.0 - Better snapshots
[0.32.0] - 2024-04-17
- The snapshot plugin now supports a new command line argument
--snapshot-only-check
that will fail the test if no
snapshot file is found. This is usefull for CI/CD pipelines, where you want to ensure that all snapshots are up to
date. - The snapshot plugin is now installed automatically when you install tpcp. There is no need to set it up in the conftest
file anymore.
v0.31.2 - More Typed Iterator fixes
[0.31.2] - 2024-02-01
Fixed
- TypedIterator does not run into a RecursionError anymore, when attributes with the wrong name are accessed.
v0.31.1 - Fix agg in typed iterator
[0.31.1] - 2024-02-01
Fixed
- TypedIterator now skips aggregation when no values are provided