Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to simplify input/output descriptors and decorators #6124

Draft
wants to merge 1 commit into
base: branch-24.12
Choose a base branch
from

Conversation

dantegd
Copy link
Member

@dantegd dantegd commented Oct 25, 2024

This PR aims to refactor our descriptors and decorators to simplify them to make them significantly easier to test, debug and mantain.

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Oct 25, 2024
@dantegd
Copy link
Member Author

dantegd commented Oct 25, 2024

@betatim @wphicks @divyegala this reflects our POC after our initial discussions, will be applying them to a real estimator alongside finishing some todos so we can see the full design in action and discuss any necessary aspects remaining.

self.intercept_ = CumlArray.zeros(self.n_features_in_,
dtype=self.dtype)

# do awesome C++ fitting here :)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One weird thing while playing with this a bit: when I add a print(f"{self.coef_=}") here I get the following:

Traceback (most recent call last):
  File "/home/coder/cuml/../ff.py", line 9, in <module>
    e.fit(X, y)
  File "/home/coder/.conda/envs/rapids/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/.conda/envs/rapids/lib/python3.12/site-packages/cuml/sample/estimator.py", line 32, in wrapper
    result = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/.conda/envs/rapids/lib/python3.12/site-packages/cuml/sample/estimator.py", line 115, in fit
    print(f"{self.coef_=}")
             ^^^^^^^^^^
  File "base.pyx", line 337, in cuml.internals.base.Base.__getattr__
AttributeError: coef_. Did you mean: '_coef_'?

I have stared at this for quite a while but can't work out what is going on??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realised about 2min after leaving the office: it is because _is_fitted doesn't get set until fit returns. Maybe something to improve as it makes for a tedious to debug thing :D - I'll ponder a suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the same behavior a scikit-learn, no?

Copy link
Member

@betatim betatim Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I was trying to access the coef_ attribute within fit.

In scikit-learn these are normal attributes, so once they are set you can use them. Right now we define __getattribute__ which uses _is_fit. I think it is a bit weird to have code like this fail, mostly because it makes you question your sanity and because the exception doesn't contain a clue (we get to see the AttributeError from __getattr__ not __getattribute__ :():

self.foo_ = 42
print(self.foo_) # Error, `foo_` doesn't exist!

Maybe we can get around the need to checking _is_fit and using __getattribute__ by recording inside the DynamicDescriptor if it has been set or not:

class DynamicDescriptor:
  def __init__(self, attribute_name):
    self.set = False
    self.attribute_name = f"{name}"

  def __get__(self, obj, objtype=None):
    if obj is None:
       return self
    if not self.set:
      raise AttributeError(f"{obj.__class__.__name__} object has no attribute {self.attribute_name}")
    else:
      if GlobalSettings().is_internal
        return self.raw
      else:
        return self.raw.to_output(obj._input_type)

  def __set__(self, obj, value):
    self.set = True
    # we can even store the value inside the descriptor?!
    self.raw = value

This might need a bit of tweaking to make the message in the exception look right ("'Estimator' object has no attribute 'foo_'").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants