Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical_Method LHS Input Encoding Error #265

Open
HiddenBao opened this issue Aug 15, 2023 · 4 comments
Open

Categorical_Method LHS Input Encoding Error #265

HiddenBao opened this issue Aug 15, 2023 · 4 comments

Comments

@HiddenBao
Copy link

  • Operating System: Windows 11 OS
  • Python version: 3.9.17
  • summit version used: 0.8.9

Description

I am currently testing a future experimental setup condition in which I am trying to create initial experiments using LHS as a strategy whilst containing categorical and continuous data. When running the code I am getting an area when generating the experimental conditions. I am working within JupyterLab, I have removed some of the names from the file path but everything is the same.

What I Did

import cython
import summit
from summit.benchmarks import ExperimentalEmulator
from summit.domain import *
from summit.utils.dataset import DataSet
from summit.strategies import SOBO, MultitoSingleObjective, LHS
import numpy as np
import pandas as pd
import pkg_resources
import pathlib

DATA_PATH = pathlib.Path("F:/Python Programs/NKData")
input_df = pd.read_csv(DATA_PATH / 'BoundariesV2.csv')

domain = Domain()
for idx, row in input_df.iterrows():
    name = row[0]  
    description = row[5]  
    data_type = row['Type']
    
    if data_type == 'Categorical':
        levels = row[2].split(',')  
        
        domain += CategoricalVariable(
            name=name,
            description=description,
            levels=levels
        )
    elif data_type == 'Continuous':
        bounds = [row[3], row[4]]
        
        domain += ContinuousVariable(
            name=name,
            description=description,
            bounds=bounds
        )
    elif data_type == 'Objective':
        bounds = [row[3], row[4]]
        maximize = row[6]
        
        domain += ContinuousVariable(
            name=name,
            description=description,
            bounds=bounds,
            is_objective=True,
            maximize=maximize
        )

domain
categorical_method: str = "one-hot"
StartStrat = LHS(domain, random_state = np.random.RandomState(808), categorical_method=categorical_method)
StartExp = StartStrat.suggest_experiments(10)
StartExp

Output

Name Type Description Values
Temperature continuous, input Reaction temperature in degrees Celsius (ºC) [40.0,80.0]
Catalyst_Amount continuous, input Catalyst amounts in molar equivalents (Equiv.) [0.01,1.0]
Starting_Reagent continuous, input 2-Methylimidozole amounts in molar equivalents (Equiv.) [1.1,2.0]
Solvent continuous, input Solvent amount in milliliters (mL) [0.1,0.35]
Time continuous, input Duration of reaction in hours (hr) [2.0,24.0]
Base continuous, input Base amount in molar equivalents (Equiv.) [1.0,5.0]
Catalyst_Type categorical, input Catalyst Types 3 levels
Main_Product continuous, maximize objective LCAP of Main Product [0.0,1.0]
Main_Impurity continuous, minimize objective LCAP of Main Impurity [0.0,1.0]

Error I get from running the very last cell

AttributeError Traceback (most recent call last)
Cell In[4], line 4
2 categorical_method: str = "one-hot"
3 StartStrat = LHS(domain, random_state = np.random.RandomState(808), categorical_method=categorical_method)
----> 4 StartExp = StartStrat.suggest_experiments(10)
5 StartExp

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\random.py:286, in LHS.suggest_experiments(self, num_experiments, criterion, exclude, **kwargs)
284 design = DataSet.from_df(design)
285 design[("strategy", "METADATA")] = "LHS"
--> 286 return self.transform.un_transform(
287 design, categorical_method=self.categorical_method
288 )

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\base.py:324, in Transform.un_transform(self, ds, **kwargs)
318 # Categorical variables using one-hot encoding
319 elif (
320 isinstance(variable, CategoricalVariable)
321 and categorical_method == "one-hot"
322 ):
323 # Get one-hot encoder
--> 324 enc = self.encoders[variable.name]
326 # Get array to be transformed
327 one_hot_names = [f"{variable.name}_{l}" for l in variable.levels]

AttributeError: 'Transform' object has no attribute 'encoders'

I apologise for any formatting issues I am new to this but I would greatly appreciate any help or advice for workarounds. Thank you for providing this library it is amazing.

@marcosfelt
Copy link
Collaborator

I am so sorry that I did not see this. If it is still relevant for me to take a look, please respond @HiddenBao

I will take a look later this week.

@HiddenBao
Copy link
Author

No worries! I briefly went back and redid the code again without requiring the .csv file to see if it was the issue however I get the same error.

What I Did

import summit
from summit.benchmarks import ExperimentalEmulator
from summit.domain import *
from summit.utils.dataset import DataSet
from summit.strategies import LHS, MTBO

domain = Domain()

domain += CategoricalVariable(
    name = "Catalyst", 
    description = "Test",
    levels = [
        "A",
        "B",
        "C",
        "D"
    ],
)

domain += ContinuousVariable(
    name = "Temperature",
    description = "Test",
    bounds = [40, 80]
)

domain += ContinuousVariable(
    name = "Catalyst_Amount",
    description = "Test",
    bounds = [0.01, 1.0]
)

domain += ContinuousVariable(
    name = "Reagent",
    description = "Test",
    bounds = [1.1, 2.0]
)

domain += ContinuousVariable(
    name = "Solvent",
    description = "Test",
    bounds = [0.1, 0.35]
)

domain += ContinuousVariable(
    name = "Time",
    description = "Test",
    bounds = [2.0, 24]
)

domain += ContinuousVariable(
    name = "Base",
    description = "Test",
    bounds = [1.0, 5.0]
)


domain += ContinuousVariable(
    name = "Main_Product",
    description = "Test",
    bounds = [0, 1],
    is_objective = True,
    maximize = True
)

domain += ContinuousVariable(
    name = "Main_Impurity",
    description = "Test",
    bounds = [0, 1],
    is_objective = True,
    maximize = False
)
domain

The domain gets created perfectly fine to my knowledge, I then tried running the transform in the LHS and MTBO strategy using the following.

strategy = LHS(domain,
                 random_state = np.random.RandomState(808),
                 categorical_method="one-hot"
                )
StartExp = StartStrat.suggest_experiments(10)
StartExp
strategy = MTBO(domain,
                 random_state = np.random.RandomState(808),
                 categorical_method="one-hot"
                )
StartExp = StartStrat.suggest_experiments(10)
StartExp

And both return the same error still.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[15], line 5
      1 strategy = MTBO(domain,
      2                  random_state = np.random.RandomState(808),
      3                  categorical_method="one-hot"
      4                 )
----> 5 StartExp = StartStrat.suggest_experiments(10)
      6 StartExp

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\random.py:286, in LHS.suggest_experiments(self, num_experiments, criterion, exclude, **kwargs)
    284 design = DataSet.from_df(design)
    285 design[("strategy", "METADATA")] = "LHS"
--> 286 return self.transform.un_transform(
    287     design, categorical_method=self.categorical_method
    288 )

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\base.py:324, in Transform.un_transform(self, ds, **kwargs)
    318 # Categorical variables using one-hot encoding
    319 elif (
    320     isinstance(variable, CategoricalVariable)
    321     and categorical_method == "one-hot"
    322 ):
    323     # Get one-hot encoder
--> 324     enc = self.encoders[variable.name]
    326     # Get array to be transformed
    327     one_hot_names = [f"{variable.name}_{l}" for l in variable.levels]

AttributeError: 'Transform' object has no attribute 'encoders'

Thank you again for creating this library, it is great and extremely useful. I greatly appreciate all the work that has been done for it.

@marcosfelt
Copy link
Collaborator

This definitely seems like a bug - I will take a look this weekend!

@marcosfelt
Copy link
Collaborator

Can confirm that I can reproduce the bug. Going to look into a fix now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants