Update README.md

Bin-Cao · May 21, 2024 · 74f6a89 · 74f6a89
1 parent 713b5c2
commit 74f6a89
Showing 1 changed file with 134 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -123,3 +123,137 @@ Contribution and suggestions are always welcome. In addition, we are also lookin
 
 - 3.Entropy-based approach (熵索函数)
 
+
+``` javascript
+Signature:
+Bgolearn.fit(
+    data_matrix,
+    Measured_response,
+    virtual_samples,
+    Mission='Regression',
+    Classifier='GaussianProcess',
+    noise_std=None,
+    Kriging_model=None,
+    opt_num=1,
+    min_search=True,
+    CV_test=False,
+    Dynamic_W=False,
+    seed=42,
+)
+Docstring:
+================================================================
+PACKAGE: Bayesian global optimization-learn (Bgolearn) package .
+Author: Bin CAO <[email protected]> 
+Guangzhou Municipal Key Laboratory of Materials Informatics, Advanced Materials Thrust,
+Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511400, Guangdong, China
+================================================================
+Please feel free to open issues in the Github :
+https://github.com/Bin-Cao/Bgolearn
+or 
+contact Mr.Bin Cao ([email protected])
+in case of any problems/comments/suggestions in using the code. 
+==================================================================
+Thank you for choosing Bgolearn for material design. 
+Bgolearn is developed to facilitate the application of machine learning in research.
+
+Bgolearn is designed for optimizing single-target material properties. 
+The BgoKit package is being developed to facilitate multi-task design.
+
+If you need to perform multi-target optimization, here are two kind reminders:
+1. Multi-tasks can be converted into a single task using domain knowledge. 
+For example, you can use a weighted linear combination in the simplest situation. That is, y = w*y1 + y2...
+
+2. Multi-tasks can be optimized using Pareto fronts. 
+Bgolearn will return two arrays based on your dataset: 
+the first array is a evaluation score for each virtual sample, 
+while the second array is the recommended data considering only the current optimized target.
+
+The first array is crucial for multi-task optimization. 
+For instance, in a two-task optimization scenario, you can evaluate each candidate twice for the two separate targets. 
+Then, plot the score of target 1 for each sample on the x-axis and the score of target 2 on the y-axis. 
+The trade-off consideration is to select the data located in the front of the banana curve.
+
+I am delighted to invite you to participate in the development of Bgolearn. 
+If you have any issues or suggestions, please feel free to contact me at [email protected].
+================================================================
+Reference : 
+document : https://bgolearn.netlify.app/
+================================================================
+
+:param data_matrix: data matrix of training dataset, X .
+
+:param Measured_response: response of tarining dataset, y.
+
+:param virtual_samples: designed virtual samples.
+
+:param Mission: str, default 'Regression', the mission of optimization.  Mission = 'Regression' or 'Classification'
+
+:param Classifier: if  Mission == 'Classification', classifier is used.
+        if user isn't applied one, Bgolearn will call a pre-set classifier.
+        default, Classifier = 'GaussianProcess', i.e., Gaussian Process Classifier.
+        five different classifiers are pre-setd in Bgolearn:
+        'GaussianProcess' --> Gaussian Process Classifier (default)
+        'LogisticRegression' --> Logistic Regression
+        'NaiveBayes' --> Naive Bayes Classifier
+        'SVM' --> Support Vector Machine Classifier
+        'RandomForest' --> Random Forest Classifier
+
+:param noise_std: float or ndarray of shape (n_samples,), default=None
+        Value added to the diagonal of the kernel matrix during fitting.
+        This can prevent a potential numerical issue during fitting, by
+        ensuring that the calculated values form a positive definite matrix.
+        It can also be interpreted as the variance of additional Gaussian.
+        measurement noise on the training observations.
+
+        if noise_std is not None, a noise value will be estimated by maximum likelihood
+        on training dataset.
+
+:param Kriging_model (default None):
+        str, Kriging_model = 'SVM', 'RF', 'AdaB', 'MLP'
+        The  machine learning models will be implemented: Support Vector Machine (SVM), 
+        Random Forest(RF), AdaBoost(AdaB), and Multi-Layer Perceptron (MLP).
+        The estimation uncertainity will be determined by Boostsrap sampling.
+    or  
+        a user defined callable Kriging model, has an attribute of <fit_pre>
+        if user isn't applied one, Bgolearn will call a pre-set Kriging model
+        atribute <fit_pre> : 
+        input -> xtrain, ytrain, xtest ; 
+        output -> predicted  mean and std of xtest
+
+        e.g. (take GaussianProcessRegressor in sklearn):
+        class Kriging_model(object):
+            def fit_pre(self,xtrain,ytrain,xtest):
+                # instantiated model
+                kernel = RBF()
+                mdoel = GaussianProcessRegressor(kernel=kernel).fit(xtrain,ytrain)
+                # defined the attribute's outputs
+                mean,std = mdoel.predict(xtest,return_std=True)
+                return mean,std    
+
+        e.g. (MultiModels estimations):
+        class Kriging_model(object):
+            def fit_pre(self,xtrain,ytrain,xtest):
+                # instantiated model
+                pre_1 = SVR(C=10).fit(xtrain,ytrain).predict(xtest) # model_1
+                pre_2 = SVR(C=50).fit(xtrain,ytrain).predict(xtest) # model_2
+                pre_3 = SVR(C=80).fit(xtrain,ytrain).predict(xtest) # model_3
+                model_1 , model_2 , model_3  can be changed to any ML models you desire
+                # defined the attribute's outputs
+                stacked_array = np.vstack((pre_1,pre_2,pre_3))
+                means = np.mean(stacked_array, axis=0)
+                std = np.sqrt(np.var(stacked_array), axis=0)
+                return mean, std    
+
+:param opt_num: the number of recommended candidates for next iteration, default 1. 
+
+:param min_search: default True -> searching the global minimum ;
+                           False -> searching the global maximum.
+
+:param CV_test: 'LOOCV' or an int, default False (pass test) 
+        if CV_test = 'LOOCV', LOOCV will be applied,
+        elif CV_test = int, e.g., CV_test = 10, 10 folds cross validation will be applied.
+
+:return: 1: array; potential of each candidate. 2: array/float; recommended candidate(s).
+File:      ~/miniconda3/lib/python3.9/site-packages/Bgolearn/BGOsampling.py
+Type:      method
+```