-
Built on top of dcs_fn_ransac_estimator_2.py
-
Previous version does not chekc if the f1 score improves every iteration during inductive generalization.
If the resampling in the while loop doesn't yeild better seed training samples 5 times stright, the iteration terminates. Max iteration is empriricaly defined as squared root of sample set size, but it depends on actual data distribution. Stopping signal is returned if count reaches max iterations.
-
def find_inliers(estimator, X_all, X, y, verbose = False):
This part has major difference from previous RANSAC1 confidence metric. Before we catch 10 and 90 percentile confidence interval from all unlabeled data. However, when unlabeled data are few and sparse. It is misleading that the 90 percnetile of sampled unlabeled data distribution is the same as 90 percentile of gobal confidence landscape of the whole feature space. In face np.percentile grabs the nearest
point to be the 90 percentile sample, even if that sample has negatvie confidence score. By estimating the gobal confidence score distribution, there's no longer sign issue for the confidence interval.