Skip to content

ZJU-ACES-ISE/AutoReconciler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ISSTA_2023

The generator of "synthesized data" mentioned in the submission paper (ISSTA 2023)

Dependency

  • numpy
  • pandas
  • gplearn
  • sympy
// install dependencies
python setup.py install

Usage

SimulatedDataset API

class SimulatedDataset:
    def __init__(self, n_features, support, size, seed: np.random.RandomState,
                 n_confused_column=0, n_ext_column=0,
                 init_depth=(2, 2), init_method='half and half'):
        """
        init a synthesized dataset
       :param n_features: the number of features(columns) in the assertion
       :param support: support
       :param size: the number of records(rows)
       :param seed: the seed to generate data
       :param n_confused_column: the number of additional data columns which are filled with obfuscation
       :param n_ext_column: the number of additional data columns
       :param with_column_names: if use real column names (names are the result of generated or real data after desensitization)
       :param init_depth: init depth (min, max)
       :param init_method: init method: 'half and half' | 'full'
       """

    def to_csv(self, path):
        """
        :param path: saved file's path
        """

Code Sample

if __name__ == '__main__':
    for i in range(50): # generate 50 groups of data
        dataset = SimulatedDataset(n_features=3, support=0.8, size=10000,
                                   n_confused_column=1, n_ext_column=1,
                                   with_column_names=False,
                                   init_depth=(2, 2), init_method='half and half',
                                   seed=np.random.RandomState(i))
        dataset.to_csv(f"./x_3_sup_0.8_cx_1_ex_1_{i}.csv")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages