bnlearn - Library for Causal Discovery using Bayesian Learning

bnlearn is Python package for causal discovery by learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods. Because probabilistic graphical models can be difficult in usage, Bnlearn for python (this package) is build on the pgmpy package and contains the most-wanted pipelines. Navigate to API documentations for more detailed information.

⭐️ Star this repo if you like it ⭐️

Read the Medium blog for more details.

Documentation pages

On the documentation pages you can find detailed information about the working of the bnlearn with many examples.

Installation

It is advisable to create a new environment (e.g. with Conda).

conda create -n env_bnlearn python=3.10
conda activate env_bnlearn

Install bnlearn from PyPI

pip install bnlearn

Install bnlearn from github source

pip install git+https://github.com/erdogant/bnlearn

The following functions are available after installation:

# Import library
import bnlearn as bn

# Structure learning
bn.structure_learning.fit()

# Compute edge strength with the test statistic
bn.independence_test(model, df, test='chi_square', prune=True)

# Parameter learning
bn.parameter_learning.fit()

# Inference
bn.inference.fit()

# Make predictions
bn.predict()

# Based on a DAG, you can sample the number of samples you want.
bn.sampling()

# Load well-known examples to play around with or load your own .bif file.
bn.import_DAG()

# Load simple data frame of sprinkler dataset.
bn.import_example()

# Compare 2 graphs
bn.compare_networks()

# Plot graph
bn.plot()
bn.plot_graphviz()

# To make the directed graph undirected
bn.to_undirected()

# Convert to one-hot datamatrix
bn.df2onehot()

# Derive the topological ordering of the (entire) graph 
bn.topological_sort()

# See below for the exact working of the functions

The following methods are also included:

inference
sampling
comparing two networks
loading bif files
Conversion of directed to undirected graphs

Method overview

Learning a Bayesian network can be split into the underneath problems which are all implemented in this package for both discrete, continuous and mixed data sets:

Structure learning: Given the data: Estimate a DAG that captures the dependencies between the variables.
- There are multiple manners to perform structure learning.
  - Constraintsearch or PC
  - Exhaustivesearch
  - Hillclimbsearch
  - NaiveBayes
  - TreeSearch
    - Chow-liu
    - Tree-augmented Naive Bayes (TAN)
  - Direct-LiNGAM (for continuous and hybrid datasets)
  - ICA-LiNGAM (for continuous and hybrid datasets)
Parameter learning: Given the data and DAG: Estimate the (conditional) probability distributions of the individual variables.
Inference: Given the learned model: Determine the exact probability values for your queries.

Examples

A structured overview of all examples are now available on the documentation pages.

Structure learning

Example: Learn structure on the Sprinkler dataset based on a simple dataframe
Example: Comparison method and scoring types types for structure learning
Example: Learn structure on more complex dataset (Asia)

Parameter learning

Example: Parameter learning using a DAG and dataframe

Inferences

Example: Make predictions on a dataframe using inference

Sampling

Example: Sampling to create datasets

Complete examples

Example: Create a Bayesian Network, learn its parameters from data and perform the inference
Example: Use case in the medical domain
Example: Use case Titanic

Plotting

Example: Interactive plotting
Example: Static plotting
Example: Comparison of two networks

Various

Example: Saving and loading of bnlearn models
Example: Data conversions such as creating sparse datamatrix from source-target and weights
Example: Load DAG from BIF files

Various basic examples

    import bnlearn as bn
    # Example dataframe sprinkler_data.csv can be loaded with: 
    df = bn.import_example()
    # df = pd.read_csv('sprinkler_data.csv')

df looks like this

Cloudy  Sprinkler  Rain  Wet_Grass
0         0          1     0          1
1         1          1     1          1
2         1          0     1          1
3         0          0     1          1
4         1          0     1          1
..      ...        ...   ...        ...
995       0          0     0          0
996       1          0     0          0
997       0          0     1          0
998       1          1     0          1
999       1          0     1          1

    model = bn.structure_learning.fit(df)
    # Compute edge strength with the chi_square test statistic
    model = bn.independence_test(model, df)
    G = bn.plot(model)

Choosing various methodtypes and scoringtypes:

    model_hc_bic  = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
    model_hc_k2   = bn.structure_learning.fit(df, methodtype='hc', scoretype='k2')
    model_hc_bdeu = bn.structure_learning.fit(df, methodtype='hc', scoretype='bdeu')
    model_ex_bic  = bn.structure_learning.fit(df, methodtype='ex', scoretype='bic')
    model_ex_k2   = bn.structure_learning.fit(df, methodtype='ex', scoretype='k2')
    model_ex_bdeu = bn.structure_learning.fit(df, methodtype='ex', scoretype='bdeu')
    model_cl      = bn.structure_learning.fit(df, methodtype='cl', root_node='Wet_Grass')
    model_tan     = bn.structure_learning.fit(df, methodtype='tan', root_node='Wet_Grass', class_node='Rain')

Example: Parameter Learning

    import bnlearn as bn
    # Import dataframe
    df = bn.import_example()
    # As an example we set the CPD at False which returns an "empty" DAG
    model = bn.import_DAG('sprinkler', CPD=False)
    # Now we learn the parameters of the DAG using the df
    model_update = bn.parameter_learning.fit(model, df)
    # Make plot
    G = bn.plot(model_update)

Example: Inference

    import bnlearn as bn
    model = bn.import_DAG('sprinkler')
    query = bn.inference.fit(model, variables=['Rain'], evidence={'Cloudy':1,'Sprinkler':0, 'Wet_Grass':1})
    print(query)
    print(query.df)
    
    # Lets try another inference
    query = bn.inference.fit(model, variables=['Rain'], evidence={'Cloudy':1})
    print(query)
    print(query.df)

References

https://erdogant.github.io/bnlearn/
http://www.bnlearn.com/bnrepository/
http://pgmpy.org

Contributors

Setting up and maintaining bnlearn has been possible thanks to users and contributors. Thanks to:

Citation

Please cite bnlearn in your publications if this is useful for your research. See column right for citation information.

Maintainer

Erdogan Taskesen, github: erdogant
Contributions are welcome.
If you wish to buy me a Coffee for this work, it is very appreciated :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

bnlearn - Library for Causal Discovery using Bayesian Learning

Read the Medium blog for more details.

Documentation pages

Installation

It is advisable to create a new environment (e.g. with Conda).

Install bnlearn from PyPI

Install bnlearn from github source

The following functions are available after installation:

The following methods are also included:

Method overview

Examples

Structure learning

Parameter learning

Inferences

Sampling

Complete examples

Plotting

Various

Various basic examples

df looks like this

Example: Parameter Learning

Example: Inference

References

Contributors

Citation

Maintainer

Files

README.md

Latest commit

History

README.md

File metadata and controls

bnlearn - Library for Causal Discovery using Bayesian Learning

Read the Medium blog for more details.

Documentation pages

Installation

It is advisable to create a new environment (e.g. with Conda).

Install bnlearn from PyPI

Install bnlearn from github source

The following functions are available after installation:

The following methods are also included:

Method overview

Examples

Structure learning

Parameter learning

Inferences

Sampling

Complete examples

Plotting

Various

Various basic examples

df looks like this

Example: Parameter Learning

Example: Inference

References

Contributors

Citation

Maintainer