Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate pgmpy for Bayesian networks capabilities #47

Open
ceteri opened this issue Dec 22, 2020 · 6 comments
Open

Integrate pgmpy for Bayesian networks capabilities #47

ceteri opened this issue Dec 22, 2020 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ceteri
Copy link
Collaborator

ceteri commented Dec 22, 2020

Integrated pgmpy for statistical inference in Bayesian networks.

Depends on: #26

@ceteri ceteri added the enhancement New feature or request label Dec 22, 2020
@ceteri ceteri added this to the Release 0.2.x milestone Dec 22, 2020
@ceteri ceteri added the good first issue Good for newcomers label Mar 5, 2021
@ceteri ceteri modified the milestones: Release 0.3.x, Release 0.4.x Mar 7, 2021
@Ankush-Chander
Copy link
Collaborator

Hey @ceteri,

I need some pointers to understand this requirement better.

Thanks in advance.

@ceteri
Copy link
Collaborator Author

ceteri commented Apr 25, 2021

Thank you @Ankush-Chander!
Here's an idea, if this seems reasonable as an approach?

There are several kinds of modeling, sampling, and inference implemented by pgmpy, although probably our shortest path is for focusing on Discrete Bayesian? This is also one of the top-requested features to add to kglab from our ongoing survey.

Next steps are:

  1. Build an example Discrete Bayesian model in pgmpy which produces known results – which we can use to verify the integration later
    • for example, using one of the examples given in their documentation
    • or, ideally, based on data in the recipe progressive example that we use
  2. Represent the data from this model in an RDF graph
  3. Develop a new class method for kglab.KnowledgeGraph or probably even better for kglab.Subgraph that loads the pgmpy model data from the KG
  4. Verify results from above, to use as a unit test

We can also decide whether to have some additional wrappers for pgmpy and its results. On the one hand, it's great to wrap results into pandas dataframes and other conveniences for data science workflows. On the other hand, it's probably better to allow people to simply use pgmpy operations on the model directly. The latter approach is how we've handled integration of PyTorch, PyVis, etc., i.e., not to intermediate unless there are pain points that need to be corrected (as in SPARQL queries).

How does that sound as an approach?

@Ankush-Chander
Copy link
Collaborator

Hey @ceteri

I tried to follow above trail but I was not able to find any widely accepted standard rdf representation of bayesian networks. Will need your help in that.

Once we pinpoint that we can provide user a pathway to move from a standard bn rdf file to kg to pgmpy model. Rest of the operation can be done directly using pgmpy endpoints.

Thanks

@ceteri
Copy link
Collaborator Author

ceteri commented May 7, 2021

Hi @Ankush-Chander, good point! The way I described it above, moving from RDF => pgmpy wouldn't work directly, and there's not standard representation.

What I should have described better:

  1. Choose a simple example Bayesian network problem
  2. Build a solution for it in pgmpy, so we have a known baseline to test against
  3. At that point, I'll represent in RDF (as idiomatic as possible; this becomes simpler after RDF-star is available)
  4. Then we can scope how best to use the Subgraph classes to transform into pgmpy

If the selected example problem can involve the "progressive example" of recipes used in the tutorial, that would be ideal. Although that's not necessary first for us to build out an integration. The initial test case should be simple, as the priority. We can always construct recipe examples later :)

Does that describe the problem better?

The intention for this is to illustrate how to use a completely different graph technology (Bayesian networks) on graph data, which can complement the other approaches we have with NetworkX, RDFlib, pslpython, PyTorch, etc.

Many thanks,
Paco

@ceteri ceteri removed this from the Release 0.4.x milestone May 10, 2021
@Ankush-Chander
Copy link
Collaborator

Hey @ceteri,

Took a while to get my head around Bayesian inferencing.

Here"s the test example.

P.S: Original cancer model although simple made some very gloomy assumptions, so I had to choose something positive :)
I hope it"s simple enough for our purpose

3. At that point, I'll represent in RDF (as idiomatic as possible; this becomes simpler after RDF-star is available)

Any pointers on step 3 will be helpful for me to continue.

Thanks in advance,
Ankush

@ceteri
Copy link
Collaborator Author

ceteri commented May 19, 2021

Wonderful, thank you @Ankush-Chander !

Now I get to wrangle with some RDF representation, hopefully with not too much reification required :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants