Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pyAgrum package and MIIC algorithm? #115

Open
bdatko opened this issue Jun 7, 2024 · 15 comments
Open

Add pyAgrum package and MIIC algorithm? #115

bdatko opened this issue Jun 7, 2024 · 15 comments

Comments

@bdatko
Copy link

bdatko commented Jun 7, 2024

I think pyAgrum would be a great addition to the list of algorithms. To my eyes, it did not look like there was a comparison in benchpress using the Multivariate Information-based Inductive Causation (MIIC) algorithm which pyAgrum has implemented. The library also offer a scikit-learn interface to learn classifiers which should help with the integration into benchpress.

@felixleopoldo
Copy link
Owner

Hi, that sounds like a good idea. In pyAgrum they call the useMIIC function on a learner object (link) and link, but it's not totally clear how to pass arguments to the algorithm, like choosing score or test function. Do you have some sample usage?
MIIC also seems to be implemented here. Do you know which one to prefer?

@bdatko
Copy link
Author

bdatko commented Jun 8, 2024

@felixleopoldo The useMIIC is the their lower-level API, but there is a convenience class pyAgrum.skbn.BNClassifier where the default choice of learningMethod is MIIC. The other choice for learningMethod are: Chow-Liu, NaiveBayes, Tree-augmented NaiveBayes, MIIC + (MDL or NML), Greedy Hill Climb, Tabu. You can use scoringType within the initializer of pyAgrum.skbn.BNClassifier to pick your flavor: AIC, BIC, BD, BDeu, K2, Log2.

There are examples of using pyAgrum.skbn.BNClassifier within this notebook titled Learning classifiers, shown below is a call using MIIC (cell 7 from the linked notebook):

#we use now another method to learn the BN (MIIC)
BNTest= skbn.BNClassifier(learningMethod = 'MIIC', prior= 'Smoothing', priorWeight = 0.5,
                          discretizationStrategy = 'quantile', usePR = True, significant_digit = 13)

xTrain, yTrain = BNTest.XYfromCSV(filename = 'res/creditCardTest.csv', target = 'Class')

More examples using BNClassifier can be found in the notebook titled Comparing classifiers (including Bayesian networks) with scikit-learn.

I have only used pyAgrum because I don't know R so, I have never directly compared the two. pyAgrum is a Python wrapper around the aGrum C++ library where their MIIC implementation is sourced in C++. It looks similar to how the original authors of MIIC provide a C++ implementation wrapped in R, but I don't know for sure.

Let me know if you need any more help. =)

@felixleopoldo
Copy link
Owner

Thanks. It seems like they refer to the Bayesian network as a classifier, where one is specified as Target? It would be nice if you could show how to do the following two steps:

  1. Learn the graph of a Bayesian network from a CSV data file (in the Benchpress data format) using with relevant parameters for structure learning
  2. Write the adjacency matrix representation of the graph to a CSV file following Benchpress graph format

@bdatko
Copy link
Author

bdatko commented Jun 12, 2024

  1. Learn the graph of a Bayesian network from a CSV data file (in the Benchpress data format) using with relevant parameters for structure learning

I hope the example below demos what you need.

  1. Write the adjacency matrix representation of the graph to a CSV file following Benchpress graph format

From what I know, there isn't any convenient writer to save the adjacency matrix to CSV so, shown below is a small helper to save the matrix in the format for benchpress.

The example assumes you have the following installed in your environment: pyAgrum, pandas, scikit-learn. You will need all three to run the example below.

import csv
from pathlib import Path

import pandas as pd
import pyAgrum.skbn as skbn
from pyAgrum import BayesNet


def adjacency_to_csv(bn: BayesNet, *, to_file: str):

    id_to_name = {bn.idFromName(name): name for name in bn.names()}

    with Path(to_file).open(mode="w", encoding="utf-8") as csvfile:
        writer = csv.writer(csvfile)
        # write header
        writer.writerow(id_to_name[col_id] for col_id in range(bn.size()))
        #write rows
        adj_mat = bn.adjacencyMatrix()
        writer.writerows(row for row in adj_mat)


data = pd.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv"
).dropna()

data.to_csv("fully_obs_titanic.csv", index=False)

classifier = skbn.BNClassifier(learningMethod="MIIC", scoringType="BIC")
xdata, ydata = classifier.XYfromCSV(filename="fully_obs_titanic.csv", target="survived")
classifier.fit(xdata, ydata)

adjacency_to_csv(classifier.bn, to_file="resulting_adjacency.csv")

Here is the resulting adjacency matrix:

❯ cat resulting_adjacency.csv
survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,1,0,0,0,0,1,0,0,0,0,0,0,0,0
0,0,1,1,0,0,0,0,0,0,1,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,1,0
0,1,0,0,0,0,0,0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,1,1,0,0,0,1,0,0,0,0,0

I ran this example with the following environment:

Python 3.11.7
numpy               1.26.4
pandas              2.2.2
pyAgrum             1.14.0
scikit-learn        1.5.0
scipy               1.13.1

@felixleopoldo
Copy link
Owner

Thanks a lot. So for the target variable (survived), can we just choose the first one in the order?

@bdatko
Copy link
Author

bdatko commented Jun 12, 2024

For the fit method of BNClassifier you can specify any column within the CSV file, see here. Shown below is the snippet for the target

Fits the model to the training data provided. The two possible uses of this function are fit(X,y) and fit(data=…, targetName=…). Any other combination will raise a ValueError

  • targetName (str) – specifies the name of the targetVariable in the csv file. Warning: Raises ValueError if either X or y is not None. Raises ValueError if data is None.

@felixleopoldo
Copy link
Owner

Ok!

@phwuil
Copy link

phwuil commented Jun 18, 2024

Hi @felixleopoldo , many thanks to @bdatko for this "issue".

Actually, BNClassifier is based on the BNLearner class. If you want to test the learning algorithms of pyAgrum, you should use BNLearner.
MIIC is a "constraint-based" method based on mutual information. There is no score but one can apply corrections (MDL/NML). Of course, you can add some priors for the parameters approximation.

import pyAgrum as gum
learner=gum.BNLearner("test.csv") # MIIC is used as default (some score-based are also implented)
learner.useMDLCorrection() # for small dataset
learner.useSmoothingPrior() # smoothing (default weight=1) for parameters
bn=learner.learnBN() # learning

Thanks again to @bdatko. Please tell me if you need some other snippets :-)

@felixleopoldo
Copy link
Owner

Hi @phwuil,
thanks for the snippet. Could you show how MIIC could be run on continuous data too?

@phwuil
Copy link

phwuil commented Jun 20, 2024

hi @felixleopoldo , thank you for that. pyAgrum is mainly about discrete variables. However there are 2 solutions for continuous data :
1- automatic discretization
2- CLG (experimental python model)

1- automatic discretisation with pyAgrum.skbn.BNDiscretizer

import pyAgrum as gum
import pyAgrum.skbn as skbn

filename="test.csv"
# BNDiscretizer has many options 
disc=skbn.BNDiscretizer()
template=disc.discretizedBN(filename)

# template contains all the (discrete variables) 
# that will be used for the learning
learner=gum.BNLearner(filename,template)
learner.useMDLCorrection()
learner.useSmoothingPrior()
bn=learner.learnBN()

@phwuil
Copy link

phwuil commented Jun 20, 2024

2- CLG : new CLG implementation in pyAgrum 1.14.0
pyAgrum.CLG tutorial

import pyAgrum.clg as gclg
# no hybrid learning : pure clg data
learner = clg.CLGLearner(filename)
clg = learner.learnCLG()

@felixleopoldo
Copy link
Owner

felixleopoldo commented Jun 24, 2024

OK. There is a new pyagrum branch, where you can try pyagrum by
snakemake --cores all --use-singularity --configfile workflow/rules/structure_learning_algorithms/pyagrum/pyagrum.json --rerun-incomplete
If you know any data scenario where it performs well, let me know!

@phwuil
Copy link

phwuil commented Jun 24, 2024

Hi @felixleopoldo, thank you for this. I have to admit that I did not know before it was pointed out to me by @bdatko. Thanks for both of you.
So I will have to learn how to use it. :-) (if you have THE good ref to help, please tell me :-) !)

@felixleopoldo
Copy link
Owner

I see, no worries:) If you mean the main reference to Benchpress it is here. It is not mentioned there, but you can also run it under WSL on Windows.

@felixleopoldo
Copy link
Owner

Hi, I have added pyagrum to benchpress.
You may try the example in the docs by:

snakemake --cores all --use-singularity --configfile workflow/rules/structure_learning_algorithms/pyagrum/config.json

Feel free to update it with more of the parameters/functionalities from pyagrum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants