Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Dataset format #34

Open
arita37 opened this issue Sep 4, 2022 · 5 comments
Open

Custom Dataset format #34

arita37 opened this issue Sep 4, 2022 · 5 comments
Labels
question Further information is requested

Comments

@arita37
Copy link

arita37 commented Sep 4, 2022

Hello,

if we wan to use on our own dataset (tabular csv)
format.

what will be the format of the dataset ?

thx

@felixleopoldo
Copy link
Owner

Hi,

It is CSV files still, where the first line tells the labels of the variables. For multinomial data, the second line should contain the number of states for the variables.

Have a look at this example for continuous data and this example for multinomial data.

Just let me know if you run into trouble or have any questions

@arita37
Copy link
Author

arita37 commented Sep 5, 2022 via email

@felixleopoldo
Copy link
Owner

felixleopoldo commented Sep 5, 2022

For continuous data, it is recommended to standardize first. For multinomial data, some algorithms may require enumeration of e.g. k states from 0 to k-1. Also, have a look at the Benchpress paper for a further description of the data format.
In general, you might have to transform your data to meet the assumptions of the algorithms you are using. You can do pairwise scatterplots of the data using the ggally_ggpairs module.

I hope this could help you

@felixleopoldo felixleopoldo added the question Further information is requested label Sep 5, 2022
@arita37
Copy link
Author

arita37 commented Sep 5, 2022

Sure, thanks
The columns name should be in the json format ?

@felixleopoldo
Copy link
Owner

No, it should be a CSV file, just have to look into the other data files here and you will see the structure.
Here is a real data JSON config example that you may also look at. If you don't know the true graph, graph_id should be set to null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants