Custom Dataset format #34

arita37 · 2022-09-04T13:23:06Z

Hello,

if we wan to use on our own dataset (tabular csv)
format.

what will be the format of the dataset ?

thx

felixleopoldo · 2022-09-04T14:33:52Z

Hi,

It is CSV files still, where the first line tells the labels of the variables. For multinomial data, the second line should contain the number of states for the variables.

Have a look at this example for continuous data and this example for multinomial data.

Just let me know if you run into trouble or have any questions

arita37 · 2022-09-05T00:34:27Z

Sure, Thx. Is there any pre-processing required for a new dataset ? Any constraints on the format requirements ? Thx

…

On Sep 4, 2022, at 23:34, Felix Rios ***@***.***> wrote: Hi, It is CSV files still, where the first line tells the labels of the variables. For multinomial data, the second line should contain the number of states for the variables. Have a look at this example for continuous data and this example for multinomial data. Just let me know if you run into trouble or have any questions — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

felixleopoldo · 2022-09-05T06:27:47Z

For continuous data, it is recommended to standardize first. For multinomial data, some algorithms may require enumeration of e.g. k states from 0 to k-1. Also, have a look at the Benchpress paper for a further description of the data format.
In general, you might have to transform your data to meet the assumptions of the algorithms you are using. You can do pairwise scatterplots of the data using the ggally_ggpairs module.

I hope this could help you

arita37 · 2022-09-05T15:47:45Z

Sure, thanks
The columns name should be in the json format ?

felixleopoldo · 2022-09-05T16:43:01Z

No, it should be a CSV file, just have to look into the other data files here and you will see the structure.
Here is a real data JSON config example that you may also look at. If you don't know the true graph, graph_id should be set to null.

felixleopoldo added the question Further information is requested label Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Dataset format #34

Custom Dataset format #34

arita37 commented Sep 4, 2022

felixleopoldo commented Sep 4, 2022

arita37 commented Sep 5, 2022 via email

felixleopoldo commented Sep 5, 2022 •

edited

Loading

arita37 commented Sep 5, 2022

felixleopoldo commented Sep 5, 2022

Custom Dataset format #34

Custom Dataset format #34

Comments

arita37 commented Sep 4, 2022

felixleopoldo commented Sep 4, 2022

arita37 commented Sep 5, 2022 via email

felixleopoldo commented Sep 5, 2022 • edited Loading

arita37 commented Sep 5, 2022

felixleopoldo commented Sep 5, 2022

felixleopoldo commented Sep 5, 2022 •

edited

Loading