-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logistic PCA and PPMI-based methods? #36
Comments
It depends on the research question which method to use. But if you start with exploration, an unsupervised approach is always a good starting point. Try the package clusteval. Make sure to use the appropriate metric, such as hamming distance. Or you can use hypergeometric tests to find significant overlapping features. In this case try HNet library. More details can be found in this blog]. Perhaps SVD analysis is more appropriate than PCA (this is optional in the pca library). Or indeed your suggestion, logistic PCA. |
For some clarity, I attempted to run Logistic PCA as the Python implementation but it crashed twice playing with VES performance vs personality study, which has personality Yes/No question. Maybe the native implementation eats up too much memory. "Significant overlapping features" is one of the things I am seeking with PCA-like methods, but that the data is extremely binary. Q: why is cluster evaluation useful in a binary data dimensionality reduction + feature selection + regression task? |
Also, secondary discovery:
|
Currently I am awaiting datasets with a data format of "liked items by user", and that certain items are similar in nature.
Currently there are a few ways of reducing dimensionality:
What are the trade-off and characteristics of each method? Are there other methods for large number of binary data columns?
The text was updated successfully, but these errors were encountered: