-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check existing kaggle models #4
Comments
I'm going to check the pinned example (random forest) and the BERT-fine tuning model on the kaggle. |
The pinned example (random forest) has been reproduced and the public score for that model is 0.263. The jupyter notebook has been uploaded in the |
is it affected by "LIMIT 30000" in the SQL code?
…On Tue, Jun 11, 2024 at 1:39 PM Peng Wang ***@***.***> wrote:
The pinned example (random forest) has been reproduced and the public
score for that model is 0.263. The jupyter notebook has been uploaded in
the models/Leash_Tutorial_test.ipynb in this repository.
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNG3OFU4JJPTMFTK2CODB3ZG4Y4ZAVCNFSM6AAAAABJENBHJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGI4TCOJXGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Yes you are right, in the tutorial the model was trained only on
30000+30000 samples, I will try to train using the whole training dataset
and see the performance.
…On Tue, Jun 11, 2024 at 1:59 PM Kai Wang ***@***.***> wrote:
is it affected by "LIMIT 30000" in the SQL code?
On Tue, Jun 11, 2024 at 1:39 PM Peng Wang ***@***.***> wrote:
> The pinned example (random forest) has been reproduced and the public
> score for that model is 0.263. The jupyter notebook has been uploaded in
> the models/Leash_Tutorial_test.ipynb in this repository.
>
> —
> Reply to this email directly, view it on GitHub
> <#4 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ABNG3OFU4JJPTMFTK2CODB3ZG4Y4ZAVCNFSM6AAAAABJENBHJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGI4TCOJXGQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/WGLab/Project_Belka/issues/4*issuecomment-2161324155__;Iw!!IBzWLUs!RHq3vPee1-zokcwQSBOf7k324RAbwD0PQAr4pdszY2Eok80_oT05ln4zEkOZRYFZ3oKNZWVgIE2rl8Ja6XmRBgdKzXBd$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A67BOBXFOYZSEWFEOSFCPPLZG43H7AVCNFSM6AAAAABJENBHJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGMZDIMJVGU__;!!IBzWLUs!RHq3vPee1-zokcwQSBOf7k324RAbwD0PQAr4pdszY2Eok80_oT05ln4zEkOZRYFZ3oKNZWVgIE2rl8Ja6XmRBq_n0HQ6$>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
I have uoloaded my notebook for BERT fine tunning (use 60000 data), and a current Neural Network Model using all split data (230M training, 56M validation). The morgan fingerprint for all split data are generated in trunks (500K each trunk) as numpy array file. |
Some people have shared their code using xgboost and random foreset for the prediction. We can borrow the code both for data processing and for prediction, so that we do not need to start from scratch. Compile the information in this issue.
This is a pinned example https://www.kaggle.com/code/andrewdblevins/leash-tutorial-ecfps-and-random-forest that we can reproduce and learn how to use parquet to process the data, and how to use 42 as random seed to ensure consistency of training/testing using different models in the future.
The text was updated successfully, but these errors were encountered: