Ultrafast BERT #50
brunosan
started this conversation in
Related work
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This paper just dropped:
The basic idea, if I understand correctly, is intuitive. They modify the loss to favor choosing a decreasing amount of nodes as learning progresses. So at first the model learns almost normally, but then learns to cluster semantics based on the input.
If this indeed the case, it stands to reason to be very useful for us, since geospatial mapping could naturally cluster those branches geographically, by sensor, time, ...
✅ Code license is MIT
Beta Was this translation helpful? Give feedback.
All reactions