🚨New DeepSeek Model Incoming🚨
but first they release the paper describing generative reward modeling (GRM) via Self-Principled Critique Tuning (SPCT)
looking forward to DeepSeek-GRM!
arxiv.org/abs/2504.02495
🚨New DeepSeek Model Incoming🚨
View original thread
30
6
one trick they used was to replace scalar grades with critiquing the source
which, yeah, that does seem like it would help
which, yeah, that does seem like it would help
4
1