ThinkyMachines: Tinker LoRa training API
ThinkingMachines announced their first product, telegraphed by a highly detailed blog earlier this week that gave legitimacy to LoRA training
idea: LoRA works really well for most companies, so they make it easy to train too
thinkingmachines.ai/tinker/
ThinkyMachines: Tinker LoRa training API
View original threadThe Blog! let’s break it down!
gist: LoRA is just as good as Full Finetuning (FullFT) as long as your data is small and you’re not doing pretraining
it works extremely well for RL, which should make sense, RL is very sparse on rewards
thinkingmachines.ai/blog/lora/
gist: LoRA is just as good as Full Finetuning (FullFT) as long as your data is small and you’re not doing pretraining
it works extremely well for RL, which should make sense, RL is very sparse on rewards
thinkingmachines.ai/blog/lora/
LoRA works best when applied to all parts of the model
i.e. attention-only doesn’t work well
for MoE, that means you need training data that exercises all experts, which makes MoE quite a bit harder
i.e. attention-only doesn’t work well
for MoE, that means you need training data that exercises all experts, which makes MoE quite a bit harder
Hyperparameters: Rank
the higher rank the bigger capacity. And yes, it absolutely can approach FullFT, especially on small data
the higher rank the bigger capacity. And yes, it absolutely can approach FullFT, especially on small data
Batch size: not too big, loss can degrade fast
Attention-only underperforms MLP-only consistently
Attention-only underperforms MLP-only consistently
1
RL on LoRA can match FullFT, even on ranks as low as 1
THIS IS HUGE
if you’ve been thinking about RL or RL environments, you should absolutely be thinking about LoRA. it would be idiotic to not consider it
also: LoRAs stack, so all these RL environments can be shareable in new ways
THIS IS HUGE
if you’ve been thinking about RL or RL environments, you should absolutely be thinking about LoRA. it would be idiotic to not consider it
also: LoRAs stack, so all these RL environments can be shareable in new ways
2