Bluesky Thread

the owners of TikTok scandalously release:

View original thread
the owners of TikTok scandalously release:

REER: a learning method that exposes the logic that led to a good result

Booty: when shaken properly elicits creative writing

ASS: (they couldn’t get an acronym to work, but i’m sure they tried)

huggingface.co/papers/2509....
huggingface.co
Paper page - Reverse-Engineered Reasoning for Open-Ended Generation
Join the discussion on this paper page
32 4
seriously this is very cool! all jokes aside

with REER, they discover a more scalable alternative to RL. instead of teaching *why* behavior leads to a good result, it *deconstructs* what led to a correct result

it feels like the unsupervised RL that we need
The image is a diagram contrasting forward reasoning approaches with a new method called Reverse-Engineered Reasoning (REER).

Diagram Breakdown:
	1.	Left Side – Build Reasoning “forward”:
	•	User Request →
Arrows lead to two costly methods:
	•	Reinforcement Learning (RL): Illustrated with a maze, reward checkmarks, and X marks for failures.
	•	Costly Distillation: Shown with chain links and logos of AI companies/models.
	•	These represent standard approaches that struggle in open-ended domains due to lack of clear reward signals.
	2.	Middle – Deep Reasoning:
	•	A gear system and a human brain icon represent deep reasoning processes.
	3.	Right Side – Recover Reasoning “backward”:
	•	A hooded figure with a bug icon symbolizes REverse-Engineered Reasoning (REER).
	•	REER works backwards from Source QA Pairs (illustrated with icons for questions, answers, money, science, and communication).

⸻

Caption (verbatim):

Figure 1 (Left) Existing methods attempt to build deep reasoning “forwards” for a user request through trial-and-error (RL) or costly distillation, which falter in open-ended domains that lack clear, verifiable reward signals. (Right) We propose a third path for teaching deep reasoning, REverse-Engineered Reasoning (REER). REER works “backwards”, recovering plausible human-like thought process from known-good outputs in open-source Question-Answer (QA) pairs.
33 2
and it works!

they made a tiny 8B model that holds up well against many large MoE models
The image is a table labeled Table 1, showing a performance comparison of different models on LongBench (LB), HelloBench (HB), and WritingBench (WB). The note above the table states that DeepWriter-8B, an 8B model fine-tuned from scratch, shows competitive performance against leading proprietary models and significantly outperforms other open-source models in its class.

⸻

Table Content:

Model	Base Model	LB	HB-A	HB-B	WB-A	WB-B	WB-C	WB-D	WB-E	WB-F
GPT-4o	-	83.1	83.7	87.6	74.40	73.42	74.38	77.91	75.86	78.08
Claude 3.5	-	89.3	82.9	88.3	59.05	57.68	56.32	59.36	62.00	67.70
Claude 3.7	-	97.8	83.9	93.2	78.24	77.93	76.51	79.37	79.26	80.88
LongWriter-8B	Llama3.1-8b	76.5	80.1	82.6	57.97	53.92	49.08	52.08	52.99	52.08
DeepWriter-8B	Qwen3-8b	91.28	82.64	87.48	72.20	71.76	70.57	70.57	73.65	72.29


⸻

Key Observations:
	•	Claude 3.7 leads in raw scores for LB (97.8) and HB-B (93.2).
	•	DeepWriter-8B consistently performs at a very strong level across all WritingBench (WB) categories, beating LongWriter-8B by a wide margin and coming close to proprietary models.
	•	GPT-4o is balanced across all tasks with solid performance.
	•	Claude 3.5 underperforms significantly on WB tasks compared to Claude 3.7.
	•	DeepWriter-8B shows clear gains over LongWriter-8B and holds its ground against GPT-4o and Claude family models, particularly excelling in WB-B through WB-F.
25 2
32 likes 4 reposts

More like this

×