Bluesky Thread

the owners of TikTok scandalously release:

September 09, 2025 View original thread

the owners of TikTok scandalously release:

REER: a learning method that exposes the logic that led to a good result

Booty: when shaken properly elicits creative writing

ASS: (they couldn’t get an acronym to work, but i’m sure they tried)

huggingface.co/papers/2509....

huggingface.co

Paper page - Reverse-Engineered Reasoning for Open-Ended Generation

Join the discussion on this paper page

32 4

seriously this is very cool! all jokes aside

with REER, they discover a more scalable alternative to RL. instead of teaching *why* behavior leads to a good result, it *deconstructs* what led to a correct result

it feels like the unsupervised RL that we need

The image is a diagram contrasting forward reasoning approaches with a new method called Reverse-Engineered Reasoning (REER).

Diagram Breakdown:
1. Left Side – Build Reasoning “forward”:
• User Request →
Arrows lead to two costly methods:
• Reinforcement Learning (RL): Illustrated with a maze, reward checkmarks, and X marks for failures.
• Costly Distillation: Shown with chain links and logos of AI companies/models.
• These represent standard approaches that struggle in open-ended domains due to lack of clear reward signals.
2. Middle – Deep Reasoning:
• A gear system and a human brain icon represent deep reasoning processes.
3. Right Side – Recover Reasoning “backward”:
• A hooded figure with a bug icon symbolizes REverse-Engineered Reasoning (REER).
• REER works backwards from Source QA Pairs (illustrated with icons for questions, answers, money, science, and communication).

⸻

Caption (verbatim):

Figure 1 (Left) Existing methods attempt to build deep reasoning “forwards” for a user request through trial-and-error (RL) or costly distillation, which falter in open-ended domains that lack clear, verifiable reward signals. (Right) We propose a third path for teaching deep reasoning, REverse-Engineered Reasoning (REER). REER works “backwards”, recovering plausible human-like thought process from known-good outputs in open-source Question-Answer (QA) pairs.

33 2

and it works!

they made a tiny 8B model that holds up well against many large MoE models

The image is a table labeled Table 1, showing a performance comparison of different models on LongBench (LB), HelloBench (HB), and WritingBench (WB). The note above the table states that DeepWriter-8B, an 8B model fine-tuned from scratch, shows competitive performance against leading proprietary models and significantly outperforms other open-source models in its class.

⸻

Table Content:

Model Base Model LB HB-A HB-B WB-A WB-B WB-C WB-D WB-E WB-F
GPT-4o - 83.1 83.7 87.6 74.40 73.42 74.38 77.91 75.86 78.08
Claude 3.5 - 89.3 82.9 88.3 59.05 57.68 56.32 59.36 62.00 67.70
Claude 3.7 - 97.8 83.9 93.2 78.24 77.93 76.51 79.37 79.26 80.88
LongWriter-8B Llama3.1-8b 76.5 80.1 82.6 57.97 53.92 49.08 52.08 52.99 52.08
DeepWriter-8B Qwen3-8b 91.28 82.64 87.48 72.20 71.76 70.57 70.57 73.65 72.29

⸻

Key Observations:
• Claude 3.7 leads in raw scores for LB (97.8) and HB-B (93.2).
• DeepWriter-8B consistently performs at a very strong level across all WritingBench (WB) categories, beating LongWriter-8B by a wide margin and coming close to proprietary models.
• GPT-4o is balanced across all tasks with solid performance.
• Claude 3.5 underperforms significantly on WB tasks compared to Claude 3.7.
• DeepWriter-8B shows clear gains over LongWriter-8B and holds its ground against GPT-4o and Claude family models, particularly excelling in WB-B through WB-F.

25 2

More like this