bird.makeup

The paper introduces Tongyi DeepResearch, an open agent that does long horizon web research end to end. It has 30.5B parameters but activates only 3.3B per token. Standard LLMs falter on multi step browsing because they lack agent habits and reliable long context handling. Mid training gives the base model those habits by learning from large synthetic trajectories before reinforcement learning. Post training starts with supervised demos for stability, then on policy reinforcement learning with a simple correctness reward. The agent runs a ReAct loop that alternates thinking and tools, Search, Visit, Python, Scholar, File Parser. A running report serves as compressed memory, so each step conditions on that report and the latest observation. A data engine generates hard questions from knowledge graphs, injects uncertainty, verifies answers automatically, and removes dependence on human labels. Training spans 3 environments, prior world, local simulator, and real web, balancing stability, cost, and realism. A Heavy Mode runs several agents in parallel and fuses their compressed reports to lift accuracy without blowing the context window. ---- Paper – arxiv. org/abs/2510.24701 Paper Title: "Tongyi DeepResearch Technical Report"
See Tweet

Service load: Currently crawling 1497 users per hour
Source Code Support us on Patreon