New system automates pre-writing and report generation, producing Wikipedia-style content in PDF format.
Researchers at Stanford University have released Co-STORM, an AI system designed to automate the writing of long-form, factual research reports. The system outputs structured, referenced content based on user prompts and is free to use.
Co-STORM combines retrieval-augmented generation with pre-writing planning. The tool collects information from online sources, builds an outline using simulated expert dialogue, and generates full-length articles complete with citations. It achieves 99 percent factual accuracy, according to its creators.
The tool is live and is free to access. Users provide a topic, and the system returns a full report with citations in PDF format. Example outputs include summaries of AI tools, historical figures, and scientific concepts.
Two-stage generation process
The model is based on an earlier system called STORM, described by authors Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, and Monica Lam. The system separates writing into two distinct stages:
Pre-writing — The AI identifies diverse viewpoints and simulates a back-and-forth between a topic expert and AI researchers. It formulates questions to guide the outline structure.
Writing — Based on the outline and sourced material, the AI generates a clean, cited PDF.
The team evaluated performance using FreshWiki, a dataset of new, high-quality Wikipedia articles. In comparison to baseline methods, Co-STORM’s outputs were 25 percent more organized and 10 percent broader in topic coverage, according to assessments from experienced Wikipedia editors.
Prompt engineering and question depth
The system's question-asking approach is central to its performance. Standard prompting produced limited results, so the team introduced two refinements:
Perspective-guided prompting, where questions are generated from different angles.
Simulated multi-turn conversations, where questions evolve based on prior answers.
The authors write: “STORM models the pre-writing stage by discovering diverse perspectives, simulating conversations... and curating the collected information to create an outline.”
This structure improves the system’s ability to generate balanced, multi-faceted content. The model relies on trusted online sources and is designed to reduce hallucination.
In the paper, the researchers acknowledge known risks such as source bias and over-association of unrelated facts. However, they argue that separating the pre-writing stage improves transparency and allows for faster iteration.
“Evaluating outline quality in the pre-writing stage is an effective way to prototype the report generation system,” the authors state.