Long-running Claude for scientific computing
In this article, we explore how to apply multi-day agentic coding workflows—test oracles, persistent memory, and orchestration patterns—to scientific computing tasks, even outside of one’s specific domain. This approach is particularly beneficial for researchers looking to leverage AI agents in their scientific endeavors.
The Premise
Traditionally, scientists using AI agents have operated within a conversational loop, meticulously managing each step of the process. However, with recent advancements in AI models, particularly in handling long-horizon tasks, a new paradigm has emerged. Researchers can now specify high-level objectives and allow a team of agents to work autonomously, significantly speeding up project completion times. Tasks such as reimplementing numerical solvers, converting legacy scientific software, and debugging large codebases are well-suited for this model, where human oversight is minimal and success criteria are clear.
Case Study: The C Compiler Project
An illustrative example of this approach is Anthropic’s C compiler project, where Claude operated across approximately 2,000 sessions to develop a C compiler capable of compiling the Linux kernel. This article aims to replicate a similar pattern for scientific computing tasks using Claude Code, particularly focusing on implementing a differentiable version of a cosmological Boltzmann solver.
Understanding the Boltzmann Solver
A Boltzmann solver predicts the statistical properties of the afterglow of the Big Bang, known as the Cosmic Microwave Background (CMB). It evolves coupled equations for various components, including photons, baryons, neutrinos, and dark matter. Existing solvers like CLASS and CAMB are essential tools in cosmology, allowing researchers to constrain cosmological models using data from surveys like Planck and the Simons Observatory.
Creating a differentiable version of a Boltzmann solver enables gradient-based inference methods, which can drastically accelerate parameter estimation. While I possess a high-level understanding of the tools and science involved, I lack the expertise to complete this task efficiently. In contrast, groups with the necessary expertise have taken months or even years to develop differentiable solvers in JAX, which is well-suited for this purpose due to its automatic differentiation capabilities.
Structuring the Task
Unlike the C compiler project, which can utilize many parallel agents, a Boltzmann solver is a deeply coupled pipeline. Small numerical errors can significantly affect downstream results, necessitating a different approach. Debugging requires tracing through the entire process and leveraging domain knowledge, making it more suitable for a single agent working sequentially.
Setting Up the Environment
We will use a High-Performance Computing (HPC) cluster running the SLURM job scheduler as our compute environment. However, the core concepts—such as maintaining a progress file, establishing a test oracle, and creating a clear agent prompt—are applicable regardless of the computing environment.
Drafting a Plan and Iterating
In this autonomous research paradigm, much of the effort should focus on crafting a set of instructions that clearly outline the project’s deliverables and context. This plan should be documented in a file named CLAUDE.md located in the project’s root directory. Claude treats this file specially, keeping it in context and referencing it throughout the project.
For the cosmological Boltzmann solver project, the initial CLAUDE.md file outlines the overall plan and design decisions made after an initial attempt. The high-level goals include achieving full feature parity with the reference CLASS implementation while ensuring full differentiability and maintaining an accuracy target of 0.1% against CLASS.
Maintaining Memory Across Sessions
The progress file, conventionally named CHANGELOG.md, serves as the agent’s long-term memory, functioning like lab notes. In CLAUDE.md, Claude is instructed to document progress in this file. A well-structured progress file should track:
- Current status
- Completed tasks
- Failed approaches and their reasons
- Accuracy tables at key checkpoints
- Known limitations
Documenting failed approaches is crucial, as it prevents the agent from revisiting the same dead ends. For instance, an entry might read: “Tried using Tsit5 for the perturbation ODE; the system is too stiff. Switched to Kvaerno5.”
The Test Oracle
For effective autonomous scientific work, it’s essential for the agent to understand whether it is making progress. This can be achieved through a reference implementation, quantifiable objectives, or an existing test suite. In our example, Claude is instructed to construct and continuously run unit tests using CLASS C source as a reference implementation.
Using Git for Coordination
Git serves as an excellent tool for monitoring and coordinating the agent’s progress in a hands-off manner. The agent should commit and push changes after each meaningful unit of work. This practice ensures a recoverable history, makes progress visible, and prevents loss of work due to unforeseen issues.
Instructions in CLAUDE.md can specify: “Commit and push after every meaningful unit of work. Run pytest tests/ -x -q before every commit. Never commit code that breaks existing passing tests.”
The Execution Loop
Initially, it is beneficial to iterate on the plan locally until a satisfactory version is encoded in CLAUDE.md. Following this, a Claude Code session can be initiated inside a terminal multiplexer like tmux on a compute node. The agent should be directed to the codebase, allowing it to proceed autonomously. Since the session runs inside tmux, users can detach and check progress later, even from mobile devices.
On an HPC cluster, users can request a node through the SLURM scheduler, which facilitates the execution of the Claude Code session.
Conclusion
The integration of AI agents like Claude into scientific computing workflows represents a significant advancement in how researchers can approach complex tasks. By leveraging autonomous agents, scientists can focus on high-level objectives while allowing AI to handle the intricacies of coding and debugging. This methodology not only accelerates project timelines but also opens new avenues for exploration in scientific research.
Note: This

