Artificial Intelligence

Long-running Claude for Scientific Computing

Long-running Claude for scientific computing

In this article, we explore how to apply multi-day agentic coding workflows—test oracles, persistent memory, and orchestration patterns—to scientific computing tasks, even outside of one’s specific domain. This approach is particularly beneficial for researchers looking to leverage AI agents in their scientific endeavors.

The Premise

Traditionally, scientists using AI agents have operated within a conversational loop, meticulously managing each step of the process. However, with recent advancements in AI models, particularly in handling long-horizon tasks, a new paradigm has emerged. Researchers can now specify high-level objectives and allow a team of agents to work autonomously, significantly speeding up project completion times. Tasks such as reimplementing numerical solvers, converting legacy scientific software, and debugging large codebases are well-suited for this model, where human oversight is minimal and success criteria are clear.

Case Study: The C Compiler Project

An illustrative example of this approach is Anthropic’s C compiler project, where Claude operated across approximately 2,000 sessions to develop a C compiler capable of compiling the Linux kernel. This article aims to replicate a similar pattern for scientific computing tasks using Claude Code, particularly focusing on implementing a differentiable version of a cosmological Boltzmann solver.

Understanding the Boltzmann Solver

A Boltzmann solver predicts the statistical properties of the afterglow of the Big Bang, known as the Cosmic Microwave Background (CMB). It evolves coupled equations for various components, including photons, baryons, neutrinos, and dark matter. Existing solvers like CLASS and CAMB are essential tools in cosmology, allowing researchers to constrain cosmological models using data from surveys like Planck and the Simons Observatory.

Creating a differentiable version of a Boltzmann solver enables gradient-based inference methods, which can drastically accelerate parameter estimation. While I possess a high-level understanding of the tools and science involved, I lack the expertise to complete this task efficiently. In contrast, groups with the necessary expertise have taken months or even years to develop differentiable solvers in JAX, which is well-suited for this purpose due to its automatic differentiation capabilities.

Structuring the Task

Unlike the C compiler project, which can utilize many parallel agents, a Boltzmann solver is a deeply coupled pipeline. Small numerical errors can significantly affect downstream results, necessitating a different approach. Debugging requires tracing through the entire process and leveraging domain knowledge, making it more suitable for a single agent working sequentially.

Setting Up the Environment

We will use a High-Performance Computing (HPC) cluster running the SLURM job scheduler as our compute environment. However, the core concepts—such as maintaining a progress file, establishing a test oracle, and creating a clear agent prompt—are applicable regardless of the computing environment.

Drafting a Plan and Iterating

In this autonomous research paradigm, much of the effort should focus on crafting a set of instructions that clearly outline the project’s deliverables and context. This plan should be documented in a file named CLAUDE.md located in the project’s root directory. Claude treats this file specially, keeping it in context and referencing it throughout the project.

For the cosmological Boltzmann solver project, the initial CLAUDE.md file outlines the overall plan and design decisions made after an initial attempt. The high-level goals include achieving full feature parity with the reference CLASS implementation while ensuring full differentiability and maintaining an accuracy target of 0.1% against CLASS.

Maintaining Memory Across Sessions

The progress file, conventionally named CHANGELOG.md, serves as the agent’s long-term memory, functioning like lab notes. In CLAUDE.md, Claude is instructed to document progress in this file. A well-structured progress file should track:

  • Current status
  • Completed tasks
  • Failed approaches and their reasons
  • Accuracy tables at key checkpoints
  • Known limitations

Documenting failed approaches is crucial, as it prevents the agent from revisiting the same dead ends. For instance, an entry might read: “Tried using Tsit5 for the perturbation ODE; the system is too stiff. Switched to Kvaerno5.”

The Test Oracle

For effective autonomous scientific work, it’s essential for the agent to understand whether it is making progress. This can be achieved through a reference implementation, quantifiable objectives, or an existing test suite. In our example, Claude is instructed to construct and continuously run unit tests using CLASS C source as a reference implementation.

Using Git for Coordination

Git serves as an excellent tool for monitoring and coordinating the agent’s progress in a hands-off manner. The agent should commit and push changes after each meaningful unit of work. This practice ensures a recoverable history, makes progress visible, and prevents loss of work due to unforeseen issues.

Instructions in CLAUDE.md can specify: “Commit and push after every meaningful unit of work. Run pytest tests/ -x -q before every commit. Never commit code that breaks existing passing tests.”

The Execution Loop

Initially, it is beneficial to iterate on the plan locally until a satisfactory version is encoded in CLAUDE.md. Following this, a Claude Code session can be initiated inside a terminal multiplexer like tmux on a compute node. The agent should be directed to the codebase, allowing it to proceed autonomously. Since the session runs inside tmux, users can detach and check progress later, even from mobile devices.

On an HPC cluster, users can request a node through the SLURM scheduler, which facilitates the execution of the Claude Code session.

Conclusion

The integration of AI agents like Claude into scientific computing workflows represents a significant advancement in how researchers can approach complex tasks. By leveraging autonomous agents, scientists can focus on high-level objectives while allowing AI to handle the intricacies of coding and debugging. This methodology not only accelerates project timelines but also opens new avenues for exploration in scientific research.

Note: This

Disclaimer: A Teams provides news and information for general awareness purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of any content. Opinions expressed are those of the authors and not necessarily of A Teams. We are not liable for any actions taken based on the information published. Content may be updated or changed without prior notice.