Contribute

Help build the benchmark that will define how long-horizon agents are evaluated.

Ready to contribute?

Get in touch to join the project. Weekly syncs Mon & Thu 5PM PT. Office hours Tue/Thu/Sat 9:30AM PT.

Get in Contact

Every capability leap needs a new benchmark.

2021

HumanEval

Function completion

~1 min per task

unlocked code LLMs

2023

SWE-bench

Real GitHub issues

~5–15 min per task

unlocked coding agents

2025

Terminal Bench

Multi-step terminal tasks

~5–15 min per task

unlocked terminal agents

2026

RALPHBench

Days-long agentic work

hours to days per task

unlocking autonomous agents

How to Contribute

Get access

Join the RALPHBench Slack and introduce yourself. Add your name, email, and affiliation to the RALPHBench Workspace and we'll add you to meeting invites. Schedule a quick call to brainstorm ideas if you'd like.

Weekly sync: Mon 5PM PTSprint sync: Thu 5PM PT

Get started

Read CONTRIBUTING.md on GitHub, then browse past meeting notes to find open ideas. AI-assisted coding is encouraged, but task ideas, instruction.md, and task.toml must be written by humans.

instruction.md + task.tomlPR deadline Feb 21

Submit & get merged

Pick a hard, real-world coding problem that takes a skilled human multiple hours. Submit a PR; tasks are reviewed weekly. You can also contribute engineering work, experiment runs, or paper writing.

3 tasks = authorship

Authorship

Contributors with 3 or more tasks merged receive authorship on the RALPHBench paper. Engineering contributions, experiment runs, and paper writing are also considered.

Contributors

XX+

Industry & Academia

Tasks needed (V1)

100

Open slots remaining

Open domains

CompilersBuild SystemsLSP/ToolingFull-Stack AppsFrameworksBrowsersGraphics/MediaBackend SDKs

Related Projects

SkillsBench

Benchmark for evaluating agent skill acquisition and transfer

Terminal Bench

Multi-step terminal task benchmark for CLI agents

Harbor

Standardized agent-environment interface specification

SWE-bench

GitHub issue resolution benchmark for coding agents

Timeline

Jan '26Project started

Mar 19PR deadline

Mar 24Merge deadline

Mar '26V1 Release

May '26V2 + NeurIPS