Back to Home

Contribute

Help build the benchmark that will define how long-horizon agents are evaluated.

Ready to contribute?

Get in touch to join the project. Weekly syncs Mon & Thu 5PM PT. Office hours Tue/Thu/Sat 9:30AM PT.

Every capability leap needs a new benchmark.

2021
HumanEval
Function completion
~1 min per task
2023
SWE-bench
Real GitHub issues
~5–15 min per task
2025
Terminal Bench
Multi-step terminal tasks
~5–15 min per task
2026
RALPHBench
Days-long agentic work
hours to days per task

How to Contribute

01

Get access

Join the RALPHBench Slack and introduce yourself. Add your name, email, and affiliation to the RALPHBench Workspace and we'll add you to meeting invites. Schedule a quick call to brainstorm ideas if you'd like.

Weekly sync: Mon 5PM PTSprint sync: Thu 5PM PT
02

Get started

Read CONTRIBUTING.md on GitHub, then browse past meeting notes to find open ideas. AI-assisted coding is encouraged, but task ideas, instruction.md, and task.toml must be written by humans.

instruction.md + task.tomlPR deadline Feb 21
03

Submit & get merged

Pick a hard, real-world coding problem that takes a skilled human multiple hours. Submit a PR; tasks are reviewed weekly. You can also contribute engineering work, experiment runs, or paper writing.

3 tasks = authorship

Authorship

Contributors with 3 or more tasks merged receive authorship on the RALPHBench paper. Engineering contributions, experiment runs, and paper writing are also considered.

Contributors
XX+
Industry & Academia
Tasks needed (V1)
100
Open slots remaining
Open domains
CompilersBuild SystemsLSP/ToolingFull-Stack AppsFrameworksBrowsersGraphics/MediaBackend SDKs

Related Projects