How the number is made
Pickrate measures one thing: when an AI agent has a developer's job to do and could use you or a competitor, how often does it pick you? Here is exactly how that number is produced.
The trial
Every number is built from selection trials. One trial is a real, unbranded developer task ("add transactional email to a Node app") run on a pinned model (Claude, GPT, Gemini), repeated N times. We never run a task once and screenshot the answer — models are stochastic, so we run many and report a proportion with a confidence interval.
Two surfaces, scored honestly
- Coding-agent (objective): the model writes code, and we parse it for the package it actually imported or installed. Someone either imported your SDK or they didn't. No interpretation.
- Conversational (judged): the model recommends in prose, and a separate Claude judge classifies which tool was the primary recommendation versus merely mentioned, against a fixed schema. We weight the objective coding surface higher.
The numbers
Pick Rate
the headline
Share of trials where the agent picks you, weighted across surface, model, and task, with a Wilson confidence interval.
Default Rate
the default
Pick Rate on open-ended "what should I use" questions only. The unprompted default, and the most discriminating cut.
Shortlist Rate
known vs picked
Share of trials where you're named at all, even if not chosen. The gap to Pick Rate is the "known but not picked" signal.
Honesty
Prompts are unbranded — we ask "add payments," never "use Stripe," so we measure what an agent picks, not what it can use. Tasks are scored against a fixed competitor set. Model versions and the as-of date are pinned on every run. This measures real model behavior, not a synthetic proxy — but it is a measurement, not ground truth, and we report it as such.
Questions
What is a selection trial?
A selection trial is one real, unbranded developer task run on a pinned model and repeated N times. Models are stochastic, so we run many trials and report a proportion with a confidence interval rather than screenshotting a single answer.
How is the coding surface scored?
Objectively. The model writes code and we parse it for the package it actually imported or installed. Either your SDK was imported or it wasn't — no interpretation.
How is the conversational surface scored?
A separate Claude judge classifies which tool was the primary recommendation versus merely mentioned, against a fixed schema. We weight the objective coding surface higher.
Why are the prompts unbranded?
We ask 'add payments,' never 'use Stripe,' so we measure what an agent picks, not what it can use. Branded prompts test readiness, not Pick Rate.
Check your own Pick Rate
See how often agents pick your tool — free, no account needed.
Check your tool →