Methodology

How the number is made

Pickrate measures one thing: when an AI agent has a developer's job to do and could use you or a competitor, how often does it pick you? Here is exactly how that number is produced.

The trial

Every number is built from selection trials. One trial is a real, unbranded developer task ("add transactional email to a Node app") run on a pinned model (Claude, GPT, Gemini), repeated N times. We never run a task once and screenshot the answer — models are stochastic, so we run many and report a proportion with a confidence interval.

Two surfaces, scored honestly

Coding-agent (objective): the model writes code, and we parse it for the package it actually imported or installed. Someone either imported your SDK or they didn't. No interpretation.
Conversational (judged): the model recommends in prose, and a separate Claude judge classifies which tool was the primary recommendation versus merely mentioned, against a fixed schema. We weight the objective coding surface higher.

The numbers

Pick Rate

the headline

Share of trials where the agent picks you, weighted across surface, model, and task, with a Wilson confidence interval.

Default Rate

the default

Pick Rate on open-ended "what should I use" questions only. The unprompted default, and the most discriminating cut.

Shortlist Rate

known vs picked

Share of trials where you're named at all, even if not chosen. The gap to Pick Rate is the "known but not picked" signal.

Honesty

Prompts are unbranded — we ask "add payments," never "use Stripe," so we measure what an agent picks, not what it can use. Tasks are scored against a fixed competitor set. Model versions and the as-of date are pinned on every run. This measures real model behavior, not a synthetic proxy — but it is a measurement, not ground truth, and we report it as such.

Questions

What is a selection trial?

A selection trial is one real, unbranded developer task run on a pinned model and repeated N times. Models are stochastic, so we run many trials and report a proportion with a confidence interval rather than screenshotting a single answer.

How is the coding surface scored?

Objectively. The model writes code and we parse it for the package it actually imported or installed. Either your SDK was imported or it wasn't — no interpretation.

How is the conversational surface scored?

A separate Claude judge classifies which tool was the primary recommendation versus merely mentioned, against a fixed schema. We weight the objective coding surface higher.

Why are the prompts unbranded?

We ask 'add payments,' never 'use Stripe,' so we measure what an agent picks, not what it can use. Branded prompts test readiness, not Pick Rate.

Check your own Pick Rate

See how often agents pick your tool — free, no account needed.

Check your tool →