How BioCosm estimates probability of FDA approval. Every adjustment has a stated reason. Nothing is a black box.
For each drug program in the pipeline, BioCosm answers one question: given what is publicly known, what is the probability this drug gets FDA approval from its current phase?
This is not a target quality score. It is not financial advice. It is a structured estimate of clinical trial success probability, anchored to historical base rates and adjusted for program-specific factors. The full reasoning lives in each drug’s writeup.
Every prediction starts from empirical phase-transition success rates published by Wong, Siah, and Lo (2019) and refined in subsequent literature. These are not guesses - they are measured success rates across tens of thousands of drug programs over decades, stratified by phase and therapeutic area.
Crucially, each published rate is a single phase transition- the probability of advancing from one stage to the next (Phase 2 → Phase 3, Phase 3 → regulatory filing, filing → FDA approval), not the probability of approval outright. To answer our question - the cumulative probability of FDA approval from a drug’s current phase - we compose the remaining transitions by multiplying them together. A drug in Phase 2 must clear Phase 3, then file, then win approval, so its base likelihood of approval (LOA) is:
Composing the transitions this way reproduces the well-known shape of historical drug-development odds: a Phase 1 oncology asset sits in the low single-to-double digits, a Phase 2 asset around 10-15%, and a Phase 3 asset around 45-55% - each stage carrying the survivorship of having already cleared the ones before it. Earlier-phase drugs correctly show lower cumulative approval odds than late-phase drugs, because they have more hurdles left to clear.
This composed LOA is the base rate the trained model below adjusts. It is the dominant term - the adjustments move the needle, they don’t override the empirical floor.
How an estimate changes as a drug advances. Because the number is the cumulative chance of approval from the drug’s current phase, it is not fixed for life. When a drug moves up a phase, it clears one of the hurdles above, so we re-score it and log a new, higher, datedestimate; the earlier estimate is kept in the drug’s history, never erased. So a single drug can carry a trail like 12% (Phase 1) → 28% (Phase 2) → 61% (Phase 3), each stamped with the date it was made. Advancing a phase is nota “win” we score - the drug can still fail later.
What we grade ourselves on. Only the final outcome - approved, or killed (a rejection, withdrawal, or failed pivotal trial) - and only against an estimate we made before that decision (an automatic leakage guard enforces this). We do not currently grade phase-to-phase advancement; that is a different question - the chance of clearing the next single hurdle, not eventual approval - and would need its own model. It is a candidate for a separate, faster-feedback scorecard later. The live results are on the track record.
The base rate gives every drug in a given disease area and stage the same number. It cannot tell two of them apart. To do that - to say this Phase 2 drug looks more promising than that one - we use a model that adjusts the base rate up or down based on ten specific, public facts about each drug.
The important part is how the weights were chosen. We did not hand-pick how much each fact should matter. We took roughly 4,500 real drug programs whose fate is already settled - approved or failed - and let a standard statistical model (a logistic regression) learn the weights from the historical record itself. The model reads the past and works out which facts actually separated the winners from the losers.
The ten facts
All ten are knowable when a trial is registered, so the model can score a drug that is still in progress. Roughly grouped:
One result the model surfaced is worth flagging honestly: more rigorous designs (randomized, blinded, with a comparator) are associated with lower approval in the historical data. This is a correlation, not cause and effect. It most likely reflects that some drugs reach approval through smaller, simpler early studies, while the large confirmatory trials are exactly where many drugs fail.
To avoid double-counting, the model uses the base rate as a fixed starting point and only learns the adjustments on top of it. Effects already captured in the base-rate table (such as biomarker-selected or orphan rows from the BIO/QLS and Thomas et al. cohorts) are not re-applied.
MODEL v1 (LOGISTIC, JUNE 2026) - VALIDATED OUT-OF-SAMPLE
The model is not just trained, it is tested. We scored it against the ~4,500 resolved drug programs on drugs it was never allowed to study while learning, using only facts known before each trial began. Among drugs at the same stage, it ranks eventual approvals above eventual failures 0.61 to 0.69 of the time (0.50 would be a coin flip), and its percentages are well sized: when it says 30 percent, about 30 percent of those drugs are approved.
For context, the base rate alone - knowing only a drug’s disease area and stage - scores about 0.50 within a stage, a coin flip. So essentially all of the power to tell same-stage drugs apart comes from the ten learned facts. The full breakdown, including the calibration charts and the honest limits, is on the validation page.
For context, the strongest published academic models reach about 0.78 to 0.81 - but they train on expensive, private industry databases costing tens of thousands of dollars a year. We reach 0.61 to 0.69 on entirely free, public data. The gap is mostly data access, not method.
Two honest notes. Phase 3 is the weakest stage (0.61) - there are fewer finished Phase 3 programs to learn from, and judging whether a long, slow program truly failed is genuinely murky. And the model learns from the past to judge the present: if the way trials are run keeps shifting, its lessons may fit today’s drugs a little less well over time.