An honest map of what is in the dataset, what is thin, and what is left out on purpose.
Coverage comes from two pipes, and understanding them explains everything else.
A few areas have their approved drugs but a thinner experimental pipeline, because they sit outside the active search set.
The number on the map counts distinct drug and diagnostic programs, not raw trial records. A single drug appears in public data under many names and many trial arms: "sotorasib", "AMG 510", and "sotorasib 960mg" are the same molecule across different studies. BioCosm collapses all of those into one node through nightly deduplication.
That is why the count is a curated figure: tens of thousands of trial rows reduce to a few thousand real programs. It is also why the internal database is larger than what the map shows. The difference is duplicates merged and noise removed, so the map stays a clean count of distinct programs rather than an inflated trial tally.