Equilibrium binding constants: $K_d$ and $\Delta\Delta G$

How tightly do two molecules stick to each other? One number answers that question across all concentrations.

1  ·  The reaction we're describing

Much of molecular biology is really about just two partners finding each other and sticking. We write this as a reversible reaction:

$\mathrm{P} + \mathrm{L} \;\; \underset{k_\mathrm{off}}{\overset{k_\mathrm{on}}{\rightleftharpoons}} \;\; \mathrm{PL}$

Here $\mathrm{P}$ is a protein (or receptor, enzyme, transcription factor, etc.), $\mathrm{L}$ is whatever it binds (a small-molecule drug, a peptide, a DNA sequence, another protein…), and $\mathrm{PL}$ is the bound complex. The two arrows represent the forward (association) and reverse (dissociation) reactions, with rate constants $k_\mathrm{on}$ and $k_\mathrm{off}$.

Even though this is a two-way chemical process, at equilibrium there is a simple, deterministic relationship between the concentrations of all three species.

2  ·  From rates to an equilibrium constant

The rate of complex formation is proportional to $[\mathrm{P}][\mathrm{L}]$ (two molecules have to collide), and the rate of dissociation is proportional to $[\mathrm{PL}]$ (one complex falls apart). The net rate of change of the complex is:

$$\frac{d[\mathrm{PL}]}{dt} = k_\mathrm{on}[\mathrm{P}][\mathrm{L}] - k_\mathrm{off}[\mathrm{PL}]$$

At equilibrium, nothing changes on average, so $d[\mathrm{PL}]/dt = 0$. Rearranging:

$$K_d \;\equiv\; \frac{k_\mathrm{off}}{k_\mathrm{on}} \;=\; \frac{[\mathrm{P}][\mathrm{L}]}{[\mathrm{PL}]}$$

$K_d$ is the dissociation constant. It has units of concentration (e.g.\ molar). Two useful interpretations:

  • Operational: $K_d$ is the ligand concentration at which half of the protein is bound. Low $K_d$ ⇒ tight binding; high $K_d$ ⇒ weak binding. Typical values: nM–μM for biology.
  • Thermodynamic: $K_d$ is a Boltzmann weight. The free energy of binding is $\Delta G^\circ = RT \ln K_d$ (referenced to a 1 M standard state). $RT \approx 0.593$ kcal/mol at 298 K, so a factor of 10 in $K_d$ is 1.36 kcal/mol.

3  ·  The Langmuir isotherm: what $K_d$ predicts

The observable in most experiments is the fractional occupancy — the probability that a given protein molecule is bound. Conservation of mass ($[\mathrm{P}]_\mathrm{total} = [\mathrm{P}] + [\mathrm{PL}]$) plus the equilibrium relation gives:

$$\theta \;=\; \frac{[\mathrm{PL}]}{[\mathrm{P}]_\mathrm{total}} \;=\; \frac{[\mathrm{L}]}{K_d + [\mathrm{L}]}$$

This is the Langmuir isotherm. A single parameter $K_d$ predicts $\theta$ at every ligand concentration.

Fractional occupancy $\theta([\mathrm{L}])$ for four hypothetical proteins with different $K_d$ values. Each curve crosses $\theta = 0.5$ at $[\mathrm{L}] = K_d$. On log-concentration axes, all Langmuir curves have the same shape — they're just shifted horizontally.
Why $K_d$ is so useful: once you know it, you can predict the binding fraction at any ligand concentration — no additional experiments needed. This is why biology tolerates the simplifying assumption that binding is "just $K_d$": one number compresses an entire response curve, and that compression is often accurate enough to be predictive.

4  ·  $\Delta\Delta G$: comparing variants

Absolute $K_d$ values are often hard to compare across labs and assays because they're sensitive to protein concentration, temperature, buffer, labeling, etc. What's usually invariant is the ratio of $K_d$ values between a mutant and a reference (typically wild-type, WT), because systematic errors cancel.

This ratio, expressed as free energy, is $\Delta\Delta G$:

$$\Delta\Delta G \;=\; \Delta G_\mathrm{mut} - \Delta G_\mathrm{WT} \;=\; RT \ln\!\frac{K_{d,\mathrm{mut}}}{K_{d,\mathrm{WT}}}$$
  • $\Delta\Delta G > 0$: the mutation weakens binding (destabilizes the complex).
  • $\Delta\Delta G < 0$: the mutation strengthens binding (stabilizes the complex).
  • $\Delta\Delta G = 0$: the mutation is functionally neutral for this interaction.
Why $\Delta\Delta G$ is the "right" learning signal (for ML people)

A predictor trained on absolute $\Delta G$ has to learn the baseline binding energy of the WT scaffold on top of any sequence-specific effects. That baseline depends on bulk properties (pI, hydrophobicity, overall fold stability) that may not generalize. $\Delta\Delta G$ strips those away and leaves the per-mutation contribution — which is usually additive or nearly so, and is the quantity models like FoldX, Rosetta, and ESM-based predictors actually try to capture.

5  ·  How we measure $K_d$

The standard recipe is a titration:

  1. Fix $[\mathrm{P}]$ at a small, known value (see pitfalls below).
  2. Vary $[\mathrm{L}]$ across ~2–3 orders of magnitude, spanning from well below to well above $K_d$.
  3. Wait for equilibrium.
  4. At each concentration, measure a signal proportional to $[\mathrm{PL}]$ — e.g.\ fluorescence from a labeled ligand that accumulates on the protein.
  5. Fit the Langmuir equation to the signal-vs-concentration points to extract $K_d$.

In the Fordyce lab, we run massively parallel versions of this recipe on microfluidic chips. Three of our most-used platforms:

  • MITOMI ‚Äî Mechanically Induced Trapping of Molecular Interactions. A microfluidic valve briefly traps protein-ligand complexes to a surface, and fluorescence from the bound ligand is imaged. Used for transcription-factor / DNA binding.
  • STAMMP / STAMMPPING ‚Äî Variants of MITOMI that add on-chip protein expression. You can go from DNA template to purified variant to measured $K_d$ in a single day for hundreds to thousands of variants.
  • MRBLE-pep ‚Äî Microspheres with Ratiometric Barcode Lanthanide Encoding. Spectrally encoded hydrogel beads carry directly synthesized peptides; you titrate a labeled protein over a pool and measure per-bead fluorescence for each peptide simultaneously.

Assumptions to verify

  • Equilibrium has been reached. Incubation time must be long compared to $1/(k_\mathrm{on}[\mathrm{L}] + k_\mathrm{off})$. Otherwise your measurement depends on kinetics, not thermodynamics.
  • You are not in the titration regime. If $[\mathrm{P}] \gtrsim K_d$, the ligand pool is depleted by binding, and the apparent half-saturation point shifts from $K_d$ to roughly $[\mathrm{P}]/2$. This is stoichiometric titration, and it measures protein concentration, not affinity. Rule of thumb: keep $[\mathrm{P}] \lesssim K_d/10$.
  • 1:1 binding stoichiometry. If $\mathrm{P}$ binds two $\mathrm{L}$'s (or vice versa), the one-parameter Langmuir is wrong; a Hill-function or two-site model is needed.
  • No cooperativity or oligomerization. If the protein dimerizes in a concentration-dependent way, the apparent $K_d$ shifts. Measure at multiple $[\mathrm{P}]$ to check.
  • Signal is proportional to $[\mathrm{PL}]$, not total ligand. If the label itself perturbs binding or if non-specific surfaces accumulate signal, you measure something other than specific occupancy.

6  ·  Example: real binding curves from our lab

Below are three real Langmuir isotherms from Nguyen et al. 2019, eLife. We measured calcineurin (a phosphatase) binding to a panel of synthetic PxIxIT-family peptides using MRBLE-pep. Each data point is the mean fluorescence signal across 28–50 MRBLE beads at that concentration; error bars are ±1 SD across beads.

Three PxIxIT peptide variants spanning a >4× range of $K_d$. Points are mean ± SD across ~30–50 MRBLE beads per (peptide, concentration) combination; curves are Langmuir fits. The tightest-binding variant, HPRIVITGPH, contains a V→R substitution at the second position of the PxIxIT motif that enhances affinity ~5× over the canonical PVIVIT sequence. Data from Nguyen et al. 2019 eLife Figure 3 source data 1.

A few things to notice in the real data:

  • On log concentration axes, all three curves have the same shape — they're just shifted horizontally by a factor of $K_d$. This is the key visual signature of a 1:1 Langmuir system.
  • The tightest peptide's curve is starting to saturate by the top concentration (2 μM); the weakest is barely past its half-max. Good fits need data that bracket $K_d$ on both sides.
  • Bead-to-bead scatter (the error bars) is small for tight binders and grows for weaker ones, because the absolute signal is smaller and shot/background noise contributes more relatively.

Per-chamber curves: looking at the full replicate structure

The summary plot above hides an important part of the experiment: the replicate structure. A microfluidic platform like STAMMP measures the same variant in many independent reaction chambers on the same chip, and each chamber produces its own complete binding curve. The plot below shows every chamber's raw curve for a single variant (WT MAX binding to the canonical CACGTG E-box) from Hastings et al. 2024, bioRxiv.

Thin light-blue lines: 17 individual microfluidic chambers, each with its own 6-point titration and Langmuir fit. Dark blue markers: median across chambers at each concentration (with inter-quartile range as error bars). Dark blue curve: Langmuir fit to the median points, yielding $K_d = $ 838 nM for WT MAX · CACGTG. Per-chamber $K_d$ values span a tight range (IQR 820–905 nM) — the scatter between chambers reflects the natural device-level variability, and taking the median across chambers averages that out.

Why show every chamber? Two reasons:

  • Sanity check on the fit. If individual chamber curves were all over the place — non-monotonic, flat at low [DNA], or with wildly different plateaus — you'd immediately suspect a systematic problem (expression failed, device leaked, imaging drifted). The fact that all 17 chambers have visually similar shapes is itself evidence that the measurement is working.
  • Honest error bars. Fitting only the median hides how tight the replicate structure actually is. Reporting $K_d = 838 \pm 50$ nM based on 17 independent fits is much more defensible than $K_d = 838$ nM from a single curve.

7  ·  Common pitfalls

  1. Measuring at a single concentration. One fluorescence value can't distinguish "high-affinity but not fully saturated" from "low-affinity at partial occupancy." You need a titration that brackets $K_d$.
  2. Insufficient concentration range. If all your points are well below $K_d$, the signal is linear in $[\mathrm{L}]$ and the fit is unconstrained — any high-$K_d$ value works. If all points are above $K_d$, everything looks saturated. Ideally cover ~$0.1\times K_d$ to ~$10\times K_d$.
  3. Ligand depletion (titration regime). If $[\mathrm{P}]$ is not negligible relative to $K_d$, the apparent curve becomes nearly a straight line up to $[\mathrm{L}] \approx [\mathrm{P}]$ followed by saturation — a stoichiometric titration rather than an affinity measurement. Fix by lowering $[\mathrm{P}]$.
  4. Not at equilibrium. Short incubation times skew curves toward the kinetically accessible subset of complexes. The symptom: the apparent $K_d$ depends on how long you wait. Always verify that increasing incubation time doesn't change the answer.
  5. Non-specific background. Labeled ligand stuck to tubing, beads, or chip surfaces adds a constant offset and inflates apparent $K_d$. Always include ligand-only and protein-only controls.
  6. Error propagation to $\Delta\Delta G$. $\Delta\Delta G$ inherits error from both $K_{d,\mathrm{mut}}$ and $K_{d,\mathrm{WT}}$. If both have ~30% error, $\Delta\Delta G$ has ~0.2 kcal/mol uncertainty. Reporting $\Delta\Delta G$ to two decimal places often overstates precision.
  7. Applying free-buffer $K_d$ to crowded cellular conditions. Cells contain competing binders, depleted co-factor pools, and volume exclusion. In-vitro $K_d$ is the right parameter for fitting mechanisms, but predicting in-cell occupancy requires more than one number.

8  ·  Key formulas in one place

Equilibrium constant: $\;\; K_d = k_\mathrm{off}/k_\mathrm{on} = [\mathrm{P}][\mathrm{L}]/[\mathrm{PL}]$

Langmuir isotherm: $\;\; \theta([\mathrm{L}]) = [\mathrm{L}]/(K_d + [\mathrm{L}])$

Binding free energy: $\;\; \Delta G^\circ = RT \ln K_d$

Relative stability: $\;\; \Delta\Delta G = RT \ln (K_{d,\mathrm{mut}} / K_{d,\mathrm{WT}})$

Useful constant: $\;\; RT \approx 0.593$ kcal/mol at 298 K; so 10× in $K_d$ = 1.36 kcal/mol in $\Delta G$.