Medical Image Analysis  ‧  USC 2026

A Distortion-Aware
Learned Similarity
Metric

Classical similarity metrics conflate anatomical identity with acquisition distortion. We propose a metric that explicitly disentangles the two — tracking distortion severity ordinally while preserving identity discrimination across CT and MRI.

Spearman(α) Distortion tracking
vs 0.52 MSE
0.71
Ratio mean Distortion / identity movement
vs 0.30 MSE
0.08
AUC benign/malignant Threshold classification
vs 0.97 MSE
0.999
mAP retention @α=0.8 Robustness under distortion
vs 71% MSE
88%
2
ModalitiesCT & MRI evaluated
93
Test patientsacross two datasets
7
BaselinesMSE, SSIM, NCC, MI, DINO, Grad + CW-SSIM
τ
Trained thresholdseparating benign / malignant distortion

Disentangled
Siamese Network

A shared ResNet50 encoder separates each image into two orthogonal subspaces: an anatomy head that encodes patient identity, and a noise head that encodes acquisition distortion level.

01
Shared ResNet50 encoder
Processes anchor, positive (distorted same-patient), and negative (different-patient) images through identical weights, extracting 2048-d feature vectors.
02
Anatomy head (128-d)
L2-normalised embeddings encoding patient identity. Same-patient pairs are pulled together via InfoNCE loss regardless of acquisition distortion level.
03
Noise head (64-d) + threshold
Encodes acquisition distortion severity. An alpha predictor outputs a_hat ∈ [0,1]. Orthogonality loss forces independence from the anatomy subspace.
04
Threshold objective (TAU=0.3)
shape_a(α) targets distances to 0 below TAU (benign, ignored) and rising above TAU (malignant, detected). No classical metric learns this threshold by design.
Architecture overview
Anchor A
Positive P
Negative N
ResNet50 — shared weights — 2048-d
Anatomy head
z_anat 128-d
 
Noise head
z_noise 64-d
Loss terms
InfoNCE
same-patient cluster
Distortion regression
shape_a(α) target
Ordering
d_high > d_low
Noise regression
predict true α
Orthogonality — z_anat ⊥ z_noise
shape_a(α) = 0  for  α ≤ TAU=0.3  (benign)
shape_a(α) = expm1(K·(α−TAU)) / expm1(K·(1−TAU))  for  α > TAU
Results

Two stories, both won

Evaluated on GoldAtlas (prostate) and CFB-GBM (brain) with bootstrap confidence intervals over 100 resamples.

Story 1 — MR / CFB-GBM
Distortion awareness — Spearman(α)
ours
0.688 ▲ best
MSE
0.524
NCC
0.382
DINO
0.486
Story 2 — CT / CFB-GBM
Identity discrimination — AUC
ours
0.9993 ▲ best
MSE
0.9683
SSIM
0.9491
NCC
0.9789
Robustness — MR degradation at α=0.6
mAP retained under moderate distortion (clinically realistic)
~99% ours retained
~75% MSE retained
~63% NCC retained
~62% Grad retained
~84% DINO retained
(but mAP=0.46 baseline)
Datasets

Two domains,
two modalities

Cross-domain evaluation on prostate and brain tumour datasets confirms consistent results independent of anatomy.

🦠
GoldAtlas
Prostate anatomy dataset. CT and T2-MRI volumes with expert segmentations. Used for distortion model development and baseline validation.
30Patients
CT+MRModalities
256²Resolution
🧠
CFB-GBM
Glioblastoma multiforme brain tumour dataset. CT and T1 post-Gd MRI. High patient-to-patient variability from tumour heterogeneity and contrast enhancement.
93Test patients
CT+MRModalities
53/40MR/CT split
Quickstart

Use the metric

Load a pretrained checkpoint and compute pairwise distances between medical images.

# Install dependencies
pip install torch torchvision nibabel pydicom opencv-python

# Load pretrained model
import torch
from model import DisentangledSiamese

model = DisentangledSiamese()
model.load_state_dict(torch.load("best_mr_cfbgbm.pth"))
model.eval()

# Compute anatomical distance between two images
# Returns value in [0,1]: 0 = same anatomy, 1 = different anatomy
z1, *_ = model.forward_once(img_tensor_1) # (1, 3, 256, 256)
z2, *_ = model.forward_once(img_tensor_2)
distance = 0.5 * (1.0 - F.cosine_similarity(z1, z2))
@article{yan2026distortionmetric,
  title = {A Distortion-Aware Learned Similarity Metric for Robust Medical Image Retrieval},
  author = {Yan, Z. and Li, R.},
  journal = {arXiv preprint},
  year = {2026}
}