Medical Image Analysis ‧ USC 2026

A Distortion-Aware
Learned Similarity
Metric

Classical similarity metrics conflate anatomical identity with acquisition distortion. We propose a metric that explicitly disentangles the two — tracking distortion severity ordinally while preserving identity discrimination across CT and MRI.

⇓ Paper (coming soon) ▩ Code View results ↓

Spearman(α) Distortion tracking

vs 0.52 MSE

0.71

Ratio mean Distortion / identity movement

vs 0.30 MSE

0.08

AUC benign/malignant Threshold classification

vs 0.97 MSE

0.999

mAP retention @α=0.8 Robustness under distortion

vs 71% MSE

88%

Method

Disentangled
Siamese Network

A shared ResNet50 encoder separates each image into two orthogonal subspaces: an anatomy head that encodes patient identity, and a noise head that encodes acquisition distortion level.

Shared ResNet50 encoder

Processes anchor, positive (distorted same-patient), and negative (different-patient) images through identical weights, extracting 2048-d feature vectors.

Anatomy head (128-d)

L2-normalised embeddings encoding patient identity. Same-patient pairs are pulled together via InfoNCE loss regardless of acquisition distortion level.

Noise head (64-d) + threshold

Encodes acquisition distortion severity. An alpha predictor outputs a_hat ∈ [0,1]. Orthogonality loss forces independence from the anatomy subspace.

Threshold objective (TAU=0.3)

shape_a(α) targets distances to 0 below TAU (benign, ignored) and rising above TAU (malignant, detected). No classical metric learns this threshold by design.

Architecture overview

Anchor A

Positive P

Negative N

↓

ResNet50 — shared weights — 2048-d

↓

Anatomy head
z_anat 128-d

Noise head
z_noise 64-d

Loss terms

InfoNCE
same-patient cluster

Distortion regression
shape_a(α) target

Ordering
d_high > d_low

Noise regression
predict true α

Orthogonality — z_anat ⊥ z_noise

shape_a(α) = 0 for α ≤ TAU=0.3 (benign)
shape_a(α) = expm1(K·(α−TAU)) / expm1(K·(1−TAU)) for α > TAU

Results

Two stories, both won

Evaluated on GoldAtlas (prostate) and CFB-GBM (brain) with bootstrap confidence intervals over 100 resamples.

Story 1 — MR / CFB-GBM

Distortion awareness — Spearman(α)

ours

0.688 ▲ best

MSE

0.524

NCC

0.382

DINO

0.486

Story 2 — CT / CFB-GBM

Identity discrimination — AUC

ours

0.9993 ▲ best

MSE

0.9683

SSIM

0.9491

NCC

0.9789

Robustness — MR degradation at α=0.6

mAP retained under moderate distortion (clinically realistic)

~99% ours retained

~75% MSE retained

~63% NCC retained

~62% Grad retained

~84% DINO retained
(but mAP=0.46 baseline)

Datasets

Two domains,
two modalities

Cross-domain evaluation on prostate and brain tumour datasets confirms consistent results independent of anatomy.

🦠

GoldAtlas

Prostate anatomy dataset. CT and T2-MRI volumes with expert segmentations. Used for distortion model development and baseline validation.

30Patients

CT+MRModalities

256²Resolution

🧠

CFB-GBM

Glioblastoma multiforme brain tumour dataset. CT and T1 post-Gd MRI. High patient-to-patient variability from tumour heterogeneity and contrast enhancement.

93Test patients

CT+MRModalities

53/40MR/CT split

Quickstart

Use the metric

Load a pretrained checkpoint and compute pairwise distances between medical images.

# Install dependencies

pip install torch torchvision nibabel pydicom opencv-python

# Load pretrained model

import torch

from model import DisentangledSiamese

model = DisentangledSiamese()

model.load_state_dict(torch.load("best_mr_cfbgbm.pth"))

model.eval()

# Compute anatomical distance between two images

# Returns value in [0,1]: 0 = same anatomy, 1 = different anatomy

z1, *_ = model.forward_once(img_tensor_1)  # (1, 3, 256, 256)

z2, *_ = model.forward_once(img_tensor_2)

distance = 0.5 * (1.0 - F.cosine_similarity(z1, z2))

@article{yan2026distortionmetric,
  title = {A Distortion-Aware Learned Similarity Metric for Robust Medical Image Retrieval},
  author = {Yan, Z. and Li, R.},
  journal = {arXiv preprint},
  year = {2026}
}

A Distortion-AwareLearned SimilarityMetric

DisentangledSiamese Network

Two stories, both won

Two domains,two modalities

Use the metric

A Distortion-Aware
Learned Similarity
Metric

Disentangled
Siamese Network

Two domains,
two modalities