MIA methods × model type × data domain (LLM / MLLM)
| Method category / subclass | Core signal / feature | Access / threat model | Model / modality | Data domain | Public evidence / notes |
|---|---|---|---|---|---|
| (1a) Logit / Loss-based — basic shadow / reference | loss / log-prob; target vs reference gap | Needs logits / likelihood (gray/white-box or limited black-box + reference) | LLM (text-only) | General data | Classic score-based / reference-based MIA; large-scale eval on pretrained LLMs shows many settings are near-random, esp. “big data + few epochs” (Do Membership Inference Attacks Work on Large Language Models?). |
| (1a) | -- | -- | LLM (text-only) | Clinical data | Strong evidence: masked LM on medical notes with likelihood-ratio reaches AUC≈0.90 (Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks); clinical LM (ClinicalBERT/GPT-2, etc.) leak ~7% under white/black-box MIA (Membership Inference Attack Susceptibility of Clinical Language Models); work-in-progress on EHR QA LLM with canonical loss + paraphrasing MIA (Exploring Membership Inference Vulnerabilities in Clinical Large Language Models). |
| (1a) | -- | -- | MLLM (multimodal) | General data | Multimodal score-based MIA uses text-image similarity / confidence: cosine threshold on CLIP (Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study); NeurIPS 2024 benchmark for VLLMs with MaxRényi-K% confidence metric (Membership Inference Attacks against Large Vision-Language Models). |
| (1a) | -- | -- | MLLM (multimodal) | Clinical data | No published systematic loss / log-prob MIA on “medical image + clinical text” multimodal models; clear gap. |
| (1b) Quantile / score-distribution regression | Model non-member score distribution via quantile regression | Needs confidence / score (softmax prob, logit, etc.) | LLM | General data | NeurIPS 2023 uses quantile regression to approximate non-member scores, avoiding multiple shadows; matches classic shadow attacks with far less compute (Scalable Membership Inference Attacks via Quantile Regression). |
| (1b) | -- | -- | LLM | Clinical data | No specific reports on clinical LLM / EHR; clinical work mostly uses simple threshold / likelihood-ratio. |
| (1b) | -- | -- | MLLM | General data | In principle extendable to multimodal confidence / similarity, but published experiments focus on single-modal classification/regression; no explicit MLLM results yet. |
| (1b) | -- | -- | MLLM | Clinical data | No public evidence. |
| (1c) Self-calibrated / reference-free (SPV-MIA) | Model self-generates reference via self-prompt; compare loss / probabilistic variation | Needs token-level probs / NLL (gray-box or black-box that returns per-token scores) | LLM | General data | Self-prompt Calibration (SPV-MIA, NeurIPS 2024) reaches AUC ≈0.9 on fine-tuned LLMs without real shadow data (Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration). |
| (1c) | -- | -- | LLM | Clinical data | No open replication of SPV-MIA on clinical LLM / EHR; clinical setups still use canonical loss-based. |
| (1c) | -- | -- | MLLM | General data | If API exposes token-level probs, can be extended, but VLLM work mostly uses confidence / similarity; no direct experiments reported. |
| (1c) | -- | -- | MLLM | Clinical data | No public evidence. |
| (1d) Neighbourhood / neighbour-text | Generate paraphrase / local-perturb neighbours and compare score gaps | Needs logits / prob or at least loss / score (black-box + scores) | LLM | General data | Neighbourhood comparison (ACL 2023) replaces shadow data with synthetic neighbours; reference-free and matches or exceeds reference-based MIA on benchmarks (Membership Inference Attacks against Language Models via Neighbourhood Comparison). |
| (1d) | -- | -- | LLM | Clinical data | Clinical LLMs haven’t used neighbourhood MIA systematically; Nemecek et al. tried paraphrasing-based perturbation for clinical QA LLM (still canonical loss, not full neighbourhood) (Exploring Membership Inference Vulnerabilities in Clinical Large Language Models). |
| (1d) | -- | -- | MLLM | General data | Could form multi-neighbour via augmentations/prompts for same image/text and combine with cross-modal consistency (see (3) & VL-MIA). Most multimodal work still relies on cosine / MaxRényi-K%, not explicit neighbourhood framework. |
| (1d) | -- | -- | MLLM | Clinical data | No public evidence. |
| (2) Representation-based (embedding / activation) | Geometry of embeddings/activations (distance, density, clusters, layer patterns) | Needs internal features (gray/white-box) | LLM | General data | Standard class in MIA surveys; also covered in recent large-model surveys (Membership Inference Attacks on Large-Scale Models: A Survey). Useful when logits are defended. |
| (2) | -- | -- | LLM | Clinical data | ClinicalBERT privacy eval shows state-of-the-art reference-based MIA had limited ability to distinguish pseudonymized vs non-pseudonymized text, implying embedding/reference MIA may understate clinical PII leakage (Using Membership Inference Attacks to Evaluate Privacy-Preserving Language Modeling Fails for Pseudonymizing Data; see also End-to-end pseudonymization of fine-tuned clinical BERT models). |
| (2) | -- | -- | MLLM | General data | LUMIA uses layer-wise linear probes on internal states across unimodal and multimodal tasks: single-modal avg AUC +15.7%, and 85.9% of multimodal settings AUC>60% (LUMIA: Linear Probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states). |
| (2) | -- | -- | MLLM | Clinical data | No LUMIA-like probing published for medical image+text models; if internal layers are accessible in medical VLMs, this is a natural attack path. |
| (3) Cross-modal / cross-query consistency | Multiple queries / modalities / augmentations of same content; check output consistency, similarity, stability | Black-box, can rely only on generated output; logits not required | LLM | General data | For APIs returning only text, attackers can use multi-paraphrase / prompt robustness gaps to infer membership; statistical analyses warn outcomes are highly sensitive to thresholds and query design (Do Membership Inference Attacks Work on Large Language Models?). |
| (3) | -- | -- | LLM | Clinical data | Nemecek et al. added paraphrasing-based MIA for clinical QA LLM, observing limited but measurable leakage—early exploration of multi-query consistency in clinical settings (Exploring Membership Inference Vulnerabilities in Clinical Large Language Models). |
| (3) | -- | -- | MLLM | General data | VLLM benchmarks emphasize text/image consistency and confidence (VL-MIA pipeline + MaxRényi-K% metric) (Membership Inference Attacks against Large Vision-Language Models); some papers note text log-prob alone may fail in multimodal due to modality interaction (e.g., LLAVA analyses). |
| (3) | -- | -- | MLLM | Clinical data | No system-level studies; medical multimodal (image report + image) would be well-suited for cross-modal consistency MIA, but remains open. |
| (4) Decoding / Perplexity / token dynamics | Token-level loss / perplexity / decoding trajectory (top-k flips, entropy, stepwise change) | Needs token-level output (may not need full logits) | LLM | General data | If token NLL / perplexity is exposed, many classic MIAs can be approximated in a label-only manner; recent statistics show AUC often barely above random in realistic LLM settings and is threshold-sensitive (Do Membership Inference Attacks Work on Large Language Models? and related SoK). |
| (4) | -- | -- | LLM | Clinical data | Clinical LLMs: token-NLL / perplexity black-box MIA on EHR QA (model Llemr) with canonical loss + paraphrasing shows limited but detectable leakage (Exploring Membership Inference Vulnerabilities in Clinical Large Language Models); masked LM setting also aligns with token-level likelihood stats (Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks). |
| (4) | -- | -- | MLLM | General data | If text-side token probs are exposed (caption/QA), LLM MIA applies; VL-MIA includes token-level MIA on LLAVA pretraining data (Membership Inference Attacks against Large Vision-Language Models). Few works study “multimodal token dynamics” explicitly. |
| (4) | -- | -- | MLLM | Clinical data | No public evidence. |
| (5) Label-only MIA | Only generated tokens (semantic similarity, rewritten perplexity proxy, output stability) | Pure black-box (text output only, no logits) | LLM | General data | PETAL (USENIX Sec 2025) uses per-token semantic similarity to approximate perplexity; label-only MIA can match some logits-based attacks on pretrained LLMs (Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models); SoK notes gains are limited in many realistic settings. |
| (5) | -- | -- | LLM | Clinical data | Clinical LLM MIA mostly relies on token-NLL (needs probs); truly “text-only, no probability” label-only MIA is nearly absent in clinical literature. |
| (5) | -- | -- | MLLM | General data | In VLLMs one can query multiple prompts/views and use answer consistency or semantic distance; published work usually still needs some confidence/similarity (cosine or MaxRényi-K%), so strong label-only evidence is scarce. |
| (5) | -- | -- | MLLM | Clinical data | No public evidence. |
| (6) Multimodal / vision-language MIA | Image+text similarity, multimodal logits, token-level image detection, internal feature combos | Needs access to multimodal encoder/decoder logits / features or generated output (black/gray-box) | LLM | General data | Not applicable (needs multimodal). |
| (6) | -- | -- | LLM | Clinical data | Not applicable. |
| (6) | -- | -- | MLLM | General data | Main line of multimodal MIA: ICCV 2023 CLIP cosine + augmentation aggregation (Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study); NeurIPS 2024 VLLM benchmark with token-level image detection pipeline and unified MaxRényi-K% metric (Membership Inference Attacks against Large Vision-Language Models). |
| (6) | -- | -- | MLLM | Clinical data | No published system-level MIA on medical image + text multimodal models; major gap for healthcare multimodal privacy. |
| (7) Distillation-based MIA | Use knowledge distillation to build reference + attack features (loss/entropy/embedding, etc.) | Needs trainable reference model (attacker has compute and approximate data) | LLM | General data | Recent work brings distillation to LLM-based recommender MIA, fusing multi-source signals to boost attacks—shows “distill + MIA” is viable for complex systems (Distillation-based Membership Inference Attacks for LLM-based Recommender Systems). |
| (7) | -- | -- | LLM | Clinical data | No public reports in clinical LLM settings. |
| (7) | -- | -- | MLLM | General data | Could distill multimodal encoder/decoder to a small model for probing, but current multimodal privacy work targets the large model directly; no explicit “distillation-based MLLM MIA” papers yet. |
| (7) | -- | -- | MLLM | Clinical data | No public evidence. |
| (8) Internal-states / layerwise probing | Train linear probe/classifier on intermediate activations to separate member vs non-member | Needs internal states (gray/white-box; feasible for open LLM/MLLM) | LLM | General data | LUMIA trains layer-wise probes and gains notable AUC (+15.7% avg) across many LLMs, analyzing which layers leak most (LUMIA: Linear Probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states). |
| (8) | -- | -- | LLM | Clinical data | No public LUMIA-style probing on clinical-tuned LLMs; technically feasible for open clinical-adapted LLMs (ClinicalBERT, ClinicalGPT, etc.). |
| (8) | -- | -- | MLLM | General data | LUMIA shows vision inputs can amplify leakage: ~85.9% of vision-related settings have AUC>60%, indicating the vision channel can be a leakage amplifier (LUMIA: Linear Probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states). |
| (8) | -- | -- | MLLM | Clinical data | No published LUMIA-style probing for medical vision+text models; if hospital-internal diagnostic VLMs expose intermediate layers, this attack would be highly relevant, but remains “theoretically feasible” only. |
Ranking (clinical focus)
| Rank | Most effective attack type (clinical) | Rationale |
|---|---|---|
| S1 | Loss / NLL-based MIA | Medical text is structured + small-data → overfitting detectable |
| S2 | Token-level perplexity / decoding MIA | Clinical models often expose token-level scores |
| A | Representation-based | Has evidence but weaker separability |
| B | Consistency-based / Label-only | Clinical text space is narrow; signal weaker |
| C | Quantile / SPV / Neighbourhood / Distillation | No clinical evidence |
| D | Multimodal MIA (medical) | Major gap; unstudied |