2025调研

论文调研

发布日期: 2025-08-28

更新日期: 2025-09-29

文章字数: 6.7k

阅读时长: 35 分

阅读次数:

AI声明：筛选、整理过程存在AI辅助，谨慎使用

USENIX 2024

错误注入和鲁棒性

DNN-GP: Diagnosing and Mitigating Model’s Faults Using Latent Concepts.
Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection.
Tossing in the Dark: Practical Bit-Flipping on Gray-box Deep Neural Networks for Runtime Trojan Injection.
Forget and Rewire: Enhancing the Resilience of Transformer-based Models against Bit-Flip Attacks.
大模型攻击与防御
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection.
REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models. 水印
Formalizing and Benchmarking Prompt Injection Attacks and Defenses.
Instruction Backdoor Attacks Against Customized LLMs.
安全ML
AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE.
Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions.
OblivGNN: Oblivious Inference on Transductive and Inductive Graph Neural Network.
MD-ML: Super Fast Privacy-Preserving Machine Learning for Malicious Security with a Dishonest Majority.
Accelerating Secure Collaborative Machine Learning with Protocol-Aware RDMA.
隐私推理
A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data.
Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models.
MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training.
Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks.
Property Existence Inference against Generative Models.
How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers.
Reconstructing training data from document understanding models.
Privacy Side Channels in Machine Learning Systems.
FaceObfuscator: Defending Deep Learning-based Privacy Attacks with Gradient Descent-resistant Features in Face Recognition.
后门
Neural Network Semantic Backdoor Detection and Mitigation: A Causality-Based Approach.
On the Difficulty of Defending Contrastive Learning against Backdoor Attacks.
Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models.
Xplain: Analyzing Invisible Correlations in Model Explanation.
Verify your Labels! Trustworthy Predictions and Datasets via Confidence Scores.
Digital Adversarial Attacks
More Simplicity for Trainers, More Opportunity for Attackers: Black-Box Attacks on Speaker Recognition Systems by Inferring Feature Extractor.
Transferability of White-box Perturbations: Query-Efficient Adversarial Attacks against Commercial DNN Services.
Adversarial Illusions in Multi-Modal Embeddings.
It Doesn’t Look Like Anything to Me: Using Diffusion Model to Subvert Visual Phishing Detectors.
Invisibility Cloak: Proactive Defense Against Visual Game Cheating.
对抗攻防
Correction-based Defense Against Adversarial Video Attacks via Discretization-Enhanced Video Compressive Sensing.
Rethinking the Invisible Protection against Unauthorized Image Usage in Stable Diffusion.
Splitting the Difference on Adversarial Training.
Machine Learning needs Better Randomness Standards: Randomised Smoothing and PRNG-based attacks.
PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses.
评估和最好的实践
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models
后门
UBA-Inf: Unlearning Activated Backdoor Attack with Influence-Driven Camouflage
越狱
LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks.
Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models.
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services.
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction.
模型萃取和水印
SoK: All You Need to Know About On-Device ML Model Extraction - The Gap Between Research and Practice.
Unveiling the Secrets without Data: Can Graph Neural Networks Be Exploited through Data-Free Model Extraction Attacks?
ClearStamp: A Human-Visible and Robust Model-Ownership Proof based on Transposed Model Training.
DeepEclipse: How to Break White-Box DNN-Watermarking Schemes.
ModelGuard: Information-Theoretic Defense Against Model Extraction Attacks.

大模型滥用

Moderating Illicit Online Image Promotion for Unsafe User Generated Content Games Using Large Vision-Language Models.
Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text.
Prompt Stealing Attacks Against Text-to-Image Generation Models.
Quantifying Privacy Risks of Prompts in Visual Prompt Learning.
安全分析
Hijacking Attacks against Neural Network by Analyzing Training Data.
False Claims against Model Ownership Resolution.
Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications.
Information Flow Control in Machine Learning through Modular Model Architecture.
物理对抗攻击
Devil in the Room: Triggering Audio Backdoors in the Physical World.
FraudWhistler: A Resilient, Robust and Plug-and-play Adversarial Example Detection Method for Speaker Recognition.
pi-Jack: Physical-World Adversarial Attack on Monocular Depth Estimation with Perspective Hijacking.
AE-Morpher: Improve Physical Robustness of Adversarial Objects against LiDAR-based Detectors via Object Reconstruction.

用户研究

“I Don’t Know If We’re Doing Good. I Don’t Know If We’re Doing Bad”: Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products.
Towards More Practical Threat Models in Artificial Intelligence Security.

USENIX 2025 cycle1

攻击 (Attacks)

越狱与提示工程 (Jailbreaking & Prompt Engineering)

PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Exposing the Guardrails: Reverse-Engineering and Jailbreaking Safety Filters in DALL·E Text-to-Image Pipelines
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators

数据投毒与后门 (Data Poisoning & Backdoors)

PoiSAFL: Scalable Poisoning Attack Framework to Byzantine-resilient Semi-asynchronous Federated Learning
Persistent Backdoor Attacks in Continual Learning
From Purity to Peril: Backdooring Merged Models From “Harmless” Benign Components

成员与属性推理 (Membership & Attribute Inference)

Enhanced Label-Only Membership Inference Attacks with Fewer Queries
Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses
Membership Inference Attacks Against Vision-Language Models
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models

对抗性与物理攻击 (Adversarial & Physical Attacks)

Fighting Fire with Fire: Continuous Attack for Adversarial Android Malware Detection
Atkscopes: Multiresolution Adversarial Perturbation as a Unified Attack on Perceptual Hashing and Beyond
Invisible but Detected: Physical Adversarial Shadow Attack and Defense on LiDAR Object Detection

系统与硬件漏洞利用 (System & Hardware Exploits)

NeuroScope: Reverse Engineering Deep Neural Network on Edge Devices using Dynamic Analysis
BarraCUDA: Edge GPUs do Leak DNN Weights
Not so Refreshing: Attacking GPUs using RFM Rowhammer Mitigation
Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
When Translators Refuse to Translate: A Novel Attack to Speech Translation Systems
Chimera: Creating Digitally Signed Fake Photos by Fooling Image Recapture and Deepfake Detectors

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computation)

DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client Momentum
LOHEN: Layer-wise Optimizations for Neural Network Inferences over Encrypted Data with High Performance or Accuracy
Task-Oriented Training Data Privacy Protection for Cloud-based Model Training
Arbitrary-Threshold Fully Homomorphic Encryption with Lower Complexity
zkGPT: An Efficient Non-interactive Zero-knowledge Proof Framework for LLM Inference
Distributed Private Aggregation in Graph Neural Networks
Phantom: Privacy-Preserving Deep Neural Network Model Obfuscation in Heterogeneous TEE and GPU System

鲁棒性与认证 (Robustness & Certification)

Robustifying ML-powered Network Classifiers with PANTS
AGNNCert: Defending Graph Neural Networks against Arbitrary Perturbations with Deterministic Certification
CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization
CertPHash: Towards Certified Perceptual Hashing via Robust Training

越狱与提示注入防御 (Jailbreak & Prompt Injection Defense)

StruQ: Defending Against Prompt Injection with Structured Queries
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

水印与知识产权保护 (Watermarking & IP Protection)

THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models
AudioMarkNet: Audio Watermarking for Deepfake Speech Detection
Provably Robust Multi-bit Watermarking for AI-generated Text
LLMmap: Fingerprinting for Large Language Models
LightShed: Defeating Perturbation-based Image Copyright Protections

后门及通用防御 (Backdoor & General Defense)

Dormant: Defending against Pose-driven Human Image Animation
SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
Pretender: Universal Active Defense against Diffusion Finetuning Attacks
Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data

漏洞/分析 (Vulnerabilities/Analysis)

Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification
The Ghost Navigator: Revisiting the Hidden Vulnerability of Localization in Autonomous Driving
Revisiting Training-Inference Trigger Intensity in Backdoor Attacks
Evaluating LLM-based Personal Information Extraction and Countermeasures
Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models
SoK: On Gradient Leakage in Federated Learning
VoiceWukong: Benchmarking Deepfake Voice Detection
Analyzing the AI Nudification Application Ecosystem
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Watch the Watchers! On the Security Risks of Robustness-Enhancing Diffusion Models
From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
NOKEScam: Understanding and Rectifying Non-Sense Keywords Spear Scam in Search Engines

NDSS 2025

攻击 (Attacks)

成员推理与数据重建 (Membership Inference & Data Reconstruction)

A Method to Facilitate Membership Inference Attacks in Deep Learning Models.
Black-box Membership Inference Attacks against Fine-tuned Diffusion Models.
Passive Inference Attacks on Split Learning via Adversarial Regularization.
RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Adversarial Data Manipulation.
Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction.
URVFL: Undetectable Data Reconstruction Attack on Vertical Federated Learning.

对抗性与物理攻击 (Adversarial & Physical Attacks)

AlphaDog: No-Box Camouflage Attacks via Alpha Channel Oversight.
Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems.
On the Realism of LiDAR Spoofing Attacks against Autonomous Driving Vehicle at High Speed and Long Distance.
PhantomLiDAR: Cross-modality Signal Injection Attacks against LiDAR.
Revisiting Physical-World Adversarial Attack on Traffic Sign Recognition: A Commercial Systems Perspective.
L-HAWK: A Controllable Physical Adversarial Patch Against a Long-Distance Target.

后门与木马 (Backdoor & Trojan)

The Philosopher’s Stone: Trojaning Plugins of Large Language Models.
LADDER: Multi-Objective Backdoor Attack via Evolutionary Algorithm.

其他攻击 (Other Attacks)

I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving.
Automated Mass Malware Factory: The Convergence of Piggybacking and Adversarial Example in Android Malicious Software Generation.

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computing)

BumbleBee: Secure Two-party Inference Framework for Large Transformers.
Diffence: Fencing Membership Privacy With Diffusion Models.
Secure Transformer Inference Made Non-interactive.
A New PPML Paradigm for Quantized Models.
Defending Against Membership Inference Attacks on Iteratively Pruned Deep Neural Networks.
DLBox: New Model Training Framework for Protecting Training Data.
MingledPie: A Cluster Mingling Approach for Mitigating Preference Profiling in CFL.
Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models.
SHAFT: Secure, Handy, Accurate and Fast Transformer Inference.
SIGuard: Guarding Secure Inference with Post Data Privacy.

后门检测与防御 (Backdoor Detection & Defense)

CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models.
BARBIE: Robust Backdoor Detection Based on Latent Separability.
DShield: Defending against Backdoor Attacks on Graph Neural Networks via Discrepancy Learning.
PBP: Post-training Backdoor Purification for Malware Classifiers.
SafeSplit: A Novel Defense Against Client-Side Backdoor Attacks in Split Learning.

模型遗忘与审查 (Model Unlearning & Censorship)

Reinforcement Unlearning.
THEMIS: Regulating Textual Inversion for Personalized Concept Censorship.
TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents.

数据保护与内容审核 (Data Protection & Content Moderation)

Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution.
GAP-Diff: Protecting JPEG-Compressed Images from Diffusion-based Facial Customization.
Provably Unlearnable Data Examples.
SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers.
Try to Poison My Deep Learning Data? Nowhere to Hide Your Trajectory Spectrum!

通用防御与架构 (General Defense & Architecture)

CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling.
ASGARD: Protecting On-Device Deep Neural Networks with Virtualization-Based Trusted Execution Environments.
BitShield: Defending Against Bit-Flip Attacks on DNN Executables.
Density Boosts Everything: A One-stop Strategy for Improving Performance, Robustness, and Sustainability of Malware Detectors.
IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems.
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing.
Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Malware with Minimal Samples.

漏洞/分析 (Vulnerabilities/Analysis)

Compiled Models, Built-In Exploits: Uncovering Pervasive Bit-Flip Attack Surfaces in DNN Executables.
Revisiting EM-based Estimation for Locally Differentially Private Protocols.
Understanding Data Importance in Machine Learning Attacks: Does Valuable Data Pose Greater Harm?
Do We Really Need to Design New Byzantine-robust Aggregation Rules?
On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks.
Safety Misalignment Against Large Language Models.
Towards Understanding Unsafe Video Generation.

CCS 2024

攻击 (Attacks)

隐私攻击 (Privacy Attacks)

Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack.
QueryCheetah: Fast Automated Discovery of Attribute Inference Attacks Against Query-Based Systems.
Membership Inference Attacks Against In-Context Learning.
SeqMIA: Sequential-Metric Based Membership Inference Attack.
PLeak: Prompt Leaking Attacks against Large Language Model Applications.
Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks.
A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability.

对抗性与物理攻击 (Adversarial & Physical Attacks)

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence.
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems.
SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems.
The Invisible Polyjuice Potion: an Effective Physical Adversarial Attack against Face Recognition.
Manipulative Interference Attacks.

数据投毒与后门 (Data Poisoning & Backdoors)

Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning.
Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses.
Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols.
BadMerging: Backdoor Attacks Against Model Merging.
Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense.

模型与系统利用 (Model & System Exploits)

Inbox Invasion: Exploiting MIME Ambiguities to Evade Email Attachment Detectors.
Optimization-based Prompt Injection Attack to LLM-as-a-Judge.
Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data.
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution.
“Modern problems require modern solutions”: Community-Developed Techniques for Online Exam Proctoring Evasion.
Not One Less: Exploring Interplay between User Profiles and Items in Untargeted Attacks against Federated Recommendation.
HyperTheft: Thieving Model Weights from TEE-Shielded Neural Networks via Ciphertext Side Channels.
DeepCache: Revisiting Cache Side-Channel Attacks in Deep Neural Networks Executables.

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computing)

Camel: Communication-Efficient and Maliciously Secure Federated Learning in the Shuffle Model of Differential Privacy.
S2NeRF: Privacy-preserving Training Framework for NeRF.
$DPM: $ Clustering Sensitive Data through Separation.
S-BDT: Distributed Differentially Private Boosted Decision Trees.
Cross-silo Federated Learning with Record-level Personalized Differential Privacy.
Membership Inference Attacks against Vision Transformers: Mosaic MixUp Training to the Defense.
Formal Privacy Proof of Data Encoding: The Possibility and Impossibility of Learnable Encryption.
Elephants Do Not Forget: Differential Privacy with State Continuity for Privacy Budget.
ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support.
Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy.
Securing Floating-Point Arithmetic for Noise Addition.
Rhombus: Fast Homomorphic Matrix-Vector Multiplication for Secure Two-Party Inference.
Byzantine-Robust Decentralized Federated Learning.
Sparrow: Space-Efficient zkSNARK for Data-Parallel Circuits and Applications to Zero-Knowledge Decision Trees.
AirGapAgent: Protecting Privacy-Conscious Conversational Agents.
CoGNN: Towards Secure and Efficient Collaborative Graph Learning.
Computationally Secure Aggregation and Private Information Retrieval in the Shuffle Model.
Zero-Knowledge Proofs of Training for Deep Neural Networks.
zkLLM: Zero Knowledge Proofs for Large Language Models.
Securely Training Decision Trees Efficiently.
Poster: End-to-End Privacy-Preserving Vertical Federated Learning using Private Cross-Organizational Data Collaboration.
Poster: Protection against Source Inference Attacks in Federated Learning using Unary Encoding and Shuffling.

内容审核与模型安全 (Content Moderation & Model Safety)

A Causal Explainable Guardrails for Large Language Models.
Legilimens: Practical and Unified Content Moderation for Large Language Model Services.
Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies.
PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs).
SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models.
Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code.

鲁棒性与检测 (Robustness & Detection)

Training Robust ML-based Raw-Binary Malware Detectors in Hours, not Months.
SpecGuard: Specification Aware Recovery for Robotic Autonomous Vehicles from Physical Attacks.
VisionGuard: Secure and Robust Visual Perception of Autonomous Vehicles in Practice.
PhyScout: Detecting Sensor Spoofing Attacks via Spatio-temporal Consistency.
Alchemy: Data-Free Adversarial Training.
I Don’t Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors.
PhySense: Defending Physically Realizable Attacks for Autonomous Systems via Consistency Reasoning.
Fisher Information guided Purification against Backdoor Attacks.
Mithridates: Auditing and Boosting Backdoor Resistance of Machine Learning Pipelines.
ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models.
Poster: AuditVotes: A Framework towards Deployable Certified Robustness for GNNs.
Towards Proactive Protection against Unauthorized Speech Synthesis.

数据保护与审计 (Data Protection & Auditing)

A General Framework for Data-Use Auditing of ML Models.
MaskPrint: Take the Initiative in Fingerprint Protection to Mitigate the Harm of Data Breach.
Dye4AI: Assuring Data Boundary on Generative AI Services.
TabularMark: Watermarking Tabular Datasets for Machine Learning.
Beowulf: Mitigating Model Extraction Attacks Via Reshaping Decision Regions.
ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach.
Pulsar: Secure Steganography for Diffusion Models.
Demo: FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation.
Catch Me if You Can: Detecting Unauthorized Data Use in Training Deep Learning Models.
Poster: Solving the Free-rider Problem in Bittensor.
Poster: Enhance Hardware Domain Specific Large Language Model with Reinforcement Learning for Resilience.

漏洞/分析 (Vulnerabilities/Analysis)

Evaluations of Machine Learning Privacy Defenses are Misleading.
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks.
Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility.
“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models.
Demystifying RCE Vulnerabilities in LLM-Integrated Apps.
Using AI Assistants in Software Development: A Qualitative Study on Security Practices and Concerns.
Analyzing Inference Privacy Risks Through Gradients In Machine Learning.
PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps.
Uncovering Gradient Inversion Risks in Practical Language Model Training.
Avara: A Uniform Evaluation System for Perceptibility Analysis Against Adversarial Object Evasion Attacks.
Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution.
Blind and Low-Vision Individuals’ Detection of Audio Deepfakes.
Privacy Analyses in Machine Learning.
Novel Privacy Attacks and Defenses Against Neural Networks.

CCS 2025 first cycle

攻击 (Attacks)

隐私与数据提取 (Privacy & Data Extraction)

Prompt Inference Attack on Distributed Large Language Model Inference Frameworks
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
Differentiation-Based Extraction of Proprietary Data from Fine-tuned LLMs

数据投毒与对抗性攻击 (Data Poisoning & Adversarial Attacks)

Poisoning Attacks to Local Differential Privacy for Ranking Estimation
ControlLoc: Physical-World Hijacking Attack on Camera-based Perception in Autonomous Driving
On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling
One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
Busting the Paper Ballot: Voting Meets Adversarial Machine Learning

防御 (Defenses)

模型遗忘与隐私保护 (Unlearning & Privacy)

Split Unlearning
Rethinking Machine Unlearning in Image Generation Models
Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models
LZKSA: Lattice-based special zero-knowledge proofs for secure aggregation’s input verification
Prototype Surgery: Tailoring Neural Prototypes via Soft Labels for Efficient Machine Unlearning
Secure Noise Sampling for Differentially Private Collaborative Learning
Founding Zero-Knowledge Proof of Training on Optimum Vicinity
Gibbon: Faster Secure Two-party Training of Gradient Boosting Decision Tree

后门与恶意软件防御 (Backdoor & Malware Defense)

Combating Concept Drift with Explanatory Detection and Adaptation for Android Malware Classification
PoisonSpot: Precise Spotting of Clean-Label Backdoors via Fine-Grained Training Provenance Tracking
Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model
FilterFL: Knowledge Filtering-based Data-Free Backdoor Defense for Federated Learning

安全与鲁棒性系统 (Secure & Robust Systems)

TensorShield: Safeguarding On-Device Inference by Shieldin g Critical DNN Tensors with TEE
Sylva: Tailoring Personalized Adversarial Defense in Pre-trained Models via Collaborative Fine-tuning
RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
SecAlign: Defending Against Prompt Injection with Preference Optimization
A Practical and Secure Byzantine Robust Aggregator
DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

漏洞/分析 (Vulnerabilities/Analysis)

Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble
Towards Backdoor Stealthiness in Model Parameter Space
What’s Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift

CCS 2025 second cycle

Adversarial Attacks & Robustness (对抗性攻击与模型鲁棒性)

这类攻击通过对输入数据（如图像、文本、传感器信号）进行微小、人难以察觉的扰动，来欺骗模型做出错误的判断。

Adversarial Observations in Weather Forecasting: 探讨在气象预报中引入对抗性观测数据的影响 (攻击)。
Asymmetry Vulnerability and Physical Attacks on Online Map Construction for Autonomous Driving: 针对自动驾驶地图构建的物理世界攻击 (攻击)。
Evaluating the robustness of a production malware detection system to transferable adversarial attacks: 评估恶意软件检测系统对可迁移对抗性攻击的鲁棒性 (评估/攻击)。
Threat from Windshield: Vehicle Windows as Involuntary Attack Sources on Automotive Voice Assistants: 利用挡风玻璃作为媒介，对车载语音助手进行物理攻击 (攻击)。
Adversarially Robust Assembly Language Model for Packed Executables Detection: 针对加壳可执行文件的对抗性鲁棒语言模型 (防御)。
Exact Robustness Certification of k-Nearest Neighbors: 对k近邻算法提供可证明的鲁棒性保证 (防御)。
Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement: 对神经网络中由对抗性攻击等引起的缺陷进行可证明的修复 (防御)。
Towards Real-Time Defense Against Object-Based LiDAR Attacks in Autonomous Driving: 针对自动驾驶中激光雷达的实时攻击防御 (防御)。

Privacy Attacks (隐私攻击)

这类攻击旨在从模型或系统中窃取敏感信息，例如训练数据、用户查询或个人身份信息。常见类型包括成员推断、模型提取和侧信道攻击。

Peekaboo, I See Your Queries: Passive Attacks Against DSSE Via Intermittent Observations: 通过间歇性观察对动态对称可搜索加密方案进行被动式攻击，窃取查询信息 (攻击)。
DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation: 针对检索增强生成（RAG）模型的成员推断攻击，判断特定数据是否被用于训练 (攻击)。
Can Personal Health Information Be Secured in LLM? Privacy Attack and Defense in the Medical Domain: 探讨大语言模型在医疗领域的隐私攻击与防御 (攻击/防御)。
Timing Attacks on Differential Privacy are Practical: 证明了对差分隐私机制的计时攻击在实践中是可行的 (攻击)。
Byte by Byte: Unmasking Browser Fingerprinting at the Function Level using V8 Bytecode Transformers: 通过分析V8字节码来揭示和增强浏览器指纹识别技术 (攻击)。
MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs: 利用侧信道攻击窃取混合专家模型（MoE）中的用户隐私 (攻击)。
Safeguarding Graph Neural Networks against Topology Inference Attacks: 防御旨在推断图结构（如社交网络关系）的拓扑推断攻击 (防御)。
You Can’t Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors: 通过系统向量缓解大语言模型中的提示词泄露问题 (防御)。
Mosformer: Maliciously Secure Three-Party Inference Framework for Large Transformers: 一个保护隐私的安全多方计算框架，用于Transformer模型的推理 (防御)。
THOR: Secure Transformer Inference with Homomorphic Encryption: 使用同态加密技术实现安全的Transformer模型推理，保护数据机密性 (防御)。
PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization: 一种优化差分隐私深度学习效用的新方法 (防御)。
IOValve: Leakage-Free I/O Sandbox for Large-Scale Untrusted Data Processing: 为大规模不可信数据处理设计的无泄漏I/O沙箱 (防御)。
Zero-Knowledge AI Inference with High Precision

Data Poisoning & Backdoor Attacks (数据投毒与后门攻击)

这类攻击通过向训练数据中注入少量精心制作的“毒样本”，在模型中植入“后门”。模型在正常输入下表现正常，但在遇到包含特定触发器（trigger）的输入时，会产生攻击者预设的恶意行为。

VillainNet: Targeted Poisoning Attacks Against SuperNets Along the Accuracy-Latency Pareto Frontier: 针对超网（SuperNets）的精确投毒攻击 (攻击)。
The Phantom Menace in PET-Hardened Deep Learning Models: Invisible Configuration-Induced Attacks: 揭示了在参数高效微调（PET）模型中由配置引发的隐形攻击，类似于后门 (攻击)。
Cascading Adversarial Bias from Injection to Distillation in Language Models: 探讨对抗性偏见如何从注入阶段传播到模型蒸馏阶段，是一种偏见投毒 (攻击)。
On Hyperparameters and Backdoor-Resistance in Horizontal Federated Learning: 研究水平联邦学习中超参数对后门攻击抵抗性的影响 (评估/防御)。
Deep Learning from Imperfectly Labeled Malware Data: 研究在不完美标注的恶意软件数据上进行学习，这与投毒攻击场景相关 (评估/防御)。
Armadillo: Robust Single-Server Secure Aggregation for Federated Learning with Input Validation: 在联邦学习中抵抗投毒攻击的安全聚合协议 (防御)。
Sentry: Authenticating Machine Learning Artifacts on the Fly: 用于实时验证机器学习模型和数据真实性的框架，可抵御投毒和篡改 (防御)。

Prompt Injection & LLM Manipulation (提示注入与大模型操纵)

这类攻击主要针对基于大语言模型（LLM）的应用，特别是检索增强生成（RAG）系统。攻击者通过构造恶意提示词（Prompt）或污染外部知识库，来操纵模型的输出，使其泄露信息、产生有害内容或执行非预期任务。

FlippedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models: 对RAG系统进行黑盒攻击，以操纵其生成的观点和内容 (攻击)。
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search: 探索基于密集嵌入的检索系统的漏洞，这种系统是RAG的核心 (攻击)。
ImportSnare: Directed ”Code Manual” Hijacking in Retrieval-Augmented Code Generation: 针对RAG代码生成系统的“代码手册”劫持攻击 (攻击)。
Here Comes The AI Worm: Preventing the Propagation of Adversarial Self-Replicating Prompts Within GenAI Ecosystems: 探讨可自我复制的对抗性提示（AI蠕虫）在生成式AI生态系统中的传播与防御 (攻击/防御)。
Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection: 通过知识注入的方式来保护RAG代码生成系统的安全 (防御)。

System, Software & Hardware Vulnerabilities (系统、软件与硬件漏洞)

这类研究关注的是机器学习系统所依赖的底层软件、硬件或网络协议中的安全漏洞，而不仅是模型本身。

PickleBall: Secure Deserialization of Pickle-based Machine Learning Models: 关注Python Pickle格式在加载ML模型时的反序列化漏洞及安全措施 (攻击/防御)。
Denial of Sequencing Attacks in Ethereum Layer 2 Rollups: 针对以太坊二层扩容方案（Rollups）的拒绝服务攻击 (攻击)。
Automatic Discovery of User-exploitable Architectural Security Vulnerabilities in Closed-Source RISC-V CPUs: 自动发现闭源RISC-V处理器中的体系结构安全漏洞 (攻击/工具)。
Styled to Steal: The Overlooked Attack Surface in Email Clients: 揭示了电子邮件客户端中被忽视的攻击面 (攻击)。
Chekhov’s Gun: Uncovering Hidden Risks in macOS Application-Sandboxed PID-Domain Services: 发现macOS沙箱服务中的隐藏安全风险 (攻击)。
Deep Dive into In-app Browsers: Uncovering Hidden Pitfalls in Certificate Validation: 揭露应用内浏览器在证书验证方面的安全隐患 (攻击)。
Hardening Deep Neural Network Binaries against Reverse Engineering Attacks: 强化深度学习模型二进制文件以抵抗逆向工程攻击 (防御)。
CITesting: Systematic Testing of Context Integrity Violations in Cellular Core Networks: 对蜂窝网络核心网中上下文完整性破坏漏洞进行系统性测试 (测试/评估)。

Model/Data Integrity & Provenance Attacks (模型/数据完整性与溯源攻击)

这类攻击的目标是破坏用于验证模型或数据来源的机制，例如数字水印。

Removal Attack and Defense on AI Generated Content Latent-based Watermarking: 针对AIGC内容中基于潜在空间的水印的移除攻击与防御 (攻击/防御)。
PreferCare: Preference Dataset Copyright Protection in LLM Alignment by Watermark Injection and Verification: 通过水印保护用于LLM对齐的偏好数据集的版权 (防御)。

Security Auditing, Benchmarking & Measurement (安全审计、基准与测量)

这些论文不一定提出新的攻击或防御方法，而是专注于开发工具、基准（Benchmark）或进行大规模测量，以评估和理解现有系统的安全状况。

What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale: 对大规模共享的扩散模型进行概念审计，以发现潜在风险 (审计)。
The Odyssey of robots.txt Governance: Measuring Convention Implications of Web Bots in Large Language Model Services: 测量robots.txt协议对大模型网络爬虫的影响，评估其治理现状 (测量)。
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images: 为图像安全分类器提供一个包含真实和AI生成图像的基准测试集 (基准)。
YouthSafe: A Youth-Centric Safety Benchmark and Safeguard Model for Large Language Models: 面向青少年的大模型安全基准和保护模型 (基准/防御)。
Automatically Detecting Online Deceptive Patterns: 自动检测网络中的欺骗性模式（如暗模式） (工具/检测)。
OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs: 利用审计日志和LLM来重构高级持续性威胁（APT）的攻击链 (工具/检测)。
Accountable Liveness

Comprehensive Defense Frameworks (综合性防御框架)

这类工作提供端到端或系统性的安全框架，旨在防御多种类型的攻击。

AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents: 为操作计算机的AI智能体（Agent）设计的端到端实时安全防御框架 (防御)。

S&P 2025

攻击 (Attacks)

提示工程与越狱 (Prompt Engineering & Jailbreaking)

Modifier Unlocked: Jailbreaking Text-to-Image Models Through Prompts.
Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-to-Image Generation Models.
On the Effectiveness of Prompt Stealing Attacks on In-the-Wild Prompts.
Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-Based Prompt Injection Attacks via the Fine-Tuning Interface.
Prompt Inversion Attack Against Collaborative Inference of Large Language Models.

数据投毒与后门 (Data Poisoning & Backdoors)

Preference Poisoning Attacks on Reward Model Learning.
Architectural Neural Backdoors from First Principles.
Practical Poisoning Attacks with Limited Byzantine Clients in Clustered Federated Learning.

模型与数据窃取 (Model & Data Extraction)

Codebreaker: Dynamic Extraction Attacks on Code Language Models.
Rigging the Foundation: Manipulating Pre-training for Advanced Membership Inference Attacks.
UnMarker: A Universal Attack on Defensive Image Watermarking.
CipherSteal: Stealing Input Data from TEE-Shielded Neural Networks with Ciphertext Side Channels.

其他攻击 (Other Attacks)

My Model is Malware to You: Transforming AI Models into Malware by Abusing TensorFlow APIs.
Make a Feint to the East While Attacking in the West: Blinding LLM-Based Code Auditors with Flashboom Attacks.
The Inadequacy of Similarity-Based Privacy Metrics: Privacy Attacks Against “Truly Anonymous” Synthetic Datasets.
EvilHarmony: Stealthy Adversarial Attacks Against Black-Box Speech Recognition Systems.
Investigating Physical Latency Attacks Against Camera-Based Perception.

防御 (Defenses)

后门与攻击检测 (Backdoor & Attack Detection)

Secure Transfer Learning: Training Clean Model Against Backdoor in Pre-Trained Encoder and Downstream Dataset.
Query Provenance Analysis: Efficient and Robust Defense Against Query-Based Black-Box Attacks.
BAIT: Large Language Model Backdoor Scanning by Inverting Attack Target.
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning.
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks.
Lombard-VLD: Voice Liveness Detection Based on Human Auditory Feedback.

隐私保护与安全计算 (Privacy & Secure Computing)

GRID: Protecting Training Graph from Link Stealing Attacks on GNN Models.
SHARK: Actively Secure Inference Using Function Secret Sharing.
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity.
FairZK: A Scalable System to Prove Machine Learning Fairness in Zero-Knowledge.
PAC-Private Algorithms.
An Attack-Agnostic Defense Framework Against Manipulation Attacks Under Local Differential Privacy.
From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis.

鲁棒性与对抗防御 (Robustness & Adversarial Defense)

TSQP: Safeguarding Real-Time Inference for Quantization Neural Networks on Edge Devices.
Fight Fire with Fire: Combating Adversarial Patch Attacks using Pattern-randomized Defensive Patches.
Adversarial Robust ViT-Based Automatic Modulation Recognition in Practical Deep Learning-Based Wireless Systems.
EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations.
Spoofing Eavesdroppers with Audio Misinformation.

通用防御与审计 (General Defense & Auditing)

Edge Unlearning is Not “on Edge”! an Adaptive Exact Unlearning System on Resource-Constrained Devices.
Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models.
Watermarking Language Models for Many Adaptive Users.
Guardain: Protecting Emerging Generative AI Workloads on Heterogeneous NPU.

漏洞/分析 (Vulnerabilities/Analysis)

Understanding Users’ Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms.
On the (In)Security of LLM App Stores.
SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers.
On the Conflict Between Robustness and Learning in Collaborative Machine Learning.
Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning.
SoK: Watermarking for AI-Generated Content.
From One Stolen Utterance: Assessing the Risks of Voice Cloning in the AIGC Era.

wangxh

https://blog.expecto.top/2025/08/28/2025-diao-yan/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 wangxh !

论文调研

本篇

2025调研

2025-08-28 wangxh

论文调研

A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations 阅读笔记

2025-04-03 wangxh

机器学习综述大语言模型

USENIX 2024

错误注入和鲁棒性

大模型攻击与防御

安全ML

隐私推理

后门

Digital Adversarial Attacks

对抗攻防

评估和最好的实践

后门

越狱

模型萃取和水印

大模型滥用

安全分析

物理对抗攻击

用户研究

USENIX 2025 cycle1

攻击 (Attacks)

越狱与提示工程 (Jailbreaking & Prompt Engineering)

数据投毒与后门 (Data Poisoning & Backdoors)

成员与属性推理 (Membership & Attribute Inference)

对抗性与物理攻击 (Adversarial & Physical Attacks)

系统与硬件漏洞利用 (System & Hardware Exploits)

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computation)

鲁棒性与认证 (Robustness & Certification)

越狱与提示注入防御 (Jailbreak & Prompt Injection Defense)

水印与知识产权保护 (Watermarking & IP Protection)

后门及通用防御 (Backdoor & General Defense)

漏洞/分析 (Vulnerabilities/Analysis)

NDSS 2025

攻击 (Attacks)

成员推理与数据重建 (Membership Inference & Data Reconstruction)

对抗性与物理攻击 (Adversarial & Physical Attacks)

后门与木马 (Backdoor & Trojan)

其他攻击 (Other Attacks)

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computing)

后门检测与防御 (Backdoor Detection & Defense)

模型遗忘与审查 (Model Unlearning & Censorship)

数据保护与内容审核 (Data Protection & Content Moderation)

通用防御与架构 (General Defense & Architecture)

漏洞/分析 (Vulnerabilities/Analysis)

CCS 2024

攻击 (Attacks)

隐私攻击 (Privacy Attacks)

对抗性与物理攻击 (Adversarial & Physical Attacks)

数据投毒与后门 (Data Poisoning & Backdoors)

模型与系统利用 (Model & System Exploits)

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computing)

内容审核与模型安全 (Content Moderation & Model Safety)

鲁棒性与检测 (Robustness & Detection)

数据保护与审计 (Data Protection & Auditing)

漏洞/分析 (Vulnerabilities/Analysis)

CCS 2025 first cycle

攻击 (Attacks)

隐私与数据提取 (Privacy & Data Extraction)

数据投毒与对抗性攻击 (Data Poisoning & Adversarial Attacks)

防御 (Defenses)

模型遗忘与隐私保护 (Unlearning & Privacy)

后门与恶意软件防御 (Backdoor & Malware Defense)

安全与鲁棒性系统 (Secure & Robust Systems)

漏洞/分析 (Vulnerabilities/Analysis)

CCS 2025 second cycle

Adversarial Attacks & Robustness (对抗性攻击与模型鲁棒性)

Privacy Attacks (隐私攻击)

Data Poisoning & Backdoor Attacks (数据投毒与后门攻击)

Prompt Injection & LLM Manipulation (提示注入与大模型操纵)

System, Software & Hardware Vulnerabilities (系统、软件与硬件漏洞)

Model/Data Integrity & Provenance Attacks (模型/数据完整性与溯源攻击)

Security Auditing, Benchmarking & Measurement (安全审计、基准与测量)

Comprehensive Defense Frameworks (综合性防御框架)

S&P 2025

攻击 (Attacks)

提示工程与越狱 (Prompt Engineering & Jailbreaking)

数据投毒与后门 (Data Poisoning & Backdoors)

模型与数据窃取 (Model & Data Extraction)

其他攻击 (Other Attacks)