2025调研


AI声明:筛选、整理过程存在AI辅助,谨慎使用

USENIX 2024

错误注入和鲁棒性

  • DNN-GP: Diagnosing and Mitigating Model’s Faults Using Latent Concepts.
  • Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection.
  • Tossing in the Dark: Practical Bit-Flipping on Gray-box Deep Neural Networks for Runtime Trojan Injection.
  • Forget and Rewire: Enhancing the Resilience of Transformer-based Models against Bit-Flip Attacks.

    大模型攻击与防御

  • An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection.
  • REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models. 水印
  • Formalizing and Benchmarking Prompt Injection Attacks and Defenses.
  • Instruction Backdoor Attacks Against Customized LLMs.

    安全ML

  • AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE.
  • Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions.
  • OblivGNN: Oblivious Inference on Transductive and Inductive Graph Neural Network.
  • MD-ML: Super Fast Privacy-Preserving Machine Learning for Malicious Security with a Dishonest Majority.
  • Accelerating Secure Collaborative Machine Learning with Protocol-Aware RDMA.

    隐私推理

  • A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data.
  • Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models.
  • MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training.
  • Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks.
  • Property Existence Inference against Generative Models.
  • How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers.
  • Reconstructing training data from document understanding models.
  • Privacy Side Channels in Machine Learning Systems.
  • FaceObfuscator: Defending Deep Learning-based Privacy Attacks with Gradient Descent-resistant Features in Face Recognition.

    后门

  • Neural Network Semantic Backdoor Detection and Mitigation: A Causality-Based Approach.
  • On the Difficulty of Defending Contrastive Learning against Backdoor Attacks.
  • Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models.
  • Xplain: Analyzing Invisible Correlations in Model Explanation.
  • Verify your Labels! Trustworthy Predictions and Datasets via Confidence Scores.

    Digital Adversarial Attacks

  • More Simplicity for Trainers, More Opportunity for Attackers: Black-Box Attacks on Speaker Recognition Systems by Inferring Feature Extractor.
  • Transferability of White-box Perturbations: Query-Efficient Adversarial Attacks against Commercial DNN Services.
  • Adversarial Illusions in Multi-Modal Embeddings.
  • It Doesn’t Look Like Anything to Me: Using Diffusion Model to Subvert Visual Phishing Detectors.
  • Invisibility Cloak: Proactive Defense Against Visual Game Cheating.

    对抗攻防

  • Correction-based Defense Against Adversarial Video Attacks via Discretization-Enhanced Video Compressive Sensing.
  • Rethinking the Invisible Protection against Unauthorized Image Usage in Stable Diffusion.
  • Splitting the Difference on Adversarial Training.
  • Machine Learning needs Better Randomness Standards: Randomised Smoothing and PRNG-based attacks.
  • PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses.

    评估和最好的实践

  • SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

    后门

  • UBA-Inf: Unlearning Activated Backdoor Attack with Influence-Driven Camouflage

    越狱

  • LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks.
  • Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models.
  • Malla: Demystifying Real-world Large Language Model Integrated Malicious Services.
  • Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction.

    模型萃取和水印

  • SoK: All You Need to Know About On-Device ML Model Extraction - The Gap Between Research and Practice.
  • Unveiling the Secrets without Data: Can Graph Neural Networks Be Exploited through Data-Free Model Extraction Attacks?
  • ClearStamp: A Human-Visible and Robust Model-Ownership Proof based on Transposed Model Training.
  • DeepEclipse: How to Break White-Box DNN-Watermarking Schemes.
  • ModelGuard: Information-Theoretic Defense Against Model Extraction Attacks.

大模型滥用

  • Moderating Illicit Online Image Promotion for Unsafe User Generated Content Games Using Large Vision-Language Models.
  • Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text.
  • Prompt Stealing Attacks Against Text-to-Image Generation Models.
  • Quantifying Privacy Risks of Prompts in Visual Prompt Learning.

    安全分析

  • Hijacking Attacks against Neural Network by Analyzing Training Data.
  • False Claims against Model Ownership Resolution.
  • Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications.
  • Information Flow Control in Machine Learning through Modular Model Architecture.

    物理对抗攻击

  • Devil in the Room: Triggering Audio Backdoors in the Physical World.
  • FraudWhistler: A Resilient, Robust and Plug-and-play Adversarial Example Detection Method for Speaker Recognition.
  • pi-Jack: Physical-World Adversarial Attack on Monocular Depth Estimation with Perspective Hijacking.
  • AE-Morpher: Improve Physical Robustness of Adversarial Objects against LiDAR-based Detectors via Object Reconstruction.

用户研究

  • “I Don’t Know If We’re Doing Good. I Don’t Know If We’re Doing Bad”: Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products.
  • Towards More Practical Threat Models in Artificial Intelligence Security.

USENIX 2025 cycle1

攻击 (Attacks)

越狱与提示工程 (Jailbreaking & Prompt Engineering)

  • PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
  • PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
  • On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
  • Exposing the Guardrails: Reverse-Engineering and Jailbreaking Safety Filters in DALL·E Text-to-Image Pipelines
  • Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
  • Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
  • Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
  • Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators

数据投毒与后门 (Data Poisoning & Backdoors)

  • PoiSAFL: Scalable Poisoning Attack Framework to Byzantine-resilient Semi-asynchronous Federated Learning
  • Persistent Backdoor Attacks in Continual Learning
  • From Purity to Peril: Backdooring Merged Models From “Harmless” Benign Components

成员与属性推理 (Membership & Attribute Inference)

  • Enhanced Label-Only Membership Inference Attacks with Fewer Queries
  • Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses
  • Membership Inference Attacks Against Vision-Language Models
  • Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models

对抗性与物理攻击 (Adversarial & Physical Attacks)

  • Fighting Fire with Fire: Continuous Attack for Adversarial Android Malware Detection
  • Atkscopes: Multiresolution Adversarial Perturbation as a Unified Attack on Perceptual Hashing and Beyond
  • Invisible but Detected: Physical Adversarial Shadow Attack and Defense on LiDAR Object Detection

系统与硬件漏洞利用 (System & Hardware Exploits)

  • NeuroScope: Reverse Engineering Deep Neural Network on Edge Devices using Dynamic Analysis
  • BarraCUDA: Edge GPUs do Leak DNN Weights
  • Not so Refreshing: Attacking GPUs using RFM Rowhammer Mitigation
  • Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI
  • Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
  • When Translators Refuse to Translate: A Novel Attack to Speech Translation Systems
  • Chimera: Creating Digitally Signed Fake Photos by Fooling Image Recapture and Deepfake Detectors

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computation)

  • DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client Momentum
  • LOHEN: Layer-wise Optimizations for Neural Network Inferences over Encrypted Data with High Performance or Accuracy
  • Task-Oriented Training Data Privacy Protection for Cloud-based Model Training
  • Arbitrary-Threshold Fully Homomorphic Encryption with Lower Complexity
  • zkGPT: An Efficient Non-interactive Zero-knowledge Proof Framework for LLM Inference
  • Distributed Private Aggregation in Graph Neural Networks
  • Phantom: Privacy-Preserving Deep Neural Network Model Obfuscation in Heterogeneous TEE and GPU System

鲁棒性与认证 (Robustness & Certification)

  • Robustifying ML-powered Network Classifiers with PANTS
  • AGNNCert: Defending Graph Neural Networks against Arbitrary Perturbations with Deterministic Certification
  • CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization
  • CertPHash: Towards Certified Perceptual Hashing via Robust Training

越狱与提示注入防御 (Jailbreak & Prompt Injection Defense)

  • StruQ: Defending Against Prompt Injection with Structured Queries
  • JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
  • SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

水印与知识产权保护 (Watermarking & IP Protection)

  • THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models
  • AudioMarkNet: Audio Watermarking for Deepfake Speech Detection
  • Provably Robust Multi-bit Watermarking for AI-generated Text
  • LLMmap: Fingerprinting for Large Language Models
  • LightShed: Defeating Perturbation-based Image Copyright Protections

后门及通用防御 (Backdoor & General Defense)

  • Dormant: Defending against Pose-driven Human Image Animation
  • SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
  • Pretender: Universal Active Defense against Diffusion Finetuning Attacks
  • Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
  • DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data

漏洞/分析 (Vulnerabilities/Analysis)

  • Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification
  • The Ghost Navigator: Revisiting the Hidden Vulnerability of Localization in Autonomous Driving
  • Revisiting Training-Inference Trigger Intensity in Backdoor Attacks
  • Evaluating LLM-based Personal Information Extraction and Countermeasures
  • Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models
  • SoK: On Gradient Leakage in Federated Learning
  • VoiceWukong: Benchmarking Deepfake Voice Detection
  • Analyzing the AI Nudification Application Ecosystem
  • We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
  • When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs
  • HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
  • Watch the Watchers! On the Security Risks of Robustness-Enhancing Diffusion Models
  • From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models
  • Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
  • NOKEScam: Understanding and Rectifying Non-Sense Keywords Spear Scam in Search Engines

NDSS 2025

攻击 (Attacks)

成员推理与数据重建 (Membership Inference & Data Reconstruction)

  • A Method to Facilitate Membership Inference Attacks in Deep Learning Models.
  • Black-box Membership Inference Attacks against Fine-tuned Diffusion Models.
  • Passive Inference Attacks on Split Learning via Adversarial Regularization.
  • RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Adversarial Data Manipulation.
  • Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction.
  • URVFL: Undetectable Data Reconstruction Attack on Vertical Federated Learning.

对抗性与物理攻击 (Adversarial & Physical Attacks)

  • AlphaDog: No-Box Camouflage Attacks via Alpha Channel Oversight.
  • Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems.
  • On the Realism of LiDAR Spoofing Attacks against Autonomous Driving Vehicle at High Speed and Long Distance.
  • PhantomLiDAR: Cross-modality Signal Injection Attacks against LiDAR.
  • Revisiting Physical-World Adversarial Attack on Traffic Sign Recognition: A Commercial Systems Perspective.
  • L-HAWK: A Controllable Physical Adversarial Patch Against a Long-Distance Target.

后门与木马 (Backdoor & Trojan)

  • The Philosopher’s Stone: Trojaning Plugins of Large Language Models.
  • LADDER: Multi-Objective Backdoor Attack via Evolutionary Algorithm.

其他攻击 (Other Attacks)

  • I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving.
  • Automated Mass Malware Factory: The Convergence of Piggybacking and Adversarial Example in Android Malicious Software Generation.

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computing)

  • BumbleBee: Secure Two-party Inference Framework for Large Transformers.
  • Diffence: Fencing Membership Privacy With Diffusion Models.
  • Secure Transformer Inference Made Non-interactive.
  • A New PPML Paradigm for Quantized Models.
  • Defending Against Membership Inference Attacks on Iteratively Pruned Deep Neural Networks.
  • DLBox: New Model Training Framework for Protecting Training Data.
  • MingledPie: A Cluster Mingling Approach for Mitigating Preference Profiling in CFL.
  • Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models.
  • SHAFT: Secure, Handy, Accurate and Fast Transformer Inference.
  • SIGuard: Guarding Secure Inference with Post Data Privacy.

后门检测与防御 (Backdoor Detection & Defense)

  • CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models.
  • BARBIE: Robust Backdoor Detection Based on Latent Separability.
  • DShield: Defending against Backdoor Attacks on Graph Neural Networks via Discrepancy Learning.
  • PBP: Post-training Backdoor Purification for Malware Classifiers.
  • SafeSplit: A Novel Defense Against Client-Side Backdoor Attacks in Split Learning.

模型遗忘与审查 (Model Unlearning & Censorship)

  • Reinforcement Unlearning.
  • THEMIS: Regulating Textual Inversion for Personalized Concept Censorship.
  • TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents.

数据保护与内容审核 (Data Protection & Content Moderation)

  • Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution.
  • GAP-Diff: Protecting JPEG-Compressed Images from Diffusion-based Facial Customization.
  • Provably Unlearnable Data Examples.
  • SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers.
  • Try to Poison My Deep Learning Data? Nowhere to Hide Your Trajectory Spectrum!

通用防御与架构 (General Defense & Architecture)

  • CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling.
  • ASGARD: Protecting On-Device Deep Neural Networks with Virtualization-Based Trusted Execution Environments.
  • BitShield: Defending Against Bit-Flip Attacks on DNN Executables.
  • Density Boosts Everything: A One-stop Strategy for Improving Performance, Robustness, and Sustainability of Malware Detectors.
  • IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems.
  • Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing.
  • Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Malware with Minimal Samples.

漏洞/分析 (Vulnerabilities/Analysis)

  • Compiled Models, Built-In Exploits: Uncovering Pervasive Bit-Flip Attack Surfaces in DNN Executables.
  • Revisiting EM-based Estimation for Locally Differentially Private Protocols.
  • Understanding Data Importance in Machine Learning Attacks: Does Valuable Data Pose Greater Harm?
  • Do We Really Need to Design New Byzantine-robust Aggregation Rules?
  • On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks.
  • Safety Misalignment Against Large Language Models.
  • Towards Understanding Unsafe Video Generation.

CCS 2024

攻击 (Attacks)

隐私攻击 (Privacy Attacks)

  • Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack.
  • QueryCheetah: Fast Automated Discovery of Attribute Inference Attacks Against Query-Based Systems.
  • Membership Inference Attacks Against In-Context Learning.
  • SeqMIA: Sequential-Metric Based Membership Inference Attack.
  • PLeak: Prompt Leaking Attacks against Large Language Model Applications.
  • Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks.
  • A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability.

对抗性与物理攻击 (Adversarial & Physical Attacks)

  • Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence.
  • Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems.
  • SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems.
  • The Invisible Polyjuice Potion: an Effective Physical Adversarial Attack against Face Recognition.
  • Manipulative Interference Attacks.

数据投毒与后门 (Data Poisoning & Backdoors)

  • Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning.
  • Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses.
  • Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols.
  • BadMerging: Backdoor Attacks Against Model Merging.
  • Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense.

模型与系统利用 (Model & System Exploits)

  • Inbox Invasion: Exploiting MIME Ambiguities to Evade Email Attachment Detectors.
  • Optimization-based Prompt Injection Attack to LLM-as-a-Judge.
  • Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data.
  • SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution.
  • “Modern problems require modern solutions”: Community-Developed Techniques for Online Exam Proctoring Evasion.
  • Not One Less: Exploring Interplay between User Profiles and Items in Untargeted Attacks against Federated Recommendation.
  • HyperTheft: Thieving Model Weights from TEE-Shielded Neural Networks via Ciphertext Side Channels.
  • DeepCache: Revisiting Cache Side-Channel Attacks in Deep Neural Networks Executables.

防御 (Defenses)

隐私保护与安全计算 (Privacy & Secure Computing)

  • Camel: Communication-Efficient and Maliciously Secure Federated Learning in the Shuffle Model of Differential Privacy.
  • S2NeRF: Privacy-preserving Training Framework for NeRF.
  • $DPM: $ Clustering Sensitive Data through Separation.
  • S-BDT: Distributed Differentially Private Boosted Decision Trees.
  • Cross-silo Federated Learning with Record-level Personalized Differential Privacy.
  • Membership Inference Attacks against Vision Transformers: Mosaic MixUp Training to the Defense.
  • Formal Privacy Proof of Data Encoding: The Possibility and Impossibility of Learnable Encryption.
  • Elephants Do Not Forget: Differential Privacy with State Continuity for Privacy Budget.
  • ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support.
  • Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy.
  • Securing Floating-Point Arithmetic for Noise Addition.
  • Rhombus: Fast Homomorphic Matrix-Vector Multiplication for Secure Two-Party Inference.
  • Byzantine-Robust Decentralized Federated Learning.
  • Sparrow: Space-Efficient zkSNARK for Data-Parallel Circuits and Applications to Zero-Knowledge Decision Trees.
  • AirGapAgent: Protecting Privacy-Conscious Conversational Agents.
  • CoGNN: Towards Secure and Efficient Collaborative Graph Learning.
  • Computationally Secure Aggregation and Private Information Retrieval in the Shuffle Model.
  • Zero-Knowledge Proofs of Training for Deep Neural Networks.
  • zkLLM: Zero Knowledge Proofs for Large Language Models.
  • Securely Training Decision Trees Efficiently.
  • Poster: End-to-End Privacy-Preserving Vertical Federated Learning using Private Cross-Organizational Data Collaboration.
  • Poster: Protection against Source Inference Attacks in Federated Learning using Unary Encoding and Shuffling.

内容审核与模型安全 (Content Moderation & Model Safety)

  • A Causal Explainable Guardrails for Large Language Models.
  • Legilimens: Practical and Unified Content Moderation for Large Language Model Services.
  • Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies.
  • PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs).
  • SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models.
  • Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code.

鲁棒性与检测 (Robustness & Detection)

  • Training Robust ML-based Raw-Binary Malware Detectors in Hours, not Months.
  • SpecGuard: Specification Aware Recovery for Robotic Autonomous Vehicles from Physical Attacks.
  • VisionGuard: Secure and Robust Visual Perception of Autonomous Vehicles in Practice.
  • PhyScout: Detecting Sensor Spoofing Attacks via Spatio-temporal Consistency.
  • Alchemy: Data-Free Adversarial Training.
  • I Don’t Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors.
  • PhySense: Defending Physically Realizable Attacks for Autonomous Systems via Consistency Reasoning.
  • Fisher Information guided Purification against Backdoor Attacks.
  • Mithridates: Auditing and Boosting Backdoor Resistance of Machine Learning Pipelines.
  • ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models.
  • Poster: AuditVotes: A Framework towards Deployable Certified Robustness for GNNs.
  • Towards Proactive Protection against Unauthorized Speech Synthesis.

数据保护与审计 (Data Protection & Auditing)

  • A General Framework for Data-Use Auditing of ML Models.
  • MaskPrint: Take the Initiative in Fingerprint Protection to Mitigate the Harm of Data Breach.
  • Dye4AI: Assuring Data Boundary on Generative AI Services.
  • TabularMark: Watermarking Tabular Datasets for Machine Learning.
  • Beowulf: Mitigating Model Extraction Attacks Via Reshaping Decision Regions.
  • ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach.
  • Pulsar: Secure Steganography for Diffusion Models.
  • Demo: FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation.
  • Catch Me if You Can: Detecting Unauthorized Data Use in Training Deep Learning Models.
  • Poster: Solving the Free-rider Problem in Bittensor.
  • Poster: Enhance Hardware Domain Specific Large Language Model with Reinforcement Learning for Resilience.

漏洞/分析 (Vulnerabilities/Analysis)

  • Evaluations of Machine Learning Privacy Defenses are Misleading.
  • The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks.
  • Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility.
  • “Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models.
  • Demystifying RCE Vulnerabilities in LLM-Integrated Apps.
  • Using AI Assistants in Software Development: A Qualitative Study on Security Practices and Concerns.
  • Analyzing Inference Privacy Risks Through Gradients In Machine Learning.
  • PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps.
  • Uncovering Gradient Inversion Risks in Practical Language Model Training.
  • Avara: A Uniform Evaluation System for Perceptibility Analysis Against Adversarial Object Evasion Attacks.
  • Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution.
  • Blind and Low-Vision Individuals’ Detection of Audio Deepfakes.
  • Privacy Analyses in Machine Learning.
  • Novel Privacy Attacks and Defenses Against Neural Networks.

CCS 2025 first cycle

攻击 (Attacks)

隐私与数据提取 (Privacy & Data Extraction)

  • Prompt Inference Attack on Distributed Large Language Model Inference Frameworks
  • Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
  • Differentiation-Based Extraction of Proprietary Data from Fine-tuned LLMs

数据投毒与对抗性攻击 (Data Poisoning & Adversarial Attacks)

  • Poisoning Attacks to Local Differential Privacy for Ranking Estimation
  • ControlLoc: Physical-World Hijacking Attack on Camera-based Perception in Autonomous Driving
  • On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling
  • One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
  • Busting the Paper Ballot: Voting Meets Adversarial Machine Learning

防御 (Defenses)

模型遗忘与隐私保护 (Unlearning & Privacy)

  • Split Unlearning
  • Rethinking Machine Unlearning in Image Generation Models
  • Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models
  • LZKSA: Lattice-based special zero-knowledge proofs for secure aggregation’s input verification
  • Prototype Surgery: Tailoring Neural Prototypes via Soft Labels for Efficient Machine Unlearning
  • Secure Noise Sampling for Differentially Private Collaborative Learning
  • Founding Zero-Knowledge Proof of Training on Optimum Vicinity
  • Gibbon: Faster Secure Two-party Training of Gradient Boosting Decision Tree

后门与恶意软件防御 (Backdoor & Malware Defense)

  • Combating Concept Drift with Explanatory Detection and Adaptation for Android Malware Classification
  • PoisonSpot: Precise Spotting of Clean-Label Backdoors via Fine-Grained Training Provenance Tracking
  • Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model
  • FilterFL: Knowledge Filtering-based Data-Free Backdoor Defense for Federated Learning

安全与鲁棒性系统 (Secure & Robust Systems)

  • TensorShield: Safeguarding On-Device Inference by Shieldin g Critical DNN Tensors with TEE
  • Sylva: Tailoring Personalized Adversarial Defense in Pre-trained Models via Collaborative Fine-tuning
  • RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models
  • SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
  • SecAlign: Defending Against Prompt Injection with Preference Optimization
  • A Practical and Secure Byzantine Robust Aggregator
  • DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

漏洞/分析 (Vulnerabilities/Analysis)

  • Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble
  • Towards Backdoor Stealthiness in Model Parameter Space
  • What’s Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift

CCS 2025 second cycle

Adversarial Attacks & Robustness (对抗性攻击与模型鲁棒性)

这类攻击通过对输入数据(如图像、文本、传感器信号)进行微小、人难以察觉的扰动,来欺骗模型做出错误的判断。

  • Adversarial Observations in Weather Forecasting: 探讨在气象预报中引入对抗性观测数据的影响 (攻击)。

  • Asymmetry Vulnerability and Physical Attacks on Online Map Construction for Autonomous Driving: 针对自动驾驶地图构建的物理世界攻击 (攻击)。

  • Evaluating the robustness of a production malware detection system to transferable adversarial attacks: 评估恶意软件检测系统对可迁移对抗性攻击的鲁棒性 (评估/攻击)。

  • Threat from Windshield: Vehicle Windows as Involuntary Attack Sources on Automotive Voice Assistants: 利用挡风玻璃作为媒介,对车载语音助手进行物理攻击 (攻击)。

  • Adversarially Robust Assembly Language Model for Packed Executables Detection: 针对加壳可执行文件的对抗性鲁棒语言模型 (防御)。

  • Exact Robustness Certification of k-Nearest Neighbors: 对k近邻算法提供可证明的鲁棒性保证 (防御)。

  • Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement: 对神经网络中由对抗性攻击等引起的缺陷进行可证明的修复 (防御)。

  • Towards Real-Time Defense Against Object-Based LiDAR Attacks in Autonomous Driving: 针对自动驾驶中激光雷达的实时攻击防御 (防御)。


Privacy Attacks (隐私攻击)

这类攻击旨在从模型或系统中窃取敏感信息,例如训练数据、用户查询或个人身份信息。常见类型包括成员推断、模型提取和侧信道攻击。

  • Peekaboo, I See Your Queries: Passive Attacks Against DSSE Via Intermittent Observations: 通过间歇性观察对动态对称可搜索加密方案进行被动式攻击,窃取查询信息 (攻击)。

  • DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation: 针对检索增强生成(RAG)模型的成员推断攻击,判断特定数据是否被用于训练 (攻击)。

  • Can Personal Health Information Be Secured in LLM? Privacy Attack and Defense in the Medical Domain: 探讨大语言模型在医疗领域的隐私攻击与防御 (攻击/防御)。

  • Timing Attacks on Differential Privacy are Practical: 证明了对差分隐私机制的计时攻击在实践中是可行的 (攻击)。

  • Byte by Byte: Unmasking Browser Fingerprinting at the Function Level using V8 Bytecode Transformers: 通过分析V8字节码来揭示和增强浏览器指纹识别技术 (攻击)。

  • MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs: 利用侧信道攻击窃取混合专家模型(MoE)中的用户隐私 (攻击)。

  • Safeguarding Graph Neural Networks against Topology Inference Attacks: 防御旨在推断图结构(如社交网络关系)的拓扑推断攻击 (防御)。

  • You Can’t Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors: 通过系统向量缓解大语言模型中的提示词泄露问题 (防御)。

  • Mosformer: Maliciously Secure Three-Party Inference Framework for Large Transformers: 一个保护隐私的安全多方计算框架,用于Transformer模型的推理 (防御)。

  • THOR: Secure Transformer Inference with Homomorphic Encryption: 使用同态加密技术实现安全的Transformer模型推理,保护数据机密性 (防御)。

  • PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization: 一种优化差分隐私深度学习效用的新方法 (防御)。

  • IOValve: Leakage-Free I/O Sandbox for Large-Scale Untrusted Data Processing: 为大规模不可信数据处理设计的无泄漏I/O沙箱 (防御)。

  • Zero-Knowledge AI Inference with High Precision


Data Poisoning & Backdoor Attacks (数据投毒与后门攻击)

这类攻击通过向训练数据中注入少量精心制作的“毒样本”,在模型中植入“后门”。模型在正常输入下表现正常,但在遇到包含特定触发器(trigger)的输入时,会产生攻击者预设的恶意行为。

  • VillainNet: Targeted Poisoning Attacks Against SuperNets Along the Accuracy-Latency Pareto Frontier: 针对超网(SuperNets)的精确投毒攻击 (攻击)。

  • The Phantom Menace in PET-Hardened Deep Learning Models: Invisible Configuration-Induced Attacks: 揭示了在参数高效微调(PET)模型中由配置引发的隐形攻击,类似于后门 (攻击)。

  • Cascading Adversarial Bias from Injection to Distillation in Language Models: 探讨对抗性偏见如何从注入阶段传播到模型蒸馏阶段,是一种偏见投毒 (攻击)。

  • On Hyperparameters and Backdoor-Resistance in Horizontal Federated Learning: 研究水平联邦学习中超参数对后门攻击抵抗性的影响 (评估/防御)。

  • Deep Learning from Imperfectly Labeled Malware Data: 研究在不完美标注的恶意软件数据上进行学习,这与投毒攻击场景相关 (评估/防御)。

  • Armadillo: Robust Single-Server Secure Aggregation for Federated Learning with Input Validation: 在联邦学习中抵抗投毒攻击的安全聚合协议 (防御)。

  • Sentry: Authenticating Machine Learning Artifacts on the Fly: 用于实时验证机器学习模型和数据真实性的框架,可抵御投毒和篡改 (防御)。


Prompt Injection & LLM Manipulation (提示注入与大模型操纵)

这类攻击主要针对基于大语言模型(LLM)的应用,特别是检索增强生成(RAG)系统。攻击者通过构造恶意提示词(Prompt)或污染外部知识库,来操纵模型的输出,使其泄露信息、产生有害内容或执行非预期任务。

  • FlippedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models: 对RAG系统进行黑盒攻击,以操纵其生成的观点和内容 (攻击)。

  • GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search: 探索基于密集嵌入的检索系统的漏洞,这种系统是RAG的核心 (攻击)。

  • ImportSnare: Directed ”Code Manual” Hijacking in Retrieval-Augmented Code Generation: 针对RAG代码生成系统的“代码手册”劫持攻击 (攻击)。

  • Here Comes The AI Worm: Preventing the Propagation of Adversarial Self-Replicating Prompts Within GenAI Ecosystems: 探讨可自我复制的对抗性提示(AI蠕虫)在生成式AI生态系统中的传播与防御 (攻击/防御)。

  • Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection: 通过知识注入的方式来保护RAG代码生成系统的安全 (防御)。


System, Software & Hardware Vulnerabilities (系统、软件与硬件漏洞)

这类研究关注的是机器学习系统所依赖的底层软件、硬件或网络协议中的安全漏洞,而不仅是模型本身。

  • PickleBall: Secure Deserialization of Pickle-based Machine Learning Models: 关注Python Pickle格式在加载ML模型时的反序列化漏洞及安全措施 (攻击/防御)。

  • Denial of Sequencing Attacks in Ethereum Layer 2 Rollups: 针对以太坊二层扩容方案(Rollups)的拒绝服务攻击 (攻击)。

  • Automatic Discovery of User-exploitable Architectural Security Vulnerabilities in Closed-Source RISC-V CPUs: 自动发现闭源RISC-V处理器中的体系结构安全漏洞 (攻击/工具)。

  • Styled to Steal: The Overlooked Attack Surface in Email Clients: 揭示了电子邮件客户端中被忽视的攻击面 (攻击)。

  • Chekhov’s Gun: Uncovering Hidden Risks in macOS Application-Sandboxed PID-Domain Services: 发现macOS沙箱服务中的隐藏安全风险 (攻击)。

  • Deep Dive into In-app Browsers: Uncovering Hidden Pitfalls in Certificate Validation: 揭露应用内浏览器在证书验证方面的安全隐患 (攻击)。

  • Hardening Deep Neural Network Binaries against Reverse Engineering Attacks: 强化深度学习模型二进制文件以抵抗逆向工程攻击 (防御)。

  • CITesting: Systematic Testing of Context Integrity Violations in Cellular Core Networks: 对蜂窝网络核心网中上下文完整性破坏漏洞进行系统性测试 (测试/评估)。


Model/Data Integrity & Provenance Attacks (模型/数据完整性与溯源攻击)

这类攻击的目标是破坏用于验证模型或数据来源的机制,例如数字水印。

  • Removal Attack and Defense on AI Generated Content Latent-based Watermarking: 针对AIGC内容中基于潜在空间的水印的移除攻击与防御 (攻击/防御)。

  • PreferCare: Preference Dataset Copyright Protection in LLM Alignment by Watermark Injection and Verification: 通过水印保护用于LLM对齐的偏好数据集的版权 (防御)。


Security Auditing, Benchmarking & Measurement (安全审计、基准与测量)

这些论文不一定提出新的攻击或防御方法,而是专注于开发工具、基准(Benchmark)或进行大规模测量,以评估和理解现有系统的安全状况。

  • What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale: 对大规模共享的扩散模型进行概念审计,以发现潜在风险 (审计)。

  • The Odyssey of robots.txt Governance: Measuring Convention Implications of Web Bots in Large Language Model Services: 测量robots.txt协议对大模型网络爬虫的影响,评估其治理现状 (测量)。

  • UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images: 为图像安全分类器提供一个包含真实和AI生成图像的基准测试集 (基准)。

  • YouthSafe: A Youth-Centric Safety Benchmark and Safeguard Model for Large Language Models: 面向青少年的大模型安全基准和保护模型 (基准/防御)。

  • Automatically Detecting Online Deceptive Patterns: 自动检测网络中的欺骗性模式(如暗模式) (工具/检测)。

  • OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs: 利用审计日志和LLM来重构高级持续性威胁(APT)的攻击链 (工具/检测)。

  • Accountable Liveness


Comprehensive Defense Frameworks (综合性防御框架)

这类工作提供端到端或系统性的安全框架,旨在防御多种类型的攻击。

  • AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents: 为操作计算机的AI智能体(Agent)设计的端到端实时安全防御框架 (防御)。

S&P 2025

攻击 (Attacks)

提示工程与越狱 (Prompt Engineering & Jailbreaking)

  • Modifier Unlocked: Jailbreaking Text-to-Image Models Through Prompts.
  • Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-to-Image Generation Models.
  • On the Effectiveness of Prompt Stealing Attacks on In-the-Wild Prompts.
  • Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-Based Prompt Injection Attacks via the Fine-Tuning Interface.
  • Prompt Inversion Attack Against Collaborative Inference of Large Language Models.

数据投毒与后门 (Data Poisoning & Backdoors)

  • Preference Poisoning Attacks on Reward Model Learning.
  • Architectural Neural Backdoors from First Principles.
  • Practical Poisoning Attacks with Limited Byzantine Clients in Clustered Federated Learning.

模型与数据窃取 (Model & Data Extraction)

  • Codebreaker: Dynamic Extraction Attacks on Code Language Models.
  • Rigging the Foundation: Manipulating Pre-training for Advanced Membership Inference Attacks.
  • UnMarker: A Universal Attack on Defensive Image Watermarking.
  • CipherSteal: Stealing Input Data from TEE-Shielded Neural Networks with Ciphertext Side Channels.

其他攻击 (Other Attacks)

  • My Model is Malware to You: Transforming AI Models into Malware by Abusing TensorFlow APIs.
  • Make a Feint to the East While Attacking in the West: Blinding LLM-Based Code Auditors with Flashboom Attacks.
  • The Inadequacy of Similarity-Based Privacy Metrics: Privacy Attacks Against “Truly Anonymous” Synthetic Datasets.
  • EvilHarmony: Stealthy Adversarial Attacks Against Black-Box Speech Recognition Systems.
  • Investigating Physical Latency Attacks Against Camera-Based Perception.

防御 (Defenses)

后门与攻击检测 (Backdoor & Attack Detection)

  • Secure Transfer Learning: Training Clean Model Against Backdoor in Pre-Trained Encoder and Downstream Dataset.
  • Query Provenance Analysis: Efficient and Robust Defense Against Query-Based Black-Box Attacks.
  • BAIT: Large Language Model Backdoor Scanning by Inverting Attack Target.
  • PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning.
  • DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks.
  • Lombard-VLD: Voice Liveness Detection Based on Human Auditory Feedback.

隐私保护与安全计算 (Privacy & Secure Computing)

  • GRID: Protecting Training Graph from Link Stealing Attacks on GNN Models.
  • SHARK: Actively Secure Inference Using Function Secret Sharing.
  • Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity.
  • FairZK: A Scalable System to Prove Machine Learning Fairness in Zero-Knowledge.
  • PAC-Private Algorithms.
  • An Attack-Agnostic Defense Framework Against Manipulation Attacks Under Local Differential Privacy.
  • From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis.

鲁棒性与对抗防御 (Robustness & Adversarial Defense)

  • TSQP: Safeguarding Real-Time Inference for Quantization Neural Networks on Edge Devices.
  • Fight Fire with Fire: Combating Adversarial Patch Attacks using Pattern-randomized Defensive Patches.
  • Adversarial Robust ViT-Based Automatic Modulation Recognition in Practical Deep Learning-Based Wireless Systems.
  • EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations.
  • Spoofing Eavesdroppers with Audio Misinformation.

通用防御与审计 (General Defense & Auditing)

  • Edge Unlearning is Not “on Edge”! an Adaptive Exact Unlearning System on Resource-Constrained Devices.
  • Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models.
  • Watermarking Language Models for Many Adaptive Users.
  • Guardain: Protecting Emerging Generative AI Workloads on Heterogeneous NPU.

漏洞/分析 (Vulnerabilities/Analysis)

  • Understanding Users’ Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms.
  • On the (In)Security of LLM App Stores.
  • SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers.
  • On the Conflict Between Robustness and Learning in Collaborative Machine Learning.
  • Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning.
  • SoK: Watermarking for AI-Generated Content.
  • From One Stolen Utterance: Assessing the Risks of Voice Cloning in the AIGC Era.

文章作者: wangxh
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 wangxh !
  目录