AI声明:筛选、整理过程存在AI辅助,谨慎使用
USENIX 2024
错误注入和鲁棒性
- DNN-GP: Diagnosing and Mitigating Model’s Faults Using Latent Concepts.
- Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection.
- Tossing in the Dark: Practical Bit-Flipping on Gray-box Deep Neural Networks for Runtime Trojan Injection.
- Forget and Rewire: Enhancing the Resilience of Transformer-based Models against Bit-Flip Attacks.
大模型攻击与防御
- An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection.
- REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models. 水印
- Formalizing and Benchmarking Prompt Injection Attacks and Defenses.
- Instruction Backdoor Attacks Against Customized LLMs.
安全ML
- AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE.
- Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions.
- OblivGNN: Oblivious Inference on Transductive and Inductive Graph Neural Network.
- MD-ML: Super Fast Privacy-Preserving Machine Learning for Malicious Security with a Dishonest Majority.
- Accelerating Secure Collaborative Machine Learning with Protocol-Aware RDMA.
隐私推理
- A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data.
- Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models.
- MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training.
- Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks.
- Property Existence Inference against Generative Models.
- How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers.
- Reconstructing training data from document understanding models.
- Privacy Side Channels in Machine Learning Systems.
- FaceObfuscator: Defending Deep Learning-based Privacy Attacks with Gradient Descent-resistant Features in Face Recognition.
后门
- Neural Network Semantic Backdoor Detection and Mitigation: A Causality-Based Approach.
- On the Difficulty of Defending Contrastive Learning against Backdoor Attacks.
- Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models.
- Xplain: Analyzing Invisible Correlations in Model Explanation.
- Verify your Labels! Trustworthy Predictions and Datasets via Confidence Scores.
Digital Adversarial Attacks
- More Simplicity for Trainers, More Opportunity for Attackers: Black-Box Attacks on Speaker Recognition Systems by Inferring Feature Extractor.
- Transferability of White-box Perturbations: Query-Efficient Adversarial Attacks against Commercial DNN Services.
- Adversarial Illusions in Multi-Modal Embeddings.
- It Doesn’t Look Like Anything to Me: Using Diffusion Model to Subvert Visual Phishing Detectors.
- Invisibility Cloak: Proactive Defense Against Visual Game Cheating.
对抗攻防
- Correction-based Defense Against Adversarial Video Attacks via Discretization-Enhanced Video Compressive Sensing.
- Rethinking the Invisible Protection against Unauthorized Image Usage in Stable Diffusion.
- Splitting the Difference on Adversarial Training.
- Machine Learning needs Better Randomness Standards: Randomised Smoothing and PRNG-based attacks.
- PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses.
评估和最好的实践
- SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models
后门
- UBA-Inf: Unlearning Activated Backdoor Attack with Influence-Driven Camouflage
越狱
- LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks.
- Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models.
- Malla: Demystifying Real-world Large Language Model Integrated Malicious Services.
- Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction.
模型萃取和水印
- SoK: All You Need to Know About On-Device ML Model Extraction - The Gap Between Research and Practice.
- Unveiling the Secrets without Data: Can Graph Neural Networks Be Exploited through Data-Free Model Extraction Attacks?
- ClearStamp: A Human-Visible and Robust Model-Ownership Proof based on Transposed Model Training.
- DeepEclipse: How to Break White-Box DNN-Watermarking Schemes.
- ModelGuard: Information-Theoretic Defense Against Model Extraction Attacks.
大模型滥用
- Moderating Illicit Online Image Promotion for Unsafe User Generated Content Games Using Large Vision-Language Models.
- Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text.
- Prompt Stealing Attacks Against Text-to-Image Generation Models.
- Quantifying Privacy Risks of Prompts in Visual Prompt Learning.
安全分析
- Hijacking Attacks against Neural Network by Analyzing Training Data.
- False Claims against Model Ownership Resolution.
- Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications.
- Information Flow Control in Machine Learning through Modular Model Architecture.
物理对抗攻击
- Devil in the Room: Triggering Audio Backdoors in the Physical World.
- FraudWhistler: A Resilient, Robust and Plug-and-play Adversarial Example Detection Method for Speaker Recognition.
- pi-Jack: Physical-World Adversarial Attack on Monocular Depth Estimation with Perspective Hijacking.
- AE-Morpher: Improve Physical Robustness of Adversarial Objects against LiDAR-based Detectors via Object Reconstruction.
用户研究
- “I Don’t Know If We’re Doing Good. I Don’t Know If We’re Doing Bad”: Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products.
- Towards More Practical Threat Models in Artificial Intelligence Security.
USENIX 2025 cycle1
攻击 (Attacks)
越狱与提示工程 (Jailbreaking & Prompt Engineering)
- PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
- PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
- On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
- Exposing the Guardrails: Reverse-Engineering and Jailbreaking Safety Filters in DALL·E Text-to-Image Pipelines
- Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
- Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
- Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
- Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators
数据投毒与后门 (Data Poisoning & Backdoors)
- PoiSAFL: Scalable Poisoning Attack Framework to Byzantine-resilient Semi-asynchronous Federated Learning
- Persistent Backdoor Attacks in Continual Learning
- From Purity to Peril: Backdooring Merged Models From “Harmless” Benign Components
成员与属性推理 (Membership & Attribute Inference)
- Enhanced Label-Only Membership Inference Attacks with Fewer Queries
- Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses
- Membership Inference Attacks Against Vision-Language Models
- Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models
对抗性与物理攻击 (Adversarial & Physical Attacks)
- Fighting Fire with Fire: Continuous Attack for Adversarial Android Malware Detection
- Atkscopes: Multiresolution Adversarial Perturbation as a Unified Attack on Perceptual Hashing and Beyond
- Invisible but Detected: Physical Adversarial Shadow Attack and Defense on LiDAR Object Detection
系统与硬件漏洞利用 (System & Hardware Exploits)
- NeuroScope: Reverse Engineering Deep Neural Network on Edge Devices using Dynamic Analysis
- BarraCUDA: Edge GPUs do Leak DNN Weights
- Not so Refreshing: Attacking GPUs using RFM Rowhammer Mitigation
- Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI
- Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
- When Translators Refuse to Translate: A Novel Attack to Speech Translation Systems
- Chimera: Creating Digitally Signed Fake Photos by Fooling Image Recapture and Deepfake Detectors
防御 (Defenses)
隐私保护与安全计算 (Privacy & Secure Computation)
- DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client Momentum
- LOHEN: Layer-wise Optimizations for Neural Network Inferences over Encrypted Data with High Performance or Accuracy
- Task-Oriented Training Data Privacy Protection for Cloud-based Model Training
- Arbitrary-Threshold Fully Homomorphic Encryption with Lower Complexity
- zkGPT: An Efficient Non-interactive Zero-knowledge Proof Framework for LLM Inference
- Distributed Private Aggregation in Graph Neural Networks
- Phantom: Privacy-Preserving Deep Neural Network Model Obfuscation in Heterogeneous TEE and GPU System
鲁棒性与认证 (Robustness & Certification)
- Robustifying ML-powered Network Classifiers with PANTS
- AGNNCert: Defending Graph Neural Networks against Arbitrary Perturbations with Deterministic Certification
- CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization
- CertPHash: Towards Certified Perceptual Hashing via Robust Training
越狱与提示注入防御 (Jailbreak & Prompt Injection Defense)
- StruQ: Defending Against Prompt Injection with Structured Queries
- JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
- SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
水印与知识产权保护 (Watermarking & IP Protection)
- THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models
- AudioMarkNet: Audio Watermarking for Deepfake Speech Detection
- Provably Robust Multi-bit Watermarking for AI-generated Text
- LLMmap: Fingerprinting for Large Language Models
- LightShed: Defeating Perturbation-based Image Copyright Protections
后门及通用防御 (Backdoor & General Defense)
- Dormant: Defending against Pose-driven Human Image Animation
- SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
- Pretender: Universal Active Defense against Diffusion Finetuning Attacks
- Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
- DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data
漏洞/分析 (Vulnerabilities/Analysis)
- Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification
- The Ghost Navigator: Revisiting the Hidden Vulnerability of Localization in Autonomous Driving
- Revisiting Training-Inference Trigger Intensity in Backdoor Attacks
- Evaluating LLM-based Personal Information Extraction and Countermeasures
- Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models
- SoK: On Gradient Leakage in Federated Learning
- VoiceWukong: Benchmarking Deepfake Voice Detection
- Analyzing the AI Nudification Application Ecosystem
- We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
- When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs
- HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
- Watch the Watchers! On the Security Risks of Robustness-Enhancing Diffusion Models
- From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models
- Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
- NOKEScam: Understanding and Rectifying Non-Sense Keywords Spear Scam in Search Engines
NDSS 2025
攻击 (Attacks)
成员推理与数据重建 (Membership Inference & Data Reconstruction)
- A Method to Facilitate Membership Inference Attacks in Deep Learning Models.
- Black-box Membership Inference Attacks against Fine-tuned Diffusion Models.
- Passive Inference Attacks on Split Learning via Adversarial Regularization.
- RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Adversarial Data Manipulation.
- Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction.
- URVFL: Undetectable Data Reconstruction Attack on Vertical Federated Learning.
对抗性与物理攻击 (Adversarial & Physical Attacks)
- AlphaDog: No-Box Camouflage Attacks via Alpha Channel Oversight.
- Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems.
- On the Realism of LiDAR Spoofing Attacks against Autonomous Driving Vehicle at High Speed and Long Distance.
- PhantomLiDAR: Cross-modality Signal Injection Attacks against LiDAR.
- Revisiting Physical-World Adversarial Attack on Traffic Sign Recognition: A Commercial Systems Perspective.
- L-HAWK: A Controllable Physical Adversarial Patch Against a Long-Distance Target.
后门与木马 (Backdoor & Trojan)
- The Philosopher’s Stone: Trojaning Plugins of Large Language Models.
- LADDER: Multi-Objective Backdoor Attack via Evolutionary Algorithm.
其他攻击 (Other Attacks)
- I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving.
- Automated Mass Malware Factory: The Convergence of Piggybacking and Adversarial Example in Android Malicious Software Generation.
防御 (Defenses)
隐私保护与安全计算 (Privacy & Secure Computing)
- BumbleBee: Secure Two-party Inference Framework for Large Transformers.
- Diffence: Fencing Membership Privacy With Diffusion Models.
- Secure Transformer Inference Made Non-interactive.
- A New PPML Paradigm for Quantized Models.
- Defending Against Membership Inference Attacks on Iteratively Pruned Deep Neural Networks.
- DLBox: New Model Training Framework for Protecting Training Data.
- MingledPie: A Cluster Mingling Approach for Mitigating Preference Profiling in CFL.
- Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models.
- SHAFT: Secure, Handy, Accurate and Fast Transformer Inference.
- SIGuard: Guarding Secure Inference with Post Data Privacy.
后门检测与防御 (Backdoor Detection & Defense)
- CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models.
- BARBIE: Robust Backdoor Detection Based on Latent Separability.
- DShield: Defending against Backdoor Attacks on Graph Neural Networks via Discrepancy Learning.
- PBP: Post-training Backdoor Purification for Malware Classifiers.
- SafeSplit: A Novel Defense Against Client-Side Backdoor Attacks in Split Learning.
模型遗忘与审查 (Model Unlearning & Censorship)
- Reinforcement Unlearning.
- THEMIS: Regulating Textual Inversion for Personalized Concept Censorship.
- TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents.
数据保护与内容审核 (Data Protection & Content Moderation)
- Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution.
- GAP-Diff: Protecting JPEG-Compressed Images from Diffusion-based Facial Customization.
- Provably Unlearnable Data Examples.
- SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers.
- Try to Poison My Deep Learning Data? Nowhere to Hide Your Trajectory Spectrum!
通用防御与架构 (General Defense & Architecture)
- CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling.
- ASGARD: Protecting On-Device Deep Neural Networks with Virtualization-Based Trusted Execution Environments.
- BitShield: Defending Against Bit-Flip Attacks on DNN Executables.
- Density Boosts Everything: A One-stop Strategy for Improving Performance, Robustness, and Sustainability of Malware Detectors.
- IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems.
- Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing.
- Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Malware with Minimal Samples.
漏洞/分析 (Vulnerabilities/Analysis)
- Compiled Models, Built-In Exploits: Uncovering Pervasive Bit-Flip Attack Surfaces in DNN Executables.
- Revisiting EM-based Estimation for Locally Differentially Private Protocols.
- Understanding Data Importance in Machine Learning Attacks: Does Valuable Data Pose Greater Harm?
- Do We Really Need to Design New Byzantine-robust Aggregation Rules?
- On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks.
- Safety Misalignment Against Large Language Models.
- Towards Understanding Unsafe Video Generation.
CCS 2024
攻击 (Attacks)
隐私攻击 (Privacy Attacks)
- Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack.
- QueryCheetah: Fast Automated Discovery of Attribute Inference Attacks Against Query-Based Systems.
- Membership Inference Attacks Against In-Context Learning.
- SeqMIA: Sequential-Metric Based Membership Inference Attack.
- PLeak: Prompt Leaking Attacks against Large Language Model Applications.
- Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks.
- A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability.
对抗性与物理攻击 (Adversarial & Physical Attacks)
- Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence.
- Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems.
- SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems.
- The Invisible Polyjuice Potion: an Effective Physical Adversarial Attack against Face Recognition.
- Manipulative Interference Attacks.
数据投毒与后门 (Data Poisoning & Backdoors)
- Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning.
- Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses.
- Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols.
- BadMerging: Backdoor Attacks Against Model Merging.
- Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense.
模型与系统利用 (Model & System Exploits)
- Inbox Invasion: Exploiting MIME Ambiguities to Evade Email Attachment Detectors.
- Optimization-based Prompt Injection Attack to LLM-as-a-Judge.
- Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data.
- SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution.
- “Modern problems require modern solutions”: Community-Developed Techniques for Online Exam Proctoring Evasion.
- Not One Less: Exploring Interplay between User Profiles and Items in Untargeted Attacks against Federated Recommendation.
- HyperTheft: Thieving Model Weights from TEE-Shielded Neural Networks via Ciphertext Side Channels.
- DeepCache: Revisiting Cache Side-Channel Attacks in Deep Neural Networks Executables.
防御 (Defenses)
隐私保护与安全计算 (Privacy & Secure Computing)
- Camel: Communication-Efficient and Maliciously Secure Federated Learning in the Shuffle Model of Differential Privacy.
- S2NeRF: Privacy-preserving Training Framework for NeRF.
- $DPM: $ Clustering Sensitive Data through Separation.
- S-BDT: Distributed Differentially Private Boosted Decision Trees.
- Cross-silo Federated Learning with Record-level Personalized Differential Privacy.
- Membership Inference Attacks against Vision Transformers: Mosaic MixUp Training to the Defense.
- Formal Privacy Proof of Data Encoding: The Possibility and Impossibility of Learnable Encryption.
- Elephants Do Not Forget: Differential Privacy with State Continuity for Privacy Budget.
- ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support.
- Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy.
- Securing Floating-Point Arithmetic for Noise Addition.
- Rhombus: Fast Homomorphic Matrix-Vector Multiplication for Secure Two-Party Inference.
- Byzantine-Robust Decentralized Federated Learning.
- Sparrow: Space-Efficient zkSNARK for Data-Parallel Circuits and Applications to Zero-Knowledge Decision Trees.
- AirGapAgent: Protecting Privacy-Conscious Conversational Agents.
- CoGNN: Towards Secure and Efficient Collaborative Graph Learning.
- Computationally Secure Aggregation and Private Information Retrieval in the Shuffle Model.
- Zero-Knowledge Proofs of Training for Deep Neural Networks.
- zkLLM: Zero Knowledge Proofs for Large Language Models.
- Securely Training Decision Trees Efficiently.
- Poster: End-to-End Privacy-Preserving Vertical Federated Learning using Private Cross-Organizational Data Collaboration.
- Poster: Protection against Source Inference Attacks in Federated Learning using Unary Encoding and Shuffling.
内容审核与模型安全 (Content Moderation & Model Safety)
- A Causal Explainable Guardrails for Large Language Models.
- Legilimens: Practical and Unified Content Moderation for Large Language Model Services.
- Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies.
- PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs).
- SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models.
- Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code.
鲁棒性与检测 (Robustness & Detection)
- Training Robust ML-based Raw-Binary Malware Detectors in Hours, not Months.
- SpecGuard: Specification Aware Recovery for Robotic Autonomous Vehicles from Physical Attacks.
- VisionGuard: Secure and Robust Visual Perception of Autonomous Vehicles in Practice.
- PhyScout: Detecting Sensor Spoofing Attacks via Spatio-temporal Consistency.
- Alchemy: Data-Free Adversarial Training.
- I Don’t Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors.
- PhySense: Defending Physically Realizable Attacks for Autonomous Systems via Consistency Reasoning.
- Fisher Information guided Purification against Backdoor Attacks.
- Mithridates: Auditing and Boosting Backdoor Resistance of Machine Learning Pipelines.
- ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models.
- Poster: AuditVotes: A Framework towards Deployable Certified Robustness for GNNs.
- Towards Proactive Protection against Unauthorized Speech Synthesis.
数据保护与审计 (Data Protection & Auditing)
- A General Framework for Data-Use Auditing of ML Models.
- MaskPrint: Take the Initiative in Fingerprint Protection to Mitigate the Harm of Data Breach.
- Dye4AI: Assuring Data Boundary on Generative AI Services.
- TabularMark: Watermarking Tabular Datasets for Machine Learning.
- Beowulf: Mitigating Model Extraction Attacks Via Reshaping Decision Regions.
- ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach.
- Pulsar: Secure Steganography for Diffusion Models.
- Demo: FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation.
- Catch Me if You Can: Detecting Unauthorized Data Use in Training Deep Learning Models.
- Poster: Solving the Free-rider Problem in Bittensor.
- Poster: Enhance Hardware Domain Specific Large Language Model with Reinforcement Learning for Resilience.
漏洞/分析 (Vulnerabilities/Analysis)
- Evaluations of Machine Learning Privacy Defenses are Misleading.
- The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks.
- Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility.
- “Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models.
- Demystifying RCE Vulnerabilities in LLM-Integrated Apps.
- Using AI Assistants in Software Development: A Qualitative Study on Security Practices and Concerns.
- Analyzing Inference Privacy Risks Through Gradients In Machine Learning.
- PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps.
- Uncovering Gradient Inversion Risks in Practical Language Model Training.
- Avara: A Uniform Evaluation System for Perceptibility Analysis Against Adversarial Object Evasion Attacks.
- Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution.
- Blind and Low-Vision Individuals’ Detection of Audio Deepfakes.
- Privacy Analyses in Machine Learning.
- Novel Privacy Attacks and Defenses Against Neural Networks.
CCS 2025 first cycle
攻击 (Attacks)
隐私与数据提取 (Privacy & Data Extraction)
- Prompt Inference Attack on Distributed Large Language Model Inference Frameworks
- Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
- Differentiation-Based Extraction of Proprietary Data from Fine-tuned LLMs
数据投毒与对抗性攻击 (Data Poisoning & Adversarial Attacks)
- Poisoning Attacks to Local Differential Privacy for Ranking Estimation
- ControlLoc: Physical-World Hijacking Attack on Camera-based Perception in Autonomous Driving
- On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling
- One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
- Busting the Paper Ballot: Voting Meets Adversarial Machine Learning
防御 (Defenses)
模型遗忘与隐私保护 (Unlearning & Privacy)
- Split Unlearning
- Rethinking Machine Unlearning in Image Generation Models
- Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models
- LZKSA: Lattice-based special zero-knowledge proofs for secure aggregation’s input verification
- Prototype Surgery: Tailoring Neural Prototypes via Soft Labels for Efficient Machine Unlearning
- Secure Noise Sampling for Differentially Private Collaborative Learning
- Founding Zero-Knowledge Proof of Training on Optimum Vicinity
- Gibbon: Faster Secure Two-party Training of Gradient Boosting Decision Tree
后门与恶意软件防御 (Backdoor & Malware Defense)
- Combating Concept Drift with Explanatory Detection and Adaptation for Android Malware Classification
- PoisonSpot: Precise Spotting of Clean-Label Backdoors via Fine-Grained Training Provenance Tracking
- Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model
- FilterFL: Knowledge Filtering-based Data-Free Backdoor Defense for Federated Learning
安全与鲁棒性系统 (Secure & Robust Systems)
- TensorShield: Safeguarding On-Device Inference by Shieldin g Critical DNN Tensors with TEE
- Sylva: Tailoring Personalized Adversarial Defense in Pre-trained Models via Collaborative Fine-tuning
- RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models
- SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
- SecAlign: Defending Against Prompt Injection with Preference Optimization
- A Practical and Secure Byzantine Robust Aggregator
- DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy
漏洞/分析 (Vulnerabilities/Analysis)
- Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble
- Towards Backdoor Stealthiness in Model Parameter Space
- What’s Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift
CCS 2025 second cycle
Adversarial Attacks & Robustness (对抗性攻击与模型鲁棒性)
这类攻击通过对输入数据(如图像、文本、传感器信号)进行微小、人难以察觉的扰动,来欺骗模型做出错误的判断。
Adversarial Observations in Weather Forecasting: 探讨在气象预报中引入对抗性观测数据的影响 (攻击)。
Asymmetry Vulnerability and Physical Attacks on Online Map Construction for Autonomous Driving: 针对自动驾驶地图构建的物理世界攻击 (攻击)。
Evaluating the robustness of a production malware detection system to transferable adversarial attacks: 评估恶意软件检测系统对可迁移对抗性攻击的鲁棒性 (评估/攻击)。
Threat from Windshield: Vehicle Windows as Involuntary Attack Sources on Automotive Voice Assistants: 利用挡风玻璃作为媒介,对车载语音助手进行物理攻击 (攻击)。
Adversarially Robust Assembly Language Model for Packed Executables Detection: 针对加壳可执行文件的对抗性鲁棒语言模型 (防御)。
Exact Robustness Certification of k-Nearest Neighbors: 对k近邻算法提供可证明的鲁棒性保证 (防御)。
Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement: 对神经网络中由对抗性攻击等引起的缺陷进行可证明的修复 (防御)。
Towards Real-Time Defense Against Object-Based LiDAR Attacks in Autonomous Driving: 针对自动驾驶中激光雷达的实时攻击防御 (防御)。
Privacy Attacks (隐私攻击)
这类攻击旨在从模型或系统中窃取敏感信息,例如训练数据、用户查询或个人身份信息。常见类型包括成员推断、模型提取和侧信道攻击。
Peekaboo, I See Your Queries: Passive Attacks Against DSSE Via Intermittent Observations: 通过间歇性观察对动态对称可搜索加密方案进行被动式攻击,窃取查询信息 (攻击)。
DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation: 针对检索增强生成(RAG)模型的成员推断攻击,判断特定数据是否被用于训练 (攻击)。
Can Personal Health Information Be Secured in LLM? Privacy Attack and Defense in the Medical Domain: 探讨大语言模型在医疗领域的隐私攻击与防御 (攻击/防御)。
Timing Attacks on Differential Privacy are Practical: 证明了对差分隐私机制的计时攻击在实践中是可行的 (攻击)。
Byte by Byte: Unmasking Browser Fingerprinting at the Function Level using V8 Bytecode Transformers: 通过分析V8字节码来揭示和增强浏览器指纹识别技术 (攻击)。
MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs: 利用侧信道攻击窃取混合专家模型(MoE)中的用户隐私 (攻击)。
Safeguarding Graph Neural Networks against Topology Inference Attacks: 防御旨在推断图结构(如社交网络关系)的拓扑推断攻击 (防御)。
You Can’t Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors: 通过系统向量缓解大语言模型中的提示词泄露问题 (防御)。
Mosformer: Maliciously Secure Three-Party Inference Framework for Large Transformers: 一个保护隐私的安全多方计算框架,用于Transformer模型的推理 (防御)。
THOR: Secure Transformer Inference with Homomorphic Encryption: 使用同态加密技术实现安全的Transformer模型推理,保护数据机密性 (防御)。
PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization: 一种优化差分隐私深度学习效用的新方法 (防御)。
IOValve: Leakage-Free I/O Sandbox for Large-Scale Untrusted Data Processing: 为大规模不可信数据处理设计的无泄漏I/O沙箱 (防御)。
Zero-Knowledge AI Inference with High Precision
Data Poisoning & Backdoor Attacks (数据投毒与后门攻击)
这类攻击通过向训练数据中注入少量精心制作的“毒样本”,在模型中植入“后门”。模型在正常输入下表现正常,但在遇到包含特定触发器(trigger)的输入时,会产生攻击者预设的恶意行为。
VillainNet: Targeted Poisoning Attacks Against SuperNets Along the Accuracy-Latency Pareto Frontier: 针对超网(SuperNets)的精确投毒攻击 (攻击)。
The Phantom Menace in PET-Hardened Deep Learning Models: Invisible Configuration-Induced Attacks: 揭示了在参数高效微调(PET)模型中由配置引发的隐形攻击,类似于后门 (攻击)。
Cascading Adversarial Bias from Injection to Distillation in Language Models: 探讨对抗性偏见如何从注入阶段传播到模型蒸馏阶段,是一种偏见投毒 (攻击)。
On Hyperparameters and Backdoor-Resistance in Horizontal Federated Learning: 研究水平联邦学习中超参数对后门攻击抵抗性的影响 (评估/防御)。
Deep Learning from Imperfectly Labeled Malware Data: 研究在不完美标注的恶意软件数据上进行学习,这与投毒攻击场景相关 (评估/防御)。
Armadillo: Robust Single-Server Secure Aggregation for Federated Learning with Input Validation: 在联邦学习中抵抗投毒攻击的安全聚合协议 (防御)。
Sentry: Authenticating Machine Learning Artifacts on the Fly: 用于实时验证机器学习模型和数据真实性的框架,可抵御投毒和篡改 (防御)。
Prompt Injection & LLM Manipulation (提示注入与大模型操纵)
这类攻击主要针对基于大语言模型(LLM)的应用,特别是检索增强生成(RAG)系统。攻击者通过构造恶意提示词(Prompt)或污染外部知识库,来操纵模型的输出,使其泄露信息、产生有害内容或执行非预期任务。
FlippedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models: 对RAG系统进行黑盒攻击,以操纵其生成的观点和内容 (攻击)。
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search: 探索基于密集嵌入的检索系统的漏洞,这种系统是RAG的核心 (攻击)。
ImportSnare: Directed ”Code Manual” Hijacking in Retrieval-Augmented Code Generation: 针对RAG代码生成系统的“代码手册”劫持攻击 (攻击)。
Here Comes The AI Worm: Preventing the Propagation of Adversarial Self-Replicating Prompts Within GenAI Ecosystems: 探讨可自我复制的对抗性提示(AI蠕虫)在生成式AI生态系统中的传播与防御 (攻击/防御)。
Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection: 通过知识注入的方式来保护RAG代码生成系统的安全 (防御)。
System, Software & Hardware Vulnerabilities (系统、软件与硬件漏洞)
这类研究关注的是机器学习系统所依赖的底层软件、硬件或网络协议中的安全漏洞,而不仅是模型本身。
PickleBall: Secure Deserialization of Pickle-based Machine Learning Models: 关注Python Pickle格式在加载ML模型时的反序列化漏洞及安全措施 (攻击/防御)。
Denial of Sequencing Attacks in Ethereum Layer 2 Rollups: 针对以太坊二层扩容方案(Rollups)的拒绝服务攻击 (攻击)。
Automatic Discovery of User-exploitable Architectural Security Vulnerabilities in Closed-Source RISC-V CPUs: 自动发现闭源RISC-V处理器中的体系结构安全漏洞 (攻击/工具)。
Styled to Steal: The Overlooked Attack Surface in Email Clients: 揭示了电子邮件客户端中被忽视的攻击面 (攻击)。
Chekhov’s Gun: Uncovering Hidden Risks in macOS Application-Sandboxed PID-Domain Services: 发现macOS沙箱服务中的隐藏安全风险 (攻击)。
Deep Dive into In-app Browsers: Uncovering Hidden Pitfalls in Certificate Validation: 揭露应用内浏览器在证书验证方面的安全隐患 (攻击)。
Hardening Deep Neural Network Binaries against Reverse Engineering Attacks: 强化深度学习模型二进制文件以抵抗逆向工程攻击 (防御)。
CITesting: Systematic Testing of Context Integrity Violations in Cellular Core Networks: 对蜂窝网络核心网中上下文完整性破坏漏洞进行系统性测试 (测试/评估)。
Model/Data Integrity & Provenance Attacks (模型/数据完整性与溯源攻击)
这类攻击的目标是破坏用于验证模型或数据来源的机制,例如数字水印。
Removal Attack and Defense on AI Generated Content Latent-based Watermarking: 针对AIGC内容中基于潜在空间的水印的移除攻击与防御 (攻击/防御)。
PreferCare: Preference Dataset Copyright Protection in LLM Alignment by Watermark Injection and Verification: 通过水印保护用于LLM对齐的偏好数据集的版权 (防御)。
Security Auditing, Benchmarking & Measurement (安全审计、基准与测量)
这些论文不一定提出新的攻击或防御方法,而是专注于开发工具、基准(Benchmark)或进行大规模测量,以评估和理解现有系统的安全状况。
What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale: 对大规模共享的扩散模型进行概念审计,以发现潜在风险 (审计)。
The Odyssey of robots.txt Governance: Measuring Convention Implications of Web Bots in Large Language Model Services: 测量
robots.txt协议对大模型网络爬虫的影响,评估其治理现状 (测量)。UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images: 为图像安全分类器提供一个包含真实和AI生成图像的基准测试集 (基准)。
YouthSafe: A Youth-Centric Safety Benchmark and Safeguard Model for Large Language Models: 面向青少年的大模型安全基准和保护模型 (基准/防御)。
Automatically Detecting Online Deceptive Patterns: 自动检测网络中的欺骗性模式(如暗模式) (工具/检测)。
OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs: 利用审计日志和LLM来重构高级持续性威胁(APT)的攻击链 (工具/检测)。
Accountable Liveness
Comprehensive Defense Frameworks (综合性防御框架)
这类工作提供端到端或系统性的安全框架,旨在防御多种类型的攻击。
- AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents: 为操作计算机的AI智能体(Agent)设计的端到端实时安全防御框架 (防御)。
S&P 2025
攻击 (Attacks)
提示工程与越狱 (Prompt Engineering & Jailbreaking)
- Modifier Unlocked: Jailbreaking Text-to-Image Models Through Prompts.
- Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-to-Image Generation Models.
- On the Effectiveness of Prompt Stealing Attacks on In-the-Wild Prompts.
- Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-Based Prompt Injection Attacks via the Fine-Tuning Interface.
- Prompt Inversion Attack Against Collaborative Inference of Large Language Models.
数据投毒与后门 (Data Poisoning & Backdoors)
- Preference Poisoning Attacks on Reward Model Learning.
- Architectural Neural Backdoors from First Principles.
- Practical Poisoning Attacks with Limited Byzantine Clients in Clustered Federated Learning.
模型与数据窃取 (Model & Data Extraction)
- Codebreaker: Dynamic Extraction Attacks on Code Language Models.
- Rigging the Foundation: Manipulating Pre-training for Advanced Membership Inference Attacks.
- UnMarker: A Universal Attack on Defensive Image Watermarking.
- CipherSteal: Stealing Input Data from TEE-Shielded Neural Networks with Ciphertext Side Channels.
其他攻击 (Other Attacks)
- My Model is Malware to You: Transforming AI Models into Malware by Abusing TensorFlow APIs.
- Make a Feint to the East While Attacking in the West: Blinding LLM-Based Code Auditors with Flashboom Attacks.
- The Inadequacy of Similarity-Based Privacy Metrics: Privacy Attacks Against “Truly Anonymous” Synthetic Datasets.
- EvilHarmony: Stealthy Adversarial Attacks Against Black-Box Speech Recognition Systems.
- Investigating Physical Latency Attacks Against Camera-Based Perception.
防御 (Defenses)
后门与攻击检测 (Backdoor & Attack Detection)
- Secure Transfer Learning: Training Clean Model Against Backdoor in Pre-Trained Encoder and Downstream Dataset.
- Query Provenance Analysis: Efficient and Robust Defense Against Query-Based Black-Box Attacks.
- BAIT: Large Language Model Backdoor Scanning by Inverting Attack Target.
- PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning.
- DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks.
- Lombard-VLD: Voice Liveness Detection Based on Human Auditory Feedback.
隐私保护与安全计算 (Privacy & Secure Computing)
- GRID: Protecting Training Graph from Link Stealing Attacks on GNN Models.
- SHARK: Actively Secure Inference Using Function Secret Sharing.
- Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity.
- FairZK: A Scalable System to Prove Machine Learning Fairness in Zero-Knowledge.
- PAC-Private Algorithms.
- An Attack-Agnostic Defense Framework Against Manipulation Attacks Under Local Differential Privacy.
- From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis.
鲁棒性与对抗防御 (Robustness & Adversarial Defense)
- TSQP: Safeguarding Real-Time Inference for Quantization Neural Networks on Edge Devices.
- Fight Fire with Fire: Combating Adversarial Patch Attacks using Pattern-randomized Defensive Patches.
- Adversarial Robust ViT-Based Automatic Modulation Recognition in Practical Deep Learning-Based Wireless Systems.
- EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations.
- Spoofing Eavesdroppers with Audio Misinformation.
通用防御与审计 (General Defense & Auditing)
- Edge Unlearning is Not “on Edge”! an Adaptive Exact Unlearning System on Resource-Constrained Devices.
- Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models.
- Watermarking Language Models for Many Adaptive Users.
- Guardain: Protecting Emerging Generative AI Workloads on Heterogeneous NPU.
漏洞/分析 (Vulnerabilities/Analysis)
- Understanding Users’ Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms.
- On the (In)Security of LLM App Stores.
- SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers.
- On the Conflict Between Robustness and Learning in Collaborative Machine Learning.
- Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning.
- SoK: Watermarking for AI-Generated Content.
- From One Stolen Utterance: Assessing the Risks of Voice Cloning in the AIGC Era.