10 May 2025 2 min read

Privacy Risks and Preservation Methods in Explainable AI: A Scoping Review

Explainable Artificial Intelligence (XAI) has emerged as a critical component of Trustworthy AI, aiming to bring transparency to complex, opaque models. While the benefits of incorporating explanations into AI systems are clear—enhancing user trust, aiding developers in debugging, and ensuring regulatory compliance—there is an urgent need to address the privacy concerns that arise from providing these additional insights. A recent scoping review published on arXiv delves into this tension between explainability and privacy, synthesizing findings from 57 studies out of 1,943 published between January 2019 and December 2024.

The Privacy-Explainability Conflict

The review identifies three key research questions:

What are the privacy risks of releasing explanations in AI systems?
What methods have researchers employed to achieve privacy preservation in XAI systems?
What constitutes a privacy-preserving explanation?

Key Findings

Privacy Risks in XAI

The study categorizes privacy risks into intentional and unintentional leakage. Intentional risks include attacks such as:

Membership Inference: Adversaries determine whether a specific individual's data was used in training the model.
Model Inversion: Sensitive data or attributes are reconstructed from explanations.
Model Extraction: The underlying model is reverse-engineered, threatening intellectual property.

Unintentional risks stem from training issues like overfitting or memorization, as well as explanation content that inadvertently reveals sensitive information.

Privacy Preservation Methods

The review highlights several techniques to mitigate these risks:

Differential Privacy (DP): Adding noise to explanations or training data to obscure individual contributions.
Cryptography: Using homomorphic encryption or secure multi-party computation to protect data during model training and inference.
Anonymization: Techniques like k-anonymity to mask identifiable information in explanations.
Federated Learning (FL): Training models on decentralized data to avoid centralized data exposure.

Characteristics of Privacy-Preserving XAI

The authors propose a checklist for privacy-preserving XAI, emphasizing resilience to attacks, prevention of direct and indirect exposure, and compliance with regulations. They also underscore the need to balance privacy with utility and explainability.

Challenges and Future Directions

The review identifies open issues, such as improving the usability of XAI methods, developing privacy metrics, and addressing the trade-offs between privacy, explainability, and accuracy. Notably, the rise of generative AI and large language models (LLMs) introduces new challenges, as these systems are prone to memorizing training data and leaking sensitive information through explanations.

Conclusion

This scoping review sheds light on the complex relationship between privacy and explainability, both foundational to Trustworthy AI. By categorizing risks and preservation methods, it provides a roadmap for researchers and practitioners to develop XAI systems that are both transparent and privacy-compliant. The findings underscore the need for continued innovation to balance these competing demands, ensuring AI systems are not only interpretable but also secure.

For more details, you can access the full paper on arXiv.