Threat update
Significant vulnerabilities in Microsoft's Azure AI Content Safety services have been discovered. These vulnerabilities enable attackers to bypass safeguards and deploy harmful AI-generated content.
Technical Detail and Additional Info
What is the threat?
Attackers are using techniques such as 'Character Injection' and 'Adversarial ML Evasion', to exploit Azure AI Content Safety services.
- Character Injection: A technique that involves altering text by inserting or replacing characters with specific symbols or sequences, such as diacritics, homoglyphs, numeric substitutions, space injections, or zero-width characters. Therefore, these subtle modifications can trick the model into misclassifying content, enabling attackers to influence the model's interpretation and interfere with the analysis. The objective is to bypass the guardrail by causing it to incorrectly classify the content.
- Adversarial Machine Learning (AML): involves altering input data using specific techniques to mislead the model's predictions. These techniques include perturbations, word substitutions, misspellings, and other manipulations. By carefully choosing and modifying words, attackers can cause the model to misinterpret the intended meaning of the input.
Why is it noteworthy?
Azure AI Content Safety is a cloud-based service designed to assist developers in establishing safety and security guardrails for AI applications by identifying and managing inappropriate content. It employs advanced techniques to filter out harmful material, including hate speech and explicit or objectionable content. Azure OpenAI leverages a large language model (LLM) equipped with Prompt Shield and AI Text Moderation guardrails to validate inputs and AI-generated content. Many people rely on Microsoft's Azure AI Content Safety service for responsible AI behavior.
However, the two security vulnerabilities found within these guardrails, which are intended to protect AI models from jailbreaks and prompt injection attacks, means that attackers can bypass both the AI Text Moderation and Prompt Shield guardrails, allowing them to inject harmful content, manipulate the model's responses, or even compromise sensitive information.
What is the exposure or risk?
These vulnerabilities mean that developers and users must be more careful of any harmful, inappropriate, or manipulated content appearing in their AI-generated outputs.
What are the recommendations?
LBT Technology Group recommends the following actions to protect your environment against these vulnerabilities:
- Inspect the data returned by AI models regularly, to detect and mitigate risks associated with malicious or unpredictable user prompts.
- Establish company-wide verification mechanisms to ensure all AI models in use are legitimate and secure.
- Use an AI Gateway to help ensure consistent security across AI workloads.
References
For more in-depth information about the recommendations, please visit the following links:
- https://hackread.com/azure-ai-vulnerabilities-bypass-moderation-safeguards/
- https://www.secureblink.com/cyber-security-news/azure-ai-vulnerability-exposes-guardrail-flaws-how-safe-are-ai-moderation-tools
If you have any questions, please contact LBT's Sales Engineer.