Responsible AI Mitigation Layers

Generative AI is increasingly being used in many different kinds of systems to augment humans and inject intelligent behavior into existing and new apps. While this opens up a world of opportunity for new capabilities, it also creates a new set of risks due to the probabilistic nature and interaction using natural language prompts.

When building generative AI applications, it is important to address potential risks to ensure safe and responsible AI deployment. The first risk category is the overall quality of the application. The system must avoid errors that result in inaccurate or manipulated information. Another key area is robustness against adversarial attacks, such as jailbreak attempts where users manipulate the system to bypass restrictions, or new threats such as rapid injection attacks where hidden instructions are embedded in the data source. They must also mitigate existing risks, including harmful content such as inappropriate language, images, and insecure code. Interactions with these models use human-like natural language, making them vulnerable to attacks similar to social engineering targeting humans. This raises questions about when it is appropriate for systems to behave like humans and when that can lead to misleading or harmful experiences. Despite the variety of risks, we found that a common framework and layered defense mechanisms can effectively address these diverse challenges.

This blog post describes mitigation strategies used against attacks on generative AI systems. But before that, let’s take a quick look at the basic Responsible AI principles that guide these mitigation mechanisms.

Microsoft AI Principles.jpg

AI is a transformative horizontal technology, like the Internet, that will transform the way we interact and work with technology. We must use it responsibly and ethically. Microsoft’s commitment begins with the AI Principles that guide our approach to responsible AI development. our principles equity Ensures that AI systems allocate opportunities, resources, and information in a fair way to all users. Reliability and Safety and Privacy and Security We focus on protecting user data while building systems that perform well in a variety of situations, including those for which they were not originally designed. principle inclusiveness We emphasize designing AI to be accessible to people of all abilities. finally, transparency and responsibility Ensure AI systems are understandable, minimize misuse, and allow for human oversight and control. These six principles remain a solid foundation, providing a framework to guide implementation and adapt to new challenges as AI technology advances.

Generative AI development life cycle

If we look at the Generative AI lifecycle, we can divide it into five stages in the iterative process:

rule – Start by aligning roles and responsibilities and establishing requirements.
map – Once your use case is defined, you can try red teaming using tools such as: PyRiT To find out what is possible.
measure – Once you know that, figure out how you can measure the level of risk at scale.
alleviate – Next, you should try to reduce or eliminate these risks using various system scans.
work – Finally, once mitigation is in place, systems are put into operation to monitor risks and respond to incidents. This happens continuously as we continually update our systems to identify, respond to, and mitigate new risks in production.

Risk Mitigation Tier

Now let’s take a closer look at the mitigation layers. Most production applications require a mitigation plan that includes four layers of technical mitigation:

model
safety system
System messages and grounding
User experience layer.

that Model and safety system layer Typically a platform layer whose built-in mitigation features are commonly used across multiple applications. This is built into Azure for you. that next two layers This largely depends on the purpose and design of the application. This means that mitigation implementations can vary greatly from application to application. Additionally, while the underlying model in use is an important component of the system, it is not a complete system.

model

right choice basic model This is a critical step in developing effective AI applications. But how do you decide which model is best for your needs? that Azure AI Model Catalog We offer over 1,600 models along with tools to help you make the best choice. For example, it provides powerful benchmarking and evaluation capabilities so you can compare the performance of multiple models side-by-side to find the best model for your use case. You can also access the model card, which provides detailed information from the model provider. These resources will help you evaluate whether your model is appropriate for your application, while also highlighting potential risks that need to be mitigated and monitored throughout development.

safety system

For most applications, relying solely on the safety measures built into the model is not sufficient. Even with fine-tuning, Large Language Models (LLMs) are still error-prone and vulnerable to attacks such as jailbreaks. That’s why Microsoft takes a layered defense-in-depth approach to AI safety. We use an AI-based safety system that surrounds the model and monitors its inputs and outputs to help prevent attacks and identify mistakes. In some cases, they may be special models trained for safety. This advanced system, known as Azure AI Content Safety, has configurable filters that monitor for violence, hate, sexual language, and self-harm. Filters allow you to customize them based on your specific use case. For example, a gaming company might allow more lenient language in its input but restrict violent language in its output. This safety system is integrated directly into the Microsoft Copilot ecosystem, ensuring built-in protection. We’re also bringing this technology to developers through Azure AI, allowing them to create more secure AI applications from the start.

Azure AI content safety Three types of filters are integrated:

content filter – For harmful content, such as text and images containing violence or hate speech, you can adjust it based on severity level. This is always set to a medium threshold by default. For Azure OpenAI models, only customers approved for modified content filtering have full content filtering control, such as configuring only high-severity content filters or turning off content filters. Apply for modified content filters via: this form
security – Prompt Shield is a detection model that can be activated on model input to detect when a user is attempting to attack or manipulate an AI system to perform an action outside of its intended purpose or design.
quality – A detection model that can be activated to flag other types of dangerous inputs or outputs, such as protected or copyrighted material or code, or factually incorrect information where the model output does not match the source material provided.

Customers can also create custom blocklists to filter specific terms in their input or output.

System messages and grounding

that system message and ground layer This is where backend prompt engineering comes into play, allowing applications to effectively leverage the power of Large Language Models (LLMs). Although these models learn vast amounts of information, they have limitations and the embedded knowledge stops during training. They also lack access to personal or proprietary data that could be used to differentiate their applications. These limitations can be addressed by combining the general reasoning and language generation capabilities of LLM with specific business information. Retrieval Augmented Generation (RAG) is a widely used approach to implement this. RAG allows your model to discover or leverage data to provide accurate answers. For example, in Bing Chat, when a search is performed, the model scans the search results to produce a more accurate response. This ensures that the data is up-to-date and accurate, reducing the model’s reliance on existing knowledge. Also important is how to guide the model to use this data effectively. system message or meta promptIt has a huge impact on the tone, style, and scope of your model’s responses, shaping how it interacts within your application.

Even small changes to system messages can have a big impact. Using system messages is much more effective at telling your model what not to do and what it should do than simply telling it what not to do. These learnings from our experience building multiple co-pilots are integrated into the Azure AI Studio playground so you don’t have to start from scratch when building system messages.

user experience

User experience is the layer where many design innovations occur to build new ways to interact with AI systems. The guidelines below will help you build better user interfaces and experiences.

Be transparent about AI roles and limitations. – This helps users stay alert in detecting potential mistakes. Additionally, disclose the role AI plays in the process as it sets user expectations for using the system.
Ensure people stay informed – AI systems typically augment humans, so it is important to incorporate mechanisms for human feedback. This is especially true for high-risk use cases in finance, healthcare, and insurance.
Alleviating misuse and overdependence – Provide data citations and prepare pre-determined responses for further verification and viewing.
documentation – It is always useful to provide user guidance and best practices to system users.

I hope you’ve gotten a good overview of how to build better AI applications using the mechanisms and guidelines above, and you can explore the following resources to learn more:

Source link

model

safety system

System messages and grounding

user experience

Our Company

About Links

Useful Links

Newsletter

Laest News

Responsible AI Mitigation Layers

model

safety system

System messages and grounding

user experience

Poor video quality is costing your business more than you think—here’s how to fix it

The Ultimate AI Tool for Video Summarization and Analysis

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News