Multimodal Public Preview Blog – Microsoft Community Hub

by info.odysseyx@gmail.com September 24, 2024

written by info.odysseyx@gmail.com September 24, 2024 0 comment 12 views

Multimodal Public Preview Blog

We are very excited to announce the public preview release of our multimodal model. Azure AI Content SafetyMultimodal APIs help you analyze data that includes both image and text content, helping you make your applications and services safe from harmful user-generated or AI-generated content.

The main goals of the “multimodal” feature are:

Detecting harmful content across multiple modalities: The main goal is to analyze and detect harmful, inappropriate, or unsafe content. Text and image (including emojis). This includes identifying explicit content, hate speech, violence, self-harm, and sexual content within text-image combinations.
Contextual analysis across text and visuals: Multimodal can understand: context Analyze text and visual elements together to detect subtle or implicit harmful content that might not be noticed when looking at text or images alone.
Real-time adjustment: provide Real-time detection And moderation to prevent the creation, sharing, or distribution of harmful content on multimodal platforms. This prevents potentially harmful content from reaching users before it reaches them.

By achieving these goals, multimodal detection ensures a safer and more respectful user experience that enables creative and responsible content creation.

User Scenario

Multimodal harmful content detection involves analyzing and curating content across multiple modes, including text, images, and video, to identify harmful, unsafe, or inappropriate material. This is especially important in scenarios where tools like DALL·E 3 generate visual content based on text prompts. The biggest challenge lies in the variety and complexity of how harmful content can appear, sometimes subtly in both text and generated images.

Harmful images

User scenario: A user enters seemingly innocuous text into DALL·E 3, but ends up generating subtly harmful images (e.g. glorifying violence, hate symbols, or discriminatory language).

Detection Mechanism: Multimodal detection evaluates image content after it has been generated using models that can recognize visual cues associated with hate speech, violence, and other harmful material.

Mitigation: Multi-modal detection flags generated images, prevents sharing, and prompts the user to correct them.

Text contained in the image

User scenario: A user requests an image that contains text that promotes hate speech or misinformation (e.g., a sign or banner with an image that contains offensive language).

Detection Mechanism: The text within the generated image is analyzed for any malicious content using Optical Character Recognition (OCR) along with NLP technology to determine the meaning and intent contained in the text.

Mitigation: You can reject displaying the image if multi-modal detection is detected.

Multimodal Detection API

The multimodal API accepts both text and image inputs. It is designed to perform multi-class and multi-severity detection, allowing you to classify content across multiple categories and assign a severity score to each category. For each category, the system returns a severity level on a scale of 0, 2, 4, or 6. The higher the number, the more severe the content.

Reference Links

Source link

Our Company

About Links

Useful Links

Newsletter

Laest News

Multimodal Public Preview Blog – Microsoft Community Hub

Configuring TLS Updates on Server/Client to Implement TLS 1.2 – Azure SQL

Get AI ready: Inside the Copilot Learning Hub

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News