Home NewsX Multimodal Public Preview Blog – Microsoft Community Hub

Multimodal Public Preview Blog – Microsoft Community Hub

by info.odysseyx@gmail.com
0 comment 8 views


Multimodal Public Preview Blog

We are very excited to announce the public preview release of our multimodal model. Azure AI Content SafetyMultimodal APIs help you analyze data that includes both image and text content, helping you make your applications and services safe from harmful user-generated or AI-generated content.

The main goals of the “multimodal” feature are:

  • Detecting harmful content across multiple modalities: The main goal is to analyze and detect harmful, inappropriate, or unsafe content. Text and image (including emojis). This includes identifying explicit content, hate speech, violence, self-harm, and sexual content within text-image combinations.
  • Contextual analysis across text and visuals: Multimodal can understand: context Analyze text and visual elements together to detect subtle or implicit harmful content that might not be noticed when looking at text or images alone.
  • Real-time adjustment: provide Real-time detection And moderation to prevent the creation, sharing, or distribution of harmful content on multimodal platforms. This prevents potentially harmful content from reaching users before it reaches them.

By achieving these goals, multimodal detection ensures a safer and more respectful user experience that enables creative and responsible content creation.

sherry_xiao_0-1727189978088.png

User Scenario

Multimodal harmful content detection involves analyzing and curating content across multiple modes, including text, images, and video, to identify harmful, unsafe, or inappropriate material. This is especially important in scenarios where tools like DALL·E 3 generate visual content based on text prompts. The biggest challenge lies in the variety and complexity of how harmful content can appear, sometimes subtly in both text and generated images.

  1. Harmful images

User scenario: A user enters seemingly innocuous text into DALL·E 3, but ends up generating subtly harmful images (e.g. glorifying violence, hate symbols, or discriminatory language).

Detection Mechanism: Multimodal detection evaluates image content after it has been generated using models that can recognize visual cues associated with hate speech, violence, and other harmful material.

Mitigation: Multi-modal detection flags generated images, prevents sharing, and prompts the user to correct them.

  1. Text contained in the image

User scenario: A user requests an image that contains text that promotes hate speech or misinformation (e.g., a sign or banner with an image that contains offensive language).

Detection Mechanism: The text within the generated image is analyzed for any malicious content using Optical Character Recognition (OCR) along with NLP technology to determine the meaning and intent contained in the text.

Mitigation: You can reject displaying the image if multi-modal detection is detected.

Multimodal Detection API

The multimodal API accepts both text and image inputs. It is designed to perform multi-class and multi-severity detection, allowing you to classify content across multiple categories and assign a severity score to each category. For each category, the system returns a severity level on a scale of 0, 2, 4, or 6. The higher the number, the more severe the content.

Reference Links





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX