Multimodal Public Preview Blog – Microsoft Community Hub by info.odysseyx@gmail.com September 24, 2024 written by info.odysseyx@gmail.com September 24, 2024 0 comment 8 views 8 Multimodal Public Preview Blog We are very excited to announce the public preview release of our multimodal model. Azure AI Content SafetyMultimodal APIs help you analyze data that includes both image and text content, helping you make your applications and services safe from harmful user-generated or AI-generated content. The main goals of the “multimodal” feature are: Detecting harmful content across multiple modalities: The main goal is to analyze and detect harmful, inappropriate, or unsafe content. Text and image (including emojis). This includes identifying explicit content, hate speech, violence, self-harm, and sexual content within text-image combinations. Contextual analysis across text and visuals: Multimodal can understand: context Analyze text and visual elements together to detect subtle or implicit harmful content that might not be noticed when looking at text or images alone. Real-time adjustment: provide Real-time detection And moderation to prevent the creation, sharing, or distribution of harmful content on multimodal platforms. This prevents potentially harmful content from reaching users before it reaches them. By achieving these goals, multimodal detection ensures a safer and more respectful user experience that enables creative and responsible content creation. User Scenario Multimodal harmful content detection involves analyzing and curating content across multiple modes, including text, images, and video, to identify harmful, unsafe, or inappropriate material. This is especially important in scenarios where tools like DALL·E 3 generate visual content based on text prompts. The biggest challenge lies in the variety and complexity of how harmful content can appear, sometimes subtly in both text and generated images. Harmful images User scenario: A user enters seemingly innocuous text into DALL·E 3, but ends up generating subtly harmful images (e.g. glorifying violence, hate symbols, or discriminatory language). Detection Mechanism: Multimodal detection evaluates image content after it has been generated using models that can recognize visual cues associated with hate speech, violence, and other harmful material. Mitigation: Multi-modal detection flags generated images, prevents sharing, and prompts the user to correct them. Text contained in the image User scenario: A user requests an image that contains text that promotes hate speech or misinformation (e.g., a sign or banner with an image that contains offensive language). Detection Mechanism: The text within the generated image is analyzed for any malicious content using Optical Character Recognition (OCR) along with NLP technology to determine the meaning and intent contained in the text. Mitigation: You can reject displaying the image if multi-modal detection is detected. Multimodal Detection API The multimodal API accepts both text and image inputs. It is designed to perform multi-class and multi-severity detection, allowing you to classify content across multiple categories and assign a severity score to each category. For each category, the system returns a severity level on a scale of 0, 2, 4, or 6. The higher the number, the more severe the content. Reference Links Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Configuring TLS Updates on Server/Client to Implement TLS 1.2 – Azure SQL next post Get AI ready: Inside the Copilot Learning Hub You may also like Bots now dominate the web and this is a copy of a problem February 5, 2025 Bots now dominate the web and this is a copy of a problem February 5, 2025 Bots now dominate the web, and this is a problem February 4, 2025 DIPSEC and HI-STECS GLOBAL AI Race February 4, 2025 DEPSEC SUCCESS TICTOKE CAN RUNNING TO PUPPENSE TO RESTITE January 29, 2025 China’s AI Application DEPSEC Technology Spreads on the market January 28, 2025 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.