Multimodal Public Preview Blog – Microsoft Community Hub by info.odysseyx@gmail.com September 24, 2024 written by info.odysseyx@gmail.com September 24, 2024 0 comment 12 views 12 Multimodal Public Preview Blog We are very excited to announce the public preview release of our multimodal model. Azure AI Content SafetyMultimodal APIs help you analyze data that includes both image and text content, helping you make your applications and services safe from harmful user-generated or AI-generated content. The main goals of the “multimodal” feature are: Detecting harmful content across multiple modalities: The main goal is to analyze and detect harmful, inappropriate, or unsafe content. Text and image (including emojis). This includes identifying explicit content, hate speech, violence, self-harm, and sexual content within text-image combinations. Contextual analysis across text and visuals: Multimodal can understand: context Analyze text and visual elements together to detect subtle or implicit harmful content that might not be noticed when looking at text or images alone. Real-time adjustment: provide Real-time detection And moderation to prevent the creation, sharing, or distribution of harmful content on multimodal platforms. This prevents potentially harmful content from reaching users before it reaches them. By achieving these goals, multimodal detection ensures a safer and more respectful user experience that enables creative and responsible content creation. User Scenario Multimodal harmful content detection involves analyzing and curating content across multiple modes, including text, images, and video, to identify harmful, unsafe, or inappropriate material. This is especially important in scenarios where tools like DALL·E 3 generate visual content based on text prompts. The biggest challenge lies in the variety and complexity of how harmful content can appear, sometimes subtly in both text and generated images. Harmful images User scenario: A user enters seemingly innocuous text into DALL·E 3, but ends up generating subtly harmful images (e.g. glorifying violence, hate symbols, or discriminatory language). Detection Mechanism: Multimodal detection evaluates image content after it has been generated using models that can recognize visual cues associated with hate speech, violence, and other harmful material. Mitigation: Multi-modal detection flags generated images, prevents sharing, and prompts the user to correct them. Text contained in the image User scenario: A user requests an image that contains text that promotes hate speech or misinformation (e.g., a sign or banner with an image that contains offensive language). Detection Mechanism: The text within the generated image is analyzed for any malicious content using Optical Character Recognition (OCR) along with NLP technology to determine the meaning and intent contained in the text. Mitigation: You can reject displaying the image if multi-modal detection is detected. Multimodal Detection API The multimodal API accepts both text and image inputs. It is designed to perform multi-class and multi-severity detection, allowing you to classify content across multiple categories and assign a severity score to each category. For each category, the system returns a severity level on a scale of 0, 2, 4, or 6. The higher the number, the more severe the content. Reference Links Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Configuring TLS Updates on Server/Client to Implement TLS 1.2 – Azure SQL next post Get AI ready: Inside the Copilot Learning Hub You may also like Believe Hyp about Quantum Protection: Report March 11, 2025 Google Jemi is coming to Android Auto but the rollout is hassle March 10, 2025 How the drones are transmitting security on the US southern border March 7, 2025 Remember a uninterrupted tech trailballs: Tom Mitchell March 7, 2025 New HMD X 1 ‘Safe’ Phone: Protection for Parents, Great Factors for Kids March 5, 2025 Opera adds Agent AI to his browser March 4, 2025 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.