Meta’s new Llama 3.2 models available on Azure AI

In collaboration with Meta Microsoft is excited to announce that Meta’s new Llama 3.2 model is now available in the Azure AI Model Catalog. From today Rama 3.2 11B Vision Instructions and Rama 3.2 90B Vision Instructions Model – Llama’s first multimodal model – Ready to be deployed via managed compute from the model catalog via managed compute.

Coming soon: Inference capabilities via the Models-as-a-Service serverless API are coming soon.

also, Llama 3.2 1B, 3B, 1B Instruct, 3B Instruct is Meta’s first SLM built for on-device local inference for mobile and edge devices.Ensures safety everywhere and provides low-cost agent applications such as RAG for multilingual summarization and on-device local inference. We are excited to be one of the launch partners for this release of Meta. The 3.2 release empowers developers with the latest Llama models suitable for edge, mobile, and image inference use cases. This release combines Azure’s secure and scalable cloud infrastructure, Azure AI Content Safety, Azure AI Search, and Azure AI Studio’s tools such as Rapid Flow with Meta’s cutting-edge AI models to deliver powerful, customizable, and secure AI experiences.

Introducing Llama 3.2: A New Era of Vision and Lightweight AI Models

Llama 3.2 is a self-regressive language model that uses an optimized transformer architecture. The tuned version uses supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to match the preference for human assistance and safety. To support image recognition tasks, the Llama 3.2-Vision model is packaged with the core LLM weights via cross-attention and uses separately trained image inference adapter weights called tool usage when the input image is provided to the model.

These models are designed for a variety of use cases, including image inference, multilingual summarization, and personalized on-device agents. These models enable developers to create AI applications that prioritize user privacy, reduce dependency on the cloud, and provide faster and more efficient processing. All models support long context lengths (up to 128k) and are optimized for inference via grouped query attention (GQA).

From today Developers have access to the following models: Through managed computing inference:

Rama 3.2 11B Vision Instructions

Rama 3.2 90B Vision Instruck

Rama Guard 3 11B Vision

Fine tuning is possible. Information about Llama 3.2 1B Instruct and 3B Instruct has been released, with information about the rest of the collection coming soon.

Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct, which will be released soon on Models-as-a-Service, will be available via serverless API deployment.

Key features and benefits of Llama 3.2

Multimodal capabilities for image inference applications: The Vision models (11B and 90B) in Llama 3.2 are the first Llama models to support multimodal tasks, incorporating image encoder representations into the language model. This helps developers bridge the gap between vision and language in AI models by creating applications that analyze visual data and generate accurate insights.
A lightweight model for mobile and edge applications: Llama 3.2’s 1B and 3B text-only models are ideal for edge applications, providing local and on-device inference to ensure sensitive information never leaves the device, significantly reducing the risk of data breaches or unauthorized access. These models enable fast, real-time responses to on-device agents, making them ideal for tasks such as message digests, information retrieval, and providing multilingual support, all while maintaining user privacy.
System-level safety and customization: Llama 3.2 introduces Llama Guard 3, a safety layer built into the model to ensure responsible innovation. This safeguard helps developers maintain compliance and trust while building AI solutions. Developers also have direct access to model weights and architecture, giving them full control and customization of the model.
Llama Stack for Smooth Development: Llama 3.2 is built on top of the Llama Stack, a standardized interface that simplifies AI application development. The stack integrates with PyTorch and includes tools for fine-tuning, synthetic data generation, and agent application development. The Llama Stack API allows developers to easily manage Llama models, providing a streamlined experience from evaluation to deployment. meta-llama/llama-stack: Model components for the Llama Stack API (github.com)

What sets Llama 3.2 apart

According to Meta, Llama 3.2 stands out for its combination of flexibility, privacy, and performance.

Deep customization: Developers have full control over weights and architecture, allowing them to tailor the model to their specific needs.

Infrastructure Control: With the flexibility to deploy in any environment – on-premises, cloud or virtual – Llama 3.2 offers unmatched versatility.

Ironclad security: Processing your data locally ensures that you maintain sovereignty over your sensitive information, putting privacy first.

Complete transparency: Llama 3.2 provides complete visibility into model behavior to support compliance and build trust...

Why use Llama 3.2 on Azure?

Developers using Meta Llama 3 models can work seamlessly with tools in Azure AI Studio, such as Azure AI Content Safety, Azure AI Search, and Rapid Flow, to enforce ethical and effective AI practices. Here are some key benefits of Llama 3.2, which seamlessly integrates with Azure, Azure AI, and Models as a Service, and provides a robust support system.

Enhanced security and compliance: Azure focuses on data privacy and security, and adopts Microsoft’s comprehensive security protocols to protect customer data. Llama 3.2 Azure AI Studio enables businesses to operate with confidence, knowing their data is safe. Security perimeter of the Azure cloudThis improves privacy and operational efficiency.

Content Safety Integration: Customers can integrate the Meta Llama 3 model with content security features provided through: Azure AI Content SafetyEnables additional responsible AI practices. This integration facilitates the development of more secure AI applications, ensuring that the content being generated or processed is monitored for compliance and ethical standards.

Simplified assessment of the LLM stream: Azure AI’s prompt flow allows for evaluation flows, which allow developers to measure how well the output of the LLM matches given standards and goals by calculating metrics. This feature is useful for workflows built with Llama 3.2. Comprehensive evaluations can be made using metrics such as rationale, which measures the appropriateness and accuracy of the model’s response based on the input source. Augmented Search Generation (RAG) pattern

Simplified distribution and inference: By deploying Meta models via MaaS using the pay-as-you-go inference API, developers can leverage the benefits of Llama 3 without having to manage the underlying infrastructure in their Azure environment.

These capabilities demonstrate Azure’s commitment to providing organizations with an environment where they can efficiently and responsibly leverage the full potential of AI technologies like Llama 3.2 to drive innovation while maintaining high standards for security and compliance.

Getting started with Meta Llama3 on MaaS

To get started with Azure AI Studio and deploy your first model, follow these simple steps:

Get used to it: If you’re new to Azure AI Studio, review this first. documentation Understand the basics and set up your first project.
Access the model catalog: Open the model catalog in AI Studio.
Find a model: Select a Meta collection using the filter or click the “View Models” button on the MaaS announcement card.
Select a model: Open Rama-3.2 Select a text model from the list.
Model deployment: Click ‘Deploy’ and select the Managed Compute option.

Frequently Asked Questions

How much does it cost to use Llama 3.2 models on Azure?
- For managed compute deployments, you are billed based on the minimum GPU SKU used in your deployment, provided you have sufficient GPU quota.
- For models via MaaS, You will be charged based on prompt and completion tokens. Pricing will be available soon in Azure AI Studio (Marketplace Offer Details tab when deploying a model) and Azure Marketplace.
Do I need GPU capacity in my Azure subscription to use the Llama 3.2 model?
- Yes, for models available through managed compute deployments, model-specific GPU capacity is required.
- When you deploy a model, the VMs that are automatically selected for deployment are displayed.
- For 11B Vision Instruct and 90B Vision Instruct, which are available via serverless API (coming soon), GPU capacity is not required.
Llama 3.2 11B Vision Instruct and 90B Vision Instruct models are listed on Azure MarketplaceCan I purchase and use these models directly from the Azure Marketplace?
- You can purchase and bill for Llama 3.2 using Azure Marketplace, but the purchasing experience is only accessible through the model catalog.
- When you try to purchase the Llama 3.2 model from the Marketplace, you are redirected to Azure AI Studio.

Given that Llama 3.2 11B Vision Instruct and 90B Vision Instruct are billed through Azure Marketplace, does this mean my Azure Use Commitment (aka MACC) will expire when these models are offered through MaaS?

Is my inference data shared with Meta?
- No, Microsoft does not share the contents of your inference requests or response data with Meta.

Are there any rate limits on the Meta model in Azure?
- Metamodel has a limit of 200k tokens per minute and 1k requests per minute. If this is not enough, please contact Azure Customer Support.

Is the MaaS model available for all Azure subscription types?
- Customers can use the MaaS model in all Azure sub-section types using a valid payment method, except for the Cloud Solution Provider (CSP) program. Free or trial Azure subscriptions are not supported.

Can I fine-tune the Llama 3.2 model?
- Llama 3.2 allows fine-tuning of 1B Instruct and 3B Instruct models.Further details on the rest of the collection will follow soon.

Source link

Introducing Llama 3.2: A New Era of Vision and Lightweight AI Models

Key features and benefits of Llama 3.2

What sets Llama 3.2 apart

Why use Llama 3.2 on Azure?

Frequently Asked Questions

Our Company

About Links

Useful Links

Newsletter

Laest News

Meta’s new Llama 3.2 models available on Azure AI

Introducing Llama 3.2: A New Era of Vision and Lightweight AI Models

Key features and benefits of Llama 3.2

What sets Llama 3.2 apart

Why use Llama 3.2 on Azure?

Frequently Asked Questions

Microsoft Loop 101 ask Microsoft anything AMA The Intrazone podcast

New Microsoft 365 Copilot Enhancements: Expanded Opportunities for Partners

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News