Phi-3.5-MoE model available in Azure AI Studio and GitHub

In August 2024 we will Pi model family, Phi-3.5-MoEIt is a mixed-experts (MoE) model with 16 experts and 6.6 billion active parameters. The introduction of this model was met with enthusiasm and praise from users who recognized its competitive performance, multilingual capabilities, robust safety measures, and superior performance over larger models while maintaining the efficiency of the Phi model.

Today we are proud to announce that the Phi-3.5-MoE model is now available via the Serverless API deployment method. Azure AI Studio (Figure 1) and GitHub (Figure 2). By providing access to models through a Serverless API, we aim to simplify the deployment process and reduce the overhead associated with infrastructure management. These advancements represent a significant step forward in making our cutting-edge deep learning models more accessible and easier to integrate into a variety of applications for users and developers around the world. Key benefits include:

Scalability: You can easily scale your usage based on demand without worrying about underlying hardware constraints. The Phi-3.5-MoE and other Phi-3.5 models are available in the East US 2, East US, North Central US, South Central US, West US 3, West US, and Sweden Central regions.
Cost-effective: At a cost of $0.00013 per 1K input token and $0.00052 per 1K output token, you only pay for the resources you use, ensuring cost-efficient operation.
Ease of integration: Seamlessly integrate Phi-3.5-MoE into your existing workflows and applications with minimal effort.

Follow the quickstart guide on how to deploy and use the Phi model suite in Azure AI Studio and GitHub. Pi-3 Cookbook.

Figure 1: Deploy the Phi-3.5-MoE model using the serverless API in Azure AI Studio.

Figure 2: Phi-3.5-MoE Playground experience on GitHub.

While we celebrate the launch of Phi-3.5-MoE, we would like to take this opportunity to highlight the complexity of training such a model. Mix of Experts (MoE) models can scale efficiently without linearly increasing computation. For example, the Phi-3.5-MoE model has a total of 42B parameters, but only activates 6.6B of them, selecting only 2 experts per token, leveraging 16 expert blocks. How to utilize these parameters effectively has proven difficult, and increasing the number of parameters only marginally improves quality. Additionally, it was difficult to specialize each expert for a specific task. With traditional training methods, all 16 experts received similar training, which limited quality improvement across different tasks.

To build a state-of-the-art MoE model, the Phi team developed a new training method: GRIN (Gradient Information) MoE Improves the use of parameters and expert specialization. The Phi-3.5-MoE model trained using this method shows a clear pattern of expert specialization, with experts clustered around similar tasks such as STEM, social sciences, and humanities. This approach achieved significantly higher quality improvements compared to existing methods. As shown in Figure 3, the model can utilize different sets of parameters for different tasks. This specialization allows efficient use of large parameter sets by activating only the most relevant parameters for each task.

Figure 3: Expert routing patterns for different tasks. GRIN MoE training demonstrates strong professionalism among professionals.

The model excels in real-world and academic benchmarks, outperforming several leading models on a variety of tasks, including mathematics, reasoning, multilingual tasks, and code generation. Figure 4 below is an example of the solution generated by Phi-3.5-MoE in response to: Gaokao 2024 math problem. This model effectively breaks down complex mathematical problems, makes inferences about them, and arrives at the correct answer.

Figure 4: Phi-3.5-MoE's answer to the GAOKAO 2024 math problem. Figure 4: Phi-3.5-MoE’s answer to the GAOKAO 2024 math problem.

The Phi-3.5-MoE model was evaluated against various academic benchmarks (see Figure 5). Compared to several open source and closed source models, Phi-3.5-MoE is better than more recent models such as Mistral-Nemo-12B, Llama-3.1-8B and Gemma-2-9B despite using fewer active parameters. Performance is excellent. It also performs similarly to or slightly better than Gemini-1.5-Flash, one of the more widely used closed models.

Figure 5: Phi-3.5-MoE evaluation results against several academic benchmarks.

We invite developers, data scientists, and AI enthusiasts to explore the specialized features of Phi-3.5-MoE through Azure AI Studio. Whether developing innovative applications or enhancing existing solutions, Phi-3.5-MoE provides the flexibility and performance you need. For the latest information on the Phi model family, please visit: pie open model page.

Source link

Our Company

About Links

Useful Links

Newsletter

Laest News

Phi-3.5-MoE model available in Azure AI Studio and GitHub

Introducing Copilot in OneDrive: Now Generally Available

Demonstrating Scenarios Where Query Store Does Not Capture Runtime Statistics for Unfinished Queries

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News