Home NewsX Phi-3.5 SLMs

Phi-3.5 SLMs

by info.odysseyx@gmail.com
0 comment 17 views


that Phi-3 Model Collection The latest addition to Microsoft’s Small Language Models (SLM) family of products. They are designed to outperform similarly or larger-sized models and be cost-effective across a wide range of benchmarks in language, inference, coding, and mathematics. The availability of Phi-3 models expands the selection of high-quality models for Azure customers, giving them more practical choices when configuring and building generative AI applications. Since its launch in April 2024, we’ve received a lot of valuable feedback from customers and community members on areas for improvement in the Phi-3 model. Today, we’re proud to announce: pie-3.5-mini, pie-3.5-visionAnd a new member of the Phi family. Pie-3.5-MoEMixture-of-Experts (MoE) model. Phi-3.5-mini enhances multi-language support with 128K context length. Phi-3.5-vision improves multi-frame image understanding and inference to improve performance on single-image benchmarks. Featuring 16 experts and 6.6B active parameters, Phi-3.5-MoE delivers high performance, reduced latency, multi-language support, and robust safety measures, outperforming larger models while maintaining Phi model efficacy.

Quality vs. size graph of Phi-3.5 SLMQuality vs. size graph of Phi-3.5 SLM

Phi-3.5-MoE: Expert Mix

Phi-3.5-MoE is the latest addition to the Phi family of models. It consists of 16 experts, each with 3.8B parameters. With a total model size of 42B parameters, this model enables 6.6B parameters when using two experts. This MoE model outperforms dense models of similar size in terms of quality and performance. It supports over 20 languages. Like its Phi-3 counterpart, the MoE model uses a robust safety post-training strategy that combines open source and proprietary synthetic instruction and preference datasets. This post-training process combines Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), leveraging both human-labeled and synthetic datasets. It includes datasets focused on usefulness and harmlessness, as well as multiple safety categories. Phi-3.5-MoE also supports context lengths up to 128K, allowing it to handle a large number of long-context tasks.

To understand model quality, we compare Phi-3.5-MoE with a set of models for various benchmarks shown in Table 1.

Table 1: Phi-3.5-MoE model qualityTable 1: Phi-3.5-MoE model quality

Let’s take a closer look at some of the public benchmark datasets in different categories in the table below.

Table 2: Phi-3.5-MoE model quality for various featuresTable 2: Phi-3.5-MoE model quality for various features

With only 6.6B active parameters, Phi-3.5-MoE achieves similar levels of language understanding and mathematics as much larger models. Moreover, it outperforms larger models in inference ability. This model provides excellent capacity for fine-tuning on a variety of tasks. Table 3 highlights the multilingual capabilities of Phi-3.5-MoE on multilingual MMLU, MEGA, and multilingual MMLU-pro datasets. Overall, we observe that with only 6.6B active parameters, this model is highly competitive on multilingual tasks compared to other models with much larger active parameters.

Multilingual skills

Table 3: Phi-3.5-MoE multilingual benchmarksTable 3: Phi-3.5-MoE multilingual benchmarks

The table below shows the multilingual MMLU scores for some of the supported languages.

Table 4: Phi-3.5-MoE Multilingual MMLU BenchmarksTable 4: Phi-3.5-MoE Multilingual MMLU Benchmarks

The Phi-3.5-mini model was further pre-trained using multilingual synthetic and high-quality filtered data. This was followed by a series of post-training steps including supervised fine-tuning (SFT), proximal policy optimization (PPO), and direct preference optimization (DPO). These processes utilized a combination of human-labeled datasets, synthetic datasets, and translated datasets.

Model quality

When looking at the capabilities of language models, it is important to understand how they compare to each other. That is why we used our internal benchmark platform to put Phi-3.5-mini to the test alongside recent state-of-the-art large models. At a high level, Table 1 provides a quick overview of the model quality on key benchmarks. Despite its compact size of only 3.8B parameters, this efficient model not only matches, but often outperforms, other large models.

Table 5: Phi-3.5-mini model qualityTable 5: Phi-3.5-mini model quality

Multilingual skills

Phi-3.5-mini is the latest 3.8B model update. This model has significant gains in multilingual, multi-turn dialogue quality and inference ability by using additional continuous pre-training and post-training data. This model has been trained on selected language sets listed here: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish and Ukrainian.

Table 6 below highlights the multilingual features of Phi-3.5-mini for the mean language-specific scores on the multilingual MMLU, MGSM, MEGA, and multilingual MMLU-pro datasets.

Table 6: Phi-3.5-mini multilingual qualityTable 6: Phi-3.5-mini multilingual quality

Table 7 below shows the multilingual MMLU scores for some of the supported languages.

Table 7: Phi-3.5-mini multilingual MMLU quality for selected language setsTable 7: Phi-3.5-mini multilingual MMLU quality for selected language sets

Phi-3.5-mini shows significant improvements over Phi-3-mini in multilingual support. Arabic, Dutch, Finnish, Polish, Thai, and Ukrainian show the biggest improvements in the new Phi version, with performance improvements of 25-50%. To put this in perspective, Phi-3.5-mini shows the best performance for all lower-8B models in English and several languages. This model uses a 32K vocabulary and is optimized for the upper-resource languages ​​above, so it is not recommended for use on lower-resource languages ​​without additional fine-tuning.

Long context

Supporting 128K context length, Phi-3.5-mini excels at tasks such as summarizing long documents or meeting minutes, long document-based QA, and information retrieval. Phi-3.5 outperforms the Gemma-2 family, which only supports 8K context length. Phi-3.5-mini is also competitive with much larger openweight models such as Llama-3.1-8B-instruct, Mistral-7B-instruct-v0.3, and Mistral-Nemo-12B-instruct-2407. Table 8 shows various long-context benchmarks.

Martinkay_3-1724293042503.png

ruler: Search-based benchmark for long-term context understanding

martincai_4-1724293077758.png

RepoQA: Benchmark for understanding long context code

Table 8: Phi-3.5-mini Long Context BenchmarksTable 8: Phi-3.5-mini Long Context Benchmarks

Phi-3.5-mini-instruct, with 3.8B parameters, 128K context length, and multilingual support, is the only model in this category. It is worth noting that it aims to support more languages ​​while maintaining English performance on a variety of tasks. Since the model capacity is limited, the model’s knowledge of English may be better than other languages. For tasks where multilingual knowledge is intensive, it is recommended to use the model in the RAG setting.

Phi-3.5-vision with multi-frame input

Phi-3.5-vision introduces cutting-edge capabilities for multi-frame image understanding and inference, developed based on valuable customer feedback. This innovation enhances detailed image comparison, multi-image summarization/storytelling, and video summarization, providing a wide range of applications in a variety of scenarios.

For example, see the model output for a multi-slide summary.

Martinkay_0-1724293262023.png

Phi-3.5-vison model output for slide summaryPhi-3.5-vison model output for slide summary

Surprisingly, Phi-3.5-vision showed significant performance improvements in a number of single-image benchmarks. For example, it improved MMMU performance from 40.4 to 43.0 and MMBench performance from 80.5 to 81.9. Also, TextVQA, a document understanding benchmark, increased from 70.9 to 72.0.

The following table shows a detailed comparison of two popular multi-image/video benchmarks, showing improved performance metrics. It should be noted that Phi-3.5-Vision is not optimized for multi-language use cases. It is not recommended for multi-language scenarios without additional fine-tuning.

Table 9: Phi-3.5-vision workload benchmarksTable 9: Phi-3.5-vision workload benchmarks

Table 10: Phi-3.5-vision VideoMME BenchmarksTable 10: Phi-3.5-vision VideoMME Benchmarks

safety

The Phi-3 family of models was developed according to: Microsoft Responsible AI StandardsThis is a company-wide set of requirements based on six principles: accountability, transparency, fairness, trust and safety, privacy and security, and inclusion. As with previous Phi-3 models, a multi-faceted safety assessment and safety post-training approach was adopted, with additional steps taken to account for the multilingual capabilities of this release. Our approach to safety training and assessment, including testing across multiple languages ​​and risk categories, is described below. Paper after Phi-3 safety training. The Phi-3 model benefits from this approach, but developers must apply responsible AI best practices, including risk mapping, measurement, and mitigation relevant to specific use cases and cultural and linguistic contexts.

Optimized transformation

Provides optimized inference for Phi family models. You can use this today to optimize Phi-3.5-mini on a variety of hardware targets. Expect an updated ONNX variant of the latest Phi-3.5 model in the coming weeks.

More predictable output

We are bringing guide Adding to the Phi-3.5-mini serverless endpoint provided by Azure AI Studio, you can define the structure to fit your application and make the output more predictable. Guidance allows you to eliminate costly retries and, for example, restrict the model to select from a predefined list (e.g. medical codes), restrict the output to direct quotes from the provided context, or follow any regular expression. Guidance provides a unique and valuable addition by tuning the model on a token-by-token basis in the inference stack, reducing cost and latency by 30-50%. Phi-3-mini serverless endpoint.

Phi-3.5-mini emerges as a unique product in the LLM field, boasting 3.8B parameters, a significant 128K context length, and multilingual support. Phi-3.5-mini represents a milestone in building efficient multilingual models, striking a delicate balance between broad language support and intensive performance on English. With a small model size, users can expect the model’s English knowledge density to be higher than other languages. When approaching multilingual, knowledge-intensive tasks, it is recommended to leverage: pie-3.5-mini Within Augmented Search Generation (RAG) Settings. This configuration can significantly improve the model’s performance in various languages ​​by leveraging external data sources, and alleviate language-specific limitations caused by the compact architecture.

Pie-3.5-MoEFeaturing a small team of 16 experts, it offers high-quality performance and reduced latency, supports multiple languages ​​with 128k context length and strong safety measures. It outperforms larger models and can be customized for a variety of applications through fine-tuning, while maintaining efficiency with 6.6B active parameters.

pie-3.5-vision Introducing advances in multi-frame image understanding and inference to improve single-image benchmark performance.

that Pie-3.5 The model family pushes the boundaries of small-scale language models and generative AI by providing cost-effective, high-performance options for the open source community and Azure customers.





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX