Nearly 100x Compression with Minimal Quality Loss by info.odysseyx@gmail.com October 8, 2024 written by info.odysseyx@gmail.com October 8, 2024 0 comment 17 views 17 As part of our ongoing effort to provide developers and organizations with advanced search tools, we’re Latest preview API for Azure AI Search. These enhancements are designed to optimize vector index size and provide more granular control and understanding of which search index to build. Search Augmentation Generation (RAG) Application. Matryoshka Representation Learning (MRL) It is a new technique that introduces another form of vector compression that complements existing quantization methods and operates independently. MRL provides the flexibility to truncate embeddings without semantic loss, providing a balance between vector size and information preservation. This technique works by training an embedding model so that the information density increases towards the beginning of the vector. As a result, much of the key information is preserved even when only a prefix of the original vector is used, allowing shorter vector representations without performance penalty. Open AI We’ve integrated MRL into the ‘text-embedding-3-small’ and ‘text-embedding-3-large’ models, adapting them for use in scenarios that require compressed embeddings while maintaining high retrieval accuracy. You can read more about the underlying research in the formula. paper [1] Or learn about the latest OpenAI embedding models. blog. Storage Compression Comparison Table 1.1 below highlights different configurations for vector compression by comparing standard uncompressed vectors. Scalar Quantization (SQ)and Binary Quantization (BQ) With or without MRL. Compression ratio shows how efficiently vector index size can be optimized to achieve significant cost savings. More information about vector index size limits can be found here. Service limits for tiers and SKUs – Azure AI Search | microsoft run. Table 1.1: Vector index size compression comparison composition *Compression ratio uncompressed – square 4 times B.Q. 28 times **MRL + SQ (1/2 and 1/3 truncated dimensions respectively) 8x-12x **MRL + BQ (1/2 and 1/3 truncated dimensions respectively) 64x – 96x memo: Compression ratio depends on embedding size and truncation. For example, using “text-embedding-3-large”, where 3072 dimensions are truncated to 1024 dimensions, binary quantization allows for 96x compression. *All compression methods listed above may result in slightly lower compression ratios due to the overhead caused by the index data structure. see “Memory overhead of the selected algorithm” For more information **The impact on compression when using MRL depends on the cut dimension values. To maintain embedding quality, we recommend using 1/2 or 1/3 the original size (see below). Quality Maintenance Table: Table 1.2 provides a detailed look at quality maintenance when using MRL with quantization across different models and configurations. Results show the impact on the average. NDCG@10 Across subsets MTEB dataset, We show that even high levels of compression can maintain retrieval quality up to 99%, especially for BQ and MRL. Table 1.2: Impact of MRL on average NDCG@10 across MTEB subsets model name original dimensions MRL Dimension Quantization Algorithm No reranking (%Δ) 2x oversampling reranking (%Δ) OpenAI Text Embedding-3-Small 1536 512 square -2.00% (Δ = 1.155) -0.0004% (Δ = 0.0002) OpenAI Text Embedding-3-Small 1536 512 B.Q. -15.00% (Δ = 7.5092) -0.11% (Δ = 0.0554) OpenAI Text Embedding-3-Small 1536 768 square -2.00% (Δ = 0.8128) -1.60% (Δ = 0.8128) OpenAI Text Embedding-3-Small 1536 768 B.Q. -10.00% (Δ = 5.0104) -0.01% (Δ = 0.0044) OpenAI Text Embedding-3-Large 3072 1024 square -1.00% (Δ = 0.616) -0.02% (Δ = 0.0118) OpenAI Text Embedding-3-Large 3072 1024 B.Q. -7.00% (Δ = 3.9478) -0.58% (Δ = 0.3184) OpenAI Text Embedding-3-Large 3072 1536 square -1.00% (Δ = 0.3184) -0.08% (Δ = 0.0426) OpenAI Text Embedding-3-Large 3072 1536 B.Q. -5.00% (Δ = 2.8062) -0.06% (Δ = 0.0356) Table 1.2 compares the relative point difference in average NDCG@10 when using different MRL dimensions of the uncompressed index (1/3 and 1/2 of the original dimension) across OpenAI text embedding models. Key Takeaways: 99% search quality with BQ + MRL + oversampling: Combining Binary Quantization (BQ) with oversampling and Matryoshka Representation Learning (MRL) maintains 99% of the original retrieval quality across the tested dataset and embedding combinations even with up to 96x compression, reducing storage space while maintaining high retrieval. It is ideal for Performance. Flexible embedding truncation: Activate MRL. dynamic It provides a balance between storage efficiency and retrieval quality by inserting truncations while minimizing loss of accuracy. No observed latency impact: Our experiments also show that using MRL has no noticeable impact on latency, supporting efficient performance even at high compression rates. For more information about how MRL works and how to implement it, please visit: MRL Support Documentation. Target vector filtering Filters can be applied specifically to the vector components of a hybrid search query. This granular control allows filters to improve the relevance of vector search results without accidentally influencing keyword-based searches. subscore Provides detailed scoring information for each recall set that contributes to the final search result. In hybrid search scenarios where multiple factors such as vector similarity and textual relevance play a role, subscores provide transparency into how each component affects the overall ranking. that Text segmentation skills By-token functionality improves your ability to process and manage large amounts of text data by splitting text based on the number of tokens. This gives you more precise control over passage (chunk) length, allowing for more targeted indexing and searching, especially for documents with extensive content. If you have any questions or would like to share your feedback, please feel free to contact us. Azure Search · Community Get started with Azure AI Search References: [1] Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., Howard-Snyder, W., Chen, K., Kakade, S., Jain, P. ., & Farhadi, A. (2024). Matryoshka expression learning. arXiv preprint arXiv:2205.13147. search location https://arxiv.org/abs/2205.13147{2205.13147} Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post PPTP and L2TP deprecation: A new era of secure connectivity next post How Microsoft listens to improve Microsoft 365 applications You may also like Galaxy S25 Ultra how finally removed me from my iPhone addiction March 20, 2025 Low Earth orbital networks are pressing for the innovation of geostationary giants March 19, 2025 AI Chattbots ‘Zero-Jnan’ may be easy victims for hackers March 18, 2025 Sevatton Dual Screen turns ad-on laptops into triple display March 17, 2025 Microprocessor market problems with market conditions and tariffs March 17, 2025 Believe Hyp about Quantum Protection: Report March 11, 2025 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.