Home NewsX Nearly 100x Compression with Minimal Quality Loss

Nearly 100x Compression with Minimal Quality Loss

by info.odysseyx@gmail.com
0 comment 17 views


As part of our ongoing effort to provide developers and organizations with advanced search tools, we’re Latest preview API for Azure AI Search. These enhancements are designed to optimize vector index size and provide more granular control and understanding of which search index to build. Search Augmentation Generation (RAG) Application.

Matryoshka Representation Learning (MRL) It is a new technique that introduces another form of vector compression that complements existing quantization methods and operates independently. MRL provides the flexibility to truncate embeddings without semantic loss, providing a balance between vector size and information preservation.

This technique works by training an embedding model so that the information density increases towards the beginning of the vector. As a result, much of the key information is preserved even when only a prefix of the original vector is used, allowing shorter vector representations without performance penalty.

Open AI We’ve integrated MRL into the ‘text-embedding-3-small’ and ‘text-embedding-3-large’ models, adapting them for use in scenarios that require compressed embeddings while maintaining high retrieval accuracy. You can read more about the underlying research in the formula. paper [1] Or learn about the latest OpenAI embedding models. blog.

Storage Compression Comparison

Table 1.1 below highlights different configurations for vector compression by comparing standard uncompressed vectors. Scalar Quantization (SQ)and Binary Quantization (BQ) With or without MRL. Compression ratio shows how efficiently vector index size can be optimized to achieve significant cost savings. More information about vector index size limits can be found here. Service limits for tiers and SKUs – Azure AI Search | microsoft run.

Table 1.1: Vector index size compression comparison

composition

*Compression ratio

uncompressed

square

4 times

B.Q.

28 times

**MRL + SQ (1/2 and 1/3 truncated dimensions respectively)

8x-12x

**MRL + BQ (1/2 and 1/3 truncated dimensions respectively)

64x – 96x

memo: Compression ratio depends on embedding size and truncation. For example, using “text-embedding-3-large”, where 3072 dimensions are truncated to 1024 dimensions, binary quantization allows for 96x compression.

*All compression methods listed above may result in slightly lower compression ratios due to the overhead caused by the index data structure. see “Memory overhead of the selected algorithm” For more information

**The impact on compression when using MRL depends on the cut dimension values. To maintain embedding quality, we recommend using 1/2 or 1/3 the original size (see below).

Quality Maintenance Table:

Table 1.2 provides a detailed look at quality maintenance when using MRL with quantization across different models and configurations. Results show the impact on the average. NDCG@10 Across subsets MTEB dataset, We show that even high levels of compression can maintain retrieval quality up to 99%, especially for BQ and MRL.

Table 1.2: Impact of MRL on average NDCG@10 across MTEB subsets

model name

original dimensions

MRL Dimension

Quantization Algorithm

No reranking (%Δ)

2x oversampling reranking (%Δ)

OpenAI Text Embedding-3-Small

1536

512

square

-2.00% (Δ = 1.155)

-0.0004% (Δ = 0.0002)

OpenAI Text Embedding-3-Small

1536

512

B.Q.

-15.00% (Δ = 7.5092)

-0.11% (Δ = 0.0554)

OpenAI Text Embedding-3-Small

1536

768

square

-2.00% (Δ = 0.8128)

-1.60% (Δ = 0.8128)

OpenAI Text Embedding-3-Small

1536

768

B.Q.

-10.00% (Δ = 5.0104)

-0.01% (Δ = 0.0044)

OpenAI Text Embedding-3-Large

3072

1024

square

-1.00% (Δ = 0.616)

-0.02% (Δ = 0.0118)

OpenAI Text Embedding-3-Large

3072

1024

B.Q.

-7.00% (Δ = 3.9478)

-0.58% (Δ = 0.3184)

OpenAI Text Embedding-3-Large

3072

1536

square

-1.00% (Δ = 0.3184)

-0.08% (Δ = 0.0426)

OpenAI Text Embedding-3-Large

3072

1536

B.Q.

-5.00% (Δ = 2.8062)

-0.06% (Δ = 0.0356)

Table 1.2 compares the relative point difference in average NDCG@10 when using different MRL dimensions of the uncompressed index (1/3 and 1/2 of the original dimension) across OpenAI text embedding models.

Key Takeaways:

  • 99% search quality with BQ + MRL + oversampling: Combining Binary Quantization (BQ) with oversampling and Matryoshka Representation Learning (MRL) maintains 99% of the original retrieval quality across the tested dataset and embedding combinations even with up to 96x compression, reducing storage space while maintaining high retrieval. It is ideal for Performance.
  • Flexible embedding truncation: Activate MRL. dynamic It provides a balance between storage efficiency and retrieval quality by inserting truncations while minimizing loss of accuracy.
  • No observed latency impact: Our experiments also show that using MRL has no noticeable impact on latency, supporting efficient performance even at high compression rates.

For more information about how MRL works and how to implement it, please visit: MRL Support Documentation.

Target vector filtering Filters can be applied specifically to the vector components of a hybrid search query. This granular control allows filters to improve the relevance of vector search results without accidentally influencing keyword-based searches.

subscore Provides detailed scoring information for each recall set that contributes to the final search result. In hybrid search scenarios where multiple factors such as vector similarity and textual relevance play a role, subscores provide transparency into how each component affects the overall ranking.

that Text segmentation skills By-token functionality improves your ability to process and manage large amounts of text data by splitting text based on the number of tokens. This gives you more precise control over passage (chunk) length, allowing for more targeted indexing and searching, especially for documents with extensive content.

If you have any questions or would like to share your feedback, please feel free to contact us. Azure Search · Community

Get started with Azure AI Search

References:
[1] Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., Howard-Snyder, W., Chen, K., Kakade, S., Jain, P. ., & Farhadi, A. (2024). Matryoshka expression learning. arXiv preprint arXiv:2205.13147. search location https://arxiv.org/abs/2205.13147
{2205.13147}





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX