RAG on PDF with text and embedded Images, with citations referencing image answering user query

In today’s era of Generative AI, customers can gain valuable insights from unstructured or structured data to create business value. By infusing AI into existing or new products, customers can create powerful applications that bring the power of AI to their users. for these people Generating AI Efficiently implement applications for working with customer data piece (Creating Search Augmentation) solutions are key to ensuring that LLM is provided with the right context of data based on user queries.

The customer has a PDF document with text and embedded pictures, which can be images or diagrams containing valuable information that they want to use as context for the LLM to answer specific user queries. Parsing these PDFs to implement an efficient RAG solution is challenging, especially if customers want to maintain the relationship between the text and extracted image context used to answer user queries. Additionally, if the images are not extracted and searchable, referencing them as part of a quote that answers a user query is also difficult. This blog post addresses the problem of extracting PDF content containing text and images as part of a RAG solution. Here, the relationship between the searchable text context and the extracted image is maintained so that the image can be retrieved as a reference within the citation. .

Below we outline a simple architecture for building RAG applications on PDF data. Here, image content extracted within the PDF is also searchable as part of the LLM output as part of the cited reference.

Solution Overview

Azure OpenAI service Provides REST API access to OpenAI’s powerful language models, including GPT-4o, GPT-4o mini, GPT-4 Turbo with Vision, GPT-4, GPT-3.5-Turbo, and the Embeddings model series. These models can be easily applied to specific tasks, including but not limited to content generation, summarization, image understanding, semantic retrieval, and natural language-to-code translation.

Azure AI Search Provides large-scale secure information discovery of user-owned content across existing and generative AI search applications. Information retrieval is fundamental to any app that displays text and vectors. Typical scenarios include catalog or document searches, data exploration, and providing more and more query results in prompts based on proprietary underlying data for conversation and co-pilot discovery.

Azure Blob storage Microsoft’s object storage solution for the cloud. Blob storage is optimized for storing large amounts of unstructured data. Unstructured data is data that does not follow a specific data model or definition, such as text or binary data.

Azure Functions A serverless solution that lets you write less code, maintain less infrastructure, and save money. Instead of worrying about deploying and maintaining servers, cloud infrastructure provides all the up-to-date resources you need to keep your applications running.

The solution leverages Azure OpenAI models for text generation and embedding, Azure AI Search for data-driven information discovery, and storage of raw PDF files and prepared data leveraged by Azure AI Search for efficient data discovery. Utilizes Azure Blob and utilizes Azure Functions. Used as a serverless component to prepare data to populate the Azure AI Search index.

Figure 1: Document data management

The document data management flow works as follows:

Raw PDF document files are uploaded to Azure Blob Storage.
Event triggers in Azure Blob call Azure functions to split large PDFs, extract text chunks, and map images to those text chunks.
After the Azure function prepares the data, it uploads the prepared data back to Azure Blob storage.
The index scheduler is then called to start the indexing process on the prepared data.
Prepared data is searched in Azure Blobs using Azure AI Search.
Azure AI Search vectorizes text using the Azure OAI embedding model to process chunks of text in parallel.
Azure AI Search indexes are populated with prepared data and vectorized chunks. We also use custom index fields to map related images to corresponding chunks of text.

Arch OverviewOYD-v2.png

Figure 2: Application runtime

The application runtime flow works as follows:

A user makes a query request through a client-side application.
The server-side AI chatbot application forwards the user’s query to Azure OAI. Note: This step is an ideal point to implement controls such as safety measures using: Azure AI content safety service.
Based on the user’s query, Azure OAI asks Azure AI Search to search for relevant text and images. Specifically, the responsibility for making requests to Azure AI Search moves from application code to the AIAO service itself.
Using the user’s query and relevant text retrieved from Azure AI Search, AIAO generates a response.
AIAO returns generated responses and associated metadata (e.g. citation data) to the server-side AI chatbot application.
The server-side AI chatbot application remaps the response data to generate a payload containing text and image URLs. This step is another great point to implement additional control before sending the payload back to the client-side application.
The server-side AI chatbot application sends responses to the user’s queries back to the client-side application.
The client-side application displays the generated response text, downloads the image from Azure Blob, and renders it in the user interface.

Note: Steps 9a and 9b are conceptual components of the reference architecture, but are not currently part of the deployable artifact. We welcome your feedback and could potentially expand our implementation to include these steps.

Figure 3: Azure Blob directory and file structure

Directory and file structures serve the following main purposes:

Azure feature: Retrieves raw PDF files and re-uploads prepared data. Event triggers are configured to receive events from: raw data directory.
Azure AI Search: Download prepared data to populate your index. Azure AI Search data sources are configured to retrieve data from: prepared_data directory.

Deployment and implementation details

In this section, we’ll take a closer look at the solution’s prerequisites and specific features. Let’s start with implementation and then outline the deployment process.

Implementation details

Before Azure AI Search can index raw PDF documents, they must first be preprocessed by Azure Functions. This function is configured to listen to Azure Blob Storage events and is triggered whenever a new document is uploaded. This function does the following:

Split a single PDF into chunks of text.: This involves breaking the PDF document into smaller chunks of text.
Create a JSON file.: Text chunks are organized into JSON files, which are then uploaded back to Azure Blob Storage. Each element in the JSON array represents a chunk of text.
Image extraction and mapping: Images from PDF are extracted and mapped to corresponding text chunks. In particular, images on a particular PDF page are linked to chunks of text on the same page.

Once the data is ready, the Azure AI Search indexer is activated to handle the actual ingestion and index population. During this process, Azure AI Search technology is used to map data to fields defined in the index. Once indexing is complete, the collected data is mapped to specified fields and ready to run queries.

deployment

After you have a comprehensive understanding of how this solution leverages Azure Functions and Azure AI Search, the next step is to deploy the solution and explore a demo application. This allows you to see the actual implementation and understand the real-world applications.

To deploy your solution, see: GitHub repositoryFollow the steps provided and complete the following sections. prerequisites and deployment.

Extend your deployment using your own documents

The provided repository contains small snippets of Azure AKS documentation, giving you a glimpse into the end-user experience. However, you may want to try it with your own PDF document. section Extend your distribution with your own articles It was created with this in mind. This provides a quick and simple way to integrate documents into your solution and start querying using an already working demo application.

By following these steps, you can use Azure AI Search to efficiently manage and query PDF documents to ensure a smooth and effective search experience.

organize

After testing your solution, you can delete the deployment to clean up all Azure resources created. To do this, follow these steps: cleanup part time job.

conclusion

This post demonstrated how to build a Retrieval-Augmented Generation (RAG) application with your own data using Azure OpenAI and Azure AI Search. By offloading AI search communications to Azure OpenAI, this solution not only enhances text-based queries, but also provides a powerful way to identify and retrieve relevant images based on user queries. This feature enriches query responses with relevant visual content whenever possible.

If you have feedback, questions, or suggestions to help us improve this solution, please submit them through our GitHub repository. We welcome all comments and look forward to hearing from you.

Source link

Solution Overview

Deployment and implementation details

Implementation details

deployment

Extend your deployment using your own documents

organize

conclusion

Our Company

About Links

Useful Links

Newsletter

Laest News

RAG on PDF with text and embedded Images, with citations referencing image answering user query

Solution Overview

Deployment and implementation details

Implementation details

deployment

Extend your deployment using your own documents

organize

conclusion

Using Structured Outputs in Azure OpenAI’s GPT-4o for consistent document data processing

Microsoft SharePoint Roadmap Pitstop September 2024 Microsoft 365

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News