Using Structured Outputs in Azure OpenAI’s GPT-4o for consistent document data processing by info.odysseyx@gmail.com October 3, 2024 written by info.odysseyx@gmail.com October 3, 2024 0 comment 6 views 6 Well organized pile of documents on table When using language models for AI-based document processing, ensuring the reliability and consistency of data extraction is critical for downstream processing. This article outlines how GPT-4o’s structured output capabilities provide the most reliable and cost-effective solution to these challenges. To run immediately and use structured output for document processing: Try out the Python samples for yourself on GitHub.. Key challenges to consistency in generating structured output ISVs and startups building document data extraction solutions are struggling to overcome the complexities of ensuring that language models produce consistent output inline according to a defined schema. These key challenges include: Limitations of inline JSON output. Although some models have introduced the ability to generate JSON output, there are still inconsistencies in the output. The language model may produce responses that do not exactly follow the provided schema. Additional prompt engineering is required to resolve this. Complexity of prompts. Including a detailed inline JSON schema within the prompt increases the number of input tokens consumed. This is especially problematic when you have large and complex output structures. Benefits of using the structured output feature in GPT-4o in Azure OpenAI To overcome the limitations and inconsistencies of inline JSON output, GPT-4o’s structured output supports the following features: Strict schema compliance. Structured output dynamically limits the model’s output to conform to the JSON schema provided in the response format of the request to GPT-4o. This ensures that the response is always formatted correctly for downstream processing. Reliability and consistency. Developers can use additional libraries such as Pydantic combined with structured output to define exactly how to constrain data to specific models. This minimizes post-processing and data validation. Cost optimization. Unlike inline JSON schemas, structured output is not included in the total number of input tokens used in a request to GPT-4o. This provides more complete input tokens for the document data. Let’s take a closer look at using structured output with document processing. Understand structured output from document processing Introduced in September 2024 Structured output features for Azure OpenAI GPT-4o models We used a class model and JSON schema to provide the flexibility our requests needed to produce consistent output. For document processing, this allows for a more efficient approach to structured data extraction and document classification. This is especially useful when building document processing pipelines. GPT-4o leverages the JSON schema format to limit the generated output to the JSON structure that matches every request. You can then easily deserialize these JSON structures into model objects that can be easily processed by other services or systems. This eliminates potential errors that often occur due to inline JSON structures being misinterpreted by the language model. Implement consistent output using GPT-4o in Python To maximize and simplify schema creation using Python, Pydantic is an ideal support library for building class models to define the desired output structure. Pydantic generates the required JSON schema required for requests and provides built-in schema generation for data validation. Below is an example of extracting data from an invoice using structured output to demonstrate the power of complex class structures. from typing import Optional from pydantic import BaseModel class InvoiceSignature(BaseModel): type: Optional[str] name: Optional[str] is_signed: Optional[bool] class InvoiceProduct(BaseModel): id: Optional[str] description: Optional[str] unit_price: Optional[float] quantity: Optional[float] total: Optional[float] reason: Optional[str] class Invoice(BaseModel): invoice_number: Optional[str] purchase_order_number: Optional[str] customer_name: Optional[str] customer_address: Optional[str] delivery_date: Optional[str] payable_by: Optional[str] products: Optional[list[InvoiceProduct]] returns: Optional[list[InvoiceProduct]] total_product_quantity: Optional[float] total_product_price: Optional[float] product_signatures: Optional[list[InvoiceSignature]] returns_signatures: Optional[list[InvoiceSignature]] Once you have a well-defined model, making a request to the Azure OpenAI Chat completion endpoint is as simple as providing the model in the request’s response form. This is explained below in the request to extract data from an invoice. completion = openai_client.beta.chat.completions.parse( model="gpt-4o", messages=[ { "role": "system", "content": "You are an AI assistant that extracts data from documents.", }, { "role": "user", "content": f"""Extract the data from this invoice. - If a value is not present, provide null. - Dates should be in the format YYYY-MM-DD.""", }, { "role": "user", "content": document_markdown_content, } ], response_format=Invoice, max_tokens=4096, temperature=0.1, top_p=0.1 ) Best practices for leveraging structured output for document data processing Schema/model design. We use well-defined names for nested objects and properties to make it easier to interpret how the GPT-4o model extracts this key information from the document. Be specific about your terms so that the model determines the correct value for the field. Take advantage of rapid engineering. Continue to use input prompts to give your model direct instructions on how to work with the provided document. For example, it includes definitions for domain jargon, acronyms, and synonyms that may be present in the document type. Use a library that generates JSON schema. Libraries like Pydantic for Python make it easier to focus on building models and validating data without the complexities of understanding how to convert or build a JSON schema from scratch. Combined with GPT-4o vision function. Processing document pages as images in requests to GPT-4o using structured output can achieve higher accuracy and cost-effectiveness compared to processing document text alone. summation Leveraging structured output in GPT-4o on Azure OpenAI provides the solution you need to ensure consistent and reliable output when processing documents. This feature minimizes the potential for errors, reduces post-processing requirements, and optimizes token usage by conforming to the JSON schema. Key recommendations to exclude from these guidelines include: Evaluate the structured output for your use case.. We have provided a collection of samples on GitHub to guide you through potential scenarios, including extraction and classification. To evaluate the effectiveness of the technique, modify these samples to fit the needs of your specific document type. Get samples from GitHub. Exploring this approach can help you further streamline your document processing workflow, increasing developer productivity and end-user satisfaction. Learn more about document processing with Azure AI. Thank you for taking the time to read this article. We’re sharing insights into ISVs and startups that are enabling document processing in AI-based solutions based on real-world problems we face. We encourage you to continue learning with additional insights in this series. Further reading Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Explore Exciting Product Management Career Opportunities at Altcase Mumbai for Aspiring Professionals next post RAG on PDF with text and embedded Images, with citations referencing image answering user query You may also like AI search threatens digital economy, researcher warns November 12, 2024 Qualcomm has an ‘AI-first’ vision for the future of smart devices November 11, 2024 AMD is moving fast in AI, may join forces with Intel November 11, 2024 A New Dawn of Software Defined Networking (SDN) in Windows Server 2025 November 5, 2024 Get AI ready: Empowering developers in the era of AI November 5, 2024 Announcing the General Availability of Windows Server IoT 2025! November 5, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.