Using Structured Outputs in Azure OpenAI’s GPT-4o for consistent document data processing

Well organized pile of documents on table

When using language models for AI-based document processing, ensuring the reliability and consistency of data extraction is critical for downstream processing.

This article outlines how GPT-4o’s structured output capabilities provide the most reliable and cost-effective solution to these challenges.

To run immediately and use structured output for document processing: Try out the Python samples for yourself on GitHub..

Key challenges to consistency in generating structured output

ISVs and startups building document data extraction solutions are struggling to overcome the complexities of ensuring that language models produce consistent output inline according to a defined schema. These key challenges include:

Limitations of inline JSON output. Although some models have introduced the ability to generate JSON output, there are still inconsistencies in the output. The language model may produce responses that do not exactly follow the provided schema. Additional prompt engineering is required to resolve this.
Complexity of prompts. Including a detailed inline JSON schema within the prompt increases the number of input tokens consumed. This is especially problematic when you have large and complex output structures.

Benefits of using the structured output feature in GPT-4o in Azure OpenAI

To overcome the limitations and inconsistencies of inline JSON output, GPT-4o’s structured output supports the following features:

Strict schema compliance. Structured output dynamically limits the model’s output to conform to the JSON schema provided in the response format of the request to GPT-4o. This ensures that the response is always formatted correctly for downstream processing.
Reliability and consistency. Developers can use additional libraries such as Pydantic combined with structured output to define exactly how to constrain data to specific models. This minimizes post-processing and data validation.
Cost optimization. Unlike inline JSON schemas, structured output is not included in the total number of input tokens used in a request to GPT-4o. This provides more complete input tokens for the document data.

Let’s take a closer look at using structured output with document processing.

Understand structured output from document processing

Introduced in September 2024 Structured output features for Azure OpenAI GPT-4o models We used a class model and JSON schema to provide the flexibility our requests needed to produce consistent output.

For document processing, this allows for a more efficient approach to structured data extraction and document classification. This is especially useful when building document processing pipelines.

GPT-4o leverages the JSON schema format to limit the generated output to the JSON structure that matches every request. You can then easily deserialize these JSON structures into model objects that can be easily processed by other services or systems. This eliminates potential errors that often occur due to inline JSON structures being misinterpreted by the language model.

Implement consistent output using GPT-4o in Python

To maximize and simplify schema creation using Python, Pydantic is an ideal support library for building class models to define the desired output structure. Pydantic generates the required JSON schema required for requests and provides built-in schema generation for data validation.

Below is an example of extracting data from an invoice using structured output to demonstrate the power of complex class structures.

from typing import Optional
from pydantic import BaseModel


class InvoiceSignature(BaseModel):
    type: Optional[str]
    name: Optional[str]
    is_signed: Optional[bool]


class InvoiceProduct(BaseModel):
    id: Optional[str]
    description: Optional[str]
    unit_price: Optional[float]
    quantity: Optional[float]
    total: Optional[float]
    reason: Optional[str]


class Invoice(BaseModel):
    invoice_number: Optional[str]
    purchase_order_number: Optional[str]
    customer_name: Optional[str]
    customer_address: Optional[str]
    delivery_date: Optional[str]
    payable_by: Optional[str]
    products: Optional[list[InvoiceProduct]]
    returns: Optional[list[InvoiceProduct]]
    total_product_quantity: Optional[float]
    total_product_price: Optional[float]
    product_signatures: Optional[list[InvoiceSignature]]
    returns_signatures: Optional[list[InvoiceSignature]]

Once you have a well-defined model, making a request to the Azure OpenAI Chat completion endpoint is as simple as providing the model in the request’s response form. This is explained below in the request to extract data from an invoice.

completion = openai_client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are an AI assistant that extracts data from documents.",
        },
        {
            "role": "user",
            "content": f"""Extract the data from this invoice. 
            - If a value is not present, provide null.
            - Dates should be in the format YYYY-MM-DD.""",
        },
        {
            "role": "user",
            "content": document_markdown_content,
        }
    ],
    response_format=Invoice,
    max_tokens=4096,
    temperature=0.1,
    top_p=0.1
)

Best practices for leveraging structured output for document data processing

Schema/model design. We use well-defined names for nested objects and properties to make it easier to interpret how the GPT-4o model extracts this key information from the document. Be specific about your terms so that the model determines the correct value for the field.
Take advantage of rapid engineering. Continue to use input prompts to give your model direct instructions on how to work with the provided document. For example, it includes definitions for domain jargon, acronyms, and synonyms that may be present in the document type.
Use a library that generates JSON schema. Libraries like Pydantic for Python make it easier to focus on building models and validating data without the complexities of understanding how to convert or build a JSON schema from scratch.
Combined with GPT-4o vision function. Processing document pages as images in requests to GPT-4o using structured output can achieve higher accuracy and cost-effectiveness compared to processing document text alone.

summation

Leveraging structured output in GPT-4o on Azure OpenAI provides the solution you need to ensure consistent and reliable output when processing documents. This feature minimizes the potential for errors, reduces post-processing requirements, and optimizes token usage by conforming to the JSON schema.

Key recommendations to exclude from these guidelines include:

Evaluate the structured output for your use case.. We have provided a collection of samples on GitHub to guide you through potential scenarios, including extraction and classification. To evaluate the effectiveness of the technique, modify these samples to fit the needs of your specific document type. Get samples from GitHub.

Exploring this approach can help you further streamline your document processing workflow, increasing developer productivity and end-user satisfaction.

Learn more about document processing with Azure AI.

Thank you for taking the time to read this article. We’re sharing insights into ISVs and startups that are enabling document processing in AI-based solutions based on real-world problems we face. We encourage you to continue learning with additional insights in this series.

Key challenges to consistency in generating structured output

Benefits of using the structured output feature in GPT-4o in Azure OpenAI

Understand structured output from document processing

Implement consistent output using GPT-4o in Python

Best practices for leveraging structured output for document data processing

summation

Learn more about document processing with Azure AI.

Further reading

Our Company

About Links

Useful Links

Newsletter

Laest News

Using Structured Outputs in Azure OpenAI’s GPT-4o for consistent document data processing

Key challenges to consistency in generating structured output

Benefits of using the structured output feature in GPT-4o in Azure OpenAI

Understand structured output from document processing

Implement consistent output using GPT-4o in Python

Best practices for leveraging structured output for document data processing

summation

Learn more about document processing with Azure AI.

Further reading

Explore Exciting Product Management Career Opportunities at Altcase Mumbai for Aspiring Professionals

RAG on PDF with text and embedded Images, with citations referencing image answering user query

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News