Home NewsX Azure Logic Apps & AI Search for Source Document Processing

Azure Logic Apps & AI Search for Source Document Processing

by info.odysseyx@gmail.com
0 comment 8 views


Credit: Technical content by Gia Mondragon, Principal PM, Azure AI Search

introduction

when working together Search-Augmented Generation (RAG) Applications, retrievers, etc. Azure AI SearchPlays an important role in ensuring that language models obtain the most relevant results to provide responses to end users. As a key component, it is essential to store data representations that are semantically similar to specific user searches, such as vectors. vector and Hybrid Search. Parsing, chunking, vectorizing data, and storing it in an index is handled by the Azure AI Search feature. Integrated vectorization. for Supported data sourcesThis feature also enables automated data collection, enrichment, and processing.

However, there are numerous data sources that are not directly integrated with AI Search but can be accessed through various connectors available at: Azure Logic App. Azure Logic Apps has introduction What’s new This facilitates all the steps required to process documents in connectors for unstructured data. Now data extraction, file import, data parsing, chunking, vectorization, and data indexing with Azure AI Search are all streamlined into one integrated flow. additionally, Azure Logic App now proposal template Use predefined indexing workflows for RAG-enabled AI search indexes to simplify the creation of these workflows for high-demand connectors. Some of these templates include indexing pipelines for files in SharePoint Online, Azure Files, SFTP, and more.

How to get started

Prerequisites:

  • Data sources that support unstructured data Azure Logic Apps connector
  • Azure Logic App (Workflow Service Plan)
  • Azure AI Search Service
  • Azure OpenAI service with text embedding model deployed
  • Azure Logic App built-in template. So you don’t have to create your own workflow. You can also create your own. However, this is not covered in any part of this blog post. This tutorial shows you how to index the files you added. After creating a workflow to Azure Files share

Create an Azure AI Search index

Currently, this integration requires an index created in Azure AI Search using at least the following schema: Later in this article, we’ll explain how to update the workflow to map more fields to each document chunk accordingly.

Azure AI Search Index: Minimum schema required for this integration

memo:The sample index definition below contains a vector field with dimensions 3072, corresponding to the Azure OpenAI Text-Included-3-Large model. If you use other Azure OpenAI embedded models or other dimensions, you must adjust your index definition accordingly before creating the index.

{ 

  "name": "chunked-index", 

  "fields": [ 

    { 

      "name": "id", 

      "type": "Edm.String", 

      "searchable": true, 

      "retrievable": true, 

      "key": true 

  }, 

    { 

      "name": "documentName", 

      "type": "Edm.String", 

      "searchable": true, 

      "retrievable": true 

    }, 

    { 

      "name": "content", 

      "type": "Edm.String", 

      "searchable": true, 

      "retrievable": true 

    }, 

    { 

      "name": "embeddings", 

      "type": "Collection(Edm.Single)", 

      "searchable": true, 

      "filterable": false, 

      "retrievable": true, 

      "dimensions": 3072, 

      "vectorSearchProfile": "vector-profile" 

    } 

  ], 

  "vectorSearch": { 

    "algorithms": [ 

      { 

        "name": "vector-config", 

        "kind": "hnsw", 

        "hnswParameters": { 

          "metric": "cosine", 

          "m": 4, 

          "efConstruction": 400, 

          "efSearch": 500 

        }, 

        "exhaustiveKnnParameters": null 

      } 

    ], 

    "profiles": [ 

      { 

        "name": "vector-profile", 

        "algorithm": "vector-config" 

      } 

    ] 

  } 

} 

Azure AI Search index: vectorization at query time

If you need Azure AI Search to vectorize data at query time, you can use the following JSON definition for the index instead of doing this on the orchestrator end of your RAG application. You must check it out Azure OpenAI endpoint changes And change for you. also, Create a service managed identity For AI search service Follow the instructions allocate Cognitive service OpenAI usersRole in the Azure OpenAI service.

{ 

  "name": "chunked-index", 

  "fields": [ 

    { 

      "name": "id", 

      "type": "Edm.String", 

      "searchable": true, 

      "retrievable": true, 

      "key": true 

  }, 

    { 

      "name": "documentName", 

      "type": "Edm.String", 

      "searchable": true, 

      "retrievable": true 

    }, 

    { 

      "name": "content", 

      "type": "Edm.String", 

      "searchable": true, 

      "retrievable": true 

       

    }, 

    { 

      "name": "embeddings", 

      "type": "Collection(Edm.Single)", 

      "searchable": true, 

      "filterable": false, 

      "retrievable": true, 

      "dimensions": 3072, 

      "vectorSearchProfile": "vector-profile" 

    } 

  ], 

  "vectorSearch": { 

    "algorithms": [ 

      { 

        "name": "vector-config", 

        "kind": "hnsw", 

        "hnswParameters": { 

          "metric": "cosine", 

          "m": 4, 

          "efConstruction": 400, 

          "efSearch": 500 

        }, 

        "exhaustiveKnnParameters": null 

      } 

    ], 

    "profiles": [ 

      { 

        "name": "vector-profile", 

        "algorithm": "vector-config", 

        "vectorizer": "azureOpenAI-vectorizer" 

 

      } 

    ], 

        "vectorizers": [ 

      { 

        "name": "azureOpenAI-vectorizer", 

        "kind": "azureOpenAI", 

        "azureOpenAIParameters": { 

          "resourceUri": "https://.openai.azure.com", 

          "deploymentId": "text-embedding-3-large", 

          "modelName": "text-embedding-3-large" 

        } 

    ], 

 

  } 

} 

Create an index from JSON in the Azure portal

Here’s how to create an index in the Azure portal using the above JSON template:

  • Go to AI Search Service Search Management -> index And click Add index and choose Add index (JSON) from the drop-down menu.

Figure 1 - Create an Azure AI Search index from JSON using the Azure portalFigure 1 – Create an Azure AI Search index from JSON using the Azure portal

  • Delete the JSON structure that appears on the right, and copy and paste the JSON template above into the right canvas as needed. Please click get.

Figure 2 - Copy the JSON template provided in this tutorial to create the index.Figure 2 – Copy the JSON template provided in this tutorial to create the index.

  • The index created with the template is as follows: chunk index This example uses this as the target index.

Import data from unstructured data sources using Azure logic app workflow templates

  • Go to your logic app resource and click Next. workflow > Workflow And click +Add > Add from template

Figure 3 - Adding a workflow from a template in Azure Logic AppFigure 3 – Adding a workflow from a template in Azure Logic App

  • Look for it Azure AI Search In the search bar, select a template that matches your data source. Using the Azure Logic App support connector for unstructured data, you should be able to import data into AI Search using the same chunking and inclusion patterns, but you will need to modify the workflow as needed.
  • In this case, I’ll select “.Azure Files: Azure OpenAI and Azure AI Search – Collect and index documents on a schedule using the RAG pattern.

Figure 4 - Select Azure Files RAG templateFigure 4 – Select Azure Files RAG template

  • choose Use this template

Figure 5 - Review and select workflowFigure 5 – Review and select workflow

  • Select and enter a name for your workflow. Please click next.

Figure 6 - Naming the workflowFigure 6 – Naming the workflow

  • Please click connect For each connection in the template configuration, add an existing endpoint corresponding to the data source (in this case, Azure Files, Azure AI Search service, and AOAI service).

Figure 7 - Connector connection configurationFigure 7 – Connector connection configuration

Examples of each connection configuration are as follows: Make sure you have minimal roles. contributor Access resources to establish a connection.

For Azure Files connections: The Azure Storage account URI is located under Azure Storage Account. Settings > Endpoints > File Services And the domain is .file.core.windows.net. You can find the connection string in your storage account. Security + Network > Keys > Connection String.

Copy the URI Storage account URI The configuration and connection string for that field.

Figure 8 - Azure Files connection configurationFigure 8 – Azure Files connection configuration

For Azure AI Search connections: The Azure AI Search endpoint URL is located under the AI ​​Search service. Overview > Essentials > URL And the domain is .search.windows.net.

If you have an admin key in your settings, you can find it in the AI ​​Search service. Settings > Keys > Default Admin Keys.

Figure 9 - AI Search ConnectionFigure 9 – AI Search Connection

For Azure OpenAI connectivity: The Azure OpenAI endpoint URL is located under the Azure OpenAI service. resource management > Keys and Endpoints > endpoint The suffix domain is .openai.azure.com. Copy Key1 for key setup and copy the authentication key configuration.

Figure 10 - Azure OpenAI connectionFigure 10 – Azure OpenAI connection

  • After configuring all service connections next.

Figure 11 - Connection configuration completedFigure 11 – Connection configuration completed

Enter the following indexing configuration details: Assume:

  • You have already created an index using one of the templates above.
  • The AOAI deployment instance has an Azure OpenAI embedding model named “text-embeddings-3-large”.

Indexing workflow configuration details:

  • AISearch Index Name: The name of the index you created as part of this tutorial.
  • OpenAI text embedding deployment identifier = text-embedding-3-large. This is the name of your Azure OpenAI embedded model deployment. This is the embedded model deployment name (not the model name – in this case they are the same).

Figure 12 - Azure OpenAI embedding model nameFigure 12 – Azure OpenAI embedding model name

  • Azure Files storage folder name: The name of the Azure Files file share where the files are located.

Figure 13 - Azure Files share nameFigure 13 – Azure Files share name

  • click next Then make.

Figure 14 - Review and create workflow detailsFigure 14 – Review and create workflow details

  • Please click Go to My Workflows Wait for the initial run to complete. This is scheduled to trigger to check for new files added to the Azure Files share. After adding new files to your configured file share, you need to ensure that those files are reflected in the AI ​​Search index.
  • As soon as you have the initial vectorized data in your index, you can use the index from this tutorial to chat with the data in your favorite RAG orchestrator, such as: Azure AI Studio.
  • To use the Azure AI Search index Azure AI Studio go Project Playground > Chat > ​​Add data > Add new data source and Follow the instructions Set up the index.
  • memo: If you created an index with minimal JSON configuration in this tutorial, you should follow the instructions exactly in the Azure AI Studio documentation found here. However, if you used the option to add the Index Vectorization tool, you will need to remove the Vector option from your AI Studio configuration because the index is tied directly to the embedding model.

Figure 15 - Azure AI Studio chat playground "Add your data"Figure 15 – Azure AI Studio Chat Playground “Add Data”

Additional considerations

For optimal AI search relevance, we recommend using a hybrid approach that combines vector and keyword search with a semantic ranker. This method is generally more effective for many use cases. For more information, see:Azure AI Search outperforms vector search through hybrid..

This case focuses specifically on fixed chunk and text-only scenarios.

Start building RAG applications in low code using Azure AI Search and Azure Logic Apps.





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX