RAFT Fine-Tuning with Azure OpenAI

Customization is key!

One of the most impactful applications of generative AI for business is creating customized natural language interfaces that use domain- and use-case-specific data to provide better, more accurate responses. That means answering questions about specific domains, such as banking, law, and healthcare.

We often talk about two ways to achieve this.

Augmented Search Generation (RAG): We store these documents in a vector database, retrieve documents at query time based on their semantic similarity to the question, and then use them as context for the LLM.
Supervised Fine-Tuning (SFT): Train a basic model based on a set of prompts and responses that represent domain-specific knowledge.

Most organizations experimenting with RAG aim to extend the knowledge base of LLMs with their internal knowledge base, but many fail to achieve the expected results without significant optimization. Likewise, curating a sufficiently large and high-quality dataset for fine-tuning can be challenging. Both approaches have limitations. Fine-tuning restricts the model to the data on which it was trained, making it susceptible to approximation and hallucination, while RAG is model-based but retrieves documents based on semantic proximity to the query. This can lead to irrelevant and inferred answers.

Ride the RAFT to the rescue!

Instead of choosing RAG or Fine-tuning, we can combine them! Think of RAG as an open-book test. The model looks for relevant documents and generates answers. Fine-tuning is like a closed test. The model relies on pre-trained knowledge. As with the test, the best results come from studying and keeping notes close.

Retrieval Aware Fine-Tuning (RAFT) is a powerful technique for preparing fine-tuning data for domain-specific open-book settings such as domain-specific RAG. It is a game changer for language models, combining the best parts of RAG and fine-tuning. RAFT helps to improve the ability to understand and use domain-specific knowledge, thereby tailoring the model to a specific domain. It is the sweet spot between RAG and domain-specific SFT.

How does it work?

RAFT has three stages:

Prepare a dataset to teach the model how to answer questions about the domain.
Fine-tune the model with a prepared dataset
Quality assessment of new custom domain adaptation models

The core of RAFT is the generation of training data, where each data point contains a question (Q), a set of documents (Dk), and a thought chain-style answer (A).

Documents are classified into Oracle documents (Do) that contain answers and Distractor documents (Di) that do not contain answers. Fine-tuning teaches the model to distinguish between these documents, producing a custom model that outperforms the original with RAG or fine-tuning alone.

We use GPT-4o to generate training data and fine-tune GPT-4o mini to create a cost-effective and fast model for our use case. This technique, called distillation, uses GPT-4o as the teacher model and 4o-mini as the student model.

In the next section of this blog, we’ll try it out for ourselves. If you’d like to follow along or see the reference code, check out: https://aka.ms/aoai-raft-workshop. We will build domain-specific models tailored to banking use cases, capable of answering questions about a bank’s online tools and accounts.

Notebook 1- Generating RAFT training data

We start by collecting domain-specific documents. In our example, this is a PDF of a banking document. To generate training data, we convert the PDF to a Markdown text format. Since the document is in PDF format and contains several tables and charts, we use GPT-4o to convert the page content to Markdown. Using Azure OpenAI GPT 4o, we extract all this information into a Markdown file that we can use for downstream processing. We then use GPT-4o (our teacher model) to generate synthetic question-document-answer triples that contain examples of “golden documents” (highly relevant) and “distractors” (misleading). This allows the model to learn to distinguish between relevant and irrelevant information. RAFT utilizes the Chain of Thought (CoT) process.By incorporating the CoT RAFT process, the model’s ability to extract information and perform logical inference is improved. This method is particularly effective for tasks that require detailed and structured thinking, as it prevents overfitting and improves training robustness.

This data is then formatted for fine-tuning and split into training, validation, and test sets. The validation data is used during training, and the test set is used to measure performance at the end.

NOTEBOOK 2- RAFT Fine Tuning

Now it’s time to teach the students! After preparing the training and validation data, the next step is to upload this data to Azure OpenAI and create a fine-tuning job. It’s surprisingly easy. Selecting the model in AI Studio, uploading the training and validation data, and setting the training parameters all take a few clicks. We’ll choose 4o-mini as the student model for training. We will show you how to upload and trigger a fine-tuning job using the SDK in your lab. The UI is an easy way to experiment, while the SDK approach is the preferred way to enable your llmops strategy for productionization and deployment in production.

Once the fine-tuning job is running, you can monitor the progress of the job and analyze the fine-tuned model in Azure OpenAI Studio when it’s complete. Finally, you can create a new deployment with the fine-tuned model, ready to use on specialized domain tasks.

Notebook 3 – Is our RAFT model really better than the base model? Let’s find out!

You can start by examining the built-in metrics returned by AI Studio, showing loss and accuracy. We want to see that accuracy is increasing while loss is decreasing.

But – we can do much more to measure the quality of our model. Remember the test dataset from the beginning? That’s why we prepared it!

There are many evaluation options, including AI Studio evaluation, but in our example, we will use the open source library RAGAS, which evaluates the RAG pipeline by metrics such as answer relevance, fidelity, answer similarity, and answer accuracy. These metrics evaluate the quality and accuracy of the answers generated using LLM as a judge or using an embedding model.

gpt4o-mini vs gpt4o-mini-raft

You can probably improve the metrics further by tuning the training parameters or generating additional training data to improve the model metrics.

Are you ready to get started?

approval:

This hands-on lab is heavily inspired by the excellent blogs of Cedric Vidal and Suraj Subramanian: , and the reference implementation:
Thanks to Liam Cavanagh for the inspiration for this. Convert page content to Markdown using GPT-4o
Credit Cedric Vidal for helping me review the blog and the wraps

References:

Source link

Our Company

About Links

Useful Links

Newsletter

Laest News

RAFT Fine-Tuning with Azure OpenAI

Managing your organization’s brand with the SharePoint brand center

Exciting Real Estate Sales Executive Opportunity at ANG HOMES PRIVATE LIMITED in Noida

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News