Fine-Tuning Llama 3.1 8B on Azure AI Serverless with LoRA and RAFT, why it’s so easy & cost efficient

The Future of AI: LLM Distillation Made Easier

Part 2 – Fine-tuning Llama 3.1 8B on Azure AI Serverless

How Azure AI Serverless Fine-tuning, LoRA, RAFT, and AI Python SDK Simplify Fine-tuning of Domain-Specific Models. (🚀🔥 Github Recipe Repo).

By Cedric VidalChief AI Advocate at Microsoft

Part of The Future of AI 🚀 series Started by Marco Casalaina With his Exploring Multi-Agent AI Systems Blog post.

Cedric Vidal_0-1726692301841.png

Fine-tuning settings for AI-based engines generated using Azure OpenAI DALL-E 3

In previous blog posts, we looked at: Llama 3.1 405B with RAFT Generate a synthetic dataset. Today, we will learn how to fine-tune the Llama 3.1 8B model with the dataset we generated. This post walks you through a simplified fine-tuning process using Azure AI Fine-Tuning as a Service, emphasizing ease of use and cost-effectiveness. We also explain what LoRA is and why combining RAFT and LoRA offers unique benefits for efficient and low-cost model customization. Finally, we provide practical, step-by-step code examples to help you apply these concepts to your own projects. > The concepts and source code mentioned in this post are Github Recipe Repo.

Azure AI takes the complexity out of the equation. Gone are the days when setting up GPU infrastructure, configuring Python frameworks, and mastering model fine-tuning techniques were hurdles. With Azure Serverless Fine-Tuning, you can bypass all that hassle entirely. Simply upload your dataset, adjust a few hyperparameters, and start the fine-tuning process. This ease of use democratizes AI development, making it accessible to a wider range of users and organizations.

Why Azure AI Serverless Fine-Tuning is a Game-Changer

Fine-tuning models used to be a difficult task.

Technical requirements: Proficiency in Python and machine learning frameworks such as TensorFlow and PyTorch was a must.
Resource intensive: Setting up and managing GPU infrastructure required significant investment.
Time consuming: The process from setup to execution was often long.

Azure AI Fine-Tuning as a Service removes these barriers by providing an intuitive platform for fine-tuning models without worrying about the underlying infrastructure. With serverless capabilities, you simply upload your dataset, specify hyperparameters, and hit the “Fine-Tuning” button. This streamlined process enables rapid iteration and experimentation, significantly shortening the AI development cycle.

Cedric Vidal_1-1726692301987.png

A llama taking a break in a workshop generated using Azure OpenAI DALL-E 3

LoRA: A Game Changer for Efficient Fine-Tuning

What is LoRA?

Low-order Rank Adaptation (LoRA) is an efficient way to fine-tune large-scale language models. Unlike traditional fine-tuning, which updates all the model’s weights, LoRA only modifies a subset of the weights captured by the adapter. This focused approach significantly reduces the time and cost required for fine-tuning while maintaining the model’s performance.

LoRA in Action

LoRA offers several advantages by allowing fine-tuning of the model by selectively adjusting small portions of the weights via adapters.

Optional weight update: Computational requirements are reduced since only a portion of the weights are fine-tuned.
Cost Effectiveness: Lower computing requirements also reduce operating costs.
speed: Fine-tuning is faster, allowing for faster deployments and iterations.

Cedric Vidal_2-1726692301990.png

Illustration of LoRA Fine-tuning. This diagram shows a single attention block enhanced with LoRA. Each attention block in the model typically integrates its own LoRA module. SVG diagram generated using Azure OpenAI GPT-4o

Combining RAFT and LoRA: Why It’s So Effective

We looked at how Azure AI’s serverless fine-tuning uses LoRA, which can be very cheap and fast since it updates only a portion of the model weights.

Combining RAFT and LORA makes the model an expert in the domain, without learning any new underlying knowledge, and instead paying attention to the most useful quotes to answer the question, but without including all the information about the domain. It’s like a librarian (see RAG Hacking Session at RAFT), a librarian may not have complete knowledge of the contents of every book, but he or she knows which books contain the answers to given questions.

Another way to look at this is from an information theory perspective. Since LoRA only updates a subset of the weights, the amount of information that can be stored in those weights is limited, unlike full weight tuning, which updates all the weights in the model from bottom to top.

LoRA may seem limiting, but it is actually perfect when used with RAFT and RAG. You get the benefits of RAG and fine-tuning. RAG provides access to a potentially infinite amount of reference documents, and RAFT with LoRA provides a model that is an expert in understanding the documents retrieved by RAG at a fraction of the overall weighted fine-tuning cost.

The Importance of Azure AI Fine-Tuning API and AI Ops Pipeline Automation

Azure AI provides developers with serverless fine-tuning capabilities via APIs, making it simple to integrate the fine-tuning process into an automated AI operations (AI Ops) pipeline. Organizations can further streamline this process using the Azure AI Python SDK to seamlessly orchestrate the model training workflow. This includes systematic data processing, model versioning, and deployment. Automating these processes is critical because it ensures consistency, reduces human error, and accelerates the entire AI lifecycle from data preparation to model training, deployment, and monitoring. Organizations can leverage Azure AI’s serverless fine-tuning API and Python SDK to maintain an efficient, scalable, and agile AI Ops pipeline, ultimately driving faster innovation and more reliable AI systems.

Addressing model drift and base model aging

One of the important aspects of machine learning, especially in fine-tuning, is to ensure that the model generalizes well to unseen data. This is the main goal of the evaluation phase.

However, as the domain evolves and documents are added or updated, the model inevitably begins to drift. The rate of this drift depends on how quickly the domain changes. It could be a month, six months, a year, or more.

Therefore, to maintain performance, it is essential to periodically refresh the model and re-run the distillation process.

Moreover, the AI field is dynamic, with new and improved foundational models being released frequently. To take advantage of these advances, there needs to be a streamlined process for re-distilling the latest models, measuring improvements and efficiently distributing updates to users.

Why Distillation Process Automation is Essential

Automation of the distillation process is critical. As new documents are added or existing documents are updated, model alignment with the domain can change over time. Setting up an automated end-to-end distillation pipeline ensures that the model is up to date and accurate. Regularly re-running distillation ensures that the model is aligned with the evolving domain, maintaining stability and performance.

Practical Steps: Fine-tuning Llama 3.1 8B with RAFT and LoRA

Now that we’ve covered the benefits, let’s look at the actual steps. raft-distillation-recipe Repository on GitHub.

If you haven’t yet run the synthetic data generation step using RAFT, skip ahead to the next step. Previous articles in this blog series.

Once you have your synthetic dataset in hand, you can move on to the next step. Fine-tuning notebook for distillation recipe repository.

Here are some key code snippets that demonstrate how to use the Azure AI Python SDK to upload a dataset, subscribe to the Markerplace service, and create and submit a fine-tuning job on the Azure AI Serverless platform.

Upload training data set

The following code checks if the training dataset already exists in the workspace and uploads it only if necessary. It incorporates the hash of the dataset into the file name to make it easy to detect if the file has been previously uploaded.

from azure.ai.ml.entities import Data

dataset_version = "1"
train_dataset_name = f"{ds_name}_train_{train_hash}"
try:
    train_data_created = workspace_ml_client.data.get(train_dataset_name, version=dataset_version)
    print(f"Dataset {train_dataset_name} already exists")
except:
    print(f"Creating dataset {train_dataset_name}")
    train_data = Data(
        path=dataset_path_ft_train,
        type=AssetTypes.URI_FILE,
        description=f"{ds_name} training dataset",
        name=train_dataset_name,
        version=dataset_version,
    )
    train_data_created = workspace_ml_client.data.create_or_update(train_data)

from azure.ai.ml.entities._inputs_outputs import Input

training_data = Input(
    type=train_data_created.type, path=f"azureml://locations/{workspace.location}/workspaces/{workspace._workspace_id}/data/{train_data_created.name}/versions/{train_data_created.version}"
)

Subscribe to Marketplace Offers

This step is only required when fine-tuning models from third-party vendors such as Meta or Mistral. If you are fine-tuning a Microsoft first-party model such as Phi 3, you can skip this step.

from azure.ai.ml.entities import MarketplaceSubscription

model_id = "https://techcommunity.microsoft.com/".join(foundation_model.id.split("https://techcommunity.microsoft.com/")[:-2])
subscription_name = model_id.split("https://techcommunity.microsoft.com/")[-1].replace(".", "-").replace("_", "-")

print(f"Subscribing to Marketplace model: {model_id}")

from azure.core.exceptions import ResourceExistsError
marketplace_subscription = MarketplaceSubscription(
    model_id=model_id,
    name=subscription_name,
)

try:
    marketplace_subscription = workspace_ml_client.marketplace_subscriptions.begin_create_or_update(marketplace_subscription).result()
except ResourceExistsError as ex:
    print(f"Marketplace subscription {subscription_name} already exists for model {model_id}")

Create a fine-tuning job using the model and data as input.

finetuning_job = CustomModelFineTuningJob(
    task=task,
    training_data=training_data,
    validation_data=validation_data,
    hyperparameters={
        "per_device_train_batch_size": "1",
        "learning_rate": str(learning_rate),
        "num_train_epochs": "1",
        "registered_model_name": registered_model_name,
    },
    model=model_to_finetune,
    display_name=job_name,
    name=job_name,
    experiment_name=experiment_name,
    outputs={"registered_model": Output(type="mlflow_model", name=f"ft-job-finetune-registered-{short_guid}")},
)

Submit a fine-tuning task

The following snippet submits the previously created fine-tuning job to the Azure AI Serverless platform. If the submission is successful, the job details, including the Studio URL and registered model name, are printed. Any errors that occurred during the submission are also displayed.

try:
    print(f"Submitting job {finetuning_job.name}")
    created_job = workspace_ml_client.jobs.create_or_update(finetuning_job)
    print(f"Successfully created job {finetuning_job.name}")
    print(f"Studio URL is {created_job.studio_url}")
    print(f"Registered model name will be {registered_model_name}")
except Exception as e:
    print("Error creating job", e)
    raise e

The full executable code is available where previously mentioned. Fine Tuning Notebook.

Join the conversation

Join our tech community on Discord to discuss fine-tuning techniques, RAFT, LoRA, and more. Whether you’re an experienced AI developer or just starting out, our community is here to support you. Share your experiences, ask questions, and collaborate with fellow AI enthusiasts. Join us. fluoride Join the conversation!

What’s next?

This concludes the second part of our blog series on fine-tuning the Llama 3.1 8B model using RAFT and LoRA, leveraging the power of Azure AI Serverless Fine-Tuning. Today, we showed how these advanced techniques enable efficient and cost-effective model customization that precisely meets your domain requirements.

Integrating RAFT and LoRA will transform your model into an expert that can effectively explore and interpret relevant information from extensive document repositories using RAG, while significantly reducing the time and cost associated with overall weight fine-tuning. This methodology accelerates the fine-tuning process and democratizes access to advanced AI capabilities.

With detailed steps and code snippets provided, you now have the tools to implement serverless fine-tuning within your AI development workflow. Leveraging automation in AI Ops helps you maintain and optimize model performance over time, helping you keep your AI solutions competitive in an ever-changing environment.

Stay tuned! We’ll cover the next topic in two weeks: deploying fine-tuned models.

Source link

Why Azure AI Serverless Fine-Tuning is a Game-Changer

LoRA: A Game Changer for Efficient Fine-Tuning

What is LoRA?

LoRA in Action

Combining RAFT and LoRA: Why It’s So Effective

The Importance of Azure AI Fine-Tuning API and AI Ops Pipeline Automation

Addressing model drift and base model aging

Why Distillation Process Automation is Essential

Practical Steps: Fine-tuning Llama 3.1 8B with RAFT and LoRA

Upload training data set

Subscribe to Marketplace Offers

Create a fine-tuning job using the model and data as input.

Submit a fine-tuning task

Join the conversation

What’s next?

Our Company

About Links

Useful Links

Newsletter

Laest News

Fine-Tuning Llama 3.1 8B on Azure AI Serverless with LoRA and RAFT, why it’s so easy & cost efficient

Why Azure AI Serverless Fine-Tuning is a Game-Changer

LoRA: A Game Changer for Efficient Fine-Tuning

What is LoRA?

LoRA in Action

Combining RAFT and LoRA: Why It’s So Effective

The Importance of Azure AI Fine-Tuning API and AI Ops Pipeline Automation

Addressing model drift and base model aging

Why Distillation Process Automation is Essential

Practical Steps: Fine-tuning Llama 3.1 8B with RAFT and LoRA

Upload training data set

Subscribe to Marketplace Offers

Create a fine-tuning job using the model and data as input.

Submit a fine-tuning task

Join the conversation

What’s next?

Exciting Software Test Engineer Job Openings in Tuticorin with Freshersworld Client

Keep pushing the boundaries. A journey with Parkinson’s

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News