Home NewsX Leveraging dynamic few-shot prompt with Azure OpenAI

Leveraging dynamic few-shot prompt with Azure OpenAI

by info.odysseyx@gmail.com
0 comment 2 views


Few-shot prompts are a technique used in natural language processing (NLP) where a model is given a small number of examples (or “shots”) to learn from before generating a response or completing a task. This approach is particularly useful because it allows the model to understand and adapt to new tasks with minimal data, making it highly efficient and versatile. Few-shot prompts allow users to obtain high-quality results without needing an extensive training dataset, saving time and computational resources. This method also improves the model’s ability to generalize from limited examples, providing more accurate and contextually relevant output.

One challenge with few-shot prompts is that as the number of examples grows, the prompts can become too large, leading to inefficiency and potential performance issues. To address this, a dynamic few-shot prompt technique can be used. In this approach, a comprehensive list of prompts is stored in a vector store, and the user’s input is matched against this vector store to identify the most relevant examples. By leveraging OpenAI embeddings in conjunction with the vector store, this method optimizes for size and relevance by ensuring that only the most relevant examples are included in the prompt. This dynamic technique not only maintains the efficiency and effectiveness of few-shot learning, but also improves the model’s ability to generate accurate and context-sensitive responses.

architecture

dynamic_few-shots_prompt_architecture.png

The diagram above shows the overall architecture of the solution. Let’s break down each component.

  • Vector Store – This store holds several shot prompt examples. Each example is indexed by its input, and its contents are input/output pairs.
  • Embedding model – This model is responsible for converting user input into vectors, which can then be used to query the vector store.
  • GPT Model – This is the model we will use to complete the chat and is responsible for providing answers to the user.

Use Cases

To demonstrate how dynamic few-shot prompts can be used, consider a scenario where we have a chat completion that can handle three types of actions:

  • Display data in tabular format
  • Text classification
  • Summarize text

I’d like to provide a few examples for each task to better explain what the model should do.

One simple way to do this is to provide all the relevant examples of these tasks in the prompt itself. This strategy has some drawbacks.

  • Information overload: If there are too many examples, the model may become overwhelmed and have difficulty identifying key requests.
  • Confusion: The model may become confused and produce responses that are irrelevant or off-topic.
  • Accuracy issues: If you focus only on examples, the accuracy of your responses may decrease.
  • expense: The more examples there are, the more tokens the model has to process, which increases the cost.

It’s easy to see that this strategy is not very scalable when it comes to supporting other types of tasks, as more examples are needed.

On the other hand, when using a dynamic few-shot prompt, we only use the most relevant examples (for a given user input) from the prompt. For example, we may decide that we only want to use the top 3 most relevant examples from the prompt.

To implement this strategy, just follow these steps:

  1. Define a list of examples
  2. List of examples of indexes in vector storage
    • The index key is the input for the built-in examples.
  3. Find the most relevant examples
    • Embed user input using the same embedding we used in the previous step.
    • Find the most relevant examples from a vector store using built-in user input.
  4. Add examples to prompts
    • Add the input of the example as a user message and its output as a secondary message.

This solution uses:

  • that Semantic similarity example selector This is a class from the langchain_core package, as it already implements most of the steps mentioned above.
    • Embed example inputs and index them into a vector store.
    • Defines the number of relevant examples to return.
    • Returns relevant examples for the given user input.
  • In-memory vector store
    • In-memory implementation of VectorStore using dictionaries
    • The package langchain-community provides other vector store implementations, such as Azure Search.
  • AzureOpenAIEmbedding
    • Use openAI embeddings such as text-embedding-ada-002.
    • The package langchain-community provides other embedding implementations, such as Text2Vec.

Code implementation

The implementation of this solution is really simple. All we need is two files: requirements.txt and main.py, the contents of which can be found below.

requirements.txt

langchain-openai==0.1.21
azure-identity==1.17.1
numpy==2.0.1

main.py

# imports
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_core.vectorstores import InMemoryVectorStore

# constants
AZURE_OPENAI_ENDPOINT = ""
DEPLOYMENT_EMBEDDING = ""
DEPLOYMENT_CHAT = ""
OPENAI_API_VERSION = "2023-05-15"

SYSTEM_MESSAGE = 'You are a helpful assistant that uses previous interactions to determine how a questions should be answered. You should try to use those same response formats'
# the examples to be indexed in the vector store. Each example should be in the format `{"input": ""}`
EXAMPLES_LIST = [
    # data examples
    {
        "input": "Can you provide the population of the top 5 most populous countries?",
        "output": '''
| Country       | Population (2023) |
|---------------|-------------------|
| China         | 1,425,000,000     |
| India         | 1,417,000,000     |
| United States | 332,000,000       |
| Indonesia     | 276,000,000       |
| Pakistan      | 231,000,000       |
'''
    },
    {
        "input": "What are the top 5 programming languages in 2023 by popularity?",
        "output": '''
| Rank | Programming Language | Popularity (%) |
|------|----------------------|----------------|
| 1    | Python               | 29.9           |
| 2    | JavaScript           | 19.1           |
| 3    | Java                 | 16.2           |
| 4    | C#                   | 8.3            |
| 5    | PHP                  | 6.1            |
'''
    },
    {
        "input": "Can you list the top 5 highest-grossing movies of all time?",
        "output": '''
| Rank | Movie Title             | Gross Revenue (USD) |
|------|-------------------------|---------------------|
| 1    | Avatar                  | $2.923 billion      |
| 2    | Avengers: Endgame       | $2.798 billion      |
| 3    | Titanic                 | $2.195 billion      |
| 4    | Star Wars: The Force Awakens | $2.068 billion |
| 5    | Avengers: Infinity War  | $2.048 billion      |
'''
    },
    {
        "input": "What are the top 5 universities in the world according to the 2023 QS World University Rankings?",
        "output": '''
| Rank | University                        | Location       |
|------|-----------------------------------|----------------|
| 1    | Massachusetts Institute of Technology (MIT) | USA  |
| 2    | University of Cambridge           | UK             |
| 3    | Stanford University               | USA            |
| 4    | University of Oxford              | UK             |
| 5    | Harvard University                | USA            |
'''
    },
    {
        "input": "Can you provide the GDP of the top 5 largest economies in 2023?",
        "output": '''
| Rank | Country       | GDP (USD Trillions) |
|------|---------------|---------------------|
| 1    | United States | 25.3                |
| 2    | China         | 17.7                |
| 3    | Japan         | 4.9                 |
| 4    | Germany       | 4.2                 |
| 5    | India         | 3.5                 |
'''
    },

    # Classification examples
    {
        "input": "Classify the following text: 'The quick brown fox jumps over the lazy dog.'",
        "output": "Sentence"
    },
    {
        "input": "Classify the following text: 'To be, or not to be, that is the question.'",
        "output": "Quote"
    },
    {
        "input": "Classify the following text: 'Once upon a time, in a land far, far away, there lived a young princess.'",
        "output": "Story"
    },
    {
        "input": "Classify the following text: 'E=mc^2 is a formula expressing the relationship between mass and energy.'",
        "output": "Scientific Statement"
    },
    {
        "input": "Classify the following text: 'I hope you have a great day!'",
        "output": "Greeting"
    },

    # Summarization examples
    {
        "input": "Summarize the following paragraph: 'The rapid advancement of technology has significantly impacted various industries. Automation and artificial intelligence are transforming the workforce, leading to increased efficiency but also raising concerns about job displacement. Companies are investing heavily in tech to stay competitive, while governments are grappling with the need to update regulations.'",
        "output": '''
- Technology is transforming industries through automation and AI.
- Increased efficiency comes with concerns about job displacement.
- Companies invest in tech; governments update regulations.
'''
    },
    {
        "input": "Summarize the following paragraph: 'Climate change is one of the most pressing issues of our time. Rising global temperatures are causing more frequent and severe weather events, such as hurricanes and wildfires. Efforts to mitigate climate change include reducing carbon emissions, transitioning to renewable energy sources, and protecting natural habitats.'",
        "output": '''
- Climate change leads to severe weather events.
- Mitigation efforts focus on reducing carbon emissions and using renewable energy.
- Protecting natural habitats is crucial.
'''
    },
    {
        "input": "Summarize the following paragraph: 'The education system is undergoing significant reforms to better prepare students for the future. Emphasis is being placed on critical thinking, problem-solving, and digital literacy. Schools are incorporating more technology into the classroom and offering courses that align with the demands of the modern workforce.'",
        "output": '''
- Education reforms focus on future readiness.
- Key skills: critical thinking, problem-solving, digital literacy.
- More technology and modern workforce-aligned courses in schools.
'''
    },
    {
        "input": "Summarize the following paragraph: 'The healthcare industry is seeing a shift towards personalized medicine. Advances in genetic research allow for treatments tailored to individual patients, improving outcomes and reducing side effects. This approach is particularly promising in the treatment of cancer and rare genetic disorders.'",
        "output": '''
- Healthcare is moving towards personalized medicine.
- Genetic research enables tailored treatments.
- Promising for cancer and rare genetic disorders.
'''
    },
    {
        "input": "Summarize the following paragraph: 'Remote work has become increasingly common, especially after the COVID-19 pandemic. Many companies have adopted flexible work policies, allowing employees to work from home or other locations. This shift has led to changes in workplace dynamics, with a greater emphasis on digital communication and collaboration tools.'",
        "output": '''
- Remote work is more common post-COVID-19.
- Companies adopt flexible work policies.
- Emphasis on digital communication and collaboration tools.
'''
    }
]

# get default azure credentials. You might need to run `az login` first.
credential = DefaultAzureCredential()

# get an openai token using the default credentials. Used to authenticate with openai
openai_token = credential.get_token("https://cognitiveservices.azure.com/.default").token

# create openai client
client = AzureOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    azure_deployment=DEPLOYMENT_CHAT,
    azure_ad_token=openai_token,
    api_version=OPENAI_API_VERSION
)

# create openai embeddings client
embedding = AzureOpenAIEmbeddings(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    deployment=DEPLOYMENT_EMBEDDING,
    azure_ad_token=openai_token,
    openai_api_version=OPENAI_API_VERSION
)

# creates an example selector that uses semantic similarity to retrieve examples
example_selector = SemanticSimilarityExampleSelector.from_examples(
    EXAMPLES_LIST,          # list of examples to index
    embedding,              # embedding model to use
    InMemoryVectorStore,    # vector store to use
    k=3,                    # number of examples to retrieve
    input_keys=["input"],   # keys in the examples that contain the input text
)

while True:
    # gets user input
    user_input = input("Enter a sentence: ")

    # select the most relevant examples based on the user input
    res = example_selector.select_examples({"input": user_input})

    messages = [
        {"role": "system", "content": SYSTEM_MESSAGE},
    ]

    print('retrieved examples:')
    for ex in res:
        # adds each example input as a user message and the corresponding output as an assistant message
        messages.append({"role": "user", "content": ex['input']})
        messages.append({"role": "assistant", "content": ex['output']})
        print('question:{}\nanswer:{}'.format(ex['input'], ex['output']))

    # adds the user input as a user message
    messages.append({"role": "user", "content": user_input})

    # create a completion using the messages
    response = client.chat.completions.create(
        messages=messages,
        model="gpt-4o",
    )

    # print the response from the model
    print('response: ', response.choices[0].message.content)

Running the code

Prerequisites

  • Python 3.10
  • OpenAI GPT distribution (proposes the gpt-4o model)
  • OpenAI Embedding distribution (proposes the text-embedding-ada-002 model)
  • You have permission to use OpenAI services (more details) here)

Installing dependencies

pip install -r requirements.txt

run

python main.py

Below you can see the results of running the code with several different types of input. Note that the Retrieved Examples section only shows the three most relevant examples for that input.

Enter a sentence: list me the biggest airplanes
retrieved examples:
question:Can you list the top 5 highest-grossing movies of all time?
answer:
| Rank | Movie Title             | Gross Revenue (USD) |
|------|-------------------------|---------------------|
| 1    | Avatar                  | $2.923 billion      |
| 2    | Avengers: Endgame       | $2.798 billion      |
| 3    | Titanic                 | $2.195 billion      |
| 4    | Star Wars: The Force Awakens | $2.068 billion |
| 5    | Avengers: Infinity War  | $2.048 billion      |
question:Can you provide the GDP of the top 5 largest economies in 2023?
answer:
| Rank | Country       | GDP (USD Trillions) |
|------|---------------|---------------------|
| 1    | United States | 25.3                |
| 2    | China         | 17.7                |
| 3    | Japan         | 4.9                 |
| 4    | Germany       | 4.2                 |
| 5    | India         | 3.5                 |
question:Can you provide the population of the top 5 most populous countries?
answer:
| Country       | Population (2023) |
|---------------|-------------------|
| China         | 1,425,000,000     |
| India         | 1,417,000,000     |
| United States | 332,000,000       |
| Indonesia     | 276,000,000       |
| Pakistan      | 231,000,000       |

response:  Certainly! Here are some of the biggest airplanes in terms of size and capacity:
| Rank | Aircraft                        | Description                                                                                               |
|------|---------------------------------|-----------------------------------------------------------------------------------------------------------|
| 1    | Antonov An-225 Mriya            | The largest cargo aircraft ever built, featuring 6 engines and a maximum takeoff weight of 640 tons       |
| 2    | Airbus A380                     | The largest passenger airliner, with a double-deck configuration and capacity for up to 853 passengers    |
| 3    | Boeing 747-8                    | One of the largest commercial aircraft, known as the "Queen of the Skies," with a capacity of 410-524 passengers |
| 4    | Lockheed C-5 Galaxy             | A large military transport aircraft used by the U.S. Air Force, with a maximum takeoff weight of 381 tons  |
| 5    | Boeing 777-9                    | The newest variant of the Boeing 777 series, featuring a longer fuselage and higher capacity, accommodating up to 426 passengers |
These aircraft are recognized for their impressive sizes and capabilities, serving various purposes from cargo transport to long-haul passenger flights.

Enter a sentence: classify the following text: F=ma is Newton's second law of motion
retrieved examples:
question:Classify the following text: 'E=mc^2 is a formula expressing the relationship between mass and energy.'
answer:Scientific Statement
question:Classify the following text: 'The quick brown fox jumps over the lazy dog.'
answer:Sentence
question:Classify the following text: 'To be, or not to be, that is the question.'
answer:Quote

response:  Scientific Statement

Enter a sentence: summarize the following text: Exercise offers numerous benefits that significantly enhance one’s quality of life. Regular physical activity helps maintain a healthy weight, reduces the risk of chronic diseases such as heart disease, diabetes, and certain cancers, and improves cardiovascular health. Additionally, exercise boosts mental health by reducing symptoms of depression and anxiety, enhancing mood, and improving sleep quality. It also increases energy levels, strengthens muscles and bones, and promotes better flexibility and balance. Overall, incorporating exercise into your daily routine can lead to a longer, healthier, and more fulfilling life. What kind of exercise do you enjoy the most?
retrieved examples:
question:Summarize the following paragraph: 'The healthcare industry is seeing a shift towards personalized medicine. Advances in genetic research allow for treatments tailored to individual patients, improving outcomes and reducing side effects. This approach is particularly promising in the treatment of cancer and rare genetic disorders.'
answer:
- Healthcare is moving towards personalized medicine.
- Genetic research enables tailored treatments.
- Promising for cancer and rare genetic disorders.
question:Summarize the following paragraph: 'The education system is undergoing significant reforms to better prepare students for the future. Emphasis is being placed on critical thinking, problem-solving, and digital literacy. Schools are incorporating more technology into the classroom and offering courses that align with the demands of the modern workforce.'
answer:
- Education reforms focus on future readiness.
- Key skills: critical thinking, problem-solving, digital literacy.
- More technology and modern workforce-aligned courses in schools.
question:Classify the following text: 'E=mc^2 is a formula expressing the relationship between mass and energy.'
answer:Scientific Statement

response:  - Exercise greatly enhances quality of life.
- Benefits: healthy weight, reduced chronic disease risk, improved cardiovascular and mental health, better sleep, increased energy, stronger muscles and bones, and improved flexibility and balance.
- Leads to a longer, healthier, and more fulfilling life.
- What kind of exercise do you enjoy the most?

Next Steps

Now that you’re familiar with the dynamic pew shot technique, here’s how to take it to the next level.

  • Use a different vector store
  • Use a different embedding model
  • Update examples/prompts to support new tasks.

conclusion

The dynamic few-shot prompt technique demonstrates significant improvements over traditional few-shot learning. By leveraging vector stores and embedding models, the method ensures that only the most relevant examples are included in the prompt, optimizing for relevance, size, and ultimately cost. This approach not only maintains the efficiency and effectiveness of few-shot learning, but also improves the model’s ability to generate accurate and context-sensitive responses. As a result, users can achieve high-quality results with minimal data, making the technique a powerful tool for a wide range of applications.

References





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX