Marc Alier (Ludo) - Home Page

Pàgina personal i Calaix Desastre

Embeddings, Context uses and Self Referencing in LLMS

Posted at — Aug 31, 2023

Table Of Contents

Generative AI models, especially Large Language Models (LLMs) like GPT and its successors, have become a pivotal force in the advancement of artificial intelligence. While these models have gained prominence for their capabilities in natural language processing (NLP)—including tasks such as sentiment analysis, machine translation, and content generation—their applications extend beyond the realm of NLP [11].

One of the key objectives of this paper is to explore the specific features and architecture of generative AI models that make them a foundational technology for the development of Smart Learning Applications. This exploration serves as a part of a broader inquiry into the capabilities, limitations, and potential applications of these models. While the growing field of AI offers options for training and fine-tuning models, thanks to open-source initiatives like Lama 2 and platforms like OpenAssistant, this paper takes a different angle. We seek to understand how the existing functionalities of readily available models can be effectively utilized to develop Smart Learning Applications without requiring extensive modifications.

By examining critical components such as embeddings and the model’s ability to understand context, this paper aims to provide a nuanced understanding that can guide the development of educational technology and open up new avenues for innovation.

1 Embeddings

At the heart of LLMs is the concept of “embedding.” In NLP, embeddings serve the purpose of transforming linguistic entities—whether they are words, phrases, or entire documents—into numerical vectors of fixed dimensions. Mikolov et al. introduced the Word2Vec model, a popular method for generating word embeddings, which has been foundational in the development of embeddings in LLMs [12]. This transformation is pivotal as it allows textual data to be represented in a manner that is both computationally efficient and semantically rich. Through the process of embedding, LLMs are equipped to discern intricate patterns and relationships in language.

1.2 Practical Applications of Embeddings

LLMs utilize embeddings for a myriad of tasks. From sentiment analysis and machine translation to the generation of content, the capabilities of LLMs are vast. The inherent numerical nature of embeddings facilitates operations that can deduce relationships, draw analogies, and discern nuances in language. In the realm of machine translation, embeddings have been instrumental in identifying equivalent terms across languages, ensuring that translations maintain their intended meaning and accuracy [13].

A notable example of the practical application of embeddings is provided by ChatPDF.com, a turnkey system that allows for the embedding of entire PDFs, spanning hundreds of pages. This system enables a chat interface with the document, offering an innovative approach to document interaction. ChatPDF.com also provides an API for developers, making it easier to integrate this embedding-friendly model into various applications [14].

Fig 2. ChatPDF enables a chat conversation with the contents of a document. In the example a conversation with the paper “Attention is all you need”

2 The context in LLMs based on transformers**

In the realm of LLMs, particularly models like GPT-4 and BERT based on the transformer architecture, the term “context” denotes the immediate surrounding information that the model leverages to generate a response [15, 16]. For GPT-4, this context is derived from the preceding text in a conversation or document. Such context is indispensable as it provides the model with insights into the ongoing topic, the tone and style of the conversation, and any specific instructions or constraints.

An essential mechanism that enables this contextual understanding is the “attention” mechanism. In transformer architectures, attention allows the model to weigh different parts of the input text differently. This means that when generating a response, the model doesn’t treat all words or tokens in the context equally. Instead, it “attends” more to certain parts that are more relevant to the query or prompt at hand. For example, if the conversation is about climate change, words like “emissions,” “carbon,” and “temperature” might receive higher attention weights. This attention mechanism works in tandem with the context to produce more accurate and contextually appropriate responses.

However, the understanding of context in LLMs extends beyond just the immediate preceding text. Given the vast datasets these models are trained on, they possess a comprehensive understanding of a multitude of topics. When presented with a specific context, the model delves into this extensive knowledge base, honing in on the relevant segments to craft an appropriate response.

2.1 Context as Ad Hoc Training

The proposition of utilizing context as a form of ad hoc training presents a novel approach to interacting with LLMs. Howard and Ruder introduced the idea of fine-tuning pre-trained models for specific tasks, which is somewhat related to the idea of ad hoc training [17].

By furnishing an LLM with a distinct context or a set of instructions, users have the capability to “guide” the model’s responses in real-time. This method of providing contextual guidance effectively acts as instantaneous training, molding the model’s behavior without necessitating alterations to its foundational architecture or weights.

In our exploration of the role of context in language models, we present two contrasting figures to highlight the difference that context can make in the responses generated by a chatbot.

Fig. 3 depicts a scenario where a chatbot, trained on data up to 2021, is asked to write a two-paragraph essay about the 2023 FIFA Women’s World Cup from the perspective of a 10-year-old girl. In this case, the chatbot lacks the specific context of the event and therefore produces a response based on its pre-2021 training, lacking in specific details about the event.

Fig. 3: A screenshot of a chatbot interaction where the chatbot, trained on data up to 2021, is asked to write a two-paragraph essay about the 2023 FIFA Women’s World Cup from the perspective of a 10-year-old girl. The chatbot’s response lacks specific details about the event.

On the other hand, Fig. 4 shows the same chatbot, but this time it is provided with context in the form of a text snippet from the Wikipedia page about the 2023 FIFA Women’s World Cup. When asked the same question, the chatbot is able to generate a detailed response, discussing the sports event, the results, participating countries, and individuals involved, as if it had been specifically trained on that information.

Fig. 4: A screenshot of the same chatbot interaction, but this time the chatbot is provided with a text snippet from the Wikipedia page about the 2023 FIFA Women’s World Cup. When asked the same question, the chatbot generates a detailed response, discussing the sports event, the results, participating countries, and individuals involved, showcasing the power of contextual information.

3 Self-Referential Context and Programmability in LLMs**_

An often-overlooked aspect of context in LLMs is the self-referential nature of their responses. As an LLM generates a response, that output becomes part of the ongoing context for subsequent interactions. This dynamic feature enables a unique form of programmability and direction-following in LLMs.

One aspect of context in advanced Language Learning Models (LLMs) like GPT-4 and GPT-3.5 that warrants closer attention is the self-referential nature of their generated responses. While earlier models in the GPT series have some ability to use context within a single interaction, this capability has been significantly enhanced in more recent versions. Advanced LLMs can incorporate their own generated text into the context for future interactions as represented in Fig 5., meaning that as the model generates a response, that newly generated content doesn’t just serve as an answer to a query; it also becomes part of the evolving context that informs subsequent responses.

Fig. 5: The generated response is incorporated into the ongoing context, influencing the LLM’s attention mechanism as the answer continues to be generated in a recursive manner.

This feature introduces a level of dynamic context updating, allowing the model to follow directions or carry out tasks in a more nuanced manner. For example, if an advanced LLM like GPT-4 or GPT-3.5 is asked to generate a recipe and then create a shopping list based on that recipe, the model can use the ingredients listed in its own generated recipe as context for compiling the shopping list. This not only shows the model’s ability to understand and maintain context but also highlights its capability to be “programmable” within the scope of a single interaction. This self-referential context updating makes advanced LLMs versatile tools for more complex, multi-step tasks and interactions.

To illustrate this in fee Fig 6 we asked GPT-4 to generate a recipe for spaghetti bolognese and then create a shopping list based on that recipe. The model first listed the ingredients and steps for the dish and then used this information to compile a shopping list. This shows that the model can use its own generated text as context for a subsequent task within the same interaction.

This example highlights a straightforward but important feature: the model’s ability to update its context dynamically. The shopping list isn’t just a separate output; it’s directly related to the recipe the model itself provided. This demonstrates that language models can follow directions from their own generated text, allowing for more nuanced and context-aware interactions.

Fig. 6: A screenshot of a chatbot interaction where the user asks GPT4 for a spaghetti bolognese recipe followed by a shopping list. The chatbot first generates the recipe and then uses it as context to create a shopping list, demonstrating the concept of Dynamic Context Updating.

In an educational setting, the self-referential context can be leveraged to enable a chatbot to review, grade, or provide feedback on student exercises. Typically, when presented with the text of an exercise and a student’s solution, both GPT-3.5 and GPT-4 tend to perform poorly in grading and offering feedback. However, the performance improves significantly when the chatbot is instructed to first solve the problem itself, then compare the student’s solution with its own generated solution, and finally provide a grade and feedback. The results using this approach are markedly better.

prompt = f"""
Determine if the student's solution is correct or not.

Question:
I'm building a solar power installation and I need \\
 help working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \\ 
me a flat $100k per year, and an additional $10 / square \\
foot
What is the total cost for the first year of operations 
as a function of the number of square feet.

Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""
response = get_completion(prompt)
print(response)

The code we shown above presents an example of prompt asking the chatbot to determine if a student’s solution is correct or not. The results will be poor, using this approach. But in the next fragment of code we can see how instructing carefully in the prompt to first solve de problem and then consider the students solution the results will be much improved (Source OpenAI Cookbook[18]).

prompt = f"""
Your task is to determine if the student's solution \\
is correct or not.
To solve the problem do the following:
- First, work out your own solution to the problem. 
- Then compare your solution to the student's solution \\ 
and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until 
you have done the problem yourself.

Use the following format:
Question:
---

question here

---
Student's solution:
---

student's solution here

---
Actual solution:
---

steps to work out the solution and your solution here

---
Is the student's solution the same as actual solution \\
just calculated:
---

yes or no

---
Student grade:
---

correct or incorrect

---

Question:
---

I'm building a solar power installation and I need help \ working out the financials.

- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \ me a flat $100k per year, and an additional $10 / square \ foot What is the total cost for the first year of operations \ as a function of the number of square feet.

---
Student's solution:
---

Let x be the size of the installation in square feet. Costs:

1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000

---
Actual solution:
"""
response = get_completion(prompt)
print(response)

4 The Evolution of Contextual Understanding: The System Message.

In March 2023, OpenAI introduced a groundbreaking feature in the API of their GPT 3.5 and GPT 4 models: the “system message” [18]. Prior to this update, the API allowed for the model to receive prompts and context through an array of messages categorized as either “user” (messages prompted by the user) or “agent” (previous responses generated by the model). The system message, however, is designed to be the first message in this array.

The system message serves a critical function: it outlines the role and behavioral parameters expected of the LLM. This allows for a more nuanced and directed interaction with the model, as it provides the LLM with guidelines on how to respond to subsequent prompts.

The introduction of the system message significantly enhances the user’s ability to guide the model’s behavior in real-time, effectively serving as an extension of the ad hoc training concept discussed earlier. It offers users a more refined tool for customizing the model’s responses, thereby elevating the level of interaction to a more dynamic and tailored experience.

The following fragment shows Python code which demonstrates how to use the system message feature to set the behavior of the model as a Socratic mentor.


import openai

...

# Define the system message to set the behavior of the model as a Socratic mentor

system_message = {
'role': 'system',
'content': 'You are a Socratic mentor. Engage in thoughtful dialogue,ask probing questions, and guide the user to deeper understanding.'
}
# Define a user message

user_message1 = {
'role': 'user',
'content': 'What is the meaning of life?'
}

# Define an agent message (a previous response from the model, if any)

agent_message1 = {
'role': 'agent',
'content': 'The meaning of life is a deeply philosophical question. What do you think it is?'
}

# Define another user message

user_message2 = {
'role': 'user',
'content': 'I think it is to find happiness.'
}

# Combine all messages into a list

messages = [system_message, user_message1, agent_message1, user_message2]

# Make an API call to GPT-4

response = openai.Completion.create(
model="gpt-4",
messages=messages
)

This code snippet illustrates how the system message can be used to guide the model’s behavior, making the interaction more dynamic and tailored to individual needs.

References

[12] Mikolov, T., et al. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).

[13] Vaswani, A., et al. “Attention is all you need.” Advances in neural information processing systems. 2017.

[14] API Backend Documentation." ChatPDF.com. Accessed August 25, 2023. **https://www.chatpdf.com/docs/api/backend**.

[15] Devlin, J., et al. “BERT: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).

[16] Howard, J., and Ruder, S. “Universal language model fine-tuning for text classification.” arXiv preprint arXiv:1801.06146 (2018).

[17] Suárez, Diego. “How to write ‘System’ Instructions for OpenAI’s GPT-4 Chat API.” Rootstrap Blog, April 25, 2023. Rootstrap Blog.

[18] OpenAI. “OpenAI Cookbook: Examples and guides for using the OpenAI API.” GitHub repository. Last modified August 22, 2023. https://github.com/openai/openai-cookbook. Accessed August 23, 2023