Approaching Generative Artificial Intelligence: Recommendations and Lessons Learned from AgPal Chat

By Andy Fan, Rafael Moraes; Agriculture and Agri-Food Canada

Introduction

Born out of the winning entry of the inaugural Canadian Public Service Data Challenge, we have had the incredible opportunity to build AgPal Chat, a generative Artificial Intelligence (AI) search tool that provides helpful federal, provincial, and territorial agricultural information to Canadians in a conversational manner. It is available on the AgPal.ca website as a new, complementary way to connect users with relevant Canadian agricultural information. It is the result of an incredible cross-functional effort across industry, academia, and other government departments to improve service delivery to Canadians.

In this article, we focus on sharing our lessons learned on the technical and policy aspects of implementation of AgPal Chat. Some of our key findings and recommendations include: the use of Retrieval Augmented Generation (RAG) to enhance AI accuracy, the necessity of guardrails to ensure ethical and safe AI interactions, and the role of strong data governance and policy compliance in creating responsible AI systems.

Prompt Engineering

Prompt engineering is a fascinating and intricate field at the intersection of human expertise and AI. Its central goal is to fine tune queries in a way that elicits the most accurate, unbiased, and relevant responses from AI systems, especially those based on language models. This practice is of utmost importance because, unlike traditional software interfaces, natural language systems rely heavily on the subtleties of human language with all its nuances and complexities. Therefore, designing effective prompts is both an art and a science, one that requires a deep understanding of the underlying AI technology, as well as the particularities of human language and cognition.

It is also a continuous, iterative process that involves testing and refining prompts to ensure AI systems generate accurate, unbiased, and relevant responses. This ongoing adjustment is crucial to avoid introducing unintended biases, as even small changes in wording can significantly impact AI behavior. Regular assessment and careful balancing of technical and linguistic elements help maintain the reliability and impartiality of AI outputs.

It's important to recognize that each large language model (LLM) will have its own optimal prompt that elicits the best performance, as different models may respond differently to the same prompt due to variations in their architecture and training data. However, the process of discovering this optimal prompt remains consistent across models. It involves the same iterative cycle of experimentation, evaluation, and refinement to ensure the prompts guide the AI in producing accurate and unbiased outputs.

Retrieval-Augmented Generation Technique

"Retrieval-Augmented Generation" or RAG is a framework that combines the retrieval of information from a knowledge source (like a database or a collection of documents) with the generative capabilities of a language model. Without it, there is a higher chance even fine-tuned LLMs will output “hallucinations” when asked about topics they have not seen extensively in their training set. To ensure that an AI provides more precise information, RAG must be implemented within the prompt construction process. Figure 1 below showcases an example of the RAG process. Open-source libraries like Langchain or Llama Index, as well as proprietary solutions (such as Azure Cognitive Search) can be employed if you want to use RAG without building it from scratch. In AgPal Chat’s case, we decided to build the RAG pattern ourselves. This led to a more flexible solution that fits our specific needs.

A diagram of a software Description automatically generated with medium confidence

Figure 1 source: Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook

Description - Figure 1: Retrieval-Augmented Generation (RAG) process example

This is an image illustrating the Retrieval-Augmented Generation (RAG) process from freecodecamp. The diagram consists of several interconnected components:

  • Input (Query): A question from the user, such as "How do you evaluate the fact that OpenAI's CEO, Sam Altman, went through a sudden dismissal by the board in just three days, and then was rehired by the company, resembling a real-life version of 'Game of Thrones' in terms of power dynamics?"
  • Indexing: The system indexes documents into chunks/vectors using embeddings.
  • Retrieval: Relevant documents are retrieved based on the query. For example:
    • Chunk 1: "Sam Altman Returns to OpenAI as CEO, Silicon Valley Drama Resembles the ‘Zhen Huan’ Comedy."
    • Chunk 2: "The Drama Concludes? Sam Altman to Return as CEO of OpenAI, Board to Undergo Restructuring."
    • Chunk 3: "The Personnel Turmoil at OpenAI Comes to an End: Who Won and Who Lost?"
  • Generation:
    • Without RAG: The system provides a generic response without specific details, such as "I am unable to provide comments on future events. Currently, I do not have any information regarding the dismissal and rehiring of OpenAI’s CEO..."
    • With RAG: The system combines the context from the retrieved documents and prompts to generate a more detailed and relevant response, such as "...This suggests significant internal disagreements within OpenAI regarding the company's future direction and strategic decisions. All of these twists and turns reflect power struggles and corporate governance issues within OpenAI..."
  • Output: The final answer is generated based on the selected retrieval method (with or without RAG), showcasing the difference in detail and accuracy.

Here is how the RAG technique generally works:

  1. Retrieval Step: Given a chat history, the RAG system first retrieves relevant documents or pieces of information from a database or a corpus. This is usually done using a retrieval model or search algorithm optimized to quickly find the most relevant content from large collections of information.
  2. Augmentation: The retrieved documents are then used to augment the input to the generative model. This means that the language model receives both the chat history and the content of the retrieved documents as context.
  3. Generation Step: A generative language model then generates a response based on this augmented input. The model uses the additional information to produce more accurate, detailed, and contextually relevant answers.

RAG frameworks are particularly useful for tasks where a language model needs access to external information or must answer questions based on factual data that may not be stored within its parameters. Examples of such tasks include open-domain question answering and fact-checking. The retrieval step allows the system to pull in up-to-date or specific information that the language model alone would not have access to, based upon its training data.

Guardrails

Guardrails are pre-defined rules or constraints that are put in place to prevent an AI system from generating inappropriate, biased, or unbalanced content. They work by guiding the generation process away from certain topics or phrases and by post-processing the AI's output to remove or revise problematic content. These guardrails are crucial for several reasons:

  1. Content Control: They prevent the generation of inappropriate, offensive, or harmful content. This includes hate speech, sexually explicit material, and other types of content that might not be suitable for all audiences.
  2. Ethical Guidelines: Guardrails help ensure that LLMs and chatbots abide by ethical guidelines. They can prevent the endorsement of illegal activities or those that might cause harm to users or third parties.
  3. Bias Mitigation: Despite best efforts, LLMs can sometimes perpetuate or even amplify biases present in their training data. Guardrails can be designed to identify and mitigate such biases, ensuring more fair and balanced interactions.
  4. Safety: By keeping the AI's behavior within certain limits, guardrails enhance user safety by not allowing the system to provide dangerous or incorrect information. This is particularly important in high-stakes domains like healthcare or legal advice, where incorrect information can have serious consequences.
  5. User Trust and Compliance: Ensuring that the system behaves predictably and within the bounds of socially acceptable norms helps to build user trust. Guardrails also help in compliance with various regulatory standards and legal requirements, which is essential for the deployment of chatbots in different industries.
  6. Prevention of Misuse: Guardrails are also important for preventing users from manipulating or 'tricking' the AI into behaving in unintended ways, such as generating malicious content or participating in deceptive practices.
  7. Maintaining Focus: They help the system stay on topic and relevant to the user's intent, improving user experience by preventing the chatbot from producing irrelevant or nonsensical responses.

To incorporate guardrails effectively, one must first identify the potential vulnerabilities that could instigate the model to have a bias and understand the context in which impartiality might be compromised. For example, if an AI model is generating objective news summaries, it should treat different entities and subjects objectively and not provide an opinion. Guardrails for this scenario could range from eliminating certain opinion laden words to implementing more sophisticated sentiment analysis checks that flag excessively positive or negative language around specific topics. Lastly, ensuring that an AI tool responds solely to pertinent inquiries is a matter of both discrimination and focus within the guardrail system. The AI should discern between questions that it should answer and those that are either irrelevant, inappropriate, or outside the scope of its functionality. Again, guardrails play a crucial role here. By giving clear instructions and examples for what constitutes a pertinent inquiry, the AI can deflect or refuse to answer questions that do not meet those criteria.

For example, in building AgPal Chat, queries about only Canadian agricultural information in the AgPal system would be pertinent. Guardrails were therefore set up to provide comprehensive and focused responses to agricultural- related questions, while avoiding or redirecting those concerning other unrelated data. A simplified example of the implementation was including a line in the system prompt stating, "Do not answer questions that are not related to data provided by the AgPal system".

In practice, guardrails can take many forms. Based on our experience in building AgPal Chat, we would recommend having at least:

  • Filtering systems that detect and block unwanted types of content.
  • Rate-limiting features to prevent spamming or abuse of the system.
  • Explicit prompting of "do not say" lists or behavior rules.
  • Review processes or human-in-the-loop mechanisms (e.g., logging and monitoring user prompts and answers) Microsoft has a great example of a guardrail prompt (Figure 2):
Screens screenshot of a black screen Description automatically generated
Description - Figure 2: Metaprompt guardrail example

This is an image of a metaprompt guardrail example for an ice cream shop conversational agent from Microsoft.

The metaprompt consists of:

## This is a conversational agent whose code name is Dana:

  • Dana is a conversational agent at Gourmet Ice Cream, Inc.
  • Gourmet Ice Cream’s marketing team uses Dana to help them be more effective at their jobs.
  • Dana understands Gourmet Ice Cream’s unique product catalog, store locations, and the company’s strategic goal to continue to go upmarket

## On Dana’s profile and general capabilities:

  • Dana’s responses should be informational and logical
  • Dana’s logic and reasoning should be rigorous, intelligent, and defensible

## On Dana’s ability to gather and present information:

  • Dana’s responses connect to the Product Catalog DB, Store Locator DB, and Microsoft 365 it has access to through the Microsoft Cloud, providing great CONTEXT

## On safety:

  • Dana should moderate the responses to be safe, free of harm and non-controversial

The prompt consists of:

Write a tagline for our ice cream shop.

The Response consists of:

Scoops of heaven in the heart of Phoenix!

Implementing guardrails in AI systems is crucial but challenging. It requires careful design to ensure they handle diverse inputs while maintaining accuracy. Ongoing maintenance is also necessary to keep guardrails effective as language models and content evolve. Despite these challenges, guardrails are essential for ensuring safe and responsible AI interactions.

Data Management and Data Governance

The output of the retrieval augment generation pattern is directly correlated with quality of the underlying data that is being used. AgPal Chat leverages the result of years of strong data management and governance practices from the AgPal team, who provided a foundation of high quality, well-curated data on Agricultural programs and services on the AgPal website. In this context, having good data management and governance practices can improve the accuracy and relevance of the generated texts by ensuring that the data sources are reliable, consistent, and up to date. Some recommendations to help realize the benefits of data management and governance include:

  1. Establishing a clear and comprehensive data strategy that defines the vision, goals, and principles of data management and governance.
  2. Implementing a robust and flexible data architecture that supports the integration, interoperability, and accessibility of different data sources.
  3. Adopting a data quality framework (see related TBS guidance) that ensures the validity, completeness, timeliness, and accuracy of data sources.
  4. Applying a data security model that protects the confidentiality, integrity, and availability of data sources and generated texts.
  5. Creating a data governance structure that assigns roles, responsibilities, and accountabilities for data management and governance.
  6. Monitoring and evaluating the data management and governance performance and outcomes, and making continuous improvements based on feedback and best practices.

Policy Related Considerations

When building AI applications in a federal public sector context, in addition to existing policies and guidelines (Directive on Automated Decision-making, Scope of the Directive, Guide on the use of Generative AI), some policy considerations should be made to ensure they are built in a responsible and ethical manner. We have found that these considerations were crucial in helping shape our design and approach to developing AgPal Chat and would highly recommend consulting them during the design phase.

Broader policy considerations on this front include:

  • Compliance: Ensure that the chatbot’s design and deployment are in line with existing applicable policies and legislation, and follow any best practices, suggested guidance from other public authorities and industry-specific regulations. Additionally, ensure that all internal or departmental policies and guidelines are adhered to. In addition to compliance, ensure that appropriate measures are in place to mitigate any legal and regulatory risks. This may involve seeking legal advice, implementing compliance processes, and staying up to date with developments in the legal and regulatory landscape.
  • Risk Evaluation: Evaluate and address potential cybersecurity threats, biases, violations of privacy, and the possibility of generating hallucinations or inaccurate information. If the system interfaces with the public, consider the public sentiment or current events that could impact the way that the tool may be perceived.
  • Stakeholder Engagement: Proactively engage with key stakeholders, such as legal counsel, privacy and security experts, Gender-based Analysis (GBA) Plus focal points, Diversity, Equity and Inclusion (DEI) representatives, other partners (e.g., Indigenous Communities), and internal process authorities (e.g., enterprise architecture, project governance) as early as possible to ensure a coordinated, compliant, and holistic approach.
  • Transparency: It is essential to notify users that they are communicating with an AI tool, rather than a human, to avoid any confusion or misunderstanding. Sharing additional information about the system such as a description of how it works, the data it is using, and the steps taken to ensure its quality can also be helpful to increase trust.
  • Bias and Discrimination Monitoring: Monitor the performance of the AI tools in guarding against bias or discrimination, ensuring that the technology is used responsibly and equitably. Capture the user queries and the system responses to review them on a scheduled basis across the lifecycle.
  • Education: Offer clear instructions to users on the optimal way to interact with the chatbot, including guidance on formatting their prompts or queries and what information should be included. Ensure that chatbot developers have access to training and resources to help them become skilled in using the technology and understand its capabilities, limitations, and best practices for its responsible use.
  • Iterative Development: Recognize the need for ongoing iteration to keep pace with regulatory and technological changes. Adopting an agile approach is one potential solution.
  • Sustainability: Ensure the design and implementation of AI tools are guided by a commitment to environmental sustainability to support long-term viability and mitigate any negative impacts on the environment or on the people and communities.

Conclusion

Prompt engineering, as a discipline, sits at the critical juncture of ensuring that AI-powered systems provide users with responses that are not only accurate and factually correct but also impartial, ethical, and contextually relevant. The introduction of RAG has marked a significant step forward in achieving this, by providing a mechanism for AI to dynamically access and incorporate external information. This process enhances the reliability and factual basis of the AI's responses, particularly in situations where the AI must draw from a vast and ever-evolving pool of knowledge.

The implementation of ethical guardrails, strong data management practices, and compliance to existing policies, laws and regulations can help AI systems better respect social norms and user trust, contributing to a more beneficial interaction for all parties involved.

Future research and improvements for AgPal Chat could focus on refining prompt engineering to improve contextual relevance, expanding the use of Retrieval-Augmented Generation (RAG) for more dynamic data integration, and enhancing the AI's scalability and efficiency, enhancing service delivery to users while retaining safety and reliability.

As AI continues to advance and become an integral part of personal and professional environments alike, the work put into prompt engineering will undeniably play a pivotal role in shaping the future of human-AI interactions. Ensuring that prompt engineering techniques continue to evolve alongside AI models will be vital in upholding the principles of accuracy, impartiality, and relevance. The proper application of prompt engineering and guardrails will enable AI to reach its full potential as a tool for enhancing human knowledge, decision-making, and productivity, without sacrificing ethical integrity or user trust.

Meet the Data Scientist

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic, and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

References