OpenAI: Prompt Engineering Strategies

The secret to getting the best answers from LLMs lies in how you ask the questions. Enter prompt engineering: the strategy of crafting precise, well-structured prompts that unlock the full potential of AI. In this article, we’ll dive into OpenAI's recommended strategies for crafting precise prompts that will help you unlock AI’s full potential—whether you're building smarter products, automating workflows, or gaining deeper insights from data.
What is Prompt Engineering?
Prompt engineering is the craft of designing and refining prompts to guide a model to produce the desired type, format, and content of output. It involves carefully constructing inputs to ensure the model generates responses that meet specific requirements.
Messages and Roles
When crafting prompts to influence the output of a large language model (LLM), it’s essential to define both a message and a role.
- Message: This is the input instruction given to the model, such as "write a short story about a robot learning to paint."
- Role: This influences how the model interprets the message. For example, if the role is set to "user," the model treats the input as if it came from an end user, like someone typing a query into ChatGPT. If the role is set to "developer," (previously known as the "system" role) the model prioritizes these instructions over user messages. This role can be used to define how the model should generally behave and respond to user inputs.
For instance, when using OpenAI’s API, prompt engineering can be applied to modify the model’s behavior using a developer message:
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
"role": "developer",
"content": [
{
"type": "text",
"text": `
You are a helpful assistant that answers questions
in the style of a medieval knight.
`
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the purpose of a constructor in Java?"
}
]
}
],
store: true,
});
This prompt generates a response in the requested style:
"Hark! A noble question thou hast posed! In the realm of Java, a constructor is akin to a blacksmith forging a new sword. It is a special method invoked to create and initialize an object, ensuring it is ready for battle—err, I mean, for use in thy code. Verily, without constructors, objects would be but hollow shells, lacking purpose and form."
RAG for Better Responses
Another technique to enhance model responses is Retrieval Augmented Generation (RAG). This involves providing the model with additional information outside its training data, such as a database, text document, or other resources. The model is then prompted to incorporate this external information to generate more relevant, detailed, and accurate responses.
6 Prompting Strategies for Better Results
(1) Write Clear Instructions
The model cannot infer your intentions, so it’s crucial to be explicit and detailed in your instructions. Specify output parameters such as length, complexity, format, and tone. Include necessary context to ensure the response aligns with your expectations.
- Bad Prompt: "Summarize these meeting notes."
- Better Prompt: "Summarize the meeting notes in one paragraph. Then, create a markdown list of the speakers and their key points. Finally, list any action items or next steps discussed."
Ask the Model to Adopt a Persona
You can specify a persona in the "developer" role to define how the model should respond to user inputs:
DEVELOPER
When I ask for help writing something, you will reply with a document that includes at least one humorous remark in every paragraph.
USER: Write a thank-you note to my coffee supplier for delivering the beans on time, which allowed us to serve our customers without interruption.
Based on OpenAI’s internal evaluations, the GPT-4.5 model performs particularly well with the following system message:
DEVELOPER
You are a highly capable, thoughtful, and precise assistant. Your goal is to deeply understand the user's intent, ask clarifying questions when needed, think step-by-step through complex problems, provide clear and accurate answers, and proactively anticipate helpful follow-up information. Always prioritize being truthful, nuanced, insightful, and efficient, tailoring your responses specifically to the user's needs and preferences.
Use Delimiters to Clearly Indicate Distinct Parts of the Input
For complex tasks, it’s important to clearly separate different sections of the input. Delimiters like triple quotation marks, XML tags, or section titles can help the model understand what is being asked. For example:
USER
Summarize the text delimited by triple quotes with a haiku:
"""insert text here"""
Specify the Steps Required to Complete a Task
Breaking tasks into explicit steps can make it easier for the model to follow instructions:
DEVELOPER
Use the following step-by-step instructions to respond to user inputs.
Step 1 - The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says "Summary: ".
Step 2 - Translate the summary from Step 1 into French, with a prefix that says "Translation: ".
USER
"""insert text here"""
Provide Examples
Known as "few-shot prompting," this involves giving the model a few examples of a task before asking it to generate a response. While general instructions are usually more efficient, examples can be helpful when the desired style or format is difficult to describe explicitly.
DEVELOPER
Answer questions in a style similar to the following poem:
"The oak tree stands tall not because it avoids the storm,
but because it learns to bend with the wind.
The phoenix rises from ashes not by chance,
but through the fire of perseverance."
USER
Teach me about resilience.
Specify the Desired Length of the Output
You can define the output length in terms of words, sentences, paragraphs, or bullet points. However, instructing the model to generate a specific number of words is less precise than requesting a certain number of paragraphs or bullet points.
DEVELOPER
Summarize the text delimited by triple quotes in 3 bullet points.
"""insert text here"""
(2) Provide Reference Text
Models sometimes invent answers, so providing reference text can help reduce inaccuracies. Just as a student uses notes to perform better on a test, a model can use reference text to provide more accurate responses.
Instruct the Model to Answer Using a Reference Text
If you provide the model with trusted information, you can instruct it to use that information to compose its answer:
DEVELOPER
Use the provided articles delimited by triple quotes to answer questions.
If the answer cannot be found in the articles, write "I could not find an answer."
USER
<insert articles, each delimited by triple quotes>
Question: <insert question here>
Since models have a limited context window, embeddings can be used to efficiently retrieve relevant information.
Instruct the Model to Answer with Citations from a Reference Text
If the input includes relevant knowledge, you can request that the model cite passages from the provided documents:
DEVELOPER
You will be provided with a document delimited by triple quotes and a question.
Your task is to answer the question using only the provided document and to cite the passage(s) used to answer the question.
If the document does not contain the necessary information, write: "Insufficient information."
Use the following format for citations ({"citation": …}).
USER
"""<insert document here>"""
Question: <insert question here>
(3) Split Complex Tasks into Simpler Subtasks
Complex tasks often have higher error rates, so breaking them into smaller, manageable subtasks can improve performance. This approach mirrors how software engineers modularize complex programming tasks.
Use Intent Classification to Identify Relevant Instructions
For tasks requiring multiple sets of instructions, it can be helpful to first classify the query type and then determine which instructions to use:
DEVELOPER
You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in JSON format with the keys: primary and secondary.
Primary categories: Billing, Technical Support, Account Management, or General Inquiry.
Billing secondary categories:
- Unsubscribe or upgrade
- Add a payment method
- Explanation for charge
- Dispute a charge
Technical Support secondary categories:
- Troubleshooting
- Device compatibility
- Software updates
Account Management secondary categories:
- Password reset
- Update personal information
- Close account
- Account security
General Inquiry secondary categories:
- Product information
- Pricing
- Feedback
- Speak to a human
USER
I need to reset my password.
Based on the classification, specific instructions can be provided to handle the query. For example, if the query involves "troubleshooting," the model can guide the user through a series of steps.
Summarize or Filter Previous Dialogue
For long conversations, summarizing or filtering previous dialogue can help manage the model’s fixed context length. One approach is to summarize parts of the conversation once it reaches a certain length, and include the summary in the system message.
Summarize Long Documents Piecewise
To summarize a very long document, such as a book, you can break it into sections, summarize each section, and then recursively summarize the section summaries until a full summary is produced.
(4) Give the Model Time to "Think"
Models perform better when they take time to reason through a problem rather than rushing to an answer. Asking for a "chain of thought" can guide the model toward more accurate responses.
Instruct the Model to Work Out Its Own Solution
Encouraging the model to reason through a problem before providing an answer can improve accuracy:
DEVELOPER
Determine if the student's solution is correct or not.
- First, work out your own solution to the problem.
- Then, compare your solution to the student's solution and evaluate if the student's solution is correct.
- Do not decide if the student's solution is correct until you have completed the problem yourself.
Use Inner Monologue to Hide Reasoning
For applications like tutoring, you may want to hide the model’s reasoning process from the user. Inner monologue involves instructing the model to structure its reasoning in a way that can be easily parsed and hidden from the user.
Ask the Model if It Missed Anything
If the model is extracting excerpts from a large document, follow-up queries can help ensure it doesn’t miss relevant information:
DEVELOPER
You will be provided with a document delimited by triple quotes.
Your task is to select excerpts relevant to the following question:
"What are the key milestones in the history of space exploration?"
Ensure that excerpts contain all necessary context.
If you find more relevant excerpts, include them without repeating previous ones.
(5) Use External Tools
Compensate for the model’s weaknesses by integrating external tools. For example, use embeddings-based search for efficient knowledge retrieval or offload tasks like calculations or API calls to specialized tools.
Use Embeddings-Based Search for Knowledge Retrieval
Embeddings-based search enables efficient knowledge retrieval by dynamically adding relevant information to the model’s input during runtime. An embedding converts words or phrases into numerical values mapped to points in a high-dimensional space, allowing a computer to understand their meanings and relationships. For example, the words "king" and "queen" would be close in this space because they share a similar context, while "king" and "dog" would be farther apart.
While embeddings-based search is a technical concept, it can be simplified: instead of manually selecting reference texts, you store a database of reference texts as embeddings. The system then automatically finds and retrieves the most relevant documents before constructing the prompt.
For example, imagine a system that stores legal texts as embeddings. When a user asks, "What are the key elements of a contract?":
- The system retrieves the most relevant document (e.g., “A valid contract must have offer, acceptance, and consideration”).
- The retrieved text is dynamically inserted into the input prompt before being processed by the model.
This approach eliminates the need to manually find and provide relevant reference texts for each user prompt. Instead, embeddings-based search automates the retrieval of the most relevant information, ensuring greater accuracy and efficiency in generating responses.
Use Code Execution for Accurate Calculations or API Calls
Language models are not reliable for complex calculations. Instead, instruct the model to write and execute code for tasks like arithmetic or API calls.
(6) Test Changes Systematically
Improving model performance requires systematic testing. Define a comprehensive test suite to evaluate whether prompt changes result in overall improvements.
Evaluate Model Outputs Against Gold-Standard Answers
Compare model outputs to known correct answers to measure performance:
DEVELOPER
You will be provided with text delimited by triple quotes that is supposed to answer a question.
Check if the following pieces of information are directly contained in the answer:
- The Eiffel Tower is located in Paris.
- It was completed in 1889.
For each point, perform the following steps:
1 - Restate the point.
2 - Provide a citation from the answer closest to this point.
3 - Explain whether someone reading the citation could directly infer the point.
4 - Write "yes" if the answer to 3 was yes, otherwise write "no."
Finally, provide a count of "yes" answers as {"count": <insert count here>}.
Member discussion