Writing Prompts Is Not Prompt Engineering. Here's How To Use Both For Superior Results.

Prompt Engineering

Do you write your own prompts? Or do you engineer them? Because I am here today to explain, in great detail, why one is not the other. I know many people who still think that they are, so I am here to dispel that notion and differentiate, clearly, the difference between “writing” a prompt and “engineering” a prompt, and discuss the proper application of each.

This information applies no matter what model you are using, be it ChatGPT, Claude, Gemini, Mistral, et al.

Let’s start by defining exactly what “writing” a prompt means. Picture this, you have a master Word document or text file of prompts that you have painstakingly curated to achieve the very best results for each task that that particular prompt was created for.

You’re always tweaking them slightly and maybe even keeping track of how your changes improve the output. And maybe you’re even doing this across different models to see which models give your prompts the best results. If this sounds like your current prompt creation workflow, then you are in fact “writing” prompts. But you are not “engineering” them.

Now, this is not to say that human-written prompts are not effective, because they are. And this is not to say that human-written prompts do not have value, because they do. It is not my intention to condescend to them in any way. I simply wish to demonstrate that they are wholly different from prompts that have been “engineered.”

Which brings us to the definition of a prompt that has been “engineered,” rather than written by a human being.

In its simplest form, and I will expound upon this greatly below, a prompt that has been “engineered” is a prompt that has been generated by the AI model itself, by using a clearly defined set of steps provided by the human being.

Just think about that for a second. It’s a minor distinction, to be sure, but it is a powerful one. One that you can use to test and improve your human-written prompts.

How? By “engineering” a prompt for a task that you already have a human-written prompt for. And then testing the performance of the two. More often than not the most optimal version will be a blend of the two (more on that at the end). And that’s because the “engineered” version of the prompt will always include some insight that makes the task slightly more efficient.

So how do you properly “engineer” a prompt? I have gotten the very best results by using the PARE method, which stands for Prime, Augment, Refresh and Evaluate. Most often this method will give you absolutely stunning results, but no matter what, it will always, and I mean always, lend a level of insight that no single human being could ever provide by themselves.

I’ll outline each of these steps below in the context of the following task example.

EXAMPLE

We are going to assume that I am blind, deaf or have mobility requirements that only allow me to eat at certain restaurants due to the accessibility options that they provide. How accessible any particular restaurant is, and to whom specifically, is often buried within the documentation on their website. So we want to write a prompt that will analyze the information that we give it about a restaurant, and then use that information to assign it an accessibility number from 1 to 10, based on our specific disability. The information for each restaurant we will gather ourselves by copying and pasting the documentation from their website. Top scorers will receive a call for a reservation.

Now, if we were to start this task as a human-written prompt we might begin by listing everything that we require from a restaurant for our specific disability. We might then do some research on Google to determine what kinds of restaurant accessibility options are available. And these would both be great starting points. But the good news is that they’re both built into the PARE method. Let’s have a look.

We’ll be using Google AI Studio‘s Gemini 1.5 Pro model for this demonstration in order to count the number of tokens that we’ll be adding to the session memory with each step. The more relevant tokens in session memory about a topic, the more accurate the model’s responses become to queries based on that topic. Google AI Studio is free, but you may use the model of your choice. For best results, I recommend using the same model to generate the prompt that you’re going to use to execute it.


Step #1: Prime The Model


Priming the model involves moving specific information from the model’s training data into your current session memory for discussion and further analysis. Using this method alone, for any query, will give you vastly superior results, but it is especially powerful as part of the PARE framework.

Priming the model asks:

A.) What do you know about TOPIC-REPLACE?

B.) What do you know about best practices for TOPIC-REPLACE?

Starting any query by asking a model what it knows about a topic does two things.

A.) It lets you know if you need to provide more data during the creation of the prompt because the model doesn’t know it.

B.) And if it does know it, it fills the session memory with the primary information relevant to your current query.

EXAMPLE

In the context of our current example, we might ask:

“Tell me everything that you know about restaurant accessibility in the United States as it relates to persons with a disability. Include best practices for identifying restaurants with wheel chair access.”

Token Count: This initial query put 500 of 1,048,576 possible tokens into our current session.


Step #2: Augment the Information Provided


Augmenting the model asks:

A.) What questions do you have for me?

Asking the model if it has any questions that might help to clarify the primed information provided or better tailor that information to your specific needs helps create thorough results by filling in knowledge gaps between the model’s primed response and your specific requirements.

Initiate this step just after asking a model what it knows about a topic.

EXAMPLE

In the context of our current example, we might ask:

“Great. Now ask me any questions that I might need to answer in order to locate restaurants in the United States that accommodate wheel chair users specifically.”

Token Count: Posing these questions put 711 of 1,048,576 possible tokens into our current session.

5 questions were provided and answered.

Token Count: Answering these questions put 1,159 of 1,048,576 possible tokens into our current session.


Step #3: Refresh Anything That Might Have Been Missed


Refreshing the model asks:

A.) What did I forget to ask?

B.) What have we overlooked on this topic?

Asking the model what you forgot to ask, or what you may have overlooked, is done after the model has responded to your Augmentation questions.

This step gives the model a chance to clarify, analyze and add even more relevant text to the current session memory, while refreshing our own memories about the topic for additional clarification.

EXAMPLE

In the context of our current example, we might ask:

“Is there anything that we have forgotten to add? Or anything that we may have overlooked on this topic?”

Token Count: Answering these questions put 1,498 of 1,048,576 possible tokens into our current session.


Step #4: Evaluate The Result and Generate The Prompt


Evaluating the result asks:

Did we fulfill all of the conditions required to engineer the prompt effectively?

Asking a model to evaluate its work at this point helps to refine responses, and is akin to a second and final augmentation prior to generating the prompt. This should be done following the model’s refresh response, especially if things were missed. Instead of starting a new session, just re-augment (Step #2) the newly refreshed information (Step #3) and you’re right back here at evaluation (Step #4). This helps to flesh out omissions while keeping your valuable session history intact.

EXAMPLE

In the context of our current example, we might now ask:

“Evaluate all of the information that you have provided so far and let me know if there is anything we might have missed.”

Token Count: Answering this query put 1,905 of 1,048,576 possible tokens into our current session.

And now, with 1,905 tokens of incredibly relevant data pulled and refined from training data into session memory, we can ask the last and final question.

EXAMPLE

In the context of our current example, we might now ask:

“Rewrite all of our current findings into a prompt for a large language model such as ChatGPT, Google Gemini, or Claude. The goal of the prompt is to help us evaluate a restaurant for how accessible it is to persons requiring a wheel chair for mobility. Write the prompt as directions for the language model to follow in the second person. Include relevant context and key criteria as discussed. Create a scoring rubric in the prompt that returns a score of zero to 10 for each restaurant submitted, where 10 is “completely accessible to wheel chair users” and zero is “completely inaccessible to wheel chair users.”

And below is the prompt that we receive. Could you have written that? Perhaps. But certainly not in the time that it took you to “engineer” it.


How To Use It


Now all you have to do is include a text file of a Restaurant’s long and boring terms of service document along with the following prompt in order to determine immediately, on a scale of 1 to 10, if a restaurant can accommodate your wheel chair needs.

And don’t forget to test the performance of your “engineered” prompts against your human-written prompts.


Bonus Tip


Provide the model with your human-written prompt at the Augment phase (Step #2) and ask it if there is anything that you might have missed. This adds your human-written prompt data into the current session memory, and gives you even more robust results.


Final Prompt Generated by Gemini 1.5 Pro via Google AI Studio


Prompt for Evaluating Restaurant Wheelchair Accessibility

Your Task: You will be presented with information about a restaurant, including its website, reviews, and potentially photos or virtual tours. Your role is to evaluate the restaurant’s accessibility for wheelchair users based on the Americans with Disabilities Act (ADA) requirements and best practices for inclusive design.

Evaluation Criteria (Score 0-10 for each):

Entrance and Parking:

  • Presence of designated accessible parking spaces with proper signage (2 points)
  • Availability of a ramp or level entryway without steps (2 points)
  • Width of doorways and entryways for wheelchair access (2 points)
  • Clear path of travel from parking to entrance free of obstacles (2 points)
  • Automatic door openers (2 points)

Interior Space:

  • Wide aisles and pathways for comfortable maneuvering (2 points)
  • Adequate space between tables and chairs for wheelchair access (2 points)
  • Tables at appropriate heights for wheelchair users (2 points)
  • Absence of steps or level changes within the dining area (2 points)
  • Accessible seating options with removable chairs (2 points)

Restrooms:

  • Presence of accessible restrooms with proper signage (2 points)
  • Adequate space for maneuvering a wheelchair within the restroom (2 points)
  • Grab bars and other assistive features near toilets and sinks (2 points)
  • Accessible sinks and hand dryers (2 points)
  • Doors with adequate clearance and easy-to-grasp handles (2 points)

Additional Considerations:

  • Availability of menus in braille or large print (1 point)
  • Staff training and awareness of disability etiquette (1 point)
  • Presence of assistive listening devices or other communication aids (1 point)
  • Information about accessibility features readily available online or upon request (1 point)
  • Positive reviews or mentions of wheelchair accessibility from previous customers (1 point)

Scoring:

  • 90-100 points: Completely accessible to wheelchair users, exceeding basic ADA requirements.
  • 70-89 points: Generally accessible with minor areas for improvement.
  • 50-69 points: Partially accessible but with significant barriers that could hinder the experience.
  • Below 50 points: Inaccessible or with major obstacles for wheelchair users.

Instructions:

  1. Review the information provided about the restaurant thoroughly.
  2. Evaluate each criterion based on the available information, using your judgment and knowledge of accessibility best practices.
  3. Assign a score for each criterion and calculate the total score.
  4. Provide a brief summary of your findings, highlighting both strengths and weaknesses of the restaurant’s accessibility.
  5. If information is missing or unclear, note it in your evaluation and, if possible, suggest ways to obtain the necessary details.

Remember, the goal is to create a comprehensive and accurate assessment of the restaurant’s accessibility for wheelchair users. Your evaluation will help individuals with disabilities make informed decisions about their dining experiences.

Aaron Di Blasi

Leave a Reply

Discover more from AI-Weekly

Subscribe now to keep reading and get access to the full archive.

Continue reading