The large language model (LLM) landscape has been abuzz with the concept of prompt engineering since the arrival of ChatGPT in late 2022. This involves crafting clever prompts, essentially worded queries, to coax the best results or bypass limitations from AI art or video generators and LLMs. The internet is flooded with prompt-engineering guides, cheat sheets, and advice threads aimed at maximizing LLM potential.
In the commercial sphere, companies are scrambling to leverage LLMs for building product copilots, automating tedious tasks, and creating personal assistants. Austin Henley, a former Microsoft employee who interviewed LLM-powered copilot developers, observes this widespread adoption: “Every business is trying to use it for virtually every use case they can imagine.”
However, a recent research wave suggests that prompt engineering might be best handled by the model itself, not a human engineer. This casts a shadow on the future of prompt engineering and raises questions about the longevity of prompt-engineer positions, at least in their current form.
The Enigmatic Success of Autotuned Prompts
Rick Battle and Teja Gollapudi, researchers at VMware, were baffled by the erratic and unpredictable nature of LLM performance in response to unconventional prompting techniques. For instance, users discovered that prompting models to explain their reasoning step-by-step (chain-of-thought) improved performance on various math and logic problems. Even more baffling, Battle found that feeding the model positive prompts like “this will be fun” or “you are as smart as ChatGPT” sometimes yielded better results.
Intrigued, Battle and Gollapudi embarked on a systematic exploration of how different prompt-engineering strategies impacted an LLM’s ability to solve elementary math problems. They tested three open-source language models with 60 unique prompt combinations each. The results were surprisingly inconsistent. Even chain-of-thought prompting offered mixed results, sometimes boosting and sometimes hindering performance. “The only real trend may be no trend,” they concluded. “What’s best for any given model, dataset, and prompting strategy is likely to be specific to the particular combination at hand.”
This research implies that the future of prompt engineering might bypass the current trial-and-error methods altogether.
Letting the Machine Do the Optimizing
A new wave of tools is emerging to automate prompt engineering, offering an alternative to the inconsistent results of manual methods. Given a few examples and a quantitative success metric, these tools can iteratively identify the optimal phrase to feed into the LLM. Battle and his collaborators discovered that in almost every case. The automatically generated prompt outperformed the best prompt found through trial-and-error. Additionally, the process was significantly faster, taking mere hours instead of days of searching.
The optimal prompts generated by the algorithm were so bizarre that no human could have likely conceived them. “I literally could not believe some of the stuff that it generated,” Battle says. In one instance, the prompt was essentially a lengthy Star Trek reference: “Command, we need you to plot a course through this turbulence and locate the source of the anomaly. Use all available data and your expertise to guide us through this challenging situation.” Apparently, framing itself as Captain Kirk helped this particular LLM perform better on basic math problems.
Battle argues that optimizing prompts algorithmically makes perfect sense considering the true nature of language models – models. “A lot of people anthropomorphize these things because they ‘speak English.’ No, they don’t,” Battle says. “It doesn’t speak English. It does a lot of math.” Based on his team’s findings, Battle proposes that manual prompt optimization might become obsolete. “You’re just sitting there trying to figure out what special magic combination of words will give you the best possible performance for your task,” Battle says, “But that’s where hopefully this research will come in and say ‘don’t bother.’ Just develop a scoring metric so that the system itself can tell whether one prompt is better than another, and then just let the model optimize itself.”
AI Art Gets a Tune-Up Too
Image generation algorithms can also benefit from automatically generated prompts. A team at Intel Labs, led by Vasudev Lal, embarked on a similar mission to optimize prompts for the image-generation model Stable Diffusion. “It seems more like a bug of LLMs and diffusion models, not a feature, that you have to do this expert prompt engineering,” Lal says. “So, we wanted to see if we can automate this kind of prompt engineering.”
Lal’s team developed a tool called NeuroPrompts that takes a basic user prompt, like “boy on a horse,” and automatically enhances it to generate a superior image. They began with a variety of prompts crafted by human prompt-engineering experts. Next, they trained a language model to refine basic prompts into these expert-level prompts. Finally, they employed reinforcement learning to optimize these prompts to create more aesthetically pleasing images. To determine this, they utilized yet another machine learning model, PickScore, a recently developed image-evaluation tool.
Here, PickScore served as a judge, rating the images generated by prompts that had been progressively refined. Through this process of trial and error, refinement, and evaluation, NeuroPrompts learned to transform simple prompts into ones that produced superior results according to the established metric.
The results were impressive
NeuroPrompts’ automatically generated prompts surpassed the expert-human prompts they used as a starting point, at least based on the PickScore ratings. Lal wasn’t surprised. “Humans will only do it with trial and error,” Lal says. “But now we have this full machinery, the full loop that’s completed with this reinforcement learning.… This is why we are able to outperform human prompt engineering.”
It’s important to acknowledge that aesthetic quality is inherently subjective. Recognizing this, Lal and his team designed NeuroPrompts to offer users some control over the optimization process. The tool allows users to specify the original prompt (e.g., “boy on a horse”) alongside an artist to emulate, a desired style and other modifiers.
Lal believes that as generative AI models, encompassing image generators and large language models, continue to evolve, the strange dependence on specific prompts might eventually disappear altogether. “I think it’s important that these kinds of optimizations are investigated and then ultimately. They’re really incorporated into the base model itself so that you don’t really need a complicated prompt-engineering step,” he says.
Please visit my other website InstaDataHelp AI News.
#instadatahelp artificialintelligence
Recent Comments