5 reasons why your AI images don't meet your expectations
Tips for reducing the gap between idea and generation
One of the most common questions I get about generative AI is, "How do I get what I want from AI,” or “how do I get something that doesn't look like it was created by AI?"
During these conversations, I've noticed a few common gaps or misconceptions that can be helpful to overcome, not only in generating images but also in anything that requires effective art direction through prompts (e.g. video).
If you’re not achieving the results you hoped for, here’s a few things to consider:
1. Your definition of the token doesn’t match the generator’s definition.
Token alignment is the key to generation success.
Every LLM is built on a different set of training material, which means the “understanding” of any particular concept may shift with each generator. In many ways, this happens the same way it does with humans. Based on our personal experiences, we bring a different understanding to concepts like “playful” or “rich.” And that’s true for LLMs too.
Which is why prompt layering can be a very important part of developing a prompt, especially for something you’d like to reuse. This technique ensures you understand how your AI partner defines the concept you’re trying to relay through tokens. It’s a tool for creating alignment and prompt refinement, so your output is much more closely tied to the concept in your mind.
2. You identified the token/prompt with one tool, and are using it on another one.
Related to the first point, if you try to port your prompts from one generator to the next, you’re probably not going to have a lot of success in creating consistency. Each model utilizes different training material, which means the results are going to vary. If you intend to utilize prompts across generators, I encourage you to utilize prompt layering with each generator before settling on a single prompt, or to create different prompts per generator in your prompt library. For instance, you may use “coral” in one generator to suggest a pinkish-orange hue, but “salmon color” in another.
3. You are being too vague.
When we communicate our ideas to others, we often rely on them to fill in the gaps. This is why people enjoy working together over time, because a shared language with clear and shared meanings develops.
The same is true for generative AI. Sometimes, that's acceptable. For instance, when we need a simple landscape image for a presentation, we don't need to specify every detail. However, when we're launching a brand campaign, we want to have control over all the elements.
Prompting is a delicate process. Long or bloated prompts are no more effective than short ones, and Midjourney's shorten command provides evidence for this when tokens are ignored or deprioritized. The goal is to find a sweet spot, which, in my experience, is around two to three lines maximum. Unfortunately, most generators don't provide feedback on your prompt beyond the final generated image. This is another reason I consider Midjourney the most effective tool currently available for refining prompts, because the shorten command allows us to understand and refine them. But I hope all generators will develop features over time that help us align more effectively with our generator without requiring us to utilize images.
4. Your token isn’t being prioritized the way you would prioritize it.
Sometimes, LLMs don’t understand our language, or our concepts are prioritized—or weighted—differently than we expected. Unfortunately, there’s very little transparency on this process beyond Midjourney’s Shorten command which reveals the LLM’s default weighting applied to your tokens. Here’s an example of weighting from a long prompt. Notice that many are weighted “0” which may not have been the user’s intent:
Weighting simply implies concept (or token) prioritization. For instance, in “cat in the garden” which is more important? Cat or garden? Your generator is going to make some assumptions based on it’s training and the patterns it understands with these two concepts. But you may not always want it to make these assumptions, which is why you may need to prioritize your concepts differently. Prompt weighting is currently possible in Midjourney’s prompt weighting feature, but most other generators do not offer this capability at this time.
#5 Your didn’t specify token weight.
This is only possible in Midjourney right now, but a really critical feature I hope to see more generators adopt. Token weighting allows you to increase or decrease the priority of a token, or a concept. So perhaps you want your lighting to be more intense, you can use the weighting feature to increase that importance so that it’s more prominent in your final rendering. When we don’t specify weight, you’ll get the generator’s default “choices.”
Effective art direction has always required individuals to excel at communicating their ideas, from individual components to the priorities and relationships of each, down to how they're executed. However, many of us have relied on the human on the receiving end to fill in the gaps. Sometimes this worked, and sometimes it didn't. Now, as we learn to direct AI, this isn't any different except it may feel quite distinct when the generator fills in the gap in unexpected ways. Therefore, practicing your art direction completion with your team will inevitably lead to improvements in your AI generations as well.