✅ Verified up-to-date [Last edited: 3/30/2026]
<aside> <img src="/icons/magic-wand_gray.svg" alt="/icons/magic-wand_gray.svg" width="40px" />
One of the most common misunderstandings about AI image generation is this: people assume prompt words behave like precise commands. They don’t.
That confusion shows up in a few familiar forms:
The technical truth is simpler: Diffusion models do not operate like cameras, film sets, or 3D render engines. They do not read a prompt and execute it like a checklist. They generate images by predicting what kinds of pixels usually go with certain words, patterns, and visual relationships.
</aside>
<aside> <img src="/icons/help-alternate_red.svg" alt="/icons/help-alternate_red.svg" width="40px" />
</aside>
You can, and if they are iconic enough they might work, but only if the correlations are strong enough. Midjourney has no special training for these terms and so they tend to be chaotic.
Remember, a diffusion model starts with noise, then gradually turns that noise into an image. At each step, it is making educated guesses based on patterns it learned during training.
So when you type a prompt, the model is not asking, “How do I set aperture, focal length, and lighting?” It is asking something closer to: “What images in my training experience tended to be associated with these words?”
That means prompts are not instructions in the strict sense. They are influences.
When someone writes:
“85mm lens, f/1.4, ISO 100, shallow depth of field”
…it sounds wonderfully precise. But in Midjourney, you’re prompting the diffusion model directly, you’re not talking to an LLM who might helpfully translate those words to useful tokens. Those words are not wired to a virtual camera. The model is not simulating optics. It is not calculating real exposure. It is not setting a sensor sensitivity value.
At best, it has learned haphazardly that phrases like “85mm” or “f/1.4” often appear near images with certain visual traits: tighter portraits, blurred backgrounds, a polished photography look, maybe a certain kind of subject framing.
So those terms can, if iconic enough, with random success and failure, sometimes suggest a look. They do not enforce one. They certainly don’t promise the change you’d think those words would provide.
A real control layer is something different. A real control layer constrains structure or composition directly. It might lock in a pose, preserve edges, follow a depth map, or keep a reference layout. Midjourney has an experimental version of the depth map layout feature called “retexture”. That is the closest Midjourney offers to actual steering. Camera metadata in a text prompt is usually just a style hint dressed up as engineering.
The same problem shows up with lighting language.
Words like: