Multiple Subjects

✅ Verified up-to-date [Last edited: 3/20/2026]

Overview (TL;DR)

If you’re prompting for more than one object or character in a scene, and their details are blending together, try this formula:

[set-up a generic scene using keywords] [add details by calling back to those keywords] [describe the rest of the image] [describe the vibe or aesthetics]

Example: Three different best friends sitting close together on a park bench. The friend in the middle is a cheerful blonde Caucasian woman wearing jeans and a green tank-top. The friend on the right is a serious African American man dressed in a tuxedo. The friend on the left is a laughing Indian woman wearing orange Hindi traditional robes. Stylish digital art by Krenz Cushart and Tom Bagshaw.

</aside>

How many subjects (objects or characters) can I get in the same image?

</aside>

Here’s a general rule of thumb for how many subjects you can prompt for successfully, with individual details fairly intact (assuming you follow the suggestions in this FAQ).

In V6: 1-2 reliably, up to 3 with careful prompting
In V7: 1-3 reliably, up to 5 with careful prompting
In V8: 1-20, up to 40 with careful prompting (more if you don't hit the token limit).
- That's not just objects or characters. That's subjects + details that you control.

There's a special way to prompt multiple subjects…

…otherwise, they blend together.

</aside>

Try this prompt template, explained in the 1️⃣ - 2️⃣ - 3️⃣ - 4️⃣ steps below...

**[set-up a generic scene using keywords] [add details by calling back to those keywords] [describe the rest of the image] [describe the vibe or aesthetics]**

First, let me give you some vocabulary for this…

[set-up a generic scene using keywords] - This is called setting up the compositional archetype.
[add details by calling back to those keywords] - This is called lexical anchoring.

You’ll need both for this to work.

</aside>

1️⃣ Step #1: Compositional Archetype: [set-up a generic scene using keywords]

For prompts where it makes sense to do so, set up the scene in generic terms using archetypes in the first statement. There’s a sweet-spot for specificity here. It doesn’t have to be very long. You’ll add details in a moment. [Note: This isn't a rule. You don't have to do this. But if what you're doing isn't working, try this. It might help.]

✅ Good:	`Three friends sitting on a park bench.`
✅ Better:	`Three different friends sitting on a park bench.`	(Without "`different"` Midjourney gets to decide their general appearance and they may appear similar.)
✅ Best, get specific:	`Three different best friends sitting close together on a park bench.`	(Without “`best friends`” and "`sitting close together"` we get a more generic vibe.)