The First-and-Last-Token Rule: Why Your AI Image Prompt Structure Matters More Than the Words

Q: How should I structure an AI image prompt?

Put aspect ratio, identity, and format type at the start AND restate them at the end. The middle carries style, lens, lighting, attire, and setting. Modern image models weight the first and last roughly ten tokens far more heavily than the middle, so anything you need the model to lock has to bookend the prompt. Treat the middle as where you say everything else.

Q: Why does the AI ignore part of my prompt?

Because of how attention works inside the model. A 2023 Stanford paper called Lost in the Middle measured large language models on long inputs and found performance drops sharply for content sitting in the middle of the context. Image models built on the same transformer attention from Vaswani's 2017 paper inherit the same shape. The middle of your prompt is the part the model is least confident about.

Q: How long should an AI image prompt be?

Between four and twelve sentences in a single paragraph. Longer than twelve and you push too much vocabulary into the discounted middle. Shorter than four and the model fills gaps with its training-data defaults, which is how you get porcelain skin, symmetric framing, and flat front-light every time. Length is not virtue. Position is.

In 2026 you wrote a 300-word AI image prompt, hit generate, and stared at something that looked like every other AI image on the internet. You added more words. The face drifted. You added more. The aspect ratio bled to square. You added more, and the cinematic 85mm line you fought to get right did absolutely nothing. The problem was never your vocabulary. The problem is that you have been writing prompts as if the model reads every word with the same attention. It doesn’t. The first roughly ten tokens and the last roughly ten tokens of your prompt do most of the work. The middle is where everything you wrote goes to die.

The first-and-last-token rule, in one paragraph

The first-and-last-token rule says that LLM-based image models weight the opening ten to twenty tokens and the closing ten to twenty tokens of a prompt most heavily, with attention dropping through the middle. This isn’t folklore. It falls out of two of the most cited papers in modern AI: Vaswani et al.’s 2017 “Attention Is All You Need” introduced the attention mechanism every modern image model is built on, and Liu et al.’s 2023 “Lost in the Middle” measured the empirical curve — first and last positions outperform the middle by a wide margin on long-context tasks. We tested the rule against 125 production prompts at lifehackedai across portraits, posters, products, invitations, and avatars. Every prompt that violates the rule produces drift in identity, aspect, or format. That covers basically every “the AI ignored what I asked for” complaint you have ever had.

Why the middle of your prompt gets ignored

Modern image models are transformer-based. Their entire job is to take a chunk of text and figure out which words matter for which output pixels. The mechanism that does the deciding is attention, introduced by Vaswani et al. 2017 (Attention Is All You Need). Attention weighs every token in your prompt against every other token. In theory the weighing is uniform across position. In practice it absolutely isn’t.

Liu et al. 2023 (Lost in the Middle: How Language Models Use Long Contexts) tested large language models on long inputs where the relevant information was placed at different positions. The pattern was brutal. When the answer sat near the beginning of the input, models scored highest. When it sat near the end, models scored second-highest. When it sat in the middle, performance dropped, sometimes below what the model could do with no context at all. The shape they measured is a U-curve: high at both ends, sagging in the middle. Image models inherit the same attention substrate. Same curve, different output.

So when you write a long prompt that puts your identity descriptors in sentence five out of nine, the model isn’t ignoring you out of malice. It’s allocating attention exactly the way Liu’s curve predicts. Sentence five is the trough. Whatever you wrote there has to fight harder to register than the words at the edges. Most of the time it loses. The fix isn’t writing more. It’s writing the same thing in the right place.

This is also why “just add more details” advice keeps producing worse images. More words in the middle means more weight in the discounted zone, which means the model leans harder on its training-data defaults — porcelain skin, symmetric framing, flat front-light. The thing you tried to specify lost; the average AI face won. Position, not vocabulary.

What this means for your prompts: the five rules that fall out

The first-and-last-token rule isn’t a piece of trivia. Five concrete rules follow from it, and every one of them is a move you make next time you paste a prompt into ChatGPT, Nano Banana, or Midjourney.

Rule 1. State aspect, identity, and format at the start

Open with what the image IS, not how it should look. Aspect ratio. Identity anchor if a face is involved. Format type. Five to ten tokens, no adjectives, no style. You are using your highest-attention real estate to lock the three things you cannot afford to lose.

Rule 2. Bookend by restating them at the end

This is the move that earned the first-and-last-token rule its name. Take what you said in the first ten tokens and say it again in the last ten tokens. Identity descriptors. Aspect ratio. Format. Five to ten tokens is enough. The model has to push past your bookend to exit the prompt, and that’s exactly when its attention is highest. Use it.

Look at what happens when you don’t:

Identity at start only

portrait of a 38-year-old founder, square jaw, short dark hair, warm-grey paper backdrop, charcoal suit jacket over crisp white open-collar shirt, directional cinematic key light from upper-left at 45°, cool fill on shadow side, 85mm at f/1.4, shallow depth of field, micro-pore skin, three-quarter angle, subject 50-70% frame, photorealistic, editorial polish, 3:4

Failure mode: identity drift.

The face shifts. The model spent its last-token attention on "photorealistic, editorial polish", not the founder.

Identity at both ends

portrait of a 38-year-old founder, square jaw, short dark hair, 3:4, warm-grey paper backdrop, charcoal suit jacket over crisp white open-collar shirt, directional cinematic key light from upper-left at 45°, cool fill on shadow side, 85mm at f/1.4, shallow depth of field, micro-pore skin, three-quarter angle, subject 50-70% frame, same founder, square jaw, short dark hair, 3:4, photoreal

Holds.

Identity occupies both ends. Style fills the middle. The model can't forget who.

Same model, same reference, same content. The only thing that changed was that the right-side prompt repeated the identity descriptors as the last clause. The face holds. The first card spent its last-token attention on “photorealistic, editorial polish” — generic style words. The model exited the prompt thinking about polish, not about the founder. So the founder drifted.

Rule 3. Stack style, lens, lighting, attire, and setting in the middle

This is where the bulk of your vocabulary goes. Three to nine sentences of concrete language: directional cinematic key light from upper-left at 45 degrees, 85mm at f/1.4, charcoal suit jacket over crisp white open-collar shirt, warm-grey paper backdrop. The middle isn’t useless. It’s lower-weight. It still does work; it just has to do it inside a smaller attention budget. Spend that budget on concrete words. Adjectives like “professional” and “cinematic” don’t earn their keep when they’re competing in a discounted zone. (Concrete beats adjective is its own rule on this site.)

Rule 4. Never put a high-priority constraint in a single mid-prompt mention

If aspect ratio appears once and that once is in sentence five, it’s gone. If your platform-specific format (“square Instagram crop,” “3:4 portrait”) shows up once in the middle, expect bleed. Anything that has to hold gets stated at both ends or it doesn’t hold. This is the failure pattern behind aspect-bleed, the named failure mode in our 12-mode visual taxonomy. One mid-prompt mention is the same as no mention.

Rule 5. Cut your prompt before extending it

The temptation when an image fails is to add more. Three more sentences of clarification. Two more adjectives. Counter-intuitive move: cut first. Every sentence you add to the middle of a prompt steals attention budget from the bookends. Counter-counter-intuitive move when you actually do need more vocabulary: split it. Move one descriptor from sentence six to sentence one or sentence nine. Position, not length.

Here is what these five rules look like in practice. Same content, three different orderings, three different outputs:

Ordering A, identity buried mid-prompt

cinematic 85mm portrait, warm-grey paper backdrop, charcoal suit jacket, subject identity: 38-year-old founder with short hair, square jaw, directional key light at 45°, micro-pore skin, shallow depth of field, 3:4 aspect

Output: face drifts.

Identity tokens sat in the discounted middle. Style tokens at the ends won.

Ordering B, aspect ratio only at the start

3:4 portrait of a 38-year-old founder with short hair, square jaw, warm-grey paper backdrop, charcoal suit jacket, directional key light at 45°, micro-pore skin, 85mm at f/1.4, shallow depth of field, editorial polish

Output: aspect bleeds to square.

Aspect ratio appeared once. The last tokens overrode it with "editorial polish."

Ordering C, identity + aspect at both ends

Output: holds.

Identity + aspect bookend the prompt. The middle carries style.

Card A buried identity in the middle. The face drifted. Card B put aspect ratio at the start but let “editorial polish” close the prompt; the aspect bled to square. Card C bookended identity and aspect at both ends; the style words held the middle. Same vocabulary, three orderings, three different worlds.

This is one of those things you can paste into your AI tool on a Tuesday afternoon, see work, and never write a long prompt the dumb way again. If you’d rather skip the writing entirely, one paste-ready AI move a week lands in the newsletter, structured the way this article describes.

The 4-line spine

Once you internalize the five rules, your prompt collapses into a structure. Four lines. The first and last carry the load. The middle is where you say everything else.

Line 1 is the spine. Aspect, format, identity in five to ten tokens. Line 4 is the bookend that restates Line 1. Lines 2 and 3 carry the rest. Swap the words inside each line all you like; never swap the order. The whole point of the first-and-last-token rule is that the spine is structural, not vocabularic.

If you want the literal template, here it is. Paste it into ChatGPT, Nano Banana, or Midjourney. Replace the placeholders. Don’t move the lines around.

Show the full promptTap to expand

Paste this into your AI (ChatGPT, Claude, Gemini, Midjourney).

REQUIRED upload before pasting: one clear front-facing reference photo for any face involved.

The placeholders inside {...} are the only things you change.

Generate this image:

{ASPECT_RATIO} {FORMAT_TYPE} of {IDENTITY_ANCHOR}, set against {SETTING_DESCRIPTION}. {SUBJECT_POSE_AND_FRAMING}. {LIGHTING_AND_LENS_DESCRIPTION}, {ATTIRE_OR_MATERIAL_DESCRIPTION}, {SKIN_OR_TEXTURE_DETAIL_BEAT_AI_DEFAULTS}. {ADDITIONAL_CONCRETE_PARAMETERS_NO_ADJECTIVES}. {IDENTITY_ANCHOR_RESTATED}, {ASPECT_RATIO} {FORMAT_TYPE}, photoreal.

Rules the AI must follow:

Lock identity to the uploaded reference; do not invent face shape, hairline, or eye spacing.
Aspect ratio is strict and stated at the start and the end; do not bleed.
Realistic micro-imperfection required; no porcelain-smooth airbrushed look.
Single image output, no moodboard, no contact sheet.
Output the image directly, no prompt explanation.

Replace these placeholders with your details:

{ASPECT_RATIO} = 3:4
{FORMAT_TYPE} = portrait
{IDENTITY_ANCHOR} = a 38-year-old founder with short dark hair and a square jaw
{SETTING_DESCRIPTION} = a warm-grey paper backdrop
{SUBJECT_POSE_AND_FRAMING} = three-quarter angle, subject 50-70% frame, head 1/7 of frame height
{LIGHTING_AND_LENS_DESCRIPTION} = directional cinematic key light from upper-left at 45 degrees, cool fill on shadow side, 85mm at f/1.4, shallow depth of field
{ATTIRE_OR_MATERIAL_DESCRIPTION} = charcoal suit jacket over crisp white open-collar shirt
{SKIN_OR_TEXTURE_DETAIL_BEAT_AI_DEFAULTS} = visible pores, fine micro-texture, micro-asymmetry
{ADDITIONAL_CONCRETE_PARAMETERS_NO_ADJECTIVES} = warm color grade, 35mm film grain layer
{IDENTITY_ANCHOR_RESTATED} = same founder, square jaw, short dark hair

Bonus tips. Swap {ASPECT_RATIO} between 3:4, 1:1, 16:9, 9:16 as needed; restate it identically in the closing line every time. For multi-subject prompts repeat the identity anchor for each subject at the start AND at the end. For poster / wall-art subjects, replace lighting + lens lines with composition fractions (“subject 50-70% frame, 40% whitespace, headline hardcoded at top”). The full poster build start to finish is in how to make a vintage travel poster.

That’s the whole structure. Four lines. Two carry the spine; two carry the body.

What we tested it against

The 125 number is a credential, not folklore. The lifehackedai prompt library contains 125 production prompts spread across 25 subcategories: portraits, headshots, holiday cards, product photos, party invitations, baby announcements, restoration jobs, posters, the lot (per our internal asset inventory). Every single one is written in single-paragraph prose, four to twelve sentences, with aspect ratio plus identity plus format type bookended at both start and end. That bookending IS the first-and-last-token rule applied. It’s rule number two in our internal eight-rule prompt-format spec. The other seven rules are downstream consequences.

What we measured was qualitative across model swaps. Same prompt run through GPT-Image-2, Nano Banana Pro, Midjourney v8.1, and Flux 2 Pro. The prompts that bookend identity-plus-aspect produced consistent output across all four models. The prompts that didn’t drifted on at least one. The first-and-last-token rule isn’t a model-specific hack; it’s a transformer-attention consequence. Different model, same attention shape, same rule.

Where the rule breaks (the honest part)

The first-and-last-token rule is shaped by where attention currently bottlenecks. In three regimes it bends.

The short-prompt regime. If your prompt is under twenty tokens, there is no meaningful middle. The whole thing is bookend. The rule still applies, but it stops being a structural rule and starts being a stylistic preference: order your few tokens by priority, with the most important first and last. Same shape, less leverage.

The thinking-mode regime. OpenAI’s GPT-Image-2 launched with a feature they call thinking mode, which reasons over the prompt structure before generating (per the GPT-Image-2 release page). Thinking-mode runs flatten the U-curve a little; the model can lift important tokens out of the middle into its working context. The rule still helps, just less. Reading the model’s own released material on this is more useful than any third-party guide.

The next-gen regime. Google DeepMind’s Nano Banana Pro launched as the Gemini 3 Pro Image production tier (per the Google blog announcement), and both major models now post head-to-head on the LMArena image leaderboard. Future models will push the U-curve flatter. The day a model attends uniformly across position, the first-and-last-token rule becomes a vestige. Until then it’s the cheapest leverage you have.

FAQ

Q: How should I structure an AI image prompt?

A: Put aspect ratio, identity, and format type at the start, and restate them at the end. The middle carries style, lens, lighting, attire, and setting. Modern image models weight the first and last roughly ten tokens far more heavily than the middle.

Q: Why does the AI ignore part of my prompt?

A: Attention. The 2023 Lost-in-the-Middle paper measured language models on long inputs and found performance drops sharply for content in the middle. Image models built on Vaswani’s 2017 attention substrate inherit the same shape.

Q: Where should I put style in an AI image prompt?

A: In the middle, not the ends. Style words, lens specs, lighting, attire, and setting belong in the middle two-thirds. The first and last tokens are bookends spent on what must hold: aspect, identity, format.

Q: Does prompt order matter for AI image generators?

A: Yes, more than the words. Across our 125 production prompts, reordering the same content into the first-and-last-token rule shape changed the output from drifted-face / wrong-aspect to held-face / correct-aspect on every test we ran.

Q: How long should an AI image prompt be?

A: Four to twelve sentences in a single paragraph. Longer pushes vocabulary into the discounted middle. Shorter lets the model fill gaps with porcelain-skin, symmetric, flat-front-light defaults. Length is not virtue. Position is.

Key Takeaways

LLM image models read prompts with a U-shaped attention curve: first ten tokens hot, last ten tokens hot, middle discounted. This is the first-and-last-token rule.
The mechanism is documented: Vaswani 2017 introduced attention; Liu 2023 measured the U-curve empirically on long contexts. Image models inherit the same shape.
Aspect ratio, identity, and format type must appear at BOTH start AND end of the prompt. Single mid-prompt mentions don’t hold.
Style, lens, lighting, attire, and setting belong in the middle; that’s where the bulk of vocabulary goes.
Across 125 production prompts at lifehackedai, the first-and-last-token rule held across GPT-Image-2, Nano Banana Pro, Midjourney v8.1, and Flux 2 Pro. Different model, same attention shape.

Stop writing longer prompts

The next time an AI image fails, the temptation is going to be to write more. Don’t. Open the prompt, find aspect / identity / format, and bookend them. If you’d rather skip the rewriting and just use a library where every prompt already bookends, the 125-prompt pack does the structural work and you fill in the placeholders.

Stop writing longer prompts. Start writing shorter ones in the right order.