Published 2026-06-12

How to Write AI Prompts That Actually Work

Short answer: modern AI image and video models (Midjourney V8.1, Veo 3.1, Kling 3.0, Nano Banana Pro, GPT Image 2) are natural-language models. They want a clear, descriptive sentence about a scene — not a pile of keywords. Describe the subject, the medium, the light, the mood and the framing in plain English, keep it consistent, and avoid contradictions. That single shift fixes most "why does this look wrong?" results.

This guide explains the rule in detail, breaks down the anatomy of a strong prompt, and gives you per-model tips. If you'd rather skip the theory, GoldenPrompts builds studio-grade prompts for you with a few clicks — but it helps to understand what's happening under the hood.

The one rule: describe a scene, don't dump keywords

Older AI art workflows often used "keyword soup" — long lists like 8k, ultra detailed, masterpiece, trending, cinematic, bokeh, 85mm. Current first-party guides for models such as FLUX.2, GPT Image 2, Nano Banana Pro and Seedream 5.0 Lite emphasize clear natural-language instructions. Treat labels such as "masterpiece" or "8K" as descriptions, not guaranteed quality controls. A coherent scene description is easier to test and refine than a comma-separated list.

Keyword soup (weak):

woman, red dress, studio, 85mm, f1.8, cinematic, 8k, masterpiece, dramatic lighting

Natural language (strong):

A confident woman in a flowing red evening dress, photographed in a dim studio with a single soft key light from the left. Shallow depth of field, warm cinematic color, calm and elegant mood.

Same ingredients — but the second one tells a coherent story the model can render without guessing.

The anatomy of a great prompt

Think of a prompt as answering a few simple questions in order. You don't need all of them every time, but the more of these you cover clearly, the more control you get:

Subject — who or what is in frame, and what are they doing? Be specific ("a silver-haired barista", not "a person").
Medium / style — a photograph, a film still, a 3D render, anime, a comic panel?
Lighting — soft window light, hard noon sun, neon, golden hour, single softbox.
Mood / color — warm and nostalgic, cold and clinical, moody teal-and-orange.
Composition / framing — full body, close-up, wide establishing shot, low angle.
Camera feel — shallow depth of field, motion blur, a slow dolly push (for video).
Environment — the location and background that sets the scene.

A good rule: one choice per dimension. Don't ask for both "golden hour" and "studio softbox" — that's a contradiction, and the model will blend them into mush.

Image vs. video prompts

For images, focus on a single frozen moment: subject, light, framing, mood.

For video, add motion and keep it stable. Describe what moves and how the camera moves — "she slowly turns toward the window as the camera pushes in" — and lean on consistency cues so the model doesn't morph faces or drift between frames. Keep one clear action; piling on five movements creates chaos.

Per-model tips

These models change often, so treat this as direction rather than gospel:

Midjourney V8.1 — loves evocative, well-composed descriptions and art direction. Use its parameters (aspect ratio, stylize, style raw, --no) for control rather than stuffing the text. Great for stylized, painterly and editorial looks.
Google Veo 3.1 & Nano Banana Pro — strong at photorealism and coherent video (Veo 3.1 even renders synced audio). Be concrete about scene, light and camera motion; describe the shot like a cinematographer.
GPT Image 2 & FLUX.2 — use clear, literal scene descriptions. Spell out what belongs in frame and phrase exclusions in the main instruction unless the interface you use documents a separate negative-prompt control.
Kling 3.0, Runway Gen-4.5 & Seedance 2.0 — capable video models; keep one main action, define the camera move, and emphasize temporal stability (no flicker, consistent identity). (OpenAI's Sora app was sunset in 2026 — these are the live alternatives.)

Across all of them: don't put hard camera specs (85mm, f/1.8, ISO) in the prompt unless you mean them — they often fight your other choices and degrade fine detail like eyes and hands.

The most common mistakes

Contradictions — two lighting setups, two locations, two moods at once.
Too many ingredients — 15 modifiers dilute the result. Four to seven strong choices usually beat a dozen weak ones.
Faces too far / too small — if the face occupies few pixels, eyes and teeth degrade. Frame closer or upscale.
Relying on negatives to fix a broken prompt — negatives remove things; they can't create coherence. Fix the positive prompt first.

Negative prompts, briefly

A negative prompt can mean either a dedicated negative field or a plain-language exclusion in the main instruction. Use a separate field only when the current interface or API documents one; Midjourney's --no parameter is one verified example. Otherwise, describe the clean result you want and state important exclusions directly, such as "clean background, no text." Controls differ between providers and wrappers, so test against the exact surface you use.

From vague idea to finished prompt (worked example)

Idea: "a cozy real-estate photo of a living room."

Finished prompt:

A bright, modern living room photographed in warm late-afternoon light streaming through large windows. Natural materials — oak floor, linen sofa, a fiddle-leaf fig in the corner. Clean straight verticals, true-to-life materials and reflections, realistic scale. Calm, inviting editorial mood. No people.

Notice: one light source, one mood, correct architecture cues (straight verticals, real scale), and a clear "no people" instruction. That's a prompt a real-estate model can render cleanly.

FAQ

Do I need to learn prompt engineering to get good results?

No. Understanding the basics helps, but tools like GoldenPrompts assemble a complete, professional prompt from a few clicks, so you get studio-grade results without writing anything by hand.

Why do my AI images have weird eyes or hands?

Usually because the prompt contradicts itself (e.g., conflicting camera or lighting cues) or the face is too small in frame. Use a clean, consistent description and frame the subject closer.

Should my prompt be in English?

Yes — most leading models understand English best, even if your interface is in another language. GoldenPrompts always outputs the prompt in English for this reason.

What's the difference between an image prompt and a video prompt?

An image prompt describes a single moment; a video prompt adds motion and camera movement and emphasizes stability so identity and detail stay consistent across frames.

Which AI model should I use?

It depends on your goal — Midjourney V8.1 for stylized art, Veo 3.1, Kling 3.0 or Runway Gen-4.5 for video, and Nano Banana Pro or GPT Image 2 for stills. OpenAI ended the Sora web/app experiences on April 26, 2026, and says its API will end on September 24, 2026, so do not start a long-term workflow on it.

Want the prompt written for you? GoldenPrompts has specialized ateliers for people & models, real estate & interiors, and characters — click your choices and copy a studio-grade English prompt. Free to start: 24 hours of everything, no card.