SD Guide

So, almost daily I get questions in my inbox or as comments on the images I post: What's your prompt? Catbox the image so I can read the metadata! What LoRA is this? And so on.

I hate to break it to you all, but even if I gave you the prompt, checkpoint, LoRAs involved, and the seed, you're simply not going to get the same result.  It'll be in the same ballpark, but it just won't have the same level of detail these images have.

Don't worry, I'm not on a high horse here. I've written this post to explain my process in detail so you too can get the same kind of results! But before we begin, please read the following statement like 20 times:

THE PROMPT IS NOT IMPORTANT. THE PROMPT WILL NOT GIVE YOU WHAT YOU NEED.

Obviously that's a pretty hyperbolic statement. And it's not entirely true, but I'm trying to make a point here.

Everywhere I go for discourse on image generation, I see this hyperfocus on engineering the perfect prompt to get what you want. I'm sure you remember less than a year ago people were popping up as Prompt Engineers, and for a fee they'd write you the perfect prompt. If you just whisper the right incantation into the mystery box, all your desires will be made manifest.

Well, much like The Secret, I think that's a load of hogwash.  All you're doing is creating a hypernested series of statements that effectively act like a random number generator that then feeds into ANOTHER random number generator that then produces an image.  Sure, the broad strokes of what you're looking for are there, but I've seen dozens of images with prompts that include things like "view from below" and "laying on stomach" and the result is an image with standard framing of a subject standing and facing the viewer.  The prompt engineer will tell you that including those contradictory statements nudges StableDiffusion to give you what you actually want, but it's all just noise. Like, literally the noise SD uses to generate an image.

Now, I'm just some hack with a kinda powerful graphics card and a lot of free time. I am not an expert, and I won't claim to be. All I have is my personal experiences to draw from.  And since so many people ask me how I do it, I must be onto something here. So what's the secret? It's pretty simple:

There is no one-click, set-and-forget prompt.

Stop thinking of SD as a magic free-images button. Stop thinking in terms of prompts and variables.  Start thinking visually. SD can be used as a tool to help you create something you want to see, but it's not the only tool you should be using. Unless you're just generating images of conventionally attractive women (or men) in generic or high fantasy/Sci-Fi settings, then you'll probably have a pretty easy time. But if you're reading this, you probably want something outside the 'norm.' And since SD is effectively a pattern recognition engine, the further from baseline you stray the less SD understands. So, you have to guide it along the way, keeping it on course.

Basically, the best metaphor I can think of is to treat this like you're tending to a bonsai tree.  You probably have an idea of how you'd like the tree to look once it's fully grown, but you have to tend to it as it grows to nudge it into the specific shape you want.  As the tree grows, you may find that it's not exactly what you had first envisioned when you planted it, but you adapt and wind up with a really lovely tree. It just takes time and effort to get there.

Below, I've laid out the steps of the workflow I've developed over the past few months. Nothing is set in stone, this is just a guideline for folks that are just getting into this, or for people who are frustrated that they aren't getting the same kind of results that I do.

TL;DR: below is a list of some Checkpoints and LoRAs that I can recommend, and here's the gist of my workflow: Generate an image at a low resolution. Edit that image to get closer to what you want to see, then reprocess it in img2img multiple times, slowly increasing the resolution. Edit the resulting images to help guide the process, adding and removing LoRAs along the way. Touch up the final image by reprocessing specific parts like hands and faces, then edit the final result like you would a photograph.

CHECKPOINTS:
NextPhoto
CyberRealistic
NoSkinnyChicks

LoRAs:
BiggerGirlsV5
SyModel
HyperFusion550K
Cellulite

NB: My guide is written with SD 1.5/1.6, Automatic1111, and Photoshop in mind.  If you have different tools at your disposal, the approach should be largely the same. Most of the image editing terms I'll use will be Photoshop specific, but things like GIMP or other full-suite image editors should be able to do most of the same things. Also, I'm an AMD wonk, so some things just don't work for me at all (like inpainting).  My approach will work for NVIDIA users as well, but you might be able to skip a few things here or there.

-prompt design
Let's get prompts out of the way first.  I've found that keeping the prompt as simple and direct as possible gives me the best results.  Basically, I write a concise sentence that describes the image I'm going for like I was tagging it for a low-visibility person, and then follow it up with specific elements that I want to be included.  Let's take this one for example:

"a beautiful woman with a huge belly walks through a quarry, obese, gigantic belly, belly_overhang, navel focus, blue hair, quarry, depth of field, golden hour,((pear_shaped_body)) <lora:symix-preview-4-5:.7>"

The first part is the core of the prompt.  the sentence is concise, declaring a subject (beautiful woman) with relevant details (huge belly) and the setting concept (walks through a quarry).  After that, I include tags to further refine what I'm looking for. Important elements from the first sentence are repeated (Huge belly, quarry) for emphasis, and the rest are additional details or tags related to the LoRA I am using (Symix in this example).

That's it.  I rarely exceed the first 75 tag limit, and when I do I've never surpassed 150.  What you'll find is the checkpoint and seed numbers are what're doing the heavy lifting.  We've defined the core elements of the image, and now we spin the wheel a few times until we find something close to what we're looking for.

-seed fishing
Keep your resolution low, ideally around 512x512. Play around with aspect ratios though, as different framing can result in different images with the same seed. Run a batch and see what you get. I often do 4 at a time, as it's a decent trade off of time vs output.  Keep in mind that these are all going to be garbage images. All you're looking for is an interesting idea/shape.  Every single detail will be off, faces will likely be smeary messes, and forget about hands or feet.  Once you have a base image that sparks your interest, you'll be pulling out those fine details later.

-img2img
This is where the actual magic happens.  Get used to the interface here, as you'll be living in this tab until you have an image you like.

If the base image you have is already roughly the shape you want, you're good to go! But since I can't leave well enough alone, I usually start tweaking things in photoshop from the start.  Play around with selecting different parts of the body and reshaping them in the Liquify tool. Maybe the belly could hang a little lower, or the breasts could be a bit bigger. Feel free to experiment and remember that this is just the base image. it's ok if your edits look like shit, you're just trying to get the basic shape across to the tool. upscaling through img2img will almost always iron out jagged selections and smeared details.

A brief thought on morphing: if you edit different body parts to be larger, you have to remember shadows. create a multiply layer under the thing you're editing, pick a color from the thing you're casting a shadow on, set the brush to the softest feathering, and paint shadows.  you don't have to be super precise with it, but thinking about where the light is coming from in the image and the shape of the thing the shadow is hitting will go a long way to preserve perspective when generating an exaggerated figure.

--scaling
Your milage may vary, but I also find I have a better time if I scale the image in PS and import it into img2img rather than use the in-built scaling in SD.  Don't get me wrong, the SD upscaler works perfectly fine, but since I'm always on the edge of running out of VRAM, I've found things go a bit better this way. But feel free to experiment and see what works best for you. Again, I'm on AMD's Radeon VII, so I've got plenty of VRAM but memory management is much more imperative since I can't use XFormers.  If you're on NVIDIA, you're probably not going to need to worry about this until you start getting to 1024x1024+ resolution. If you do you Photoshop specifically, use the Image Size function and select Preserve Details 2.0.  That uses some solid algorithmic fuckery that creates detail out of nothing.  It's not a perfect upscaler, but it's great for our purposes here.

For most steps in this process, I tend to upscale the base image by 15-25% each time. My experience has been if I just scale an image up to the target resolution in one pass, the final result looks 'looser.' Like, that overly smooth vaguely fake look that a lot of generated images seem to have? I think that comes from trying to jump straight to the end. By taking a stair-step approach to the process, you wind up with a little noise in each image that tends to get interpreted as sub-pixel detail when you upscale again. This leads to more natural looking skin, cleaner details in the background, better fabric on clothing, etc.

Each step of the process, you'll see your image start to resolve itself.  Skin will start looking better, light and shadow will start acting a bit more realistically, and much finer details will start appearing.  Keep your denoising strength anywhere between 0.4-0.55 as any higher and you'll see significant shifts in the composition of your image, while any lower will most likely not change enough. Of course, this is just a rule of thumb.  Sometimes halfway through the process I'll generate a batch of 4 images with a high denoising strength just to see if I get anything new and interesting. Again, just play around and see what happens.

--feedback loop
At this point in the process, you'll want to reconsider your prompt and LoRAs. When you generated your seed image, you were painting with a broad brush, and you wanted big deviations from baseline reality.  Most of the LoRAs you're going to use are intended for line art, so if you keep reprocessing your image with full-strength LoRAs, you're going to wind up with very flat, smooth images.  By the time you run your first upscale, you'll probably want to decrease the LoRA strength to 0.4-0.6.  I'm not an expert, but my understanding is that the 'strength' is actually a percentage of steps where the LoRA is considered. So, at 1.0, SD uses the LoRA on every step, while at 0.4 it only applies it on the first 40% of steps.  Since SD is basically starting with a noise covered image and resolving down from that, we want the LoRA to guide the overall shape of the subject, but then get out of the way and let SD and your Checkpoint of choice to fill in the details. (Again, I don't actually know exactly how the tools work, I just know that thinking about them in this way has helped me a lot. If you know better, do let me know.)

You should also think about your LoRAs wholistically.  A perfect example of this is the Cellulite LoRA.  I generally don't even add it to my prompt until I'm near the end of the process.  Adding that kind of detail early on can sometimes mess with your overall composition at lower resolutions, and since we're gonna reprocess multiple times, little details are going to change radically between each iteration. I'll add it in the last two or three iterations, once we're in a medium resolution (like one of your dimensions has reached 800+).  Conversely, the more conceptual LoRAs (like HyperBottomHeavy or Symix) that are responsible for overall composition become less important once we have a mid-range-resolution image.  At that size, SD has a better time recognizing exaggerated shapes for what they are, rather than just guessing a huge belly is actually a pair of thighs pressed together. So we can safely lower them to keep the broad-strokes, but let SD and your detail LoRAs do the heavy lifting.

Also, look at your prompt again.  If you followed the guidelines at the beginning of this article, you probably don't have much extraneous stuff in there. But maybe the image drifted a little and something you like showed up randomly, like a pair of sunglasses on the head, or a shirt starting to look more like a jacket.  If you want to make sure those elements become clear and stick around to the end, add them to your prompt. Conversely, maybe one of your ideas didn't materialize in the seed image. Remove it, otherwise SD might try to add it back in at a later step, and it might not look great. Just be mindful of all the elements at your control each step. As I said at the beginning, you're pruning branches and guiding growth.  There just isn't a perfect one-click solution to any of this stuff.

Finally, don't forget you can edit your images before putting them back into img2img.  If your latest iteration is looking good overall, but something is off (a breast is randomly the wrong size, a hand has appeared somewhere, an extra belly button showed up, etc) just tweak it in PS before sending it back through.

-finishing
After three-five upscales, you're probably close to 1000+ pixels on at least one dimension of your image.  If you've done a good job tending your prompt, LoRAs, denoising strength, and any other tweaks, you probably have a pretty reasonable looking image. But I'm also willing to bet the face looks wrong, the hands are probably still suggestions at best, and other small details are weird. Don't worry! We're not done just yet.

At this point, many of you are probably eyeing the inpainting tool.  unfortunately, this guide won't help you with that. Since I've been working with AMD architecture and only a baseline understanding of all the dependencies for both SD and A1111, inpainting has literally never worked for me. If it works for you, there are tons of useful guides and YouTube tutorials that do a fantastic job of explaining inpainting. Go follow those, and you can skip to the next step.

For the rest of you, here's how I deal with it.

Take your latest iteration and drop it into photoshop. I usually then immediately upscale the image to 200% with Preserve Details 2.0, and take stock of what needs to be fixed. 99 times out of 100, the face will need serious work. Using the rectangular marquee tool, select the face and head.  Be mindful of the hair as well, try to select as much of it as you reasonably can without selecting like half of your subject.  We're going to completely reprocess the head here, so if your subject has long or complicated hair, the new version might not line up with your original image. Once you've selected a reasonable target, copy that and paste it into your img2img box.

AN IMPORTANT NOTE: When you paste this, you'll probably notice a thin white outline around your pasted image. To the best of my understanding, this is because the dimensions of the selection you made are not exactly divisible by 8.  SD operates on powers of 8 with images, so you can't have an odd number of pixels on a side (or a number not divisible by 8, anyway).  To counter this, SD will add blank pixels around the image to bump it up to the next cleanly divisible size.  It's never a big deal, but it's something to be aware of. Just make sure to include a little negative space in your selection so you can cleanly remove it later.

From here, I'll usually set Resize By to anything between 1.25 and 3.0.  Just depends on the size of your original selection and how much detail you want/need.  In my experience, 512x512 is normally enough to get a pretty good looking face, but obviously you can set it to whatever you want.  As long as one of your dimensions is above 400 pixels, you're probably going to get something useful.

Next, rewrite your prompt.  Since we're dealing with a pretty specific subject (the human face and head), a lot of the details you included on your prompt are probably unrelated.  remove them, and add any details you want to see in the face and hair (things like red lips, eyeshadow, smile/frown, etc).  Also, double check your LoRAs.  Some you absolutely won't need (HyperBreasts isn't gonna help much with faces), while others will be more useful.  Make sure to swap them out and play with their weights.  If you're looking for a more expressive face beyond smiles or placid expressions, search Civitai for 'expressions' or 'emotions' and play with some of those.  I can personally recommend Expression Helper Realistic and Emotion Puppeteer, but new stuff is being added all the time. They all work differently, so experiment and see what works for you.

Play around with the denoising, and push for a batch of at least a couple of images. We don't want a ton of denoising as you probably want to keep the basic head shape/size similar to your source (as we're gonna have to reattach it to the base image later), so 0.4-0.45 is generally enough (even if using expression LoRAs).  At these lower resolutions, you should be able to process a batch of options to choose from relatively quickly.

Once you have a face you like, copy that back into photoshop.  UH OH IT'S WAY TOO BIG NOW.  That's because we upscaled the image to give SD more room to add detail.  You'll need to downscale the new head/face back to its original size.  Luckily we can do this with math, rather than freehanding the scale tool and hoping for the best.  The formula is simple, just divide 1 by the Resize By number. So, if you set it to 1.25, you'll need to downsize the new head to 80%. 1.5 is 66.67, 2 is 50, etc. You can either calculate this yourself and type in the target percentage, or if you're using photoshop you can literally type in 100/1.25 (or whatever your Resize By number was) and it'll calculate it for you.

You're not quite done yet, though.  Two things will likely still be wrong: there's probably a white box around the edges of your new face, and it isn't exactly lined up with your original face.  from the IMPORTANT NOTE above: since the image was padded, it's going to be offset by a few pixels.  Nudge it up and to the left a couple times and it should be just fine. Then either use a mask to paint out the edges of the pasted image, or if you do crosswords in pen just use the eraser.

Repeat this section for hands, feet, and anything else that needs retouching.  Rewrite your prompts for each thing, keeping it simple and concise, like "A fat woman's hand resting on her hip, hand, fat arm, [any other relevant details you want]."

-Post Processing

I am not a photo editing educator, so I don't think I'm qualified to discuss color theory, framing, or anything else really.  All I can say is once I have an image that looks pretty good, I'll play around with Curves, Levels, Color balance, Hue/Saturation, Color Lookup, and/or Camera Raw. There's literally thousands of hours of tutorials and education on YouTube, and if you care at all about images that look good, I recommend watching a couple. PixImperfect is a powerhouse for PS education specifically, and his vids are short and to the point. If you don't want to mess around with all that, I still recommend you add noise to your image.  If you're using PS, Camera Raw has a pretty good noise function that somewhat simulates film grain.  Put a light dusting of that on your image and it'll go a long way to paper over weird little imperfections.

If you're not using PS, here's a good recipe that most image editors should be able to reproduce: Add a new layer, fill with 50% grey, and set that layer to Overlay (or equivalent). Many image editors have an Add Noise function, just apply that to the new layer (if you have a choice monochrome works best usually, but look at the vibe of your image and decide if low-light noise makes more sense).  Turn the fill or opacity down to anywhere between 5-20%, and add a slight blur to the layer.  If everything worked correctly, you should have a pretty reasonable facsimile of film grain.

Finally, I generally resize my image down by 75-80% with a 'sharper' function. If your tool doesn't have that function, you can skip it.  I find it just tightens things up a little bit, but YMMV.

And that's it.  Now you've got a pretty good looking FaT oF sHe, and it only took an hour or two. Just remember that my word is not gospel, and my process is evolving all the time.  New checkpoints/LoRAs, different scalars, new approaches to prompts, etc.  Just play around and have fun with it. If it ever feels like work, just stop.  No one should put in this much effort if they don't enjoy the process.