Day 92 of 100 Days Agentic Engineer Challenge: Step-by-Step Guide to Consistent AI Actor Generation
There are two types of ai videos: shorts that contains between 6 to 10 scenes with 5 seconds length each or movies with ai actors and lipsync where the scene can be a few minutes long. The first option is easy to achieve but the second one is a challenge and require many steps and tools. Below I will explain how to do it and what tp use.
Step-by-Step Guide to Consistent AI Actor Generation
Here a step by step process inspired by Imagine Art Films course on Skool.
1. Create a Master Image
🔗 Flux Realism
- Generate a highly realistic and original image of your character using an AI tool.
- Make sure the image is high-quality and accurately captures the character’s essence.
2. Upscale the Original Image
🔗 Real-ESRGAN
- Use an upscaling tool like Real-ESRGAN on Replicate to enhance resolution.
- Avoid face enhancement features to preserve the original look.
3. Create Character Poses
🔗 Consistent Character
- Generate multiple images (preferably 1042x1024) of the character in varied poses, angles, and styles.
- Make sure the outputs are realistic and diverse to build a richer dataset.
Optional Tools:
4. Prepare the Image Dataset
- Name each image sequentially (e.g.,
1.png,2.png, etc.). - Organize the images in a dedicated folder for easy access.
5. Generate Captions (Optional)
- 🔗 Single Image Captioning
- 🔗 Batch Captioning (requires OpenAI API key)
- Create AI-generated captions for each image to add descriptive context.
- Save each caption in a
.txtfile named to match its image (e.g.,1.txtfor1.png).
6. Compile the Final Dataset
- Zip all image and caption files into a single archive.
- Name the zip file appropriately for easy identification.
7. Train the LoRa Model
🔗 Flux LoRa Trainer
- Upload your zip archive to a training platform like Replicate.
- Set parameters such as training steps (recommended: 2000–3000 for better results).
- Make sure the model visibility is set to private if needed.
8. Test and Refine the Model
- Generate sample images using your new LoRa model.
- Evaluate the results and adjust prompts or parameters as needed to fine-tune output quality.
Few Improvements:
- For ultra-realistic images, you can explore other models available on Civitai or Hugging Face. You can also try Flux 1.1 Pro Ultra in Raw mode.
- Additional upscaling options include Tensorpix and Topaz Labs.
- For training, you can also use Black Forest Labs’ Flux Pro Finetuning service.
This process is focused solely on generating consistent images with digital actors — there’s no video, no audio, and no lipsync involved. It requires multiple steps, and for each new actor, the process must be repeated.
Before starting, you’ll need a script to define the scenario for generating the images.
I’m planning to automate this entire workflow and integrate it into my AI video platform: Pixonaut.art.
