How to Generate Images From Text Prompts with Python and TensorFlow

In this tutorial you will learn how to generate images from text prompts using Python, VQGAN, and neural networks. You can create some very interesting machine learning generated artwork using this software. The possibility are limitless when it comes to the different types of images you can generate. This article will be using Ubuntu 22.04 for the operating system. However any Ubuntu based version of Linux should work. Keep in mind you will want to make sure to have a graphics card with at least 6GB of VRAM. If not you will have to generate lower resolution images. If creating machine learning generated artwork sounds interesting to you continue reading this tutorial below.

Step 1: Install Required Dependencies

The first thing to do is install Anaconda. You can find the latest version and instructions HERE. This tutorial has been tested on Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04.

Step 2: Create the Conda Environment

conda create --name vqgan python=3.9
conda activate vqgan

Next you will set up the Conda environment. This is where you will run VQGAN from.

Step 3: Install Pytorch in the new environment:

Note: This installs the CUDA version of Pytorch for Nvidia graphics cards. If you are using an AMD graphics card, read our AMD section at the bottom of the article.

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Step 4: Install other required Python packages:

pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops torch_optimizer

Step 5: Clone the required VQGAN repositories:

git clone 'https://github.com/nerdyrodent/VQGAN-CLIP'
cd VQGAN-CLIP
git clone 'https://github.com/openai/CLIP'
git clone 'https://github.com/CompVis/taming-transformers'
pip install taming-transformers && pip install CLIP
pip install setuptools==59.5.0

Now you need to clone the git repository’s. After cloning install taming-transformers and CLIP using pip. Finally install setuptools==59.5.0. This is required as the latest version of setuptools will not work with VQGAN.

Step 6: Download at least 1 VQGAN pretrained model

mkdir checkpoints

curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' #ImageNet 16384
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt&dl=1' #ImageNet 16384

Choose at least one of the above pretrained models, and download it using the curl command. I recommend downloading both of the models so that you can try each to see which works best.

Pretrained Model Information

Visit https://github.com/CompVis/taming-transformers#overview-of-pretrained-models to learn more about VQGAN pre-trained models, including download links. The model .yaml and .ckpt files need to be the checkpoints directory.

Generating VQGAN image from basic text prompt

You have now finished installing VQGAN. You are ready to begin generating images. To generate an image from text, specify your text prompt as shown in the example below.

python generate.py -p "A illustration of a pineapple in a fruit bowl"

Generating VQGAN image from multiple prompts

You are also able to generate images from split text prompts. In the below example you can see I am using four different descriptors to generate the image.

python generate.py -p "A painting of a pineapple in a fruit bowl | psychedelic | surreal:0.5 | weird:0.25"

A painting of a pineapple in a fruit bowl

Additionally you can use an input image as one of your split prompt inputs. This will use the supplied image as a sample during the generation process.

python generate.py -p "A picture of a bathroom with a portrait of Van Gogh" -ip "samples/VanGogh.jpg | samples/Bedroom.png"

A picture of a bathroom with a portrait of Van Gogh

Generating VQGAN “story mode” images

Story mode allows you to create a story from multiple text prompts using the carrot symbol. This will generate you a .mp4 video file. For example:

python generate.py -p "A painting of a apple|photo:-1 ^ a painting of a banana ^ a painting of a grape ^ a painting of a watermelon ^ a photograph of strawberry" -cpe 1500 -zvid -i 6000 -zse 10 -vl 20 -zsc 1.005 -opt Adagrad -lr 0.15 -se 6000

VQGAN Feedback loop animation’s

You are able to generate multiple images into a video. The script makes slight changes to each image creating a warping effect. The “150” at the end of the line is the number of frames. “blackhole.png” is your output filename. This will produce a .mp4 file named “video.mp4” by default. Example below:

./zoom.sh "A painting of a green firetruck spinning through a black hole" blackhole.png 150

The ImageMagick package is requred to generate mp4 animated loop videos. If you don’t have it installed, you can install it using the below command.

sudo apt install imagemagick

Generating multiple random images

You can also supply multiple phrases to be used at random to generate multiple images. There is a random list of words in the “./zoom.sh” file. Edit this file if you want to change the words used in the random phrase generator. Here is an example of the random images put into a collage.

chmod +x ./zoom.sh
./zoom.sh

multiple image example — Left to right #1-#9

#1 ‘A pencil art sketch of a criticizing pickle and a menu in the style of strange colors and Futurism’
#2 ‘A painting of a wild hotel and a flower in the style of Constructionist and Edgar Degas’
#3 ‘A spray painting of a awaiting computer and a bedroom in the style of Edgar Degas and Art Nouveau’
#4 ‘A photograph of a benefiting AR-15 and a pickle in the style of Modern art and Edgar Degas’
#5 ‘A sculpture of a undertaking computer and a figurine in the style of Pop Art and Picasso’
#6 ‘A painting of a tree on a dresser in the style of Surreal Art and Claude Monet’
#7 ‘A pencil art sketch of a touching statue and a AR-15 in the style of Surreal Art and Claude Monet’
#8 ‘A pencil art sketch of a adding table and a fish in the style of Surreal Art and Art Nouveau’
#9 ‘An illustration of a raining lamp and a spanner in the style of Pop Art and Michelangelo Caravaggio’

Advanced settings

There are many advanced flags you can use in the VQGAN program. To view all of the options from the terminal use the -h flag.

python generate.py -h

usage: generate.py [-h] [-p PROMPTS] [-ip IMAGE_PROMPTS] [-i MAX_ITERATIONS] [-se DISPLAY_FREQ]
[-s SIZE SIZE] [-ii INIT_IMAGE] [-in INIT_NOISE] [-iw INIT_WEIGHT] [-m CLIP_MODEL]
[-conf VQGAN_CONFIG] [-ckpt VQGAN_CHECKPOINT] [-nps [NOISE_PROMPT_SEEDS ...]]
[-npw [NOISE_PROMPT_WEIGHTS ...]] [-lr STEP_SIZE] [-cuts CUTN] [-cutp CUT_POW] [-sd SEED]
[-opt {Adam,AdamW,Adagrad,Adamax,DiffGrad,AdamP,RAdam,RMSprop}] [-o OUTPUT] [-vid] [-zvid]
[-zs ZOOM_START] [-zse ZOOM_FREQUENCY] [-zsc ZOOM_SCALE] [-cpe PROMPT_FREQUENCY]
[-vl VIDEO_LENGTH] [-ofps OUTPUT_VIDEO_FPS] [-ifps INPUT_VIDEO_FPS] [-d]
[-aug {Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} [{Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} ...]]
[-cd CUDA_DEVICE]

optional arguments:
  -h, --help            show this help message and exit
  -p PROMPTS, --prompts PROMPTS
                        Text prompts
  -ip IMAGE_PROMPTS, --image_prompts IMAGE_PROMPTS
                        Image prompts / target image
  -i MAX_ITERATIONS, --iterations MAX_ITERATIONS
                        Number of iterations
  -se DISPLAY_FREQ, --save_every DISPLAY_FREQ
                        Save image iterations
  -s SIZE SIZE, --size SIZE SIZE
                        Image size (width height) (default: [512, 512])
  -ii INIT_IMAGE, --init_image INIT_IMAGE
                        Initial image
  -in INIT_NOISE, --init_noise INIT_NOISE
                        Initial noise image (pixels or gradient)
  -iw INIT_WEIGHT, --init_weight INIT_WEIGHT
                        Initial weight
  -m CLIP_MODEL, --clip_model CLIP_MODEL
                        CLIP model (e.g. ViT-B/32, ViT-B/16)
  -conf VQGAN_CONFIG, --vqgan_config VQGAN_CONFIG
                        VQGAN config
  -ckpt VQGAN_CHECKPOINT, --vqgan_checkpoint VQGAN_CHECKPOINT
                        VQGAN checkpoint
  -nps [NOISE_PROMPT_SEEDS ...], --noise_prompt_seeds [NOISE_PROMPT_SEEDS ...]
                        Noise prompt seeds
  -npw [NOISE_PROMPT_WEIGHTS ...], --noise_prompt_weights [NOISE_PROMPT_WEIGHTS ...]
                        Noise prompt weights
  -lr STEP_SIZE, --learning_rate STEP_SIZE
                        Learning rate
  -cuts CUTN, --num_cuts CUTN
                        Number of cuts
  -cutp CUT_POW, --cut_power CUT_POW
                        Cut power
  -sd SEED, --seed SEED
                        Seed
  -opt, --optimiser {Adam,AdamW,Adagrad,Adamax,DiffGrad,AdamP,RAdam,RMSprop}
                        Optimiser
  -o OUTPUT, --output OUTPUT
                        Output file
  -vid, --video         Create video frames?
  -zvid, --zoom_video   Create zoom video?
  -zs ZOOM_START, --zoom_start ZOOM_START
                        Zoom start iteration
  -zse ZOOM_FREQUENCY, --zoom_save_every ZOOM_FREQUENCY
                        Save zoom image iterations
  -zsc ZOOM_SCALE, --zoom_scale ZOOM_SCALE
                        Zoom scale
  -cpe PROMPT_FREQUENCY, --change_prompt_every PROMPT_FREQUENCY
                        Prompt change frequency
  -vl VIDEO_LENGTH, --video_length VIDEO_LENGTH
                        Video length in seconds
  -ofps OUTPUT_VIDEO_FPS, --output_video_fps OUTPUT_VIDEO_FPS
                        Create an interpolated video (Nvidia GPU only) with this fps (min 10. best set to 30 or 60)
  -ifps INPUT_VIDEO_FPS, --input_video_fps INPUT_VIDEO_FPS
                        When creating an interpolated video, use this as the input fps to interpolate from (>0 & <ofps)
  -d, --deterministic   Enable cudnn.deterministic?
  -aug, --augments {Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} [{Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} ...]
                        Enabled augments
  -cd CUDA_DEVICE, --cuda_device CUDA_DEVICE
                        Cuda device to use

AMD GPU Instructions

If you have an AMD graphics card you are able to use ROCm instead of CUDA. You can check if your card supports TensorFlow here: https://github.com/RadeonOpenCompute/ROCm#supported-gpus

Install ROCm accordng to the instructions and don’t forget to add the user to the video group as detailed in the link: https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1/page/How_to_Install_ROCm.html

The usage and set up instructions above are the same, except for the line where you install Pytorch. Instead of pip install torch==1.9.0+cu111 ..., use the one or two lines which are displayed here (select Pip -> Python-> ROCm): https://pytorch.org/get-started/locally/

Troubleshooting

RuntimeError: CUDA out of memory

For example:

RuntimeError: CUDA out of memory. Tried to allocate 150.00 MiB (GPU 0; 23.70 GiB total capacity; 21.31 GiB already allocated; 78.56 MiB free; 21.70 GiB reserved in total by PyTorch)

Your request doesn’t fit into your GPU’s VRAM. Reduce the image size of the image by editing “generate.py”.

What is the best optimizing agent?

The Adam agent is usually a good general purpose agent to use. If you would like more information see the Pytorch Optimizer and Optim articles.

Can I download and use all pre-trained models simultaneously?

Yes, all you need to do is set everything to true in the download_models.sh file.

Errors during video generation

Try installing ffmpeg:

sudo apt install ffmpeg

If you are running VQGAN with Ananconda try:

conda install -c conda-forge ffmpeg

Related resources

View more image examples on the FormatSwap Twitter.

Check our our The Best Mechanical Gaming Keyboards to Purchase in 2023 article.

Read our Deep Learning Image Style Transfer Tutorial Using Neural Style Pt.

View our other Machine Learning Tutorials.

Learn How to Create a Mapped Network Drive in Windows 10.

Click here to learn How to Install MySQL on Ubuntu 22.04 LTS.

View all of our available online tools at Formatswap.com.

Questions

Feel free to leave a comment below if you have any further questions. Thank you for reading the tutorial.