6 Steps to Make AI Music Video (Beginner’s Guide)

I just made this music video entirely using AI:

The videos, the song, the lyrics, everything is AI.

This took me about 30 minutes to make.

In this post, I will show you exactly how to create your own music video with AI.

If you prefer a video tutorial, I just made one.

Let’s go!

Disclaimer: This post has affiliate links at no cost to you.

Step 1: Creating a Song

As the first step, we need a song.

If you have one already, that’s good.

But if you don’t, in this step, I will show you how to create one.

First, head over to Elevenlabs and sign up. This is the best AI music generator (with vocals) that I’ve tested.

Then choose Music from the options on the left.

After this, you will see a view similar to ChatGPT, but this one is for creating music.

Here’s my prompt that I used to create a song:

Create an upbeat, inspiring pop song about turning imagination into reality — the feeling of starting something new and bringing your ideas to life.

Style: Modern pop with light electronic and lo-fi elements, warm synths, and a steady beat (around 100–110 BPM).

Mood: Positive, creative, motivational, and hopeful.

Vocals: Male or female pop voice with a friendly, confident tone.

Lyrics theme:

Beginning a new creative journey

Believing in yourself

The joy of seeing ideas come alive

Example lyrical ideas:

“It starts with a spark in my mind”

“Let’s make it real, together we shine”

“Every dream begins with one small try”

Structure:
Verse → Pre-Chorus → Chorus → Verse → Bridge → Final Chorus

Feel free to come up with your own, copy the one from me, or use ChatGPT to create your prompt.

Then hit “Generate”. This starts to create the song.

You can play the song while it’s creating! This is because the AI is putting it together in parts. You can play the first parts while the rest are still being created.

Notice how it also creates you the vocals for the song.

If you’re not happy with the song, you can either re-create it from scratch, or you can edit it by parts.

For example, you can choose any section of your song, and ask the AI to include/remove styles from it.

Here I’m tweaking the vocal track:

You can edit your lyrics on the spot too.

Then the AI just renders the song again with the new lyrics.

You can also include/exclude styles from your entire track instead of just touching it one part at a time.

Whenever you change something, hit Generate to create a new version of the track with your edits in place.

Oh, and don’t forget that you can also expand any section of the song too.

Once you’ve made your track and it’s ready for a video, download it.

Step 2: Creating the Character

Then, let’s create our artist for the music video.

The best place to do this called OpenArt.

This place hosts all the best AI image/video generator models. You’ll need it in all the rest of the steps of this tutorial

Once you’ve signed up, choose “Image” on the left.

Then pick the “Seedream 4.0” AI image generator model from the list.

Then describe the AI what kind of a character you want to create.

Here’s my prompt:

A young American female singer performing in a softly glowing, futuristic studio bathed in pastel lights. She stands at a sleek microphone surrounded by warm synth tones visualized as gentle waves of color in the air. The mood is uplifting, dreamy, and modern — a blend of pop and light electronic aesthetics. Her outfit is stylish but understated — soft fabrics, subtle metallic accents, and warm tones (peach, gold, lavender). The lighting is cinematic and warm, evoking sunrise energy. Background elements include abstract shapes or light trails suggesting creativity and imagination coming to life. The overall vibe is bright, inspiring, and emotionally open — no darkness, no heavy rock or metal influence, no acoustic-only setting. Render in modern pop / lo-fi electronic style, with smooth gradients, gentle bokeh, and a clean, futuristic atmosphere.

I recommend creating at least 2–4 images, not just one.

That’s because some of the results might not be what you were looking for.

I think this is the best image out of the bunch, so I will choose her as the character.

Step 3: Scene Images

Now that we have our artist, it’s time to put her into different scenes for the video.

  • In some scenes, she is singing the song.
  • In some scenes, she’s just doing actions silently.

To create the scenes so that the character remains consistent, write your prompt…

And drag and drop the character main image to the image generator.

This way it’s able to keep a consistent look across the scenes with minimal changes.

Remember to change the aspect ratio of your images too. It’s 1:1 by default, so you’ll get squares, which is not ideal in most music videos. Instead, you can choose 16:9 for wide-screen videos.

That being said, here’s the prompt for the first scene:

The singer is sitting in the misty forest during the sunrise and is looking down eyes closed
Style: vibrant yet warm, cinematic lensing

Here’s the result:

That’s amazing, isn’t it?

Here’s another scene prompt:

A young female singer in a dim, softly glowing creative space at dusk. She sits by a keyboard and sketchbook, gentle light from a window outlining her silhouette. Soft pastel reflections hint at imagination awakening. A few glowing music notes or soundwave lines float in the air, subtle and dreamy.
 Style: cinematic, warm synth tones, early-evening glow, intimate and inspirational.

Here are the results for that:

And so on.

You can create as many scenes and prompts as you like.

Step 4: Lip-Sync

Once you have your scenes, it’s time to do the lip sync.

To this, I recommend using OpenArt again.

There are a couple of AI lip-syncing tools out there that make your character sing the song you created.

However, notice that at this stage, the AI tools mostly accept only 10–30 seconds of audio.

So you need to split the song into parts before the lip-sync.

You can use a free tool like AudioTrimmer.com to do this.

To split the song, choose the start and end in the editor and hit “Crop”.

Then download your file.

Just do this as many times as you need for your song.

Also, remember that you don’t need to create lip-sync for everything.

It’s fine if the artist is not singing in every part of the video.

To do a lip-sync, head over to OpenArt.

Choose the “Video” on the left and pick “Lip-Sync Video”.

Then choose “Creatify Aurora” from the models.

Then drop your audio clip and character image to the view.

Hit create.

After a couple of minutes, you’ll see your first result.

Here’s what the first 12 seconds look like:

Isn’t that amazing? Looks exactly like she was singing there.

Step 5: Effects & Non-Vocal Scenes

Now you can also create those scenes for your music video where your artist is not singing the song.

To do this, once again, feel free to use OpenArt.

First, you need to create images for your scenes.

To do this consistently:

  1. Give it a prompt.
  2. Drop an image of your character for the AI.

Then pick one of the images.

For example, I like this image a lot:

Then go to the “Video”, choose “Image to Video”.

Choose one of the AI video generator models, drop your image to the view, give it a prompt, and hit create.

Here’s what one of the clips looks like:

That’s amazing, huh?

Now, all you need to do is repeat the step 4 and step 5 until you have scenes for all parts of your music video.

Step 6: Edit Them Together

Now that you have your song and scenes, it’s time to put those together.

This is possible with any basic video editor tool.

I recommend doing the following:

  1. First drop the song to the timeline.
  2. Then drop the lip-synced videos.
  3. Make sure the lip-synced videos match the song.
  4. Then drop in those non-vocal scenes, etc. into places you think they fit.
  5. Enjoy it!

Here’s my video:

Can you imagine that that is entirely made by AI, in just 30 minutes or so.

This is crazy.

About the Expenses

Right now, if you make AI music videos, you’re definitely one of the first people in the planet to have one.

But as I showed in this guide, it’s going to take some effort and also cost quite a bit.

Based on my credit usage, I would say that creating this 55-second video cost me roughly $50.

So if you do a full 3–4 minute video, expect to spend hundreds of dollars on it!

Another option is to wait for a year or two for these tools to be $9/month with unlimited creations.

But at that time, the AI music space is also completely saturated.

Thanks for reading. I hope you enjoyed it!