Creating AI-Generated Music Videos: A Fun Experiment
I recently experimented with creating AI-generated music videos by combining various AI tools, and published the results in this YouTube playlist. The results were quite interesting - check out this example:
Here’s a quick overview of the process:
1. Image Generation with Flux-dev
First, I used Flux-dev on Replicate to generate anime-style images. Here’s an example prompt I used:
2. Music Generation with Suno
For the music, I used Suno to generate the audio tracks. While they don’t officially offer an API yet, there is an unofficial API available (though I haven’t tested it).
3. Image to Video with KLING 1.6
I then used KLING 1.6 to transform the static images into dynamic videos, adding subtle movements and transitions.
4. Combining Audio and Video
Finally, I wrote a simple Python script using moviepy to combine the generated audio and video. The script loops the video for the duration of the audio track:
from moviepy.editor import VideoFileClip, ImageClip, AudioFileClip
# ... rest of the code ...
async def create_video(
self,
audio_file: UploadFile,
visual_file: UploadFile,
bottom_crop_percent: int = 0
) -> str:
"""
Create a video from an audio file and either an image or video file.
Args:
audio_file: The uploaded audio file
visual_file: The uploaded image or video file
bottom_crop_percent: Percentage to crop from bottom (default: 0)
Returns:
str: Path to the generated video file
"""
# Load the audio and visual files
audio_clip = AudioFileClip(audio_path)
# Create video clip and loop if necessary
if is_image:
visual_clip = ImageClip(visual_path).set_duration(audio_duration)
else:
visual_clip = VideoFileClip(visual_path)
if visual_clip.duration < audio_duration:
visual_clip = visual_clip.loop(duration=audio_duration)
# Combine and save
final_clip = visual_clip.set_audio(audio_clip)
final_clip.write_videofile(
output_path,
codec='libx264',
audio_codec='aac'
)
# ... rest of the code ...
Reflections and Future Improvements
It’s crazy how easily is becoming to create cool things with AI, and without even needing ML skills, just basic programming skills.
Regarding the experiment, an interesting next step would be to automatically generate image prompts based on song lyrics. This could create a more cohesive narrative in the music videos. The process would look something like this:
- Split lyrics into time-based chunks
- For each chunk, generate:
- Text-to-image prompts for scene composition
- Image-to-video prompts for camera movements and transitions
- Use these prompts to create a sequence of scenes that match the song’s narrative
This could lead to even more engaging and contextually relevant music videos!
Feel free to check out the example videos and let me know what you think!