AI + LLM courses specifically designed for entrepreneurs and senior professionals!!
Hi, Welcome to Build it yourself.
In this tutorial, we will build an AI video generation tool.
With this tool, you can create talking head videos where AI will take the audio and lip-sync it with the video. Look at this example video.
Disclaimer: This tool creates synthetic media. So, please use it judiciously.
- A base video – shoot yourself for 10-15 seconds without speaking anything (.mp4 format)
- Take the voice you want to use (.wav format)
Note, don’t keep any space in your file name. For ex – if the file name is “my audio.wav”, then change it to “my-audio.wav”
If your video is not in mp4 format, you can use any free tool available on the internet.
AI video generation tool
Once you have both items ready, you need to go to this Google Colab Notebook.
Colab is an online notebook where you can run small code in the browser and see the output.
Google Colab also provides Tesla T4 GPU for processing in its free plan. So, it’s very easy to run AI models with these notebooks.
Step 1 – Installation
In this step, you will download and install all the dependencies that are required to run the AI model.
Click on the play-like icon. It will take around 12 minutes to install all the requirements.
Step 2 – Add the Pre-trained AI model.
AI is all about models. We have a model to do a specific task. For example – For text to speech we use a model.
Here we will use a model that would sync the audio to a face in a video.
In this step, we will download the pre-trained model. It will take around 40 seconds to do so.
Step 3 – Upload your video and audio
Check the left sidebar and you will see a folder option, click on it.
It will show 2 folders – simple_data and video-retalking. Click on the arrow to see sub-folders
Click on examples. It further has 2 subfolders – audio and face.
Click on audio and you will see a couple of existing audio files. To add your file, see on the right side, there are 3 dots. Click on them and the first option is Upload.
Upload the file from your computer. It will take a few minutes depending on the size of the file. I suggest you keep the duration to 15 seconds
Similarly, click on face and you will see a few existing video files. To add your file, see on the right side, there are 3 dots. Click on them and the first option is Upload.
Upload the file from your computer. It will take slightly longer than uploading an audio file.
Step 4 – Select your files
Once you have uploaded both files, it’s time to check them. Run the next code and it will show 2 dropdowns. One to choose video and the other to audio.
You know what to do – select your files
Step 5 – Check your files
Once you have selected both files, we can visualize them if they were uploaded successfully or not.
Run the cell and it will display both your video and audio files.
It’s time for real action.
Step 6 – Action
Now we will run our model and pass our input audio and video to it. The model would lip-sync the audio to the input video and gives us an output.
The process would take some time. So, if you want you can take a short tea/lunch break after running the cell.
Step 7 – Check the output
Once the model is done with its process, it will display the output. You can check it.
You can download it from the left side “result” folder. The name of the file would be output.mp4
You just used a generative AI model to create a talking head video.
This was pretty cool. Right?
Unleash your creativity and build your own AI video generation tool.
Thanks for choosing to read it completely. I can try to simplify any AI concept for you.
In case you want me to simplify any AI topic, please let me know on LinkedIn – https://www.linkedin.com/in/nikhileshtayal/