I’ve recorded thousands of hours of video over the years of my career and I can tell you that it takes a lot of preparation, work and energy. I can also tell you that if you use an AI avatar video generator like HeyGen, you hardly need any of the above, and that scares the hell out of me.
With the advent of high-quality generative video, these AI video avatars are popping up everywhere. I haven’t paid much attention to it, mainly because I like being in front of the camera and doing it for TV and social video. Still, I know that not everyone likes the spotlight and would happily hand the duties over to an avatar, and when I got a glimpse of the apparent quality of HeyGen’s avatars, I was intrigued enough to give it a try. Now I honestly wish I hadn’t.
HeyGen, which you can use on mobile or desktop, is a simple and powerful platform for creating AI avatars that can speak to the camera for you, based on scripts you supply. They’re useful for video presentations, social media, interactive avatars, training videos, and basically anything where an attractive human face can help sell the topic or information.
HeyGen lets you create digital twins that can appear in relatively static videos or videos where the other is on the move. For my experience, I chose the ‘Still’ option.
Creating a different me
There are some rules to creating your avatar and I think following them as I did resulted in the slightly off-putting quality of my digital twin.
HeyGen recommends that you start the process by taking a video of yourself with a professional camera or one of your best smartphones, but the video must be at least 1080p. If you use the free version like I did, you’ll notice that the final videos are only 720p. Upgrade later and you can start producing full HD video avatars (more on the pricing structure later).
There are even more guidelines, like using a “nice background,” avoiding harsh shadows and background noise, and a few that are essential for selling the digital twin version of you. HeyGen asked that I look directly (but not creepily, I assume) at the camera, make normal (open to interpretation) gestures below chest height, and take pauses between sentences. The last bit is actually good advice for making real videos. I have a habit of stream-of-consciousness speaking and forget to pause and create clear soundbites for editing.
Here, however, the pauses are not about what you say, at least not for the training video. It seems like it’s about learning to deal with your twin’s face and mouth, when to talk and when not to.
In any case, I could say whatever I wanted to the camera, as long as it was at least 2 minutes. More video helps the quality of new videos with your avatar.
Training to be me
I set up my iPhone 16 Pro Max and a few lights and filmed myself in my home office talking nonsense for two minutes, all the while making sure to take one-second pauses and making sure my gestures didn’t get too wild . After dropping it on my MacBook Air, I uploaded the video. At that moment it became clear that as a non-paying user I transferred almost all video rights to HeyGen. Not optimal at all, but I wasn’t about to pay $24 a month for the basic plan and take back control of my image.
The HeyGen system took a lot of time to record the video and prepare my digital twin. Once it was done I was able to create my first 3 minute video. Paying customers can create videos of 5 minutes or longer depending on the service level they choose. Payment also gives access to faster video processing.
To create a video, I selected the video format: portrait or landscape. I shot my training video in portrait, but that didn’t seem to matter. I also had to provide a script that I could type or paste into a field that accepts up to 2000 characters.
For someone who writes for a living, I struggled with the script and eventually settled on a short monologue from Hamlet. After checking the script length, the system went to work and slowly generated my first HeyGen Digital Twin video. I accidentally left a few spaces at the end of my script, because about half of it is the digital me silently vamping to the cameras. It’s disturbing.
Nothing is real
@lanceulanoff
♬ original sound – LanceUlanoff
I followed this up with a tight TikTok video revealing that the video they were watching wasn’t actually mine. In my third video and the last of my free monthly allotment, I told a joke: “Have you ever played casual tennis? It’s the same as regular tennis, but without the racket. Ha ha ha ha ha ha ha ha!” As you might have guessed, the punchline doesn’t really get through and because my digital twin never laughs and “laughs” in a completely humorless way, none of it is even remotely funny.
In all these videos I was struck by the audio quality. It is the essence of my voice, but also not my voice. It’s too robotic and emotionless. At least it is well synchronized with the mouth. The visuals, on the other hand, are almost perfect. My digital twin looks just like me, or at least a very emotionless version of me who loves Tim Cook keynote-style hand gestures. To be honest, I didn’t know what to do with my hands when I originally recorded my training video because I was afraid that if I didn’t control my often wild hand gestures, they would look bizarre with my digital twin. I was wrong. This overly controlled twin is the bizarre one.
Just no
Can an AI version of me tell a joke? Sort of. #heygen @HeyGen_Official pic.twitter.com/ODke9z67VHOctober 9, 2024
On TikTok, someone wrote: “Nobody likes this. Nobody wants this.” When I posted the video to Threads, reactions ranged from shock to dismay. People noticed my “distracting” hand gestures, called it “creepy,” and worried that such videos represented the “death of truth.”
But here’s the thing. While the AI-generated video is concerning, there’s nothing in it that I didn’t write or copy and paste. Yes, my digital twin is way past creepy and deep into unnervingly accurate, but at least it does what I want. The concern is that if you have a good two minute video of someone else speaking, can you upload it and have them say whatever you want? Possibly.
HeyGen gets credit for effectively creating a hassle-free digital twin video generator. It’s far from perfect and could be vastly improved if users also let it train on emotions (the right looks for ‘funny’, ‘sad’, ‘angry’, you get the idea) and a wider variety of facial expressions (a smile or two would be nice). Until then, these digital twins will be our emotionless doppelgängers, waiting for our video bids.