Tech

Soul App Getting Ready to Use Realistic Avatars for Social Interaction

AI-powered video animation is generally considered the domain of content creation/publishing platforms. Most social networking apps haven’t considered the potential of video animation, more specifically interactive avatars, in digital connections. But, Soul App, a popular social networking platform, intends to change the status quo.

A sought-after social networking app in China, Soul recently presented its research, titled “Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation,” at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025.

Outside the realm of hardcore content creation, talking head animation is still in its infancy. At this time, Replika, a chatbot app, is one among just a handful of companies that are using live avatars for one-on-one conversations with an AI. But, at best, Replika’s animations can be termed rudimentary as the emphasis is on the interaction more than on the visuals.

In comparison, Soul App is going for high-fidelity talking head animation that can accurately capture facial expressions and body movements, including accessory movements. So, Soul is an outlier in this sense, but then the company has been doing things differently from the get-go.

Although it is a social networking platform, Soul does not rely on offline connections to build the user’s online circle. Instead, the app uses common interests as the central theme. This created the need for a system that could analyze data and come up with relevant matches based on it.

Soul’s management quickly gauged how artificial intelligence could become an integral part of this process. Soon enough, the company debuted its homegrown Lingxi engine that compiled user responses and content preferences to connect them with like-minded individuals on the platform.

This first tryst with AI motivated Soul App’s team to explore the technology further. In recent years, the social networking platform has added speech synthesis, music generation, and voice call models to its already impressive portfolio of AI tools. But Soul App’s most prolific release has been the self-developed large language model, Soul X, which was launched in 2023.

All of these were milestones in the journey towards offering Soul’s users AI-powered lifelike personas, multimodal understanding, and multilingual communication. The idea was to allow users to enjoy interactions with people who shared their interests, as well as human-like emotional companionship and interactions with AI entities.

Late in 2024, Soul App debuted its end-to-end voice call large model that enabled users to have realistic voice calls with AI partners. So, the next step was naturally to give these AI-powered entities a visual avatar.  Although the progression is logical, the question remains: why the interest in talking head animation?

Thus far, these high-fidelity animations have only been seen in video content used in academia, customer support, employee training, and brand building. 

However, a study conducted to understand how talking heads impact the quality and efficacy of instructional videos found that these captivate user attention by increasing social presence (the feeling of being spoken to/tutored personally). There is also some evidence to suggest that when used for brand building, talking heads instill confidence and trust by humanizing the brand.

So, it can be postulated that when used in social networking, realistic talking head animations would enhance user engagement and make for more emotionally fulfilling interactions. That said, it’s also important to state here that Soul App has steadfastly pursued its goal of “Social + AI” over the last few years.

It was never about replacing human connections. Instead, Soul App wanted AI to work as a catalyst for better human-human interactions, and even as an interlocutor of sorts, while always being available for one-on-one chats.

The CTO of Soul, Tao Ming, explained both the company’s goals and the expected results of realistic talking head animation perfectly when he said that when it comes to communication complexity, face-to-face human interactions are crucial not just for information exchange but also for effective communication.

He added that for human-computer interactions to be equally effective, they need to adopt a similar approach. Soul App’s research presented at CVPR 2025 is a distinctive step in this direction. For starters, it competently tackles some of the prevailing issues with existing diffusion models.

These are notorious for their extensive use of computational resources, which slows down the output. Furthermore, despite their penchant for energy guzzling, they fall short when it comes to accurately capturing and presenting facial expressions and body movement.

To deal with these issues, Soul App’s researchers reverse-engineered the key steps of diffusion-based models. These were then restructured with the inclusion of one-step diffusion techniques and large language models. This resulted in the splitting of talking head tasks into two distinct modules:

  1. FMLG (Facial Motion Latent Generation):  An autoregressive language model is used to capture facial expressions. It leveraged large-scale learning and efficient diversity sampling to generate accurate and varied facial motion.
  2. ETM (Efficient Temporal Module): A one-step diffusion process is employed to create realistic body muscle and accessory movements.

Together, these two modules not only deliver exceptional video generation efficiency but also offer conspicuous improvements in the depiction of facial-body coordination, micro-movements, and naturalness. Simply put, the animation can react quickly and in a more human-like manner than was ever possible.

Soul is known to rapidly roll out its AI accomplishments on the platform by integrating them into the app’s features. At this time, AI is being used to power several of Soul App’s features, including AI Partners, the gaming scenario Werewolf Awakening, and others.

As far as “Teller Talking Head” research is concerned, it will likely be made a part of Soul X, which will take the model’s multimodal capabilities to another level. When it comes to usage scenarios, there is a very good possibility that talking heads will become a part of Soul App’s virtual companion feature in the near future. After all, giving users the feeling of talking to a virtual being that can emote and respond as well as the regular John or Jane has been the goal all along! 

Leave a Reply