January 26, 2022
BEIJING – On Feb 4, the Beijing Winter Olympics will kick off. An AI TV anchor that has a good command of the sign language will live broadcast the events on a CCTV app, ensuring Chinese audience who have hearing difficulties can enjoy the Games.
The AI sign language anchor, supported by the world’s largest sign language database with up to 200,000 pieces of data, is expected to provide “warmth” to the targeted audience with its up-to-the-second precise interpretation.
The database was developed by a team composed of top faculty members, and students with hearing difficulties from the Technical College for the Deaf at Tianjin University of Technology, the country’s top college devoted to those with hearing difficulties.
Yuan Tiantian, vice-dean and professor at the college, shared with China Daily reporter Yang Cheng the technologies and her insights on boosting the country’s support to people with hearing difficulties.
Following are excerpts from the interview:Q: The country’s first AI sign language anchor will serve the targeted audience during the Winter Olympics and the Winter Paralympics. The world’s biggest corpus behind the AI TV anchor was created by your team. Could you explain to us the leading technologies of the database? During your research in the past six years, were there any difficulties you faced? Were there any shining moments you could share with us?
Yuan: Let me first explain two basic technologies – the “sign language recognition” and “sign language generation” – the key points I use to explain to outsiders the technological arena.
The “sign language generation” refers to the technology that helps generate the sign language for the audience.
However, our technology is in the area of the sign language recognition – recognizing the sign language with the particular word order from the people with the hearing difficulties and then transforming them into the word order.
Here I have to stress their word order is different from us. For example, they place the predicate at the end of the sentence, while we put them in the middle of the sentence, just between the subject and the object in modern Chinese.
For example, when those people write down some messages, their word order sometimes mislead us.
So the AI hostess needs to express with the audience group’s word orders.
The sign language recognition and generation technologies are complementary with each other.
We are dealing with the sign language recognition part: we process the information from the word orders of people with hearing difficulties, transfer them into ours.
And then the sign language generation technology transfers our word order into the word order of the people with hearing difficulties; and it drives the AI anchor to show the right sign language.
The international research circle concluded that the sign language recognition is much more difficult than the generation in various technologies involved.
Our process includes transforming the video of the sign language movements into the text in their word order.
We have at least three channels, namely expression, limb and hand gesture.
We also conducted research on action recognition and the analysis of their intention. The limb has 18 points to analyze, a hand — 21 points and a face — more than 100 points to analyze. All those are challenging AI and algorithm.
Many domestic and international companies hope to join the forces, but when they find sign language recognition is much more difficult than sign language generation, they retreat.
The voice recognition is easier.
In terms of the shining moments during the studies, I think, I really hope to express that it was the support and contributions from a great number of the people with hearing difficulties that impressed all of us.
They provided their sign language to us to accumulate the database and the database, turning it into the world’s largest and helping make it ever growing.
As the senior prefer to use sign language more than the younger generations, who prefer to use voice recognition, they have made great contributions for us to form the database.
When we told them that we needed to form the database, many elderly, despite of their age and health conditions, were very enthusiastic to help us to show their sign language…we have been extremely moved.
It was the kindness and warmth from the group of the enthusiastic and kind people, who volunteered to provide their sign language in different fields, including some very nice elderly people, that enhanced our confidence and gave us strength to go ahead with the challenging studies.Q: Could you enlighten us on the advantages of the AI hostess? What’re the differences with a human sign language interpreter? Why such an AI-supported TV anchor was chosen for the upcoming Winter Olympics?
Yuan: I think the AI TV anchor could help with the continuous translation for the long texts, and ensure the least loss of the information.
Researches indicate and sign language users also conclude that human sign language interpreters can lose some information.
But the AI robots backed with our database can overcome this major barrier.
Statistics indicated that within special scenarios, the correct rate of sign language recognition could top 97 percent while the correct rate of sign language generation could be even higher.
The CCTV announced in November 2021 that it hoped to use the AI anchor in its TV broadcast.
In a bid to learn the particular journalism language, we studied the TV sign language anchor at the program Common Interest.
We teamed up with CCTV to broadcast on its app. It received widespread recognition, so that CCTV hoped to promote the AI anchor during the Winter Olympics on its app again.
We also upgraded our database with the Winter Olympics’ particular scenarios.
The database is ever upgrading.Q: As far as we know, your college has revealed an ambitious target that it will expand the corpus to 1 million pieces of data and after the Winter Olympics, the research team expects the technology could better serve medical and transportation areas. Is this information correct?
Yuan: Yes. We expect that it could help the deaf in their shopping, transportation and other aspects of daily life.
For example, during the fifth World Intelligence Congress in Tianjin last year, many visitors were attracted by our exhibitions…some indicated they expected to apply the technology in shopping malls.
The application during the live broadcast of the Winter Olympics is by no means the only target that we are pursuing.
Because we are the teachers of the group at the college, we have profound insights on the people’s demand and we will never cease our efforts.
Some research institutions could promote their results very fast, but we are not of that kind…in fact, we need much more time to obtain results.
Q: Are there any similar research results in Western countries? Are there any advantages and disadvantages that your team’s research has?
Yuan: I think the CV (computer vision) plus NLP (natural language processing) area is among the hottest ones in global research circle.
Amid the fourth industrial revolution, AI-supported language recognition has seen rapid development.
We all hope the AI technologies we are developing could work 24/7 to help the people who need help, for whom the AI development is shaping a new future.
Compared with the Western countries, our technologies still lag behind and have a funding shortage.
However, we have greater dedications to people in need.
We now have a team of 200-300 members to help the Winter Olympics project in the sentence transformation between the languages and in the future we expect the member could rise.
Our computing science team is composed of up to 50 members from our own college and the computing science college at our university. They are devoted to the aid for the people.
We also expect more companies to join our forces to support disabled people.