Race heats up for Hangeul AI

From Naver to Google, global tech giants are working on Korean-language AI models to take advantage of K-culture boom.

Jie Ye-eun

Jie Ye-eun

The Korea Herald

20230618000148_0.jpeg

(Getty Images)

June 19, 2023

SEOULThe following series is part of The Korea Herald’s “Hello Hangeul” project which consists of interviews, in-depth analyses, videos and various other forms of content that shed light on the stories of people who are learning the Korean language and the correlation between Korea’s soft power and the rise of its language within the league of world languages. – Ed.

Amid the immense popularity of Korean-made content from music to TV shows, an increasing number of people around the world are eager to learn Hangeul, the Korean alphabet. And the world’s most sophisticated artificial intelligence models are no exception.

Korea itself may not be a sizable market, but the potential of Hangeul and related AI services seems almost unlimited, according to experts here.

“Korean content has played a crucial role in promoting the value of Korean culture and language to the world, prompting renewed interest in Hangeul among AI developers around the world,” said Kim Se-hyun, a technical director of the Korea Artificial Intelligence Association.

Park Jin-ho, a Korean language and literature professor at Seoul National University, also offered a rosy outlook for Korean-language AI services.

“In line with the Korean culture boom, their fans around the world would turn to the services to better understand Hangeul,” Park said.

Experts predicted Korea-made AI models will be widely used in Southeast Asia, China and Japan as the countries are geographically closer to Korea and their people are more into the Korean culture. But they agreed much of the technological progress could happen in the US, the world’s largest AI market, which means fierce competition among AI developers at home and abroad.

Fiercer competition

An increasing number of big tech companies are investing in learning Hangeul, as the demand for Korean-language-based AI services grows larger and faster.

Last month, US tech giant Google chose Korean and Japanese as the first foreign languages for its AI-based chatbot Bard in a bid to renew its competition with ChatGPT, backed by its archrival Microsoft. It was an unexpected announcement considering Korea is one of the few markets where Google is not a dominant search engine. In a country with 51 million people, Google’s market share stands at about 30 percent.

During the developers’ day event, Google CEO Sundar Pichai explained that Korean was the most appropriate language for its program development, saying “From the point of view of an English speaker, Korean and Japanese are quite difficult.”

German AI firm DeepL also launched the Korean translation service in January, choosing Korean as the company’s 31st language and the fourth Asian language after Chinese, Japanese and Indonesian. The company also focused on the potential of the Korean language.

“We’ve been getting a lot of requests for Korean language support … We were surprised to see more-than-expected interest from users,” DeepL founder and CEO Jarek Kutylowski said during a press conference in Seoul last month.

Industry watchers say it is very natural for global companies to seek business expansion in Korea by taking advantage of the recent K-culture boom. But they remained skeptical about their commercial success here.

“Korea is one of the numerous markets where they offer services. It will be challenging for them to develop AI models more specifically designed for the Korean language,” an industry official said on condition of anonymity.

Fiercer competition

(123rf)

Boosting home advantage

Korean tech giants Naver and Kakao are also going all-out to secure a competitive edge against their global rivals by developing more advanced AI models that outpace ChatGPT when it comes to Korean language capabilities.

Naver, the operator of the nation’s No. 1 web portal, plans to launch its hyperscale AI model, called HyperClova X, this summer. It will be the third of its kind after those in the US and China and the largest one specialized in Hangeul, the company said.

HyperClova X has been trained in both Korean and English, but its biggest strength is its ability to better understand the social and cultural context, as well as its linguistic supremacy thanks to its sizable Naver-compiled database, the nation’s dominant portal site.

“We pin hopes on building our AI-integrated service ecosystem in regions such as Japan, Southeast Asia and the Middle East, where our Hangeul-based services are already in high demand,” a Naver official said. “The growth potential seems adequately high considering the Hallyu craze.”

Kakao, the operator of South Korea’s No. 1 messenger app KakaoTalk, also plans to unveil the upgraded version of its Korean-language AI model, KoGPT, in the third quarter. The company’s AI model, developed by its AI unit, Kakao Brain, has been primarily trained on Korean texts. The company said it boasts competitiveness in communicating in the Korean language more efficiently and accurately.

Tricky language to learn

Experts agree that Hangeul is a tricky language even for AI models to learn due to a complicated grammar structure that is totally different from English — the most common language in AI study.

The Korean language capabilities of global AI models have drastically improved in recent years, especially for popular services like translation, summarization and answering simple questions. But they still lag behind their English-based services.

“The Korean language is grammatically different from other languages. Sentence structures and expressions are considered more complicated. Learning the language is also linked to the understanding of Korea’s unique culture and characteristics,” said Kim of the AI institute.

Currently, most generative AIs, mainly used in English-speaking countries, use a technique of tokenization — a way of breaking a piece of text into smaller units called tokens, including words, characters or subwords. While it may be a suitable system for English, is not completely applicable to the Korean language, he added.

That’s why Naver and Kakao have decided to develop their own token-sharing methods suitable for Korean morphemes to overcome these limitations, according to Park, the SNU professor.

“It is crucial to fully understand the characteristics of a language to develop a new AI model, not to mention the Korean language, which has many irregular predicate elements,” the professor said. “I hope that local firms can come up with a successful Korean-language AI model, so it can be shared with other non-Korean speaking firms.”

scroll to top