GPT takes another giant step forward

GPT creates new possibilities and GPT-4 takes a new giant step forward by being able to recognise images and analyse them, says the writer.

Zhang Zhouxiang

Zhang Zhouxiang

China Daily


A response by ChatGPT, an AI chatbot developed by OpenAI, is seen on its website in this illustration picture taken Feb 9, 2023. [Photo/Agencies]

March 21, 2023

BEIJING – On March 15, Open AI released its latest technology, GPT-4, which it says marks major progress compared with the previous GPT-3 in three areas: creativity, visual input and longer context, expanding its recognition of text input from 3,000 to 25,000 words.

This represents another advance for GPT, the acronym for Generative Pre-trained Transformer, a deep learning technology that uses artificial neural networks. Ever since the first computer was invented, how to make machines understand human’s intentions has always been the major challenge for computer engineers and software developers. Each innovative solution produces a new generation of machines.

The first programmable general-purpose electronic digital computer, the Electronic Numerical Integrator and Computer (ENIAC) was developed during World War II by the United States. It required scientists to insert paper clips into holes. When the disk operating system was invented, the users had to type in commands. Now the mainstream input way is just to click on-screen buttons. Over the past eight decades, computers have advanced so much that they can now create a virtual world for users, but the progress in input is so slow that audio input can only be used as an auxiliary input method. Even audio input did not emerge until the fast development of AI technology over the past 10 years.

Now GPT creates new possibilities. The GPT technology is based on the machine learning large amounts of human language models so as to understand them, of which the GPT-3 is a master of understanding the text language. A simple test is to tell ChatGPT to “calculate the root of the age of the host of The Daily Show”. To do so, it must determine who hosts the program, find his age, then do the calculation.

GPT-4 takes a new giant step forward by being able to recognize images and analyze them. For example, there are now apps that help the visually impaired by linking them to volunteers with discerning eyes via video chat so that the latter can help the former to recognize items when necessary; apps based on GPT-4 can hopefully help them instead in the near future.

Besides, some smart home appliances can now operate on audio commands, but in the future the user might just need to come home and point to the air conditioner, a gesture that could be picked up by the AI based on GPT-4 to turn it on. When the user leaves home, all he/she needs to do is wave a goodbye to the camera at the door, and the door will automatically shut, lock, and make sure all unnecessary appliances are turned off in the house.

Such scenarios are not that far off with the development of GPT-4.

scroll to top