From app applications to big model application ecosystems, big model application vendors have reached a stage where they need to demand computing power from chips.
The opportunity for mobile side AI has brought together two industries that originally had little intersection.
Zhang Li, Senior Director of Ecological Development of MediaTek’s Wireless Communication Business Unit, told reporters at a developer conference that some large model manufacturers have seen the transition from app applications to large model application ecosystems, and have reached a stage where computing power is needed for cloud chips and terminal chips.
Recently, MediaTek has launched the “Dimensity AI Pioneer Program” and provided relevant developer solutions to support large model manufacturers in implementing end-to-end AI technology and innovating end-to-end generative AI applications. In Zhang Li’s view, in the near future, generative AI and chips will be strongly correlated.
Over the past year, as generative AI moves from the cloud to the terminal, chip companies like MediaTek have begun to smell more opportunities. They gradually discovered that,
Unlike large server clusters in the cloud, smart terminals used to be limited by miniaturization and computing power, but their unique advantage lies in having better privacy. The industry is gradually seeing the potential of end-to-end AI scenarios such as AI smartphones, AI PCs, and smart cars. More importantly, when large models are installed in smartphones, large model enterprises can deploy AI locally to avoid expensive cloud computing costs.
However, the number of large models that can be run on mobile devices is currently around 7 billion parameters, and the supported application scenarios for large models are still limited. The explosion of popular AI applications requires end-to-end computing power and ecological support. In addition to MediaTek, the industry is also exploring miniaturized models suitable for end-to-end operation.
Large model enterprises knock on the door of chip factories
The expensive computing power of large model cloud computing is one of the driving forces for the end-to-end application of large models.
There are reports that ChatGPT needs to respond to over 200 million requests per day, and its power consumption may exceed 500000 kilowatt hours per day. A senior executive from a large model application manufacturer also told reporters that Sora is still not open to the public for use, mainly due to high computational costs.
By contrast, using decentralized end-to-end computing power to perform calculations on personal terminal devices such as mobile phones and computers without relying on networking is considered to replace some cloud computing power and reduce the computational cost of users using large model services. However, the characteristic of large model parameters requires high computational power, which is precisely the limitation of terminals such as mobile phones. At present, MediaTek and Qualcomm mobile chips can support up to tens of billions of parameter large language models, while mainstream large models running in the cloud often have hundreds of billions of parameters.
“Nowadays, mobile computing power can support a large model with 7 billion parameters, and up to that, it could be over 10 billion parameters,” a chip industry insider told reporters.
One reason why mobile phones, represented by mobile phones, are currently not suitable for carrying high computing power chips is power consumption. Yang Lei, Product Director of Anmou Technology, pointed out that most PCs, tablets, smartphones, and smart glasses are powered by batteries, and power consumption and battery capacity determine the device’s battery life. A high-performance GPU often consumes hundreds of watts, making it more suitable for cloud use. Generally, the power consumption of mobile phones does not exceed 10 watts.
In the case of limited computing power, carrying large models on the end side faces multiple challenges. Li Yanji, Deputy General Manager of MediaTek’s Wireless Communication Business Unit, told reporters that large model manufacturers hope to efficiently operate large models on the end side, but face two types of problems. One is operational efficiency, including power consumption and speed, and the other is that memory usage may be too high. Directly placing cloud trained models on mobile devices will encounter the above problems, and large model manufacturers are very concerned about these optimization solutions.
“There are many difficulties when collaborating with large model manufacturers. For example, installing 7B (7 billion) and 13B models into small devices like smartphones is a great challenge. We must use Neuron Studio in our development kit to quantify and compress, and create the best and smallest network structure.” Li Junnan, Technical Planning Director of MediaTek’s Wireless Communication Business Unit, told reporters.
Based on the demand for computing power, chip manufacturers have come closer to large model manufacturers. “Without the support of chips, both the end and cloud sides will face the same challenges.” Zhang Li told reporters that large model applications will tend to start from the bottom of the chip and explore the possibility of creating new experiences for mobile platform users.
When will popular apps appear?
After the concepts of AI phones and AI PCs were proposed, popular applications based on these intelligent terminals did not emerge.
The reporter learned that currently, end-to-end computing power is one of the reasons limiting the implementation of large model applications on mobile phones. “At present, the functionality of mobile AI is still relatively limited, and it can support image editing. However, it is not yet feasible to conduct large model voice conversations in case of network disconnection. Image based models do not require large parameters, while voice based models are even larger.” Industry insiders in the chip industry told reporters.
Zhang Lize told reporters that popular end to end applications will emerge, and this is not absolutely related to chip manufacturing processes and capabilities. At first, people thought that mobile phones seemed to be unable to move, and app manufacturers seemed at a loss because past rules had already been established, competition and traffic were relatively tight, and many developers did not have new opportunities. Generative AI provides developers with more tools. Technology brings user experience innovation, and in this case, there is no need to worry about whether explosive products will appear, but it is uncertain when they will appear.
To promote the implementation of AI applications on mobile phones and other devices, the demand for computing power from large model applications to chips is one aspect, while improving the overall computing power on the end side and the performance of small models are two other aspects. Industry outlook for further improvement in mobile computing power.
Yang Lei believes that the computing power of flagship mobile phone chips can reach 40-50 TOPS, while that of mid-range phones can reach 10-20 TOPS. entry-level phones are not specifically equipped with AI capabilities, and it is predicted that with the evolution of semiconductor technology, the computing power level of flagship phones is expected to reach 100 TOPS, and entry-level phones will increase to 5-10 TOPS. In two years, phones are expected to have the hardware computing power to deploy AI large models locally.
To meet the requirements of decentralized end-to-end operations, large models are also moving towards miniaturization.
In April this year, Meta released two open-source large models of the Llama 3 series, with parameter quantities of 8B and 70B, respectively. Fu Sheng, Chairman and CEO of Cheetah Mobile, stated that the performance of the small parameter model Llama 3 8B is better than that of the previous generation’s large parameter Llama 2 70B, which confirms that the ability of the small parameter model will rapidly improve. Zhang Junlin, the person in charge of new technology research and development on Sina Weibo, also believes that the most important change for Llama 3 is the significant expansion of training data. After fixing the size of small models and increasing the amount of data, the effect will continue to improve. Zhang Junlin told reporters that the capacity of the small model is not yet at its limit.
The rapidly improving capabilities of small models have led some industry insiders to predict that small models will accelerate their implementation in smart terminals. “Small model SLM is very popular now. By reducing the size of the model through good training, it seems that the ability of small models is also very good. The ability below 3B is good, which is a very favorable trend for the end side. Apple may also be laying out such small models.” Li Junnan said that the memory bandwidth bottleneck related to AI computing power on the end side can also be solved by using small models.
From the results released, it can be seen that Apple is also focusing on small models and breaking through the limitations of end side parameters. According to a research report by Huafu Securities, in addition to the MM1 model with 3 billion parameters, Apple is more focused on the layout of end side models, with the end side scenario based small model ReALM having a minimum parameter of 80 million. Apple also proposed using flash memory to solve the memory bottleneck of large model operation, and its Flash-LLM solution increases the number of model parameters that can be run on the end device to twice the original. With the June WWDC and subsequent press conferences, Apple related products are expected to be launched.
As for the future form of popular apps on the end, Zhang Li stated that MediaTek has two dimensions when looking at apps: one is to look at the existing top apps that are innovating and generating AI, and the other is to look at the newly emerging apps. It is currently uncertain from which category popular apps will emerge. Both the cloud and the end side may have large model applications running out, in contrast, the cloud may be more suitable for innovation from 0 to 1, while the end side may be more suitable for perception and some better innovations.
Li Yanji believes that multimodal large model inputs and outputs such as images and videos should soon be implemented on mobile phones. In addition, the expert system will also train many small models that can be switched according to user needs. The demand for mobile devices has become increasingly clear, such as the trend towards personalized capabilities and local computing.