top of page

Alibaba Unveils Lightweight AI Model for Image and Video Processing on Mobile Devices

tech360.tv

Alibaba Group Holding has launched a new multimodal artificial intelligence model, Qwen2.5-Omni-7B, capable of processing text, images, audio and video directly on smartphones, tablets and laptops.


Purple geometric logo and text "Qwen2.5-Omni" on black background. Blue streaks flow from the logo, creating a dynamic, futuristic feel.
Credit: ALIBABA

The model, introduced on Thursday, is the latest addition to Alibaba’s Qwen family and is designed to run locally on devices with limited computing power. With only 7 billion parameters, it enables real-time responses in text or audio without requiring an internet connection.


Cartoon bear named Ethan talking, saying "I can explain PPTs, web materials, and more." Background shows blurred document and waveforms.
Credit: ALIBABA

Qwen2.5-Omni-7B is open-source and available on Hugging Face, Microsoft’s GitHub and Alibaba’s ModelScope. It is also integrated into Alibaba’s Qwen Chat.


Alibaba highlighted potential applications such as providing real-time audio descriptions for visually impaired users and offering cooking guidance by analysing ingredients. The model’s ability to handle multiple input types reflects growing demand for AI systems that extend beyond text generation.





In benchmark tests, Qwen2.5-Omni-7B scored 56.1 on OmniBench, outperforming Google’s Gemini-1.5-Pro, which scored 42.9. It also achieved 92.4 on the CV15 audio benchmark, surpassing Alibaba’s earlier Qwen2-Audio model by one point.


For image-related tasks, the model scored 59.2 on the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark, beating the Qwen2.5-VL vision-language model.


The release aligns with a broader industry trend toward efficient, multimodal AI models that prioritise portability and data privacy. These models can operate without cloud-based processing, reducing reliance on external servers.


Other tech firms are also advancing in this space. OpenAI recently added image generation to its GPT-4o model, while ByteDance introduced InfiniteYou, a tool that re-crafts images while preserving subjects’ identities. In January, DeepSeek released Janus-Pro, an updated version of its multimodal model.


Alibaba’s Qwen models have become popular among AI developers in mainland China, positioning the company as a key competitor to DeepSeek’s V3 and R1 models.

 
  • Alibaba launched Qwen2.5-Omni-7B, a multimodal AI model for mobile devices

  • The model processes text, images, audio and video locally without internet

  • It outperformed Google’s Gemini-1.5-Pro in benchmark tests


Source: SCMP

As technology advances and has a greater impact on our lives than ever before, being informed is the only way to keep up.  Through our product reviews and news articles, we want to be able to aid our readers in doing so. All of our reviews are carefully written, offer unique insights and critiques, and provide trustworthy recommendations. Our news stories are sourced from trustworthy sources, fact-checked by our team, and presented with the help of AI to make them easier to comprehend for our readers. If you notice any errors in our product reviews or news stories, please email us at editorial@tech360.tv.  Your input will be important in ensuring that our articles are accurate for all of our readers.

Tech360tv is Singapore's Tech News and Gadget Reviews platform. Join us for our in depth PC reviews, Smartphone reviews, Audio reviews, Camera reviews and other gadget reviews.

  • YouTube
  • Facebook
  • TikTok
  • Instagram
  • Twitter
  • LinkedIn

© 2021 tech360.tv. All rights reserved.

bottom of page