
Alibaba upgrades Qwen with multimodal agent features and two-hour video analysis
Alibaba has released a substantial update to its Qwen family that shifts the model toward agent-style orchestration and expanded multimodal inputs. The refreshed Qwen can accept and reason over combined text, still images and extended video files, with support for clips approaching two hours in length.
Engineers have added temporal visual parsing so the model can follow events across frames and fuse that signal with text prompts to produce actionable outputs, reducing dependence on separate pipelines for long-form media analysis. That makes Qwen better suited for chained task workflows where perception, memory and planning are executed in sequence inside a single model stack rather than via glue code and external tools.
Operationally, handling near two-hour video inputs raises requirements for sustained memory, larger context windows and higher inference throughput on cloud GPUs and inference clusters. Vendors and enterprise integrators will need to weigh batching, windowing and cost trade-offs when deploying Qwen for media-heavy workloads such as surveillance triage, marketing asset indexing and long-form content summarization.
The upgrade sits alongside other recent Alibaba releases and commercial efforts — including robotics foundation work and enhancements to cloud-hosted model tooling — that collectively point to a strategy of productizing multimodal and agent capabilities for enterprise customers. Those sibling projects emphasize on-demand tool interfaces and runtime scaling techniques, underscoring Alibaba’s push to move research advances closer to deployable services.
Competitive dynamics are sharpening: domestic rivals and startups focused on temporal video understanding and multimodal APIs will face pressure to match long-horizon video capability and integrated agent features. At the same time, customers must consider not just feature parity but deployment fit — geography, sovereignty, on-prem and in-region hosting options, and auditability — when selecting a supplier.
For enterprises, the practical path to adoption will involve red-team testing, dataset curation for temporal vision tasks, and investment in operational tooling to manage cost, latency and safety. In the short term, the model’s higher compute footprint increases deployment costs; longer term, tighter integration of perception and action could simplify application stacks and speed time-to-value for multimodal use cases.
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you

Alibaba, ByteDance and Kuaishou Unveil Next-Gen Robotics and Video AI
Chinese technology leaders released distinct AI models this week: Alibaba introduced a robotics-focused model for real-world object interaction, ByteDance launched an improved text-to-video generator, and Kuaishou rolled out a paywalled video model with longer outputs. These releases sharpen competition with Western labs on robotics, video synthesis, and agentic capabilities while raising consent and commercialisation questions.
Alibaba Qwen3.5: frontier-level reasoning with far lower inference cost
Alibaba’s open-weight Qwen3.5-397B-A17B blends a sparse-expert architecture and multi-token prediction to deliver large-context, multimodal reasoning at sharply lower runtime cost and latency. The release — permissively Apache 2.0 licensed and offering hosted plus options up to a 1M-token window — pushes enterprises to weigh on-prem self-hosting, in-region hosting, and new procurement trade-offs around cost, sovereignty and operational maturity.

Alibaba's Qwen3-Max-Thinking Positions Itself as a Viable Enterprise AI Alternative
Alibaba Cloud says its new Qwen3-Max-Thinking model matches top-tier reasoning models on established benchmarks and adds adaptive tool use and test-time scaling to boost performance. Enterprises should view this as a meaningful expansion of vendor choice, but must weigh domain fit, deployment constraints, and governance risks before adoption.
Alibaba launches Wukong enterprise agents and centralizes AI under Token Hub
Alibaba unveiled Wukong , an enterprise agent platform that will integrate with messaging and commerce systems and sit inside a new Token Hub group. The move accompanied a leadership reshuffle and produced a modest stock uptick, signaling Beijing-era competition among Chinese cloud and AI players.

DeepSeek Signals Ambition to Compete with Google with a Multimodal, Multilingual AI Search
Recent job listings indicate DeepSeek is building an AI search product that can handle text, images and audio while supporting multiple languages. The postings also emphasize engineering work on evaluation, training data and scalable infrastructure—signals that the company aims for a reliable, production-grade search and agent platform rather than a research demo.
Alibaba launches XuanTie C950 CPU tuned for agentic inference
Alibaba introduced the XuanTie C950 , a RISC-V CPU aimed at running multi-step agent workloads and targeted inference tasks. The chip is pitched as an inference-focused, low-latency alternative that could shift some control-heavy inference off constrained GPU pools—though real-world gains depend on software stacks, memory provisioning and manufacturing scale.

Alibaba expands low-cost coding tools across local AI models
Alibaba Cloud launched low-price coding subscriptions that bundle multiple domestic models, including Qwen 3.5 , with steep first-month discounts and two subscription tiers designed to drive rapid developer adoption while exposing Alibaba to usage telemetry and distribution leverage.
Alibaba International Unveils Accio Work, Enterprise Agent for SMEs
Alibaba International launched Accio Work , a no-code enterprise agent suite aimed at automating end-to-end SME operations. The move coincides with a broader internal consolidation of AI assets under a newly formed Token Hub and parallel enterprise work on an agent called Wukong, introducing both product synergy and near-term execution risk from recent personnel shifts.