
AT&T Rewrites Model Orchestration, Cuts Costs by 90%
Context and chronology
AT&T confronted a throughput problem when internal usage climbed to roughly 8 billion tokens per day, forcing a rethink of where heavy compute runs. The company’s chief data officer, Andy Markus, led a shift away from funneling all tasks into large reasoning models toward a layered orchestration approach. Mr. Markus’s team assembled a multi-agent stack that places compact, task-focused workers beneath a controlling super-agent tier, prioritizing latency and cost per transaction. This architecture was integrated with Microsoft Azure and includes a graphical workflow builder for internal teams.
Design principles and operational trade-offs
Engineers chose interchangeable model components rather than committing to one monolithic model, allowing rapid substitution as capabilities evolve. The orchestration uses retrieval-enhanced methods and a vector-backed search layer to keep decision logic anchored in AT&T’s own data, with human oversight retained as a governance control. That combination trimmed response time and reduced inference spend, with reported savings up to 90% on select workloads. The team emphasizes measuring three core properties—accuracy, cost, and responsiveness—before promoting agentic automation into production.
Adoption, use cases, and measured outcomes
The workflow tool has reached more than 100,000 employees, and usage metrics show durable daily engagement for a majority of active users. Reported productivity uplifts on some tasks reached as high as 90%, while complex engineering flows are being decomposed into chains of smaller agents that correlate telemetry, file logs, and change histories. The company offers both a no-code visual path and a pro-code path driven by Python, with surprisingly high uptake of the low-code option even among technical participants. Operational design preserves audit trails, enforces role-based access, and keeps humans on the loop during multi-step handoffs.
Developer productivity and downstream effects
By treating coding as a series of function-specific archetypes, teams produce near-production quality artifacts in far fewer iterations—an internal example cut what was a six-week build into roughly twenty minutes. Mr. Markus frames this approach as ‘AI-fueled coding,’ where focused generation replaces iterative back-and-forth, compressing delivery timelines and increasing the velocity of production-grade outputs. The approach reduces costly context switching for engineers and enables nontechnical stakeholders to prototype solutions in plain language. Taken together, these elements create a repeatable pattern for large enterprises wrestling with scale, cost, and governance.
Source: VentureBeat.
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you

Perplexity unveils Computer: a 19-model orchestration platform
Perplexity launched Computer , a cloud-native orchestrator that coordinates 19 models and is initially gated behind a $200 /month Max tier. The product signals a strategic shift toward orchestration layers and has immediate implications for enterprise vendor strategy, search infrastructure, and platform power dynamics.

Microsoft VP: Agentic AI Will Cut Startup Costs and Reshape Operations
Microsoft’s Amanda Silver says deployed, multi-step agentic systems can lower capital and labor barriers for startups much like the cloud did, citing Azure Foundry and Copilot-driven workflows that reduce developer toil and incident load — but realizing those gains depends on projection-first data, auditable execution traces, and platform primitives that make automation reversible and measurable.

Block cuts 4,000 roles to embed AI, reshaping fintech model
Block will remove more than 4,000 positions as it installs intelligence tools across operations, shrinking total headcount to roughly 6,000 . The move sits inside a wider wave of AI-linked reductions — including much larger cuts at peers — and highlights divergent corporate strategies between payroll trimming, heavy infrastructure investment and public signaling to investors.

Anthropic Sonnet 4.6 Delivers Opus-level Results at One-Fifth the Token Cost
Anthropic released Sonnet 4.6, a mid-tier model that approaches flagship accuracy while charging roughly one-fifth the per-token rate, making continuous agentic deployments far cheaper to run. The model complements Opus 4.6's platform and agent primitives—both releases together reshape procurement decisions by separating raw capability (context, primitives) from marginal inference economics.

Meta accelerates in‑house AI for moderation, cutting reliance on contractors
Meta is shifting content moderation work from external contractors to proprietary AI systems in a staged, multi‑year rollout while simultaneously ramping AI capital spending and piloting paid AI features. The consolidation speeds iteration and signal capture for safety models but collides with concurrent privacy initiatives and high‑profile litigation, magnifying regulatory, data and operational risks.

MiniMax’s M2.5 slashes AI costs and reframes models as persistent workers
Shanghai startup MiniMax unveiled M2.5 in two flavors, claiming near–state-of-the-art accuracy while cutting consumption costs dramatically and enabling sustained, low-cost agent deployments. The release couples a sparse Mixture-of-Experts design and a proprietary RL training loop with aggressive pricing, but licensing and weight availability remain unresolved.

TraceLink Scales Agentic Orchestration Across Life‑Sciences Supply Chain
TraceLink converted regulatory compliance work into an operational backbone, enabling network-driven execution and significant growth in multienterprise transactions. Key metrics: 310,000+ partners, 7 billion regulated transactions annually, and 971% YoY expansion in MINT links.
OpenAI’s Reasoning-Focused Model Rewrites Cloud and Chip Economics
OpenAI is moving a new reasoning-optimized foundation model into product timelines, privileging memory-resident, low-latency inference that changes instance economics and supplier leverage. Hardware exclusives (reported Cerebras arrangements), a sharp DRAM price shock and retrofittable software levers (eg. Dynamic Memory Sparsification) together create a bifurcated market where hyperscalers, specialized accelerators and neoclouds each capture different slices of growing inference value.