BLOGs

Why Factory Scheduling Needs a Team of AI Agents, Not Just One

How multi-agent reinforcement learning is transforming real-time production scheduling in semiconductor manufacturing

Running a semiconductor factory is one of the most demanding coordination challenges in modern industry. Hundreds of machines, dozens of product types, constant breakdowns, shifting demand, and every scheduling decision ripples across the entire production floor. A choice made on one machine right now can delay a completely different product three operations downstream, hours later.

Traditional approaches rely on human-crafted dispatching rules or static optimization models that crack under this pressure. They work in controlled settings, but real factories are anything but controlled. What if, instead of one monolithic system trying to manage everything, a coordinated team of AI agents were deployed, each specializing in its own piece of the puzzle, guided by a shared strategy from above?

That is exactly the approach laid out in two recent research papers that present the science behind a new generation of factory scheduling.

The Problem: Factories Are Too Dynamic for Static Plans

In real-world factories, particularly in semiconductor packaging and testing, conditions change constantly. Machines break down unexpectedly. Rush orders arrive. Maintenance windows shift. A schedule that was optimal an hour ago may already be obsolete.

Most existing AI scheduling methods attempt to solve this by training a single agent to pick from a fixed menu of human-designed dispatching rules. That works in small, controlled settings. But in a factory with over a hundred machines, dozens of operations, and strict constraints on how and when products can be converted between lines, a single agent simply cannot keep up.

The Solution: A Leader-Follower Team of AI Agents

The approach detailed in "Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling" (Jang, Klabjan, Liu, Patel, Li, Ananthanarayanan, Dauod, and Juang, 2024) breaks the scheduling problem into manageable pieces. Each manufacturing operation is assigned its own AI agent, a "follower," responsible for the machines in its area. A separate "leader" agent sees the big picture and provides high-level goals to each follower at the start of every shift.

The followers do not receive explicit instructions. Instead, the leader communicates through abstract goal signals, numerical vectors that evolve during training until the leader and followers develop a shared language for coordination. This is far more scalable than trying to centrally control every machine.

To prevent any single bad decision from cascading into a major production loss, a rule-based safety mechanism was also developed that can override an agent's choice when it risks idling a critical machine or violating conversion time limits.

The Theoretical Foundation

The companion paper, "Learning Multiple Coordinated Agents under Directed Acyclic Graph Constraints" (Jang, Klabjan, Liu, Patel, Li, Ananthanarayanan, Dauod, and Juang), provides the mathematical framework underpinning this approach. It introduces MARLM-SR, a multi-agent reinforcement learning model with synthetic rewards, and proves that training agents to maximize these synthetic rewards also pushes the entire system toward better overall performance. The paper also introduces a Reward Generator and Distributor (RGD) that learns how to fairly allocate credit to each agent based on its actual contribution to the team outcome.

Real-World Results

The approach was validated on two real production scenarios built from Intel's high-volume packaging and test factory data. In the most challenging high-demand setting, the model reduced tardiness by 10.4% and improved the completion rate by 31.4% compared to existing state-of-the-art methods. Crucially, while competing approaches showed little improvement during training, suggesting they had hit a ceiling, the multi-agent model continued to learn and adapt.

What This Means for Manufacturing

These results point to a future where factory scheduling is not a static plan handed down from above, but a living, adaptive process driven by teams of AI agents that coordinate in real time. Each agent focuses on what it knows best, the leader keeps everyone aligned, and built-in safeguards prevent costly mistakes.

The architecture also scales naturally. Adding a new manufacturing operation means adding a new follower agent and updating the leader's view, not retraining the entire system from scratch. As product complexity grows and lead times shrink, this kind of modularity becomes essential.

For semiconductor manufacturers facing ever-tighter deadlines, volatile demand, and increasingly complex product mixes, intelligent and scalable scheduling is not a luxury. It is rapidly becoming a necessity. And this research suggests the path forward is not building a single, ever-larger AI model, but organizing a well-coordinated team of agents that mirrors the structure of the factory itself.


Both papers were developed in collaboration between Northwestern University, The Catholic University of Korea, and Intel Corporation.