
Reinforcement Learning in Multi-Input Multi-Output Semiconductor Process Control
Focused on using deep reinforcement learning to control multi-input multi-output (MIMO) batch processes in semiconductor manufacturing, specifically within Chemical-Mechanical Polishing (CMP) processes
Research Background and Motivation
1. Semiconductor Manufacturing Challenges:
The semiconductor industry is continually innovating to develop larger wafers
and finer line widths. A key challenge in semiconductor manufacturing is
controlling the process parameters effectively to enhance product quality and
yield. This includes ensuring the flatness of wafers during manufacturing,
which is critical for subsequent steps like lithography.
2. CMP Process:
Chemical-Mechanical Polishing (CMP) is a crucial process in semiconductor
manufacturing that ensures the flatness of the wafer surface by polishing it
through a combination of mechanical grinding and chemical etching.
Application in CMP Process
Reinforcement Learning Model: The model adjusts the control variables (e.g., platform speed, back pressure) to optimize the outputs. This approach is advantageous over traditional methods because it can handle the nonlinear and dynamic nature of the CMP process more effectively.

The model includes two indicators: material removal rate (y1) and non-uniformity (y2).
It also involves three controllable variables: μ1, μ2, and μ3, which represent the platen speed, back pressure, and polishing downforce, respectively.
These controllable variables are constrained within the ranges of [-1, 1] and [-3, 3]. Additionally, it is assumed that ε1,t ~ N(0, 60²) and ε2,t ~ N(0, 30²), representing white noise.
The target values for the material removal rate (y1) and non-uniformity (y2) are set at 2200 and 400, respectively, where the index t denotes the batch number passing through the CMP system.
Steps

Research Results
Compares the Q-Learning-based control with traditional self-tuning controllers. The results show that Q-Learning provides more stable and optimized control, achieving closer adherence to target values for removal rate and non-uniformity.
Applying Q-Learning Control (Range [-3, 3])


Applying Q-Learning Control (Range [-1, 1])

