StudentResearch

Reinforcement Learning in Multi-Input Multi-Output Semiconductor Process Control

Focused on using deep reinforcement learning to control multi-input multi-output (MIMO) batch processes in semiconductor manufacturing, specifically within Chemical-Mechanical Polishing (CMP) processes

Research Background and Motivation

1. Semiconductor Manufacturing Challenges:

The semiconductor industry is continually innovating to develop larger wafers

and finer line widths. A key challenge in semiconductor manufacturing is

controlling the process parameters effectively to enhance product quality and

yield. This includes ensuring the flatness of wafers during manufacturing,

which is critical for subsequent steps like lithography.

2. CMP Process:

Chemical-Mechanical Polishing (CMP) is a crucial process in semiconductor

manufacturing that ensures the flatness of the wafer surface by polishing it

through a combination of mechanical grinding and chemical etching.

Application in CMP Process

Reinforcement Learning Model: The model adjusts the control variables (e.g., platform speed, back pressure) to optimize the outputs. This approach is advantageous over traditional methods because it can handle the nonlinear and dynamic nature of the CMP process more effectively.

The model includes two indicators: material removal rate (y1) and non-uniformity (y2).

It also involves three controllable variables: μ1, μ2, and μ3, which represent the platen speed, back pressure, and polishing downforce, respectively.

These controllable variables are constrained within the ranges of [-1, 1] and [-3, 3]. Additionally, it is assumed that ε1,t ~ N(0, 60²) and ε2,t ~ N(0, 30²), representing white noise.

The target values for the material removal rate (y1) and non-uniformity (y2) are set at 2200 and 400, respectively, where the index t denotes the batch number passing through the CMP system.

Steps

Research Results

Compares the Q-Learning-based control with traditional self-tuning controllers. The results show that Q-Learning provides more stable and optimized control, achieving closer adherence to target values for removal rate and non-uniformity.

Applying Q-Learning Control (Range [-3, 3])

Applying Q-Learning Control (Range [-1, 1])