top of page
螢幕擷取畫面 2024-09-05 004125.png

Reinforcement Learning in Multi-Input Multi-Output Semiconductor Process Control

Focused on using deep reinforcement learning to control multi-input multi-output (MIMO) batch processes in semiconductor manufacturing, specifically within Chemical-Mechanical Polishing (CMP) processes

Research Background and Motivation

1. Semiconductor Manufacturing Challenges​:

   The semiconductor industry is continually innovating to develop larger wafers

   and finer line widths. A key challenge in semiconductor manufacturing is

   controlling the process parameters effectively to enhance product quality and

   yield. This includes ensuring the flatness of wafers during manufacturing,

   which is critical for subsequent steps like lithography.

2. CMP Process:

   Chemical-Mechanical Polishing (CMP) is a crucial process in semiconductor

   manufacturing that ensures the flatness of the wafer surface by polishing it

   through a combination of mechanical grinding and chemical etching.

Application in CMP Process

Reinforcement Learning Model: The model adjusts the control variables (e.g., platform speed, back pressure) to optimize the outputs. This approach is advantageous over traditional methods because it can handle the nonlinear and dynamic nature of the CMP process more effectively.

image.png

The model includes two indicators: material removal rate (y1) and non-uniformity (y2).

It also involves three controllable variables: μ1, μ2, and μ3, which represent the platen speed, back pressure, and polishing downforce, respectively.

These controllable variables are constrained within the ranges of [-1, 1] and [-3, 3]. Additionally, it is assumed that ε1,t ~ N(0, 60²) and ε2,t ~ N(0, 30²), representing white noise.

The target values for the material removal rate (y1) and non-uniformity (y2) are set at 2200 and 400, respectively, where the index t denotes the batch number passing through the CMP system.

Steps

螢幕擷取畫面 2024-09-05 010044.png

Research Results

Compares the Q-Learning-based control with traditional self-tuning controllers. The results show that Q-Learning provides more stable and optimized control, achieving closer adherence to target values for removal rate and non-uniformity.

Applying Q-Learning Control (Range [-3, 3])

螢幕擷取畫面 2024-09-05 010339.png
螢幕擷取畫面 2024-09-05 010346.png

Applying Q-Learning Control (Range [-1, 1])

螢幕擷取畫面 2024-09-05 010308.png
螢幕擷取畫面 2024-09-05 010318.png

© YuKai Huang, 2025

  • GitHub
  • LinkedIn
bottom of page