Technical Blog
News & Insights
Implementing AI in Automated Optical Defect Classification
Implementing AI in Automated Optical Defect Classification
Artificial intelligence is simultaneously transforming what semiconductors can do and how semiconductors are manufactured. Implementing AI in Automated Optical Defect Classification represents a critical intersection of these two trends — where AI algorithms, specialized hardware architectures, and advanced fabrication technologies converge to create new possibilities.
At INDNIX Technology, our AI division operates at this intersection, both developing AI-specific silicon and deploying AI within our manufacturing operations.
The Computational Demands of Modern AI
The computational requirements of AI workloads differ fundamentally from traditional computing:
Parallelism: Neural network inference and training are inherently parallel operations. A single layer of a transformer model may require billions of independent multiply-accumulate (MAC) operations that can theoretically execute simultaneously. Traditional CPUs, optimized for sequential instruction execution, are poorly suited to this workload.
Memory Bandwidth: Large AI models are often memory-bandwidth limited rather than compute-limited. A GPT-class model with 175 billion parameters requires 350 GB of storage (at 16-bit precision) just for the weights. During inference, every parameter must be read from memory for each token generated, requiring memory bandwidth exceeding 1 TB/s for real-time text generation.
Precision Flexibility: Unlike scientific computing that requires 64-bit floating-point precision, many AI workloads tolerate reduced precision. Training typically uses 16-bit or mixed 16/32-bit precision. Inference can often use 8-bit integers or even 4-bit quantized weights with minimal accuracy degradation. Hardware architectures that support multiple precision modes achieve significantly higher throughput per watt.
AI Inspection Architecture and Design
Our approach to AI Inspection design focuses on maximizing throughput per watt — the key metric for both data center (where power is the dominant operating cost) and edge (where battery life is the constraint) deployments.
Compute Array Architecture
The core of our AI accelerator is a systolic array of processing elements (PEs) optimized for matrix multiplication — the dominant operation in neural networks. Each PE performs a multiply-accumulate operation per clock cycle, and the array is organized to maximize data reuse:
- Weight stationary dataflow: Weights are loaded once into the PE array and reused across multiple input activations, minimizing memory bandwidth for weight-dominated workloads (inference)
- Output stationary dataflow: Partial sums accumulate within each PE before being written to memory, minimizing memory traffic for output-dominated workloads (training)
- Row stationary dataflow: Our flexible PE architecture supports dynamic switching between dataflows based on the layer dimensions, optimizing for each layer in a multi-layer network
On-Chip Memory Hierarchy
Our AI accelerator includes a multi-level on-chip memory hierarchy designed to keep the compute array fed with data:
- Register files within each PE provide zero-latency access to actively used weights and activations
- Local SRAM buffers (256 KB per PE cluster) store weights and activations for the current layer
- Global SRAM buffer (16 to 64 MB) stores inter-layer activation data, eliminating round-trips to external DRAM
- Compression engine applies lossless compression to activation data, effectively doubling SRAM capacity for sparse models
Defect Classification Optimization
Our hardware architecture is specifically optimized for Defect Classification operations:
- Support for INT8, INT4, and binary precision modes with automatic precision selection based on model requirements
- Hardware support for common activation functions (ReLU, GELU, Sigmoid, Softmax) with single-cycle execution
- Dedicated tensor reshape and transpose units that handle data layout transformations without consuming compute cycles
- Hardware attention mechanism support for transformer architectures, computing scaled dot-product attention in dedicated functional units
AI in Semiconductor Manufacturing
Beyond designing AI hardware, we deploy AI extensively within our own manufacturing operations:
Defect Detection and Classification
Our AI-powered inspection systems use convolutional neural networks trained on millions of defect images to classify inspection results with accuracy exceeding 99.5%. The system distinguishes between 47 defect categories — from killer defects that cause device failure to cosmetic defects that do not affect functionality — enabling automated disposition decisions that reduce human review workload by 90%.
Process Optimization
Reinforcement learning agents continuously optimize process parameters (etch times, deposition temperatures, implant doses) to maximize yield. These agents explore the parameter space through controlled experiments, learning optimal settings 10x faster than traditional design-of-experiments (DOE) approaches.
Predictive Yield Modeling
Gradient-boosted decision tree models predict wafer-level yield from inline metrology measurements with accuracy exceeding 90%. These predictions enable early wafer disposition — scrapping wafers predicted to have very low yield before they consume expensive downstream processing, saving 5 to 10% of processing costs.
Equipment Predictive Maintenance
Long short-term memory (LSTM) networks analyze equipment sensor data time series to predict failures 3 to 14 days before they occur. This advance warning enables maintenance to be scheduled during planned downtime rather than causing unplanned production stoppages.
Fabrication Technology for AI Chips
AI accelerator chips place unique demands on fabrication technology:
- High metal layer count (10 to 15 layers) to support the dense interconnect fabric between thousands of processing elements
- High-bandwidth memory interfaces supporting HBM3 or GDDR6 with I/O speeds exceeding 8 Gbps per pin
- Advanced packaging using 2.5D interposer or fan-out technology to integrate the AI accelerator die with HBM memory stacks
- Thermal design supporting power dissipation of 200 to 400 watts in data center accelerators
Our fabrication and packaging capabilities support AI accelerator designs from edge devices consuming 500 milliwatts to data center accelerators dissipating 300 watts.
Conclusion
Implementing AI in Automated Optical Defect Classification encompasses both the design of AI-specific semiconductor hardware and the deployment of AI within semiconductor manufacturing operations. At INDNIX Technology, we pursue both tracks simultaneously — creating AI silicon that pushes the boundaries of computational efficiency while using AI to continuously improve the fabrication processes that produce that silicon. This virtuous cycle of AI-designed and AI-manufactured semiconductors represents the future of our industry.