Is it possible to smooth FC2 input?

Hi Developers,

Recently when I try to apply smoothquant in my side (Qwen3-1.7B), I found that the **FC2** (or the **down_proj** in Qwen-like definition) is not concluded in the smoothed layers. However, I observed that the static per-tensor scaling factors in this layer input, can be **extremely large** if no smooth is applied.

```
Layer 0: {'q_proj_input': 0.009227362204724409, 'o_proj_input': 0.021776574803149606, 'gate_input': 0.010765255905511811, 'down_input': 0.15748031496062992}

Layer 1: {'q_proj_input': 0.008427657480314961, 'o_proj_input': 0.011441929133858268, 'gate_input': 0.015071358267716535, 'down_input': 1.236220472440945}

Layer 2: {'q_proj_input': 0.009781003937007874, 'o_proj_input': 0.018331692913385825, 'gate_input': 0.023375984251968504, 'down_input': 133.03937007874015}

...

Layer 26: {'q_proj_input': 0.03297244094488189, 'o_proj_input': 2.031496062992126, 'gate_input': 0.022637795275590553, 'down_input': 11.21259842519685}

Layer 27: {'q_proj_input': 0.03641732283464567, 'o_proj_input': 3.0078740157480315, 'gate_input': 0.035679133858267716, 'down_input': 23.433070866141733}
```

As you can see, the `down_input` here refers to the per-tensor scale in **down_proj** (should be the same to fc2 in OPT). When accumulated through layers, the down_input scale becomes extremely large, which means the **outliers** here explode! Then of course, the final ppl does not look acceptable (from original ~16 to quantized ~90)

If we can apply smooth to this layer, I believe the result can improve a lot. May I know if you have tried to implement that? Many thanks!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to smooth FC2 input? #108

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Is it possible to smooth FC2 input? #108

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions