WASAPI: AUTOCONVERTPCM (#1097) produces silent input streams on Windows 11 24H2 Communications-class endpoints

## Summary

Since cpal v0.17.2, `default_input_config()` for "Communications-class" USB microphones on Windows 11 24H2 returns `16 kHz mono F32` (the system Communications mix format), and the resulting WASAPI capture stream delivers genuine zero/near-zero samples — i.e. silence at the noise floor — while the **same physical microphone records normal speech levels via DirectShow on the same machine at the same moment**.

The regression bisects to PR #1097 — *"wasapi: Enable resampling and rate adjustment"* (merged 2026-01-29, released in v0.17.2 on 2026-02-08). My downstream users started reporting *"audio looks like it's capturing, but the files are basically silent with a bit of white noise"* exactly when their auto-updater pulled them through the cpal-bump release.

A precise measurement, same mic + same speaker + same minute:

| Path | Mean | Peak |
|---|---|---|
| `ffmpeg -f dshow -i audio="<mic>"` | **-28.6 dB** | **-3.3 dB** (normal speech) |
| cpal WASAPI (default_input_config + build_input_stream) | **-85.5 dB** | **-42.9 dB** (noise floor) |

That's an ~82 dB delta = ~12,600× attenuation. It's not an offset; the samples are *genuinely zero*, not misinterpreted bytes from a format mismatch (see hex dump below).

## Environment

- **OS:** Windows 11 Pro 24H2 (build 10.0.26200, Insider)
- **cpal:** v0.18.0 (downstream fork pinned to a commit based on upstream `main`; behavior is the same as v0.17.2+)
- **Hardware (reproduced on both):** USB headset (Jabra Evolve 75) and USB webcam (Logi C270 HD WebCam)
- **Working baseline:** Same machine, same mics, ffmpeg via DirectShow → normal speech levels
- **Not affected on the same machine:** Built-in `Microphone Array (Intel Smart Sound Technology)` — exposes 48 kHz stereo via WASAPI and records normally. Only the USB Communications-class endpoints are silent.

## Reproduction

1. Use a USB headset or USB webcam mic that Windows registers as a Communications-class endpoint on Win11 24H2 (verifiable: `mmsys.cpl` → Recording → properties → the device is set as both Default Device AND Default Communications Device).
2. Enumerate via cpal:

```rust
use cpal::traits::{DeviceTrait, HostTrait};
fn main() {
    let host = cpal::default_host();
    for d in host.input_devices().unwrap() {
        let name = d.name().unwrap_or("?".into());
        println!("=== {} ===", name);
        if let Ok(c) = d.default_input_config() {
            println!("  default: {:?} {} ch @ {} Hz",
                c.sample_format(), c.channels(), c.sample_rate().0);
        }
        if let Ok(configs) = d.supported_input_configs() {
            for c in configs {
                println!("  supported: {:?} {} ch @ {}-{} Hz",
                    c.sample_format(), c.channels(),
                    c.min_sample_rate().0, c.max_sample_rate().0);
            }
        }
    }
}
```

Output on the affected machine:

```
=== Microphone (Logi C270 HD WebCam) ===
  default: F32 1 ch @ 16000 Hz
  supported: F32 1 ch @ 16000-16000 Hz
  supported: I32 1 ch @ 16000-16000 Hz
  supported: I16 1 ch @ 16000-16000 Hz
  supported: U8  1 ch @ 16000-16000 Hz

=== Headset (Jabra Evolve 75) ===
  default: F32 1 ch @ 16000 Hz
  supported: F32 1 ch @ 16000-16000 Hz
  ... same as above
```

Note: cpal exposes **only 16 kHz** for these devices — *which is not a native hardware rate*. `ffmpeg -f dshow -list_options true` for the same devices lists 8000 / 11025 / 22050 / 32000 / 44100 / 48000 / 96000 Hz × 1/2 ch × 8/16-bit. **16 kHz is the Windows Communications-class mix format**, and AUTOCONVERTPCM is what makes WASAPI accept that rate via server-side resampling.

3. Build an input stream with `default_input_config()` and dump samples. Result: stream callbacks fire at the expected rate, but every sample value is 0, ±1, or extremely-near-zero noise. Decoded as `s16le`, the first 256 bytes of one capture look like:

```
00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 FF FF 00 00 00 00 01 00
01 00 00 00 00 00 FF FF 00 00 01 00 00 00 00 00 00 00 00 00 00 00 01 00
00 00 00 00 00 00 FF FF 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[…continues with the same near-zero pattern]
```

For comparison, the **same mic captured via DirectShow** in the same second (decoded to `s16le`):

```
C2 FE 53 FE 87 FE A0 FE EE FE 4F FF 7E FF D9 FF 20 00 23 00 83 00 1A 01
4E 01 5F 01 8D 01 BF 01 F3 01 1F 02 3C 02 5D 02 8E 02 CF 02 1C 03 73 03
[…normal speech signal continues]
```

The cpal samples are not misinterpreted bytes from a format mismatch — they are genuinely zero. The format negotiation succeeds; the stream just doesn't carry any signal.

## Why I believe PR #1097 is the cause

- The change in #1097 enables `AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM` in WASAPI `Initialize` so non-native rates can be requested through the server-side resampler.
- The PR thread (https://github.com/RustAudio/cpal/pull/1097) acknowledges this flag was non-standard prior to Windows 10, with no testing reported on Win11 24H2 Communications-class endpoints.
- On Win11 24H2 specifically, the WASAPI audio engine appears to apply a privacy/Communications policy when a non-Communications consumer opens a Communications-class endpoint at the Communications mix format (16 kHz F32 mono): `Initialize` succeeds, the stream "plays," callbacks fire — but the samples delivered are zero.
- Reverting to v0.15.3 (the last release before AUTOCONVERTPCM) restores normal capture on the exact same hardware. (We confirmed the timeline: our downstream stopped working when users were rolled past the cpal v0.17.2+ release; no other audio-code changes correlate.)
- The Intel Smart Sound mic array on the same machine is NOT a Communications-class endpoint, exposes 48 kHz stereo via WASAPI (without AUTOCONVERTPCM in play), and records normally.

## Suggested fix directions

1. **Gate AUTOCONVERTPCM behind an opt-in flag** rather than always-on. The PR's stated goal (issue #593) was solving a build-time failure when users request non-native rates; AUTOCONVERTPCM is one valid solution, but for callers who request the device's native rate (or who use `default_input_config()` expecting a usable stream), the flag introduces silent-failure risk on Win11 24H2.
2. **Or: probe for silent streams during stream setup.** A 100–500 ms post-`Start` check — if RMS is exactly zero over the first N buffers, retry with AUTOCONVERTPCM off and use the device's actual hardware mix format from `GetMixFormat` on the endpoint's `eMultimedia` role (instead of `eCommunications`).
3. **Or: pick the endpoint role explicitly.** `IMMDeviceEnumerator::GetDefaultAudioEndpoint(eCapture, eMultimedia)` returns a different audio session policy than `eCommunications`, even for the same physical device. cpal currently doesn't expose role selection; exposing it (or defaulting to `eMultimedia` for non-RT use cases) sidesteps the policy gate entirely.

Happy to test patches against the affected hardware here. cc @yeah-its-gloria @roderickvd.

## Downstream context

We're [screenpipe](https://github.com/screenpipe/screenpipe) — Rust + Tauri app that records audio + accessibility text continuously. We started seeing user reports immediately after our auto-updater rolled cpal v0.17.2+ to Windows users. Diagnosis credit to one of our users (William Lucas) who built the DirectShow baseline + WASAPI hex dump to isolate the regression to the cpal capture layer.

Path	Mean	Peak
`ffmpeg -f dshow -i audio="<mic>"`	-28.6 dB	-3.3 dB (normal speech)
cpal WASAPI (default_input_config + build_input_stream)	-85.5 dB	-42.9 dB (noise floor)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WASAPI: AUTOCONVERTPCM (#1097) produces silent input streams on Windows 11 24H2 Communications-class endpoints #1200

Summary

Environment

Reproduction

Why I believe PR #1097 is the cause

Suggested fix directions

Downstream context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

WASAPI: AUTOCONVERTPCM (#1097) produces silent input streams on Windows 11 24H2 Communications-class endpoints #1200

Description

Summary

Environment

Reproduction

Why I believe PR #1097 is the cause

Suggested fix directions

Downstream context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions