[Feature request] Additional API parameters for VAD threshold/silence and prompt reset to improve long audio punctuation quality

Hi, thanks for maintaining this fantastic project. We're using `whisper-asr-webservice` with the `faster-whisper` backend and have been seeing great results overall.

I’ve encountered a known Whisper behavior in long audio files: over time, the transcription loses punctuation and capitalization, outputting lower-quality, all-lowercase text. This appears to happen when the model drifts without context reset or when audio is segmented too aggressively or not enough.

The results were varied based on a couple different GPUs tested (RTX 3060 and RTX 3090). I had someone who was able to successfully handle 10 test files by using the following settings on the RTX 3060 using the `small.en` model.

```python
vad_threshold = 0.9 # (RTX 3090 worked great with 0.7)
vad_min_silence = 10000 
asr_prompt_reset_on_temperature = 0.3
```

It would be great to support these parameters as part of the API request. This would allow for more flexible handling of different hardware environments and audio types.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] Additional API parameters for VAD threshold/silence and prompt reset to improve long audio punctuation quality #332

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature request] Additional API parameters for VAD threshold/silence and prompt reset to improve long audio punctuation quality #332

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions