-
Notifications
You must be signed in to change notification settings - Fork 80
tutorial whisper #3760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
tutorial whisper #3760
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| --- | ||
| title: Building speech to text with Whisper | ||
| logo: openai | ||
| category: ai | ||
| permalink: /tutorials/whisper | ||
| modified_at: 2026-05-26 | ||
| --- | ||
|
|
||
| Whisper is an automatic speech recognition model that converts speech to text. It was trained on a large, multilingual audio corpus, which makes it robust to different accents, background noise, and real-world conditions. As an open-source model, it is well suited for developers who want to integrate speech-to-text without depending entirely on a proprietary API. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| Instead of relying on an external SaaS API, Whisper can run directly inside a web application using `faster-whisper`. This implementation keeps the same model family while improving inference speed and reducing resource usage. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| In this tutorial, a small speech-to-text demo is deployed on Scalingo using a FastAPI backend, a minimal HTML/JavaScript frontend that records audio in the browser, and `faster-whisper` running on CPU in a single web container. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Planning your deployment | ||
|
|
||
| For this kind of application, it is recommended to start with an M container and move to a larger size if startup time or inference latency becomes an issue. The application warms the model in the background at startup and stores downloaded model files under `/tmp/models`. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| The application supports two environment variables: `MODEL_USE`, which defaults to `small`, and `MODEL_CACHE_DIR`, which defaults to `/tmp/models`. Starting with `MODEL_USE=small` is a good default, then moving to a larger model only if better accuracy is required. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Deploying the application | ||
|
|
||
| ### Using the command line | ||
|
|
||
| 1. Clone the repository: | ||
|
|
||
| ```bash | ||
| git clone https://github.com/Scalingo/whisper-speech-to-text | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue: the repository does not exist.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wait for the tutorial to be validated before create the repo, you have it here: https://github.com/SC-Samir/whisper-scalingo |
||
| cd whisper-speech-to-text | ||
| ``` | ||
|
|
||
| 2. Create the application on Scalingo: | ||
|
|
||
| ```bash | ||
| scalingo create whisper-speech-to-text | ||
| ``` | ||
|
|
||
| The Scalingo command line automatically detects the Git repository and | ||
| adds a Git remote pointing to Scalingo: | ||
|
|
||
| ```bash | ||
| git remote -v | ||
|
|
||
| origin https://github.com/Scalingo/whisper-speech-to-text (fetch) | ||
| origin https://github.com/Scalingo/whisper-speech-to-text (push) | ||
| scalingo git@ssh.osc-fr1.scalingo.com:whisper-speech-to-text.git (fetch) | ||
| scalingo git@ssh.osc-fr1.scalingo.com:whisper-speech-to-text.git (push) | ||
| ``` | ||
|
|
||
| 3. Configure the application: | ||
|
|
||
| ```bash | ||
| scalingo --app whisper-speech-to-text env-set MODEL_USE=small | ||
| scalingo --app whisper-speech-to-text env-set MODEL_CACHE_DIR=/tmp/models | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
| ``` | ||
|
|
||
| 4. Deploy to Scalingo: | ||
|
|
||
| ```bash | ||
| git push scalingo main | ||
| ``` | ||
|
|
||
| Scalingo detects the Python environment, installs the dependencies declared by the project, and starts the application using the `Procfile`. The speech-to-text demo is now deployed. | ||
|
|
||
| ## Testing the deployment | ||
|
|
||
| Before using the application, check the health endpoint to verify that the model is loaded: | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```bash | ||
| curl https://whisper-speech-to-text.osc-fr1.scalingo.io/health | ||
| ``` | ||
|
|
||
| Once the model is ready, open the application in a browser and test recording from the HTML interface. The transcription endpoint can also be tested directly with `curl`: | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```bash | ||
| curl -X POST https://whisper-speech-to-text.osc-fr1.scalingo.io/transcribe \ | ||
| -F "file=@sample.webm" | ||
|
SC-Samir marked this conversation as resolved.
Outdated
SC-Samir marked this conversation as resolved.
Outdated
|
||
| ``` | ||
|
|
||
| The backend writes the uploaded file to `/tmp`, transcribes it, then returns a JSON response containing the transcript and model metadata. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Updating the model | ||
|
|
||
| The application reads the Whisper model name from the `MODEL_USE` environment variable, so changing model size does not require code changes. | ||
|
|
||
| To switch the deployed application to another model, update the variable from the command line: | ||
|
|
||
| ```bash | ||
| scalingo --app whisper-speech-to-text env-set MODEL_USE=medium | ||
| ``` | ||
|
|
||
| Model names such as `tiny`, `base`, `small`, `medium`, `large-v3`, or `turbo` can be used, depending on the balance required between accuracy, startup time, and CPU usage. | ||
|
|
||
| After changing the variable, restart the application so the web process reloads the selected model: | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```bash | ||
| scalingo --app whisper-speech-to-text restart | ||
| ``` | ||
|
|
||
| At the next startup, the application downloads or reloads the selected model into the cache directory and warms it in the background before serving transcription requests. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Updating your application | ||
|
|
||
| To deploy a new version, commit the changes and push again to the Scalingo remote: | ||
|
|
||
| ```bash | ||
| git add . | ||
| git commit -m "Update Whisper demo" | ||
| git push scalingo main | ||
| ``` | ||
|
|
||
| If the frontend template, model settings, or Python dependencies change, redeploying is enough for Scalingo to rebuild and restart the application with the new version. | ||
|
SC-Samir marked this conversation as resolved.
Outdated
|
||
Uh oh!
There was an error while loading. Please reload this page.