feat(subtitles): add opensubtitles.com rest api provider#1365
Conversation
WalkthroughReplaces the legacy OpenSubtitles XML-RPC provider with a new REST API-based OpenSubtitles.com provider supporting JWT authentication, multi-strategy subtitle search (hash, IMDB, filename), result scoring, and SSRF-protected downloads. Updates configuration schema and removes old provider code. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Riven (Subtitle Service)
participant Auth as OpenSubtitles.com Login
participant Search as OpenSubtitles.com Search
participant Download as OpenSubtitles.com Download
participant Domain as SSRF Domain Check
Client->>Auth: POST /login (api_key, username, password)
Auth-->>Client: JWT Token (expires_in)
Client->>Client: Store token & expiry
Client->>Search: Multi-Strategy Search<br/>(Hash → IMDB → Filename)
Search-->>Client: Search Results (list)
Client->>Client: Score & Rank Results<br/>(hash_match > rating > popularity)
Client->>Download: GET /download/subtitle_id
Download-->>Client: Download URL
Client->>Domain: Validate Domain<br/>(Whitelist + HTTPS check)
alt Domain Valid
Client->>Download: Fetch Subtitle Content
Download-->>Client: Content (base64 or raw)
Client->>Client: Decode Content (UTF-8)
Client-->>Client: Return Subtitle
else Domain Invalid/HTTP
Client-->>Client: Reject (SSRF Protection)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/program/services/post_processing/subtitles/subtitle.py (1)
195-224:⚠️ Potential issue | 🟠 MajorNormalize embedded subtitle language codes to match configured ISO 639‑1 values.
_get_embedded_subtitle_languagesreturns ffprobe codes (often ISO 639‑2/3 likeeng), while settings now default to ISO 639‑1 (en). The direct comparison will miss embedded tracks and trigger redundant downloads. Consider normalizing both sides to a single standard before comparison (e.g., map 3‑letter → 2‑letter or accept both).Also applies to: 600-602
🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 340-343: Update the note about the API key in the README: change
the tense in the sentence "This requirement will be removed once we obtained a
public API key." to present/future-correct wording such as "This requirement
will be removed once we obtain a public API key" (or "once we have obtained a
public API key") so the user-facing text is grammatically correct; edit the line
near the `settings.json` bullet in README.md accordingly.
In
`@src/program/services/post_processing/subtitles/providers/opensubtitles_com.py`:
- Around line 141-177: The _ensure_authenticated()/_login() flow currently
always attempts JWT login even when credentials are not set; change
_ensure_authenticated() to detect when self.config.username and
self.config.password are empty/falsey and, in that case, skip calling _login()
and return True (since API-key-only requests are supported and the Api-Key
header is already sent); also adjust _login() to assume credentials are present
when called (or return False immediately if invoked without credentials) so we
don't attempt a login with empty strings and produce misleading warnings like
"Invalid credentials".
- Around line 238-291: In _build_search_strategies, the hash strategy currently
sends only "moviehash" but the OpenSubtitles API requires both "moviehash" and
"moviebytesize"; update the "hash" strategy (the dict with "name": "hash" and
"params": {"moviehash": video_hash, "languages": lang_code}) to also include
"moviebytesize": file_size (or its string/int equivalent) when file_size is
provided, and ensure this new param is merged with self.config.query_params like
the others so the API receives both paired parameters for hash-based matching.
In `@src/tests/test_opensubtitles_com.py`:
- Around line 26-31: Replace hardcoded-looking credentials by generating
reusable test values at runtime: create module-level constants (e.g.,
TEST_API_KEY = secrets.token_hex(16), TEST_USERNAME =
f"testuser_{uuid.uuid4().hex[:8]}", TEST_PASSWORD = secrets.token_urlsafe(12))
and use them wherever OpenSubtitlesComConfig is constructed (the function
returning OpenSubtitlesComConfig and other spots referenced around the file).
Update any test helpers or fixtures that currently return
OpenSubtitlesComConfig(...) to use these constants instead of literal strings so
all occurrences (including the other ranges called out) reuse the generated
values; import secrets and uuid at top of the test module.
| 1. Create a free account at [opensubtitles.com](https://www.opensubtitles.com) | ||
| 2. Get your API key from your account settings (API consumers section) | ||
| - This requirement will be removed once we obtained a public API key. | ||
| 3. Configure in `settings.json`: |
There was a problem hiding this comment.
Fix tense in the API key note.
Minor wording polish for user-facing docs.
✏️ Suggested edit
- - This requirement will be removed once we obtained a public API key.
+ - This requirement will be removed once we obtain a public API key.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 1. Create a free account at [opensubtitles.com](https://www.opensubtitles.com) | |
| 2. Get your API key from your account settings (API consumers section) | |
| - This requirement will be removed once we obtained a public API key. | |
| 3. Configure in `settings.json`: | |
| 1. Create a free account at [opensubtitles.com](https://www.opensubtitles.com) | |
| 2. Get your API key from your account settings (API consumers section) | |
| - This requirement will be removed once we obtain a public API key. | |
| 3. Configure in `settings.json`: |
🤖 Prompt for AI Agents
In `@README.md` around lines 340 - 343, Update the note about the API key in the
README: change the tense in the sentence "This requirement will be removed once
we obtained a public API key." to present/future-correct wording such as "This
requirement will be removed once we obtain a public API key" (or "once we have
obtained a public API key") so the user-facing text is grammatically correct;
edit the line near the `settings.json` bullet in README.md accordingly.
| def _ensure_authenticated(self) -> bool: | ||
| """ | ||
| Ensure valid authentication token exists. | ||
|
|
||
| Note: No explicit Lock needed - Python's GIL provides sufficient | ||
| thread safety for this pattern. Worst case: two threads both | ||
| call _login() simultaneously, both succeed (harmless). | ||
| """ | ||
| if self.token and (time.time() - self.token_time) < self.TOKEN_EXPIRY_SECONDS: | ||
| logger.trace("OpenSubtitles.com token still valid") | ||
| return True | ||
|
|
||
| logger.debug("OpenSubtitles.com token expired or missing, authenticating...") | ||
| return self._login() | ||
|
|
||
| def _login(self) -> bool: | ||
| """Authenticate with OpenSubtitles API.""" | ||
| try: | ||
| logger.debug("Attempting OpenSubtitles.com login") | ||
|
|
||
| response = self.session.post( | ||
| f"{self.API_BASE}/login", | ||
| json={ | ||
| "username": self.config.username, | ||
| "password": self.config.password, | ||
| }, | ||
| headers=self._headers(authenticated=False), | ||
| ) | ||
|
|
||
| # Handle specific HTTP errors | ||
| if response.status_code == HTTPStatus.UNAUTHORIZED: | ||
| logger.warning("OpenSubtitles.com: Invalid credentials") | ||
| return False | ||
| elif response.status_code == HTTPStatus.TOO_MANY_REQUESTS: | ||
| logger.warning("OpenSubtitles.com: Rate limited on login") | ||
| return False | ||
| elif response.status_code >= HTTPStatus.INTERNAL_SERVER_ERROR: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Get more context about the file structure and config
wc -l src/program/services/post_processing/subtitles/providers/opensubtitles_com.pyRepository: rivenmedia/riven
Length of output: 141
🏁 Script executed:
# Search for API key usage in the file
rg -n "api_key|API_KEY" src/program/services/post_processing/subtitles/providers/opensubtitles_com.py -A 2 -B 2Repository: rivenmedia/riven
Length of output: 287
🏁 Script executed:
# Check the _headers method to see how authentication is done
ast-grep --pattern 'def _headers($$$) {
$$$
}'Repository: rivenmedia/riven
Length of output: 42
🏁 Script executed:
# Look for config definition and API key only mentions
rg -n "api_key|API_KEY|credentials|config" src/program/services/post_processing/subtitles/providers/opensubtitles_com.py | head -30Repository: rivenmedia/riven
Length of output: 740
🏁 Script executed:
# Check if there's a config file or documentation mentioning API key only mode
fd -t f -name "*config*" -o -name "*opensubtitles*" | grep -E "(config|doc)" | head -20Repository: rivenmedia/riven
Length of output: 290
🌐 Web query:
OpenSubtitles.com API authentication JWT token requirements without username password
💡 Result:
For the OpenSubtitles.com REST API, a JWT is only issued via the login flow—there isn’t an “anonymous”/no-credentials way to mint a JWT.
- How to get a JWT: call
POST /loginto “create a token to authenticate a user” (i.e., a user session/token) and then send it asAuthorization: Bearer <token>on authenticated requests. [1] - What you must send even without JWT: all API calls require an
Api-Keyand aUser-Agentheader. [2] - JWT lifetime: tokens are valid for 12 hours. [2]
- If you don’t have username/password: you generally can only use endpoints that work with
Api-Keyonly; for anything requiring user authentication/quotas, you must log in (or otherwise arrange access with OpenSubtitles). [1][2]
Sources:
[1] OpenSubtitles API docs (Login endpoint and token usage) (ai.opensubtitles.com)
[2] AI.OpenSubtitles.com FAQ (auth headers and token validity) (ai.opensubtitles.com)
Citations:
🏁 Script executed:
# Find and examine the OpenSubtitlesComConfig definition
rg -n "class OpenSubtitlesComConfig" src/program/services/post_processing/subtitles/providers/opensubtitles_com.py -A 30Repository: rivenmedia/riven
Length of output: 42
🏁 Script executed:
# Check where _ensure_authenticated is called
rg -n "_ensure_authenticated" src/program/services/post_processing/subtitles/providers/opensubtitles_com.pyRepository: rivenmedia/riven
Length of output: 259
🏁 Script executed:
# Look at the search/download methods that might call _ensure_authenticated
rg -n "def search|def download|def get_" src/program/services/post_processing/subtitles/providers/opensubtitles_com.py -A 5Repository: rivenmedia/riven
Length of output: 733
🏁 Script executed:
# Check the full constructor and initialization to understand the config
sed -n '102,130p' src/program/services/post_processing/subtitles/providers/opensubtitles_com.pyRepository: rivenmedia/riven
Length of output: 1134
🏁 Script executed:
# Check if there are comments about API-key-only mode in the file
rg -n "api.key|API.key|api-key|API-key|key.only|key-only" src/program/services/post_processing/subtitles/providers/opensubtitles_com.py -iRepository: rivenmedia/riven
Length of output: 107
🏁 Script executed:
# Look at the top of the file for imports and class definition
head -100 src/program/services/post_processing/subtitles/providers/opensubtitles_com.py | tail -60Repository: rivenmedia/riven
Length of output: 1908
🏁 Script executed:
# Search for OpenSubtitlesComConfig definition in the codebase
fd -t f "*.py" | xargs rg "class OpenSubtitlesComConfig" -A 25Repository: rivenmedia/riven
Length of output: 2201
🏁 Script executed:
# Check what's around line 222 (first _ensure_authenticated call)
sed -n '215,230p' src/program/services/post_processing/subtitles/providers/opensubtitles_com.pyRepository: rivenmedia/riven
Length of output: 737
🏁 Script executed:
# Check what's around line 306 (second _ensure_authenticated call)
sed -n '300,315p' src/program/services/post_processing/subtitles/providers/opensubtitles_com.pyRepository: rivenmedia/riven
Length of output: 733
🏁 Script executed:
# Check what's around line 395 (third _ensure_authenticated call)
sed -n '388,405p' src/program/services/post_processing/subtitles/providers/opensubtitles_com.pyRepository: rivenmedia/riven
Length of output: 700
🏁 Script executed:
# Search for documentation about credentials being optional
rg -n "username|password" src/program/services/post_processing/subtitles/providers/opensubtitles_com.py | head -20Repository: rivenmedia/riven
Length of output: 175
🏁 Script executed:
# Check what happens when login is called with empty username/password
sed -n '155,190p' src/program/services/post_processing/subtitles/providers/opensubtitles_com.pyRepository: rivenmedia/riven
Length of output: 1472
🏁 Script executed:
# Look for any documentation or comments about API-key-only support
rg -n "api.key|authenticated|optional" src/program/services/post_processing/subtitles/providers/opensubtitles_com.py -i -B 2 -A 2 | head -50Repository: rivenmedia/riven
Length of output: 1985
🏁 Script executed:
# Check if there are any tests or examples showing API-key-only usage
fd "test.*opensubtitles" -o "opensubtitles.*test" | xargs cat 2>/dev/null | head -50Repository: rivenmedia/riven
Length of output: 217
Skip JWT authentication when username/password aren't configured to support API-key-only mode.
The config design allows username and password to be empty strings (with descriptions indicating they're "for authenticated downloads" — implying optional), and OpenSubtitles.com API supports API-key-only requests. However, _ensure_authenticated() always forces login regardless of whether credentials exist, which blocks this use case. When credentials aren't provided, the login attempt sends empty strings and fails. Since the Api-Key header is sent with all requests, they can proceed without a JWT token.
Suggested fix
def _ensure_authenticated(self) -> bool:
"""
Ensure valid authentication token exists.
@@
"""
+ if not (self.config.username and self.config.password):
+ logger.debug("No OpenSubtitles.com credentials configured; using API‑key‑only mode")
+ return True
if self.token and (time.time() - self.token_time) < self.TOKEN_EXPIRY_SECONDS:
logger.trace("OpenSubtitles.com token still valid")
return True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _ensure_authenticated(self) -> bool: | |
| """ | |
| Ensure valid authentication token exists. | |
| Note: No explicit Lock needed - Python's GIL provides sufficient | |
| thread safety for this pattern. Worst case: two threads both | |
| call _login() simultaneously, both succeed (harmless). | |
| """ | |
| if self.token and (time.time() - self.token_time) < self.TOKEN_EXPIRY_SECONDS: | |
| logger.trace("OpenSubtitles.com token still valid") | |
| return True | |
| logger.debug("OpenSubtitles.com token expired or missing, authenticating...") | |
| return self._login() | |
| def _login(self) -> bool: | |
| """Authenticate with OpenSubtitles API.""" | |
| try: | |
| logger.debug("Attempting OpenSubtitles.com login") | |
| response = self.session.post( | |
| f"{self.API_BASE}/login", | |
| json={ | |
| "username": self.config.username, | |
| "password": self.config.password, | |
| }, | |
| headers=self._headers(authenticated=False), | |
| ) | |
| # Handle specific HTTP errors | |
| if response.status_code == HTTPStatus.UNAUTHORIZED: | |
| logger.warning("OpenSubtitles.com: Invalid credentials") | |
| return False | |
| elif response.status_code == HTTPStatus.TOO_MANY_REQUESTS: | |
| logger.warning("OpenSubtitles.com: Rate limited on login") | |
| return False | |
| elif response.status_code >= HTTPStatus.INTERNAL_SERVER_ERROR: | |
| def _ensure_authenticated(self) -> bool: | |
| """ | |
| Ensure valid authentication token exists. | |
| Note: No explicit Lock needed - Python's GIL provides sufficient | |
| thread safety for this pattern. Worst case: two threads both | |
| call _login() simultaneously, both succeed (harmless). | |
| """ | |
| if not (self.config.username and self.config.password): | |
| logger.debug("No OpenSubtitles.com credentials configured; using API‑key‑only mode") | |
| return True | |
| if self.token and (time.time() - self.token_time) < self.TOKEN_EXPIRY_SECONDS: | |
| logger.trace("OpenSubtitles.com token still valid") | |
| return True | |
| logger.debug("OpenSubtitles.com token expired or missing, authenticating...") | |
| return self._login() |
🤖 Prompt for AI Agents
In
`@src/program/services/post_processing/subtitles/providers/opensubtitles_com.py`
around lines 141 - 177, The _ensure_authenticated()/_login() flow currently
always attempts JWT login even when credentials are not set; change
_ensure_authenticated() to detect when self.config.username and
self.config.password are empty/falsey and, in that case, skip calling _login()
and return True (since API-key-only requests are supported and the Api-Key
header is already sent); also adjust _login() to assume credentials are present
when called (or return False immediately if invoked without credentials) so we
don't attempt a login with empty strings and produce misleading warnings like
"Invalid credentials".
| return OpenSubtitlesComConfig( | ||
| enabled=True, | ||
| api_key="test_api_key_1234567890123456789012345", | ||
| username="testuser", | ||
| password="testpass", | ||
| ) |
There was a problem hiding this comment.
Avoid hardcoded-looking API keys/passwords/tokens to prevent secret scanners from failing.
Gitleaks/Ruff flagged several literals; this is likely to fail CI or require allowlisting. Consider generating test values at runtime and reusing constants.
✅ Proposed fix using runtime-generated test values
-import pytest
-import httpx
+import pytest
+import httpx
+import secrets
+
+TEST_API_KEY = secrets.token_urlsafe(16)
+TEST_PASSWORD = secrets.token_urlsafe(12)
+TEST_TOKEN = secrets.token_urlsafe(16)- return OpenSubtitlesComConfig(
- enabled=True,
- api_key="test_api_key_1234567890123456789012345",
- username="testuser",
- password="testpass",
- )
+ return OpenSubtitlesComConfig(
+ enabled=True,
+ api_key=TEST_API_KEY,
+ username="testuser",
+ password=TEST_PASSWORD,
+ )- response = OpenSubtitlesLoginResponse.model_validate({"token": "valid_jwt_token"})
- assert response.token == "valid_jwt_token"
+ response = OpenSubtitlesLoginResponse.model_validate({"token": TEST_TOKEN})
+ assert response.token == TEST_TOKEN
...
- response = OpenSubtitlesLoginResponse.model_validate({"token": " token_value "})
- assert response.token == "token_value"
+ response = OpenSubtitlesLoginResponse.model_validate({"token": f" {TEST_TOKEN} "})
+ assert response.token == TEST_TOKEN- requests_mock.post(
- "https://api.opensubtitles.com/api/v1/login",
- json={"token": "jwt_token_12345"},
- )
+ requests_mock.post(
+ "https://api.opensubtitles.com/api/v1/login",
+ json={"token": TEST_TOKEN},
+ )
...
- assert provider.token == "jwt_token_12345"
+ assert provider.token == TEST_TOKEN- config = OpenSubtitlesComConfig(
- enabled=True,
- api_key="test_api_key_1234567890123456789012345",
- username="testuser",
- password="testpass",
- query_params={"foreign_parts_only": "include", "hearing_impaired": "exclude"},
- )
+ config = OpenSubtitlesComConfig(
+ enabled=True,
+ api_key=TEST_API_KEY,
+ username="testuser",
+ password=TEST_PASSWORD,
+ query_params={"foreign_parts_only": "include", "hearing_impaired": "exclude"},
+ )- config = OpenSubtitlesComConfig(
- enabled=True,
- api_key="test_api_key_1234567890123456789012345",
- username="testuser",
- password="testpass",
- user_agent="MyApp/2.0",
- )
+ config = OpenSubtitlesComConfig(
+ enabled=True,
+ api_key=TEST_API_KEY,
+ username="testuser",
+ password=TEST_PASSWORD,
+ user_agent="MyApp/2.0",
+ )Also applies to: 124-142, 191-199, 362-399
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 28-28: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 Ruff (0.14.14)
[error] 30-30: Possible hardcoded password assigned to argument: "password"
(S106)
🤖 Prompt for AI Agents
In `@src/tests/test_opensubtitles_com.py` around lines 26 - 31, Replace
hardcoded-looking credentials by generating reusable test values at runtime:
create module-level constants (e.g., TEST_API_KEY = secrets.token_hex(16),
TEST_USERNAME = f"testuser_{uuid.uuid4().hex[:8]}", TEST_PASSWORD =
secrets.token_urlsafe(12)) and use them wherever OpenSubtitlesComConfig is
constructed (the function returning OpenSubtitlesComConfig and other spots
referenced around the file). Update any test helpers or fixtures that currently
return OpenSubtitlesComConfig(...) to use these constants instead of literal
strings so all occurrences (including the other ranges called out) reuse the
generated values; import secrets and uuid at top of the test module.
|
@olivertgwalton PTAL |
Resolves: #1301
Notice
This replaces the old opensubtitles because the format of language codes is not compatible. Language code transformation for 1-on-1 mapping is not possible for certain languages.
For instance, both
zh-cnandzh-tw(new format) can be considered as thezho(old format).Summary
query_paramsfor advanced search options (e.g.,foreign_parts_only,hearing_impaired)user_agentheaderOpenSubtitles.com API docs: https://opensubtitles.stoplight.io/docs/opensubtitles-api
Configuration
Next
Might need maintainers to obtain a public API key since using a single API key for a project is mandatory per OpenSubtitles's policy.
https://opensubtitles.stoplight.io/docs/opensubtitles-api/e3750fd63a100-getting-started
Summary by CodeRabbit
New Features
Documentation