Feature/resource management#3
Conversation
… stability" This reverts commit 25fc5a3.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis PR implements Phase 3 Resource Management by adding authenticated file upload endpoints, DigitalOcean Spaces storage integration, and background vision-based text extraction for PDFs and text files, alongside a React frontend dashboard for resource management with polling support. ChangesResource Management with Vision OCR
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant Backend_API
participant Spaces
participant ARQ_Worker
participant OpenRouter
participant Database
User->>Frontend: Upload PDF/TXT
Frontend->>Backend_API: POST /api/resources (multipart)
Backend_API->>Spaces: upload_file(bytes)
Spaces-->>Backend_API: public_url
Backend_API->>Database: INSERT Resource (processing)
Database-->>Backend_API: resource_id
Backend_API->>ARQ_Worker: enqueue extraction_task(id)
Backend_API-->>Frontend: ResourceOut
Frontend->>Frontend: Poll /api/resources (3s)
ARQ_Worker->>Spaces: Download file
Spaces-->>ARQ_Worker: file bytes
ARQ_Worker->>ARQ_Worker: Render pages to image
ARQ_Worker->>OpenRouter: Vision API (per page)
OpenRouter-->>ARQ_Worker: extracted text
ARQ_Worker->>Database: UPDATE Resource (ready, text)
Database-->>ARQ_Worker: ok
Frontend->>Frontend: Display ready resource
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (2)
frontend/src/components/Layout.tsx (1)
26-27: ⚡ Quick winAvoid sidebar links that immediately redirect away.
At Lines 26-27,
/solverand/generatorappear as active destinations, but current routing redirects both back to/. Consider hiding or disabling these nav items until those pages are actually available.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/components/Layout.tsx` around lines 26 - 27, The sidebar currently exposes navigation entries with label 'PYQ Solver' and 'Paper Gen' that point to paths '/solver' and '/generator' but those routes redirect to '/', so update the Layout.tsx navigation config to hide or disable these items until real routes exist: locate the nav array containing { label: 'PYQ Solver', icon: Zap, path: '/solver' } and { label: 'Paper Gen', icon: FileEdit, path: '/generator' } and either remove them, wrap them behind a feature flag/prop (e.g., isSolverEnabled/isGeneratorEnabled) or set a disabled flag and render them non-clickable with appropriate UI state so users are not redirected unexpectedly.backend/app/workers/tasks.py (1)
71-97: ⚡ Quick winReuse one
httpx.AsyncClientfor all OCR page calls.Opening a new client for each page loses connection pooling and adds repeated setup overhead on the hot path.
Proposed refactor
- for i in range(pages_to_process): + async with httpx.AsyncClient(timeout=90.0) as client: + for i in range(pages_to_process): print(f"DEBUG: Rendering Page {i+1}/{pages_to_process}...") page = pdf[i] # Render page to image bitmap = page.render(scale=2) pil_image = bitmap.to_pil() @@ - async with httpx.AsyncClient(timeout=90.0) as client: - vision_response = await client.post( + vision_response = await client.post( "https://openrouter.ai/api/v1/chat/completions",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/workers/tasks.py` around lines 71 - 97, The code opens a new httpx.AsyncClient for each page OCR call (the async with httpx.AsyncClient(...) block around the client.post that produces vision_response using img_base64), which loses connection pooling; refactor by creating a single AsyncClient instance once outside the per-page loop (or accept one as a parameter), reuse that client.post(...) inside the loop to call the same OpenRouter endpoint for each image, and close the client after the loop (or rely on context manager at the higher scope) so connection pooling and performance are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/routers/resources.py`:
- Around line 54-61: The code currently commits the new Resource with
status="processing" before enqueuing the Redis job; instead, in upload_resource
(and the retry endpoint handler) create the Redis pool and call await
redis.enqueue_job('extraction_task', str(new_resource.id)) before calling
db.commit()/db.refresh(); wrap the enqueue in a try/except so that if
enqueue_job fails you rollback the DB (await db.rollback()) and return/raise an
error (e.g., HTTPException) so the record does not remain stuck in "processing";
only after successful enqueue call await db.commit() and await
db.refresh(new_resource). Ensure the same change is applied to both the
upload_resource flow and the retry endpoint flow that currently call
db.add(new_resource) / await db.commit() / await redis.enqueue_job(...).
- Around line 88-94: The DB row is deleted even if
storage_service.delete_file(object_name) fails; update the delete flow in the
function handling resource deletion to check the boolean return of
storage_service.delete_file(object_name) (use the computed object_name from
resource.file_url), and if it returns False abort the operation instead of
calling await db.delete(resource) / await db.commit(); return or raise an
appropriate error (e.g., HTTP error or custom exception) conveying the storage
deletion failure so the DB row is not orphaned.
- Around line 59-60: Your POST handlers (upload_resource and retry_extraction)
create a new ARQ pool via create_pool/RedisSettings.from_dsn and call
enqueue_job on it per request, causing connection churn; instead add a FastAPI
lifespan that creates the pool once (redis_pool = await create_pool(...)),
stores it as app.state.redis_pool, yields, and closes it on shutdown, then
update the handlers to retrieve the pool from app.state.redis_pool (or a simple
dependency that returns app.state.redis_pool) and call enqueue_job on that
shared pool rather than creating a new one per request.
- Around line 31-33: The current unbounded await file.read() (the line "content
= await file.read()") can OOM; change to a chunked read with a hard cap: define
MAX_UPLOAD_BYTES (e.g. configurable), read in a loop using await
file.read(CHUNK_SIZE) and accumulate into a buffer while tracking total_size,
and if total_size > MAX_UPLOAD_BYTES raise an HTTPException(413) (or return an
appropriate error) and abort processing; update the handler (the upload/resource
endpoint that uses the variable file) to use this pattern and document
MAX_UPLOAD_BYTES.
In `@backend/app/services/storage.py`:
- Around line 19-26: Change object uploads to private ACL instead of
public-read: in the code path that calls self.client.put_object (and uses
settings.SPACES_BUCKET, Key=object_name), set ACL='private' (or omit ACL) and
stop returning the direct public URL (settings.SPACES_PUBLIC_URL/...); instead
return the object key or a stable identifier and generate time-limited access
via self.client.generate_presigned_url('get_object', Params={'Bucket':
settings.SPACES_BUCKET, 'Key': object_name}, ExpiresIn=...) when the API needs
to serve a download. Also update the extraction task to read files with the
boto3 client (self.client.get_object / streaming body) using the same
SPACES_BUCKET and object_name rather than fetching the public HTTP URL.
In `@backend/app/workers/tasks.py`:
- Around line 114-117: The TXT extraction code using httpx.AsyncClient should
validate the HTTP response before storing content: after awaiting
client.get(resource.file_url) call, invoke response.raise_for_status() (or
otherwise check response.status_code == 200) and only then set
resource.extracted_text and resource.status = "ready"; ensure any exceptions
from response.raise_for_status() propagate or are handled similarly to the PDF
extraction path so error responses (4xx/5xx) are not stored as valid text.
- Around line 131-138: In the exception-recovery path update for Resource,
ensure the SQLAlchemy session is rolled back before performing any further
queries or commits: call await db.rollback() (or db.rollback() if sync) before
executing select(Resource).where(Resource.id == resource_id) and before
attempting to set res.status = "failed" and await db.commit(); reference the
existing symbols Resource, db, resource_id and the block that does result =
await db.execute(select(Resource)...) so the rollback happens first to clear the
failed transaction state.
In `@context/architecture.md`:
- Line 31: Update architecture.md so storage references are consistent: replace
all occurrences of "Cloudflare R2" and any other R2-specific wording (e.g.,
initialization, bucket naming, endpoint examples) with "DigitalOcean Spaces" and
corresponding Spaces-specific terms, and ensure the line that currently reads
"**DigitalOcean Spaces**: Original uploaded PDFs and generated PDF exports." is
the single canonical description; verify and update surrounding references at
the earlier "Cloudflare R2" mention and the later storage mention so they use
the same provider name and matching configuration terminology.
In `@context/project-overview.md`:
- Line 48: The project overview currently contradicts itself by listing
"OCR/AI-based text extraction for scanned documents" as in-scope while also
marking image/vision support out of scope; update the scope language to be
explicit and unambiguous (for example replace the line "OCR/AI-based text
extraction for scanned documents." with "Printed scanned PDFs (machine-printed
text) supported via OCR; handwriting recognition and broader image/vision
features are out of scope for the MVP."). Ensure the change appears near the
existing OCR/vision statements so readers clearly see the boundary and consider
adding a short parenthetical like "(handwriting deferred)" to the sentence to
prevent future confusion.
In `@frontend/src/api/resources.ts`:
- Around line 13-16: The upload function's parameter is currently declared as
type: string which weakens type-safety; change the upload signature to use the
Resource union type (e.g., Resource['type']) so only valid resource types are
accepted, update the function declaration for upload and any local references
expecting string to the narrowed type, and adjust any caller sites that pass
literals or variables to ensure they conform to Resource['type'] (or cast
appropriately) so the API contract is enforced at compile time.
- Around line 17-21: The POST call using api.post<Resource>('/resources/',
formData, { headers: { 'Content-Type': 'multipart/form-data' } }) should drop
the manual headers so the browser/Axios can set the multipart boundary
automatically; update the call that sends the formData (the api.post invocation
that assigns to response) to pass formData without the headers object (and
ensure no other wrapper adds Content-Type), leaving Axios to set the correct
Content-Type with boundary.
In `@frontend/src/pages/Resources.tsx`:
- Line 1: The file input retains its previous value after resetting state which
prevents re-selection of the same file; add a ref for the native input (e.g.,
uploadInputRef) on the <input type="file"> used in Resources.tsx and, in the
success/error/reset branches of the upload handler functions (the code paths
using setSelectedFile, setUploading, setUploadProgress around the upload logic
and the handlers near lines 24-27 and 79-83), set uploadInputRef.current.value =
'' (guarded for null) to clear the native input so selecting the same file fires
onChange again.
- Around line 123-136: The table can go blank when resources is undefined —
modify the conditional render in Resources.tsx around the isLoading / resources
checks to explicitly handle resources === undefined (and any error state) before
falling back to resources.length === 0 or resources.map; specifically, add a
branch like resources === undefined || fetchError that returns a <tr><td
colSpan={5}> row with a clear message (e.g., "Failed to load resources" or "No
data available") and appropriate styling, ensuring you still keep the existing
isLoading, resources?.length === 0, and resources?.map branches and use the same
colSpan={5} so table layout stays consistent.
---
Nitpick comments:
In `@backend/app/workers/tasks.py`:
- Around line 71-97: The code opens a new httpx.AsyncClient for each page OCR
call (the async with httpx.AsyncClient(...) block around the client.post that
produces vision_response using img_base64), which loses connection pooling;
refactor by creating a single AsyncClient instance once outside the per-page
loop (or accept one as a parameter), reuse that client.post(...) inside the loop
to call the same OpenRouter endpoint for each image, and close the client after
the loop (or rely on context manager at the higher scope) so connection pooling
and performance are preserved.
In `@frontend/src/components/Layout.tsx`:
- Around line 26-27: The sidebar currently exposes navigation entries with label
'PYQ Solver' and 'Paper Gen' that point to paths '/solver' and '/generator' but
those routes redirect to '/', so update the Layout.tsx navigation config to hide
or disable these items until real routes exist: locate the nav array containing
{ label: 'PYQ Solver', icon: Zap, path: '/solver' } and { label: 'Paper Gen',
icon: FileEdit, path: '/generator' } and either remove them, wrap them behind a
feature flag/prop (e.g., isSolverEnabled/isGeneratorEnabled) or set a disabled
flag and render them non-clickable with appropriate UI state so users are not
redirected unexpectedly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0858e1a6-780e-4492-9a77-2ac99ccc55e8
⛔ Files ignored due to path filters (1)
frontend/bun.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
.env.examplebackend/app/config.pybackend/app/main.pybackend/app/routers/resources.pybackend/app/schemas/resource.pybackend/app/services/storage.pybackend/app/workers/arq_worker.pybackend/app/workers/tasks.pybackend/requirements.txtcontext/architecture.mdcontext/progress-tracker.mdcontext/project-overview.mdfrontend/src/App.tsxfrontend/src/api/resources.tsfrontend/src/components/Layout.tsxfrontend/src/pages/Resources.tsx
| redis = await create_pool(RedisSettings.from_dsn(settings.REDIS_URL)) | ||
| await redis.enqueue_job('extraction_task', str(new_resource.id)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find backend -type f -name "*.py" | head -20Repository: shubhamxdd/pyq-appl
Length of output: 662
🏁 Script executed:
cd backend && find . -type f -name "resources.py" | head -5Repository: shubhamxdd/pyq-appl
Length of output: 90
🏁 Script executed:
wc -l backend/app/routers/resources.pyRepository: shubhamxdd/pyq-appl
Length of output: 100
🏁 Script executed:
sed -n '50,70p' backend/app/routers/resources.pyRepository: shubhamxdd/pyq-appl
Length of output: 751
🏁 Script executed:
sed -n '110,130p' backend/app/routers/resources.pyRepository: shubhamxdd/pyq-appl
Length of output: 499
🏁 Script executed:
cat -n backend/app/routers/resources.py | head -150Repository: shubhamxdd/pyq-appl
Length of output: 4948
🏁 Script executed:
rg -n "create_pool|redis" backend/app --type py -A 2 -B 2Repository: shubhamxdd/pyq-appl
Length of output: 1539
🏁 Script executed:
find backend -type f -name "*.py" -exec rg -l "lifespan|startup|app.state" {} \;Repository: shubhamxdd/pyq-appl
Length of output: 97
🏁 Script executed:
cat backend/app/main.pyRepository: shubhamxdd/pyq-appl
Length of output: 701
🏁 Script executed:
rg -n "close|cleanup|app.state" backend/app --type pyRepository: shubhamxdd/pyq-appl
Length of output: 45
🏁 Script executed:
rg -n "arq|redis" backend/app/config.pyRepository: shubhamxdd/pyq-appl
Length of output: 45
🏁 Script executed:
cat backend/app/workers/arq_worker.pyRepository: shubhamxdd/pyq-appl
Length of output: 475
🌐 Web query:
ARQ Python library pool creation best practices async FastAPI
💡 Result:
When integrating the ARQ library with FastAPI, managing Redis connection pools effectively is critical for performance and stability. Below are the best practices for pool creation and management. Centralized Pool Management Avoid creating a new Redis pool for every request or task. Instead, initialize a single, long-lived connection pool that is shared across your application [1]. In FastAPI, the recommended approach is to use the lifespan event handler to create the pool on startup and close it on shutdown [2]. Example of Lifespan Management: from contextlib import asynccontextmanager from fastapi import FastAPI from arq import create_pool from arq.connections import RedisSettings redis_pool = None @asynccontextmanager async def lifespan(app: FastAPI): # Startup: Initialize the pool global redis_pool redis_pool = await create_pool(RedisSettings(host="localhost", port=6379)) yield # Shutdown: Close the pool await redis_pool.close app = FastAPI(lifespan=lifespan) Handling Connection Limits If you encounter errors related to connection limits (e.g., max_connections reached), you can explicitly configure the connection pool using redis-py's BlockingConnectionPool [3]. This is particularly useful for both the ARQ worker and the enqueuer to prevent crashes under high load [3]. from redis.asyncio import BlockingConnectionPool from arq.connections import RedisSettings # Configure with a specific pool REDIS_SETTINGS = RedisSettings( connection_pool=BlockingConnectionPool(max_connections=10)) Worker Configuration The ARQ worker should be configured to use the same Redis settings as your enqueuer to ensure consistency [4][5]. When running the worker, pass the settings class directly to the ARQ CLI or your worker entry point [5]. Key Considerations: 1. Avoid Blocking: Ensure all tasks defined for ARQ are async (async def) to maintain the non-blocking nature of the event loop [6][4]. 2. Resource Cleanup: Always ensure that database sessions or other resources opened within an ARQ task are properly closed or returned to their respective pools [7][8]. Using context managers or ARQ's on_job_start/after_job_end hooks can help manage these lifecycles [7]. 3. Resilience: Leverage ARQ's built-in retry mechanisms and exponential backoff for tasks that interact with external services or databases to handle transient failures gracefully [9].
Citations:
- 1: https://medium.com/@geetansh2k1/setting-up-and-using-an-async-redis-client-in-fastapi-the-right-way-0409ad3812e6
- 2: https://dev.to/meroline_lizlent/uvicorn-the-lightning-fast-asgi-server-powering-modern-python-apps-5a85
- 3: Allow
connection_poolto be set in RedisSettings python-arq/arq#473 - 4: https://github.com/davidmuraya/fastapi-arq
- 5: https://oneuptime.com/blog/post/2026-03-31-redis-fastapi-background-tasks/view
- 6: https://arq-docs.helpmanual.io/index.html
- 7: https://wazaari.dev/blog/arq-sqlalchemy-done-right
- 8: https://stackoverflow.com/questions/70955332/how-to-correctly-use-sqlalchemy-within-fastapi-or-arq-for-mysql
- 9: https://davidmuraya.com/blog/fastapi-arq-retries/
Initialize Redis pool once at app startup, not per request.
Currently, each POST request to /upload_resource (lines 59-60) and /retry_extraction (lines 118-119) creates a fresh ARQ pool without closing it. Under load, this causes connection churn and risks exhausting Redis connections. Use FastAPI's lifespan context manager to initialize the pool once on startup, close it on shutdown, and inject it via a dependency.
`@asynccontextmanager`
async def lifespan(app: FastAPI):
redis_pool = await create_pool(RedisSettings.from_dsn(settings.REDIS_URL))
app.state.redis_pool = redis_pool
yield
await redis_pool.close()
app = FastAPI(lifespan=lifespan)
Then in routes, retrieve from app.state.redis_pool instead of creating new pools.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/routers/resources.py` around lines 59 - 60, Your POST handlers
(upload_resource and retry_extraction) create a new ARQ pool via
create_pool/RedisSettings.from_dsn and call enqueue_job on it per request,
causing connection churn; instead add a FastAPI lifespan that creates the pool
once (redis_pool = await create_pool(...)), stores it as app.state.redis_pool,
yields, and closes it on shutdown, then update the handlers to retrieve the pool
from app.state.redis_pool (or a simple dependency that returns
app.state.redis_pool) and call enqueue_job on that shared pool rather than
creating a new one per request.
| self.client.put_object( | ||
| Bucket=settings.SPACES_BUCKET, | ||
| Key=object_name, | ||
| Body=file_content, | ||
| ACL='public-read', | ||
| ContentType=content_type | ||
| ) | ||
| return f"{settings.SPACES_PUBLIC_URL}/{object_name}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat backend/app/services/storage.pyRepository: shubhamxdd/pyq-appl
Length of output: 1487
🏁 Script executed:
rg "put_object|upload|storage" backend/app --type py -l | head -20Repository: shubhamxdd/pyq-appl
Length of output: 157
🏁 Script executed:
rg "public-read\|ACL" backend --type py -A 3 -B 3Repository: shubhamxdd/pyq-appl
Length of output: 45
🏁 Script executed:
cat backend/app/routers/resources.pyRepository: shubhamxdd/pyq-appl
Length of output: 4101
🏁 Script executed:
cat backend/app/workers/tasks.pyRepository: shubhamxdd/pyq-appl
Length of output: 6629
Use private ACL with pre-signed URLs instead of public-read for uploaded resources.
Files are stored with ACL='public-read' (line 23), making them directly accessible to anyone with the URL regardless of application authentication. Though the API restricts listing/deletion to authenticated users, the URLs themselves (stored in the database and returned to clients) are not secret—if shared or leaked, anyone can access the file without authentication. This is particularly risky for user-uploaded PDFs and documents that may contain sensitive data.
Refactor to use private ACL and generate pre-signed URLs for temporary, authenticated access. Update the extraction task to use the boto3 client directly for internal file access rather than fetching via HTTP.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/services/storage.py` around lines 19 - 26, Change object uploads
to private ACL instead of public-read: in the code path that calls
self.client.put_object (and uses settings.SPACES_BUCKET, Key=object_name), set
ACL='private' (or omit ACL) and stop returning the direct public URL
(settings.SPACES_PUBLIC_URL/...); instead return the object key or a stable
identifier and generate time-limited access via
self.client.generate_presigned_url('get_object', Params={'Bucket':
settings.SPACES_BUCKET, 'Key': object_name}, ExpiresIn=...) when the API needs
to serve a download. Also update the extraction task to read files with the
boto3 client (self.client.get_object / streaming body) using the same
SPACES_BUCKET and object_name rather than fetching the public HTTP URL.
| - Long-context LLM via OpenRouter (No vector DB). | ||
| - Long-context LLM via OpenRouter. | ||
| - PDF and Plain Text file support. | ||
| - OCR/AI-based text extraction for scanned documents. |
There was a problem hiding this comment.
Clarify scope to remove vision/OCR contradiction.
Line 48 puts OCR/AI extraction for scanned docs in scope, but Line 55 marks image/vision support out of scope. Please tighten wording so the intended MVP boundary is unambiguous (e.g., “printed scanned PDFs supported, handwriting deferred”).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@context/project-overview.md` at line 48, The project overview currently
contradicts itself by listing "OCR/AI-based text extraction for scanned
documents" as in-scope while also marking image/vision support out of scope;
update the scope language to be explicit and unambiguous (for example replace
the line "OCR/AI-based text extraction for scanned documents." with "Printed
scanned PDFs (machine-printed text) supported via OCR; handwriting recognition
and broader image/vision features are out of scope for the MVP."). Ensure the
change appears near the existing OCR/vision statements so readers clearly see
the boundary and consider adding a short parenthetical like "(handwriting
deferred)" to the sentence to prevent future confusion.
| upload: async (file: File, type: string) => { | ||
| const formData = new FormData(); | ||
| formData.append('file', file); | ||
| formData.append('type', type); |
There was a problem hiding this comment.
Narrow the type parameter to the resource union.
At Line 13, type: string weakens type-safety and allows invalid values to compile. Use Resource['type'] to enforce the API contract at compile time.
Proposed fix
- upload: async (file: File, type: string) => {
+ upload: async (file: File, type: Resource['type']) => {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| upload: async (file: File, type: string) => { | |
| const formData = new FormData(); | |
| formData.append('file', file); | |
| formData.append('type', type); | |
| upload: async (file: File, type: Resource['type']) => { | |
| const formData = new FormData(); | |
| formData.append('file', file); | |
| formData.append('type', type); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/src/api/resources.ts` around lines 13 - 16, The upload function's
parameter is currently declared as type: string which weakens type-safety;
change the upload signature to use the Resource union type (e.g.,
Resource['type']) so only valid resource types are accepted, update the function
declaration for upload and any local references expecting string to the narrowed
type, and adjust any caller sites that pass literals or variables to ensure they
conform to Resource['type'] (or cast appropriately) so the API contract is
enforced at compile time.
| const response = await api.post<Resource>('/resources/', formData, { | ||
| headers: { | ||
| 'Content-Type': 'multipart/form-data', | ||
| }, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For Axios v1.x in browsers, when sending a FormData instance, should Content-Type: multipart/form-data be omitted so the browser can add the boundary automatically?
💡 Result:
Yes, you should omit the Content-Type header when sending a FormData instance in the browser [1][2]. When you pass a FormData object as the data payload in Axios, the browser's underlying XHR or Fetch API automatically sets the Content-Type header to multipart/form-data and appends the necessary boundary string [1][2]. If you manually set the Content-Type header to multipart/form-data, you often omit the required boundary parameter, which causes the server to fail to parse the request [1][3][4]. Key points to remember: - Automatic Handling: Axios detects the FormData instance and allows the browser to handle the header generation, including the unique boundary [2]. - Avoid Manual Overrides: Manually setting Content-Type: multipart/form-data can prevent the browser from injecting the correct boundary, leading to errors [1][4]. - Shortcut Methods: If you prefer to be explicit about the intent, Axios provides helper methods like postForm, putForm, and patchForm, which are designed to handle multipart/form-data requests automatically [5][6][7]. - Troubleshooting: If you find that the Content-Type is not being set correctly (e.g., it defaults to application/x-www-form-urlencoded), ensure you are not accidentally overriding it in your global Axios defaults or instance configuration [3][8][9]. If you must override a default header that is interfering, you can set the header to undefined or an empty string in your request config to allow Axios to re-evaluate the correct type [9].
Citations:
- 1: https://stackoverflow.com/questions/75742226/cant-set-content-type-header-for-axios-with-formdata
- 2: https://stackoverflow.com/questions/68643330/pass-data-to-service-in-axios/68643919
- 3: Form data - start boundary issue axios/axios#381
- 4: https://stackoverflow.com/questions/49579640/how-to-send-data-correct-axios-error-multipart-boundary-not-found
- 5: https://github.com/axios/axios/blob/v1.15.0/docs/pages/advanced/multipart-form-data-format.md
- 6: https://github.com/axios/axios/blob/v1.x/README.md
- 7: https://axios.rest/pages/advanced/multipart-form-data-format
- 8: Issue with difference in default behaviour between axios and axiosInstance when content-type is not set (specific case of multipart-/form-data) axios/axios#3857
- 9: Not automatically change the header content type to multipart form data axios/axios#5556
🏁 Script executed:
cat -n frontend/src/api/resources.ts | head -50Repository: shubhamxdd/pyq-appl
Length of output: 1311
Remove manual Content-Type header when sending FormData.
When passing a FormData instance to Axios (line 17), manually setting 'Content-Type': 'multipart/form-data' omits the required boundary parameter, causing the server to fail parsing the multipart request. Axios and the browser automatically inject the correct header with the boundary when FormData is detected, so omit the headers object entirely.
Proposed fix
- const response = await api.post<Resource>('/resources/', formData, {
- headers: {
- 'Content-Type': 'multipart/form-data',
- },
- });
+ const response = await api.post<Resource>('/resources/', formData);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const response = await api.post<Resource>('/resources/', formData, { | |
| headers: { | |
| 'Content-Type': 'multipart/form-data', | |
| }, | |
| }); | |
| const response = await api.post<Resource>('/resources/', formData); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/src/api/resources.ts` around lines 17 - 21, The POST call using
api.post<Resource>('/resources/', formData, { headers: { 'Content-Type':
'multipart/form-data' } }) should drop the manual headers so the browser/Axios
can set the multipart boundary automatically; update the call that sends the
formData (the api.post invocation that assigns to response) to pass formData
without the headers object (and ensure no other wrapper adds Content-Type),
leaving Axios to set the correct Content-Type with boundary.
| @@ -0,0 +1,182 @@ | |||
| import { useState } from 'react'; | |||
There was a problem hiding this comment.
Clear the native file input after successful upload.
At Lines 26-27, state resets, but the <input type="file"> still retains its previous value. Re-selecting the same file may not fire onChange, blocking immediate re-upload of the same file.
Proposed fix
-import { useState } from 'react';
+import { useRef, useState } from 'react';
@@
const [file, setFile] = useState<File | null>(null);
+ const fileInputRef = useRef<HTMLInputElement>(null);
@@
onSuccess: () => {
queryClient.invalidateQueries({ queryKey: ['resources'] });
setFile(null);
+ if (fileInputRef.current) fileInputRef.current.value = '';
},
@@
<input
+ ref={fileInputRef}
type="file"
accept=".pdf,.txt"Also applies to: 24-27, 79-83
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/src/pages/Resources.tsx` at line 1, The file input retains its
previous value after resetting state which prevents re-selection of the same
file; add a ref for the native input (e.g., uploadInputRef) on the <input
type="file"> used in Resources.tsx and, in the success/error/reset branches of
the upload handler functions (the code paths using setSelectedFile,
setUploading, setUploadProgress around the upload logic and the handlers near
lines 24-27 and 79-83), set uploadInputRef.current.value = '' (guarded for null)
to clear the native input so selecting the same file fires onChange again.
in this pr ive added resources/files upload to digitalocean using boto3
and text extraction using nvidia model from openrouter
Summary by CodeRabbit
Release Notes
New Features
UI/UX Improvements