Skip to content

feat: Implement Sample Paper Generator with format detection and question generation#8

Merged
shubhamxdd merged 19 commits into
mainfrom
feature/paper-gen
May 19, 2026
Merged

feat: Implement Sample Paper Generator with format detection and question generation#8
shubhamxdd merged 19 commits into
mainfrom
feature/paper-gen

Conversation

@shubhamxdd
Copy link
Copy Markdown
Owner

@shubhamxdd shubhamxdd commented May 18, 2026

  • Added new papers router for handling paper-related endpoints.
  • Introduced detect-format endpoint to analyze past papers and extract question formats.
  • Created generate_paper_task for background processing of paper generation using LLM.
  • Developed frontend components for paper generation, including resource selection and format detection.
  • Integrated new API calls in the frontend for managing papers and outputs.
  • Enhanced UI with loading states and user feedback for paper generation process.
  • Added Pydantic schemas for paper creation and output management.

Summary by CodeRabbit

  • New Features

    • Sample paper generator with background generation, job status, rename/delete, and recent generations history
    • AI-driven format detection for past papers; quick auto-populate when creating papers
    • PDF exports: full study guide and questions-only downloads
    • Generator page (protected) with paper creation, polling, and paper output viewing
  • UX

    • Toggle controls to show/hide answers and explanations; new checkbox component
    • Persistent resource selections and URL deep-linking for solver sessions
  • Limits

    • Free-plan monthly quotas enforced with clear UI behavior and messages

Review Change Stack

…tion generation

- Added new `papers` router for handling paper-related endpoints.
- Introduced `detect-format` endpoint to analyze past papers and extract question formats.
- Created `generate_paper_task` for background processing of paper generation using LLM.
- Developed frontend components for paper generation, including resource selection and format detection.
- Integrated new API calls in the frontend for managing papers and outputs.
- Enhanced UI with loading states and user feedback for paper generation process.
- Added Pydantic schemas for paper creation and output management.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

This PR delivers Phase 5 Sample Paper Generator: adds LLM prompts and schemas, new /papers REST API and format-detection endpoint, background Arq worker to generate and persist paper outputs, DB migration and model changes for selected resources, PDF generation/upload, resources quota/enqueue updates, and a new React Generator page plus frontend client and routing.

Changes

Paper Generation Feature

Layer / File(s) Summary
LLM prompts & paper schemas
backend/app/llm/client.py, backend/app/llm/prompts.py, backend/app/schemas/paper.py
Adds DETECT_FORMAT_PROMPT and GENERATE_PAPER_PROMPT and Pydantic models for paper creation, resource linking, format detection, outputs, and question structure.
Papers API endpoints
backend/app/main.py, backend/app/routers/papers.py
Registers papers router and implements endpoints: POST /papers/detect-format, POST /papers (create + enqueue + Job), GET /papers, GET /papers/{id}, PATCH /papers/{id}, DELETE /papers/{id}, GET /papers/{id}/output, PATCH /papers/{id}/output, and GET /papers/{id}/pdf with ownership checks and error handling.
Background worker & jobs
backend/app/workers/arq_worker.py, backend/app/workers/tasks.py
Registers generate_paper_task and extends extraction_task/generate_paper_task to accept job_id, update Job lifecycle (queued→running→done/failed), stream LLM output, parse JSON, persist PaperOutput, and handle cancellation/errors.
PDF service & template
backend/app/services/pdf.py, backend/app/templates/paper.html
Adds Jinja2 + WeasyPrint-based generate_paper_pdf and paper.html template; includes option-letter filter and returns PDF bytes for upload.
DB migration, model & solver sync
backend/alembic/versions/983ad2983d5c_add_selected_resource_ids_to_chat_.py, backend/app/models/chat_session.py, backend/app/routers/solver.py, backend/app/schemas/chat_session.py
Adds selected_resource_ids JSON column, model field, and persists/exposes selection in solver endpoints/schemas.
Resources enqueue & quota
backend/app/routers/resources.py
Enforces per-user resource quota for free plan on upload, creates ingestion Job records, enqueues extraction_task with job_id, and ensures Redis pool cleanup; retry path updated similarly.
Frontend Generator & client
frontend/src/App.tsx, frontend/src/api/papers.ts, frontend/src/components/ui/checkbox.tsx, frontend/src/pages/Generator.tsx
Adds Generator page and route, papersApi client, a Checkbox component, Generator UI with detect-format, create dialog, polling, recent generations sidebar, output toggles, and PDF export/regenerate flows.
Frontend Solver updates
frontend/src/api/solver.ts, frontend/src/pages/Solver.tsx
Extends ChatSession type and updateSession to include selected_resource_ids, derives active session from URL, navigates on create/delete, syncs/persists selected resources.
Config, docs & UI utility
backend/app/config.py, context/*, frontend/src/index.css, todo.md
Adds PAPERS_LIMIT/RESOURCES_LIMIT, updates architecture/progress/docs, adds .no-scrollbar CSS utility, and updates todo and progress tracker content.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • shubhamxdd/pyq-appl#3: Related resource extraction task and pipeline that this PR consumes for paper generation.
  • shubhamxdd/pyq-appl#4: Related OpenRouter streaming integration referenced by LLM client usage in this PR.

Poem

🐰 I nibble prompts and stitch the paper tight,
From context chunks I bloom questions in the night,
Background hops and queues spin out the draft,
Toggle answers on — the classroom laughs,
Phase Five hops forward; soon we ship with light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main feature added: a Sample Paper Generator with format detection and question generation capabilities, which aligns with the substantial new papers router, LLM-based generation task, and related frontend components.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/paper-gen

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
backend/app/schemas/paper.py (1)

8-8: ⚡ Quick win

Constrain contract strings to explicit enums/literals.

role, delivery_mode, status, and question type are open strings right now. Invalid values can pass validation and break worker/API assumptions later. Use Literal[...]/Enums at schema level to fail fast on bad input.

Proposed contract tightening
+from typing import Literal
+
 class PaperResourceLink(BaseModel):
     resource_id: UUID
-    role: str # syllabus | past_paper | notes
+    role: Literal["syllabus", "past_paper", "notes"]

 class PaperCreate(BaseModel):
@@
-    delivery_mode: str = "background"
+    delivery_mode: Literal["background"] = "background"

 class PaperUpdate(BaseModel):
     title: Optional[str] = None
-    status: Optional[str] = None
+    status: Optional[Literal["pending", "generating", "done", "failed"]] = None
@@
 class PaperQuestion(BaseModel):
-    type: str # mcq | short | long
+    type: Literal["mcq", "short", "long"]

Also applies to: 14-14, 18-18, 37-37

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/schemas/paper.py` at line 8, The schema currently uses free-form
strings for role, delivery_mode, status and question type (e.g., the field
"role: str" in the Paper model) which lets invalid values pass; change those
fields to constrained types by replacing str with either typing.Literal[...]
(e.g., Literal["syllabus","past_paper","notes"]) or a small Enum class and
update the corresponding fields in the Paper and Question schemas (look for
symbols like Paper, Question, delivery_mode, status, type) to use the new
Literal/Enum types; also add the necessary imports (from typing import Literal
or from enum import Enum) and run quick validation tests to ensure invalid
values now raise schema errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/routers/papers.py`:
- Around line 67-68: Replace the current except block that does raise
HTTPException(status_code=500, detail=f"Failed to detect format: {str(e)}") with
logic that logs the full exception server-side (e.g., logger.exception(...) or
use the app logger) and raises an HTTPException with a generic client-facing
message like "Failed to detect file format" or "Internal server error" so
internal exception text is not returned; update the except handling in the
papers.py handler where the exception variable e is caught to call
logger.exception(e) and then raise HTTPException(status_code=500, detail="Failed
to detect file format").
- Around line 88-95: The loop currently silently skips invalid resources
(res_result.scalar_one_or_none()) which lets papers be created without required
context; change this to fail fast by raising an HTTPException(status_code=400)
with a clear message when a requested Resource (Resource.id ==
res_link.resource_id, Resource.user_id == current_user.id) is not found/owned,
and after processing data.resources ensure at least one valid resource
remains—if none, raise HTTPException(400) as well; make these checks inside the
paper creation handler where db.execute/select and res_link, res_result are used
so invalid input is rejected immediately.
- Around line 109-111: Wrap the ephemeral Redis pool usage (create_pool and
RedisSettings.from_dsn/ settings.REDIS_URL) around a try/except/finally: call
redis.enqueue_job("generate_paper_task", str(new_paper.id)) inside try, catch
enqueue exceptions to update the paper's status from "pending" to a failure
state (e.g., "failed" or "error") and persist that change, then re-raise or log
as appropriate; in finally ensure the Redis pool is properly closed/shut down
(close + await wait_closed or the equivalent on the pool object) so no
per-request pool leak occurs.

In `@backend/app/workers/tasks.py`:
- Around line 251-266: The parsed questions JSON (questions) must be
structurally validated before creating PaperOutput to avoid persisting malformed
data; before the PaperOutput(...) call, add an explicit validation step that
ensures questions is a list and that each question object contains the required
keys and types (e.g., common fields like id/stem/type, and type-specific fields
such as choices for "multiple_choice" or answer/explanation formats), and fail
fast (raise/log and do not persist) if validation fails; implement this using a
JSON Schema check with jsonschema.validate or a small Pydantic model for
Question and QuestionList and run it in the same scope as tasks.py parsing logic
so only validated data is passed into PaperOutput.
- Around line 288-289: Replace the bare "except: pass" used when marking papers
as failed in backend/app/workers/tasks.py with a proper exception handler: catch
"Exception as err" instead of a bare except and log the error (using the
module's logger or logger.exception/logger.error) including the exception
message and stack trace so failures aren't silently swallowed; do not suppress
the exception silently and only swallow it if there's a clear, logged reason.

In `@frontend/src/pages/Generator.tsx`:
- Around line 103-107: The invalidation in toggleSettingsMutation's onSuccess
uses the closed-over activePaperId which can be stale; change onSuccess to use
the mutation's variables (the second onSuccess arg) instead and call
queryClient.invalidateQueries with ['paper-output', variables.id] (or
variables?.id) so the cache for the actually-updated paper is invalidated
reliably; locate the useMutation call (toggleSettingsMutation, mutationFn) and
update its onSuccess signature to accept (_, variables) and use variables.id for
the queryKey.
- Around line 485-493: The generating placeholder is shown whenever activeOutput
is falsy, which wrongly includes failed papers; update the render condition in
Generator.tsx (the block using outputLoading, activeOutput) to exclude failed
outputs by checking the failure flag/status (e.g., add && !failed or check
activeOutput?.status !== 'failed') so the generating message only appears when
there is no output and the job hasn't failed, and ensure failed cases render the
existing failure UI instead.

---

Nitpick comments:
In `@backend/app/schemas/paper.py`:
- Line 8: The schema currently uses free-form strings for role, delivery_mode,
status and question type (e.g., the field "role: str" in the Paper model) which
lets invalid values pass; change those fields to constrained types by replacing
str with either typing.Literal[...] (e.g.,
Literal["syllabus","past_paper","notes"]) or a small Enum class and update the
corresponding fields in the Paper and Question schemas (look for symbols like
Paper, Question, delivery_mode, status, type) to use the new Literal/Enum types;
also add the necessary imports (from typing import Literal or from enum import
Enum) and run quick validation tests to ensure invalid values now raise schema
errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5795654d-275d-4aca-a761-2eb5de6e1bdf

📥 Commits

Reviewing files that changed from the base of the PR and between 342801b and 3c34532.

📒 Files selected for processing (12)
  • backend/app/llm/client.py
  • backend/app/llm/prompts.py
  • backend/app/main.py
  • backend/app/routers/papers.py
  • backend/app/schemas/paper.py
  • backend/app/workers/arq_worker.py
  • backend/app/workers/tasks.py
  • context/progress-tracker.md
  • frontend/src/App.tsx
  • frontend/src/api/papers.ts
  • frontend/src/components/ui/checkbox.tsx
  • frontend/src/pages/Generator.tsx

Comment on lines +67 to +68
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to detect format: {str(e)}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid leaking internal exception text in API error responses.

At Line 68, returning str(e) can expose provider/internal details. Return a generic message to clients and log the detailed error server-side.

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 67-67: Do not catch blind exception: Exception

(BLE001)


[warning] 68-68: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 68-68: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/papers.py` around lines 67 - 68, Replace the current
except block that does raise HTTPException(status_code=500, detail=f"Failed to
detect format: {str(e)}") with logic that logs the full exception server-side
(e.g., logger.exception(...) or use the app logger) and raises an HTTPException
with a generic client-facing message like "Failed to detect file format" or
"Internal server error" so internal exception text is not returned; update the
except handling in the papers.py handler where the exception variable e is
caught to call logger.exception(e) and then raise HTTPException(status_code=500,
detail="Failed to detect file format").

Comment thread backend/app/routers/papers.py
Comment thread backend/app/routers/papers.py
Comment on lines +251 to +266
try:
questions = json.loads(clean_json)
print(f"✨ [DEBUG] Successfully parsed {len(questions)} questions.")
except json.JSONDecodeError as je:
print(f"❌ [PARSE ERROR] Failed to parse LLM response as JSON.")
print(f"🔍 [DEBUG] Raw response snippet: {clean_json[:500]}...")
raise je

# 5. Save Output
print(f"💾 [DEBUG] Saving paper output to database...")
new_output = PaperOutput(
paper_id=paper.id,
questions=questions,
include_answers=True,
include_explanations=True
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate generated JSON schema before persisting.

At Line 252 onward, parsed JSON is saved directly. If the model returns malformed shape, bad data is persisted and UI contracts break downstream. Add explicit structural validation (array, required fields per question type) before PaperOutput insert.

🧰 Tools
🪛 Ruff (0.15.12)

[error] 255-255: f-string without any placeholders

Remove extraneous f prefix

(F541)


[error] 260-260: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/workers/tasks.py` around lines 251 - 266, The parsed questions
JSON (questions) must be structurally validated before creating PaperOutput to
avoid persisting malformed data; before the PaperOutput(...) call, add an
explicit validation step that ensures questions is a list and that each question
object contains the required keys and types (e.g., common fields like
id/stem/type, and type-specific fields such as choices for "multiple_choice" or
answer/explanation formats), and fail fast (raise/log and do not persist) if
validation fails; implement this using a JSON Schema check with
jsonschema.validate or a small Pydantic model for Question and QuestionList and
run it in the same scope as tasks.py parsing logic so only validated data is
passed into PaperOutput.

Comment on lines +288 to +289
except:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove bare except: pass in failure handling.

At Line 288–289, this suppresses errors when marking papers as failed, making stuck states hard to diagnose. Catch Exception as err and log it.

Minimal fix
-            except:
-                pass
+            except Exception as err:
+                logger.exception("Failed to mark paper %s as failed: %s", paper_id, err)
🧰 Tools
🪛 Ruff (0.15.12)

[error] 288-288: Do not use bare except

(E722)


[error] 288-289: try-except-pass detected, consider logging the exception

(S110)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/workers/tasks.py` around lines 288 - 289, Replace the bare
"except: pass" used when marking papers as failed in
backend/app/workers/tasks.py with a proper exception handler: catch "Exception
as err" instead of a bare except and log the error (using the module's logger or
logger.exception/logger.error) including the exception message and stack trace
so failures aren't silently swallowed; do not suppress the exception silently
and only swallow it if there's a clear, logged reason.

Comment thread frontend/src/pages/Generator.tsx
Comment thread frontend/src/pages/Generator.tsx Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/routers/solver.py`:
- Around line 176-185: Persist the validated resource IDs instead of the raw
request list: replace uses of data.resource_ids when assigning
session.selected_resource_ids and when constructing new ChatSession (new_sess)
so they store the filtered/validated IDs that were actually used in the query
(e.g., the variable holding the authorized/exists IDs from the earlier filter),
converted to strings; update the assignment in the existing-session branch
(session.selected_resource_ids = ...) and the ChatSession constructor
(selected_resource_ids=...) to use that validated_id_list rather than
data.resource_ids to avoid saving deleted or unauthorized IDs.

In `@frontend/src/pages/Generator.tsx`:
- Around line 377-405: The output toggle buttons (the Button components that
reference activeOutput and call toggleSettingsMutation.mutate) should be gated
so they cannot fire when no output record exists; check the paper status and
output presence (use condition activePaper?.status === 'done' && !!activeOutput)
and apply it to both the onClick behavior and the disabled prop (or
early-return) so the PATCH is not called for pending/failed papers; update the
Button instances that reference activeOutput, activePaperId, and
toggleSettingsMutation to respect this guard and reflect disabled state
visually.
- Around line 113-123: When the user toggles a resource or changes its role, the
component must clear any previously detected document format so Step 2 won't
remain READY for a stale format; update handleResourceToggle and
handleRoleChange to call the state reset for the detected format (e.g.,
setDetectedFormat(null) or the equivalent state updater you use for the learned
format) after updating selectedResources, ensuring the component also clears any
related "Step 2 ready" status if you have a separate flag (e.g., setStep2Status
or similar). This ensures toggling or role changes invalidate prior detection
before createPaper can submit.
- Around line 217-239: The Detect Pattern button is currently gated by the
original resource type (res.type) so it doesn't appear when the user changes the
role to "past_paper" via the select; update the conditional that renders the
button to check the selected role in selectedResources instead (e.g.,
selectedResources.find(r => r.id === res.id)?.role === 'past_paper') so the
button appears whenever the selected role is past_paper, and ensure
handleDetectFormat(res.id) and isDetecting behavior remain the same.

In `@frontend/src/pages/Solver.tsx`:
- Around line 138-148: The effect that syncs selected resources leaves stale
selections when activeSessionId is set but sessions doesn't yet contain that
session; update the useEffect body (the effect that references activeSessionId,
sessions and calls setSelectedResources) so that when activeSessionId and
sessions are present but sessions.find(...) (activeSession) returns undefined
you explicitly call setSelectedResources([]) — i.e., change the branch for
activeSession being falsy to clear selections instead of doing nothing.
- Around line 68-71: The session switch helper handleSelectSession should be
used as the single source of truth for session navigation instead of directly
calling setActiveSessionId; update both places that render session list buttons
(where they currently call setActiveSessionId(sess.id)) to call
handleSelectSession(sess.id) (which performs navigate(`/solver/${id}`)) so the
URL is updated on click and becomes the canonical session state used by the
component.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d1444b3-40d5-49b4-88b0-63e607158631

📥 Commits

Reviewing files that changed from the base of the PR and between 3c34532 and 334514d.

📒 Files selected for processing (9)
  • backend/alembic/versions/983ad2983d5c_add_selected_resource_ids_to_chat_.py
  • backend/app/models/chat_session.py
  • backend/app/routers/papers.py
  • backend/app/routers/solver.py
  • backend/app/schemas/chat_session.py
  • frontend/src/App.tsx
  • frontend/src/api/solver.ts
  • frontend/src/pages/Generator.tsx
  • frontend/src/pages/Solver.tsx
✅ Files skipped from review due to trivial changes (1)
  • backend/alembic/versions/983ad2983d5c_add_selected_resource_ids_to_chat_.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/App.tsx

Comment on lines +176 to +185
# Update persistent resource selection if changed
new_resource_ids = [str(rid) for rid in data.resource_ids]
if session.selected_resource_ids != new_resource_ids:
session.selected_resource_ids = new_resource_ids
else:
# Auto-create session if none provided
new_sess = ChatSession(user_id=current_user.id, title=data.content[:30] + "...")
new_sess = ChatSession(
user_id=current_user.id,
title=data.content[:30] + "...",
selected_resource_ids=[str(rid) for rid in data.resource_ids]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Persist only the validated resource IDs.

data.resource_ids can still contain deleted or unauthorized IDs that were filtered out by the query above. Saving the raw request back into session.selected_resource_ids keeps stale IDs in the session and desyncs the solver UI from the actual context used for the answer.

Proposed fix
-        new_resource_ids = [str(rid) for rid in data.resource_ids]
+        new_resource_ids = [str(res.id) for res in resources]
         if session.selected_resource_ids != new_resource_ids:
             session.selected_resource_ids = new_resource_ids
@@
         new_sess = ChatSession(
             user_id=current_user.id, 
             title=data.content[:30] + "...",
-            selected_resource_ids=[str(rid) for rid in data.resource_ids]
+            selected_resource_ids=[str(res.id) for res in resources]
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/solver.py` around lines 176 - 185, Persist the validated
resource IDs instead of the raw request list: replace uses of data.resource_ids
when assigning session.selected_resource_ids and when constructing new
ChatSession (new_sess) so they store the filtered/validated IDs that were
actually used in the query (e.g., the variable holding the authorized/exists IDs
from the earlier filter), converted to strings; update the assignment in the
existing-session branch (session.selected_resource_ids = ...) and the
ChatSession constructor (selected_resource_ids=...) to use that
validated_id_list rather than data.resource_ids to avoid saving deleted or
unauthorized IDs.

Comment on lines +113 to +123
const handleResourceToggle = (id: string) => {
setSelectedResources(prev => {
const exists = prev.find(r => r.id === id);
if (exists) return prev.filter(r => r.id !== id);
return [...prev, { id, role: 'notes' }];
});
};

const handleRoleChange = (id: string, role: string) => {
setSelectedResources(prev => prev.map(r => r.id === id ? { ...r, role } : r));
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset the detected format when the selected source changes.

Once pattern detection succeeds, deselecting that resource or changing its role leaves Step 2 in a READY state and createPaper can submit a format learned from a different document.

Proposed fix
   const handleResourceToggle = (id: string) => {
+    setFormatConfig(null);
     setSelectedResources(prev => {
       const exists = prev.find(r => r.id === id);
       if (exists) return prev.filter(r => r.id !== id);
       return [...prev, { id, role: 'notes' }];
     });
   };

   const handleRoleChange = (id: string, role: string) => {
+    setFormatConfig(null);
     setSelectedResources(prev => prev.map(r => r.id === id ? { ...r, role } : r));
   };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleResourceToggle = (id: string) => {
setSelectedResources(prev => {
const exists = prev.find(r => r.id === id);
if (exists) return prev.filter(r => r.id !== id);
return [...prev, { id, role: 'notes' }];
});
};
const handleRoleChange = (id: string, role: string) => {
setSelectedResources(prev => prev.map(r => r.id === id ? { ...r, role } : r));
};
const handleResourceToggle = (id: string) => {
setFormatConfig(null);
setSelectedResources(prev => {
const exists = prev.find(r => r.id === id);
if (exists) return prev.filter(r => r.id !== id);
return [...prev, { id, role: 'notes' }];
});
};
const handleRoleChange = (id: string, role: string) => {
setFormatConfig(null);
setSelectedResources(prev => prev.map(r => r.id === id ? { ...r, role } : r));
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/Generator.tsx` around lines 113 - 123, When the user
toggles a resource or changes its role, the component must clear any previously
detected document format so Step 2 won't remain READY for a stale format; update
handleResourceToggle and handleRoleChange to call the state reset for the
detected format (e.g., setDetectedFormat(null) or the equivalent state updater
you use for the learned format) after updating selectedResources, ensuring the
component also clears any related "Step 2 ready" status if you have a separate
flag (e.g., setStep2Status or similar). This ensures toggling or role changes
invalidate prior detection before createPaper can submit.

Comment thread frontend/src/pages/Generator.tsx
Comment thread frontend/src/pages/Generator.tsx
Comment thread frontend/src/pages/Solver.tsx
Comment on lines +138 to +148
// Sync selected resources from active session
useEffect(() => {
if (activeSessionId && sessions) {
const activeSession = sessions.find((s: any) => s.id === activeSessionId);
if (activeSession) {
setSelectedResources(activeSession.selected_resource_ids || []);
}
} else {
setSelectedResources([]);
}
}, [activeSessionId, sessions]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear stale selections when the active session is not in the cached list yet.

Right after navigating to a newly-created session, activeSessionId is set before sessions contains that record. In that window this effect preserves the previous session's selectedResources, so the next question can be sent with the wrong documents.

Proposed fix
   useEffect(() => {
     if (activeSessionId && sessions) {
       const activeSession = sessions.find((s: any) => s.id === activeSessionId);
       if (activeSession) {
         setSelectedResources(activeSession.selected_resource_ids || []);
+      } else {
+        setSelectedResources([]);
       }
     } else {
       setSelectedResources([]);
     }
   }, [activeSessionId, sessions]);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Sync selected resources from active session
useEffect(() => {
if (activeSessionId && sessions) {
const activeSession = sessions.find((s: any) => s.id === activeSessionId);
if (activeSession) {
setSelectedResources(activeSession.selected_resource_ids || []);
}
} else {
setSelectedResources([]);
}
}, [activeSessionId, sessions]);
// Sync selected resources from active session
useEffect(() => {
if (activeSessionId && sessions) {
const activeSession = sessions.find((s: any) => s.id === activeSessionId);
if (activeSession) {
setSelectedResources(activeSession.selected_resource_ids || []);
} else {
setSelectedResources([]);
}
} else {
setSelectedResources([]);
}
}, [activeSessionId, sessions]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/Solver.tsx` around lines 138 - 148, The effect that syncs
selected resources leaves stale selections when activeSessionId is set but
sessions doesn't yet contain that session; update the useEffect body (the effect
that references activeSessionId, sessions and calls setSelectedResources) so
that when activeSessionId and sessions are present but sessions.find(...)
(activeSession) returns undefined you explicitly call setSelectedResources([]) —
i.e., change the branch for activeSession being falsy to clear selections
instead of doing nothing.

shubhamxdd added 15 commits May 19, 2026 02:51
… logic for different modes

in this commit i added feature in which it only adds questions to pdf not answer !
…e with URL

like in chat we have different url (id) for each chat i did similar thing here for each sample paper
…ting, dual PDF export (Full/Questions), abort/delete functionality, paper renaming, and dashboard metrics.
@shubhamxdd
Copy link
Copy Markdown
Owner Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/routers/resources.py (1)

102-130: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Commit Job records before handing their IDs to ARQ.

Line 116 and Line 226 enqueue new_job.id before that transaction commits. A fast worker can start, fail to load the row, and then blow up when extraction_task later touches job.status; the enqueue-error path can also commit a flushed queued job that never actually ran. Persist the job first, then enqueue, and move the resource/job into a terminal failure state in a follow-up transaction if the queue write fails.

Also applies to: 213-239

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/resources.py` around lines 102 - 130, The code enqueues
new_job.id into ARQ before the DB transaction is committed; commit and persist
the Job first, then enqueue. Change the sequence so that after creating new_job
and awaiting db.flush() you await db.commit() and
db.refresh(new_job)/db.refresh(new_resource) to ensure new_job.id is durable,
then create the Redis pool and call redis.enqueue_job('extraction_task',
str(new_resource.id), str(new_job.id)). If enqueue fails, start a new DB
transaction to set new_job.status = 'failed' (and new_resource.status or cleanup
as appropriate) and commit that follow-up transaction, then raise the
HTTPException; always close the redis pool in finally.
🧹 Nitpick comments (2)
frontend/src/api/papers.ts (1)

19-21: ⚡ Quick win

Replace any types with proper interfaces for type safety.

The create and toggleOutput methods use any for their data parameters, which bypasses TypeScript's type checking. This makes the API less self-documenting and more error-prone.

♻️ Proposed type-safe refactor
// Add interface definitions at the top of the file
interface CreatePaperRequest {
  title: string;
  resource_ids: string[];
  format_config: {
    duration_minutes: number;
    total_marks: number;
    sections: Array<{
      name: string;
      question_count: number;
      marks_per_question: number;
      type: 'mcq' | 'short' | 'long';
    }>;
  };
}

interface ToggleOutputRequest {
  include_answers?: boolean;
  include_explanations?: boolean;
}

// Update the method signatures
export const papersApi = {
  // ... other methods ...
  
  create: async (data: CreatePaperRequest) => {
    const response = await api.post('/papers', data);
    return response.data;
  },
  
  // ... other methods ...
  
  toggleOutput: async (id: string, data: ToggleOutputRequest) => {
    const response = await api.patch(`/papers/${id}/output`, data);
    return response.data;
  },
  
  // ... other methods ...
};

Also applies to: 39-41

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/api/papers.ts` around lines 19 - 21, Replace the loose any types
used in the create and toggleOutput methods by adding explicit request
interfaces (e.g., CreatePaperRequest and ToggleOutputRequest) at the top of the
file and updating signatures: change create: async (data: any) => ... to create:
async (data: CreatePaperRequest) => ... and toggleOutput: async (id: string,
data: any) => ... to toggleOutput: async (id: string, data: ToggleOutputRequest)
=> ...; ensure CreatePaperRequest models title, resource_ids, format_config
(with duration_minutes, total_marks, sections array and section fields name,
question_count, marks_per_question, type) and ToggleOutputRequest includes
optional include_answers and include_explanations so callers get proper type
checking and auto-complete.
backend/app/services/pdf.py (1)

28-30: ⚡ Quick win

Consider defensive validation for question structure.

The code assumes all questions have a type field. If a malformed question is passed without this field, a KeyError will be raised. Using .get() with a default would make the code more resilient.

♻️ Proposed defensive refactor
-    mcqs = [q for q in questions if q.get('type') == 'mcq']
-    shorts = [q for q in questions if q.get('type') == 'short']
-    longs = [q for q in questions if q.get('type') == 'long']
+    # Use .get() for safety, but questions without 'type' will be silently excluded
+    mcqs = [q for q in questions if q.get('type') == 'mcq']
+    shorts = [q for q in questions if q.get('type') == 'short']
+    longs = [q for q in questions if q.get('type') == 'long']
+    
+    # Optionally, log a warning if any questions were excluded
+    total_categorized = len(mcqs) + len(shorts) + len(longs)
+    if total_categorized < len(questions):
+        print(f"Warning: {len(questions) - total_categorized} questions had invalid or missing 'type' field")

Note: The code already uses .get('type') so it won't crash, but questions with missing or invalid types will be silently excluded from all sections. Consider adding logging to catch data quality issues.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/pdf.py` around lines 28 - 30, The three list
comprehensions (mcqs, shorts, longs) filter the questions list by q.get('type')
but silently drop entries missing or with invalid types—add defensive validation
and logging: iterate over questions, read q_type = q.get('type', None), if
q_type not in {'mcq','short','long'} log a warning (using the module's logger or
create one) including the offending question identifier/summary, and append
valid items to mcqs, shorts, or longs accordingly; this preserves current
behavior for valid types while surfacing malformed question data for
investigation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/routers/papers.py`:
- Around line 250-260: The current deletion path removes the Paper (await
db.delete(paper); await db.commit()) even if storage deletions via
storage_service.delete_file for output.pdf_url or output.question_pdf_url fail,
risking orphaned billable files; modify the logic in the papers deletion flow to
check the result/raise on failure of storage_service.delete_file (or catch and
log+raise) for both output.pdf_url and output.question_pdf_url (use the same
pattern as the resource delete flow), and only proceed to call db.delete(paper)
and db.commit() when both deletions succeed (or surface the cleanup failure to
the caller), referencing output.pdf_url, output.question_pdf_url,
storage_service.delete_file, db.delete(paper), and db.commit() to locate the
code to change.
- Around line 141-167: The Job row is committed only after enqueue, so the
worker may see a missing Job id; move the DB commit for the newly created Job
(the new_job created and currently flushed via await db.flush()) to occur before
calling redis.enqueue_job for "generate_paper_task" so the worker always
observes the row; additionally, update the exception path around
redis.enqueue_job to remove or mark and commit the new_job consistently (e.g.,
delete the phantom new_job or set its status to "failed" and commit) to avoid
leaving a queued Job that never ran; reference new_job, await db.flush(), await
db.commit(), redis.enqueue_job("generate_paper_task") and generate_paper_task
when making the change.
- Around line 117-136: The loop in routers/papers.py currently only checks
ownership of each res_link; update the existence query to also require the
resource is ready and has non-empty extracted text before inserting into
paper_resources. Specifically, change the select(...) used for res_result to
include Resource.status == "ready" and Resource.extracted_text is not null/empty
(or the equivalent check your Resource model uses), then if scalar_one_or_none()
returns None raise HTTPException with a message like "Resource {id} not ready or
has no extracted text." Use the same symbols (data.resources,
res_link.resource_id, current_user.id, Resource, res_result,
paper_resources.insert(), new_paper.id) to locate and modify the code.

In `@backend/app/services/pdf.py`:
- Line 9: The Jinja2 Environment is created without autoescaping which can cause
XSS when rendering user content; update the Environment instantiation (the env
variable created with Environment(loader=FileSystemLoader(template_dir))) to
enable autoescape for HTML templates (e.g., pass autoescape=True or use
select_autoescape for HTML/XML templates) so user-provided question content is
escaped by default when rendering PDFs.
- Around line 11-13: The map_option_letter function needs input validation: in
map_option_letter convert the incoming index to an int inside a try/except and
raise a clear ValueError for non-numeric input, then check that the int is
within the allowed range (0 through 3) and raise a ValueError if out of bounds;
only after those checks return chr(65 + idx). Use the function name
map_option_letter in your changes so the reviewer can find the updated logic.

In `@frontend/src/App.tsx`:
- Around line 229-275: The Card components used for navigation (the Card
elements with onClick={() => navigate('/resources')},
onClick={()=>navigate('/solver')} and onClick={()=>navigate('/generator')}
inside App.tsx) are not keyboard-accessible; replace these clickable-only
containers with semantic interactive elements by rendering the Card as a real
Link or button (or use the Card's asChild prop to wrap a react-router <Link
to="..."> or a <button>) so they receive focus, support keyboard activation, and
preserve the existing styling and CardHeader/CardTitle/CardDescription layout;
repeat the same change for the other Card blocks noted around lines 289-333.

In `@frontend/src/pages/Generator.tsx`:
- Around line 238-246: The client-side quota check in Generator.tsx currently
hardcodes the monthly limit as 3 (see monthlyPapers, papers, user) which can
diverge from backend settings; replace this by either fetching the canonical
limit from the backend (e.g., call the existing settings or config API to get
PAPERS_LIMIT before comparing monthlyPapers.length) and use that value instead
of 3, or remove the preflight check entirely and handle the server-side
403/error from the generation API (showing toast.error based on the API
response). Ensure you use the same identifiers (monthlyPapers, papers, user,
toast) so the change integrates with the existing logic.

---

Outside diff comments:
In `@backend/app/routers/resources.py`:
- Around line 102-130: The code enqueues new_job.id into ARQ before the DB
transaction is committed; commit and persist the Job first, then enqueue. Change
the sequence so that after creating new_job and awaiting db.flush() you await
db.commit() and db.refresh(new_job)/db.refresh(new_resource) to ensure
new_job.id is durable, then create the Redis pool and call
redis.enqueue_job('extraction_task', str(new_resource.id), str(new_job.id)). If
enqueue fails, start a new DB transaction to set new_job.status = 'failed' (and
new_resource.status or cleanup as appropriate) and commit that follow-up
transaction, then raise the HTTPException; always close the redis pool in
finally.

---

Nitpick comments:
In `@backend/app/services/pdf.py`:
- Around line 28-30: The three list comprehensions (mcqs, shorts, longs) filter
the questions list by q.get('type') but silently drop entries missing or with
invalid types—add defensive validation and logging: iterate over questions, read
q_type = q.get('type', None), if q_type not in {'mcq','short','long'} log a
warning (using the module's logger or create one) including the offending
question identifier/summary, and append valid items to mcqs, shorts, or longs
accordingly; this preserves current behavior for valid types while surfacing
malformed question data for investigation.

In `@frontend/src/api/papers.ts`:
- Around line 19-21: Replace the loose any types used in the create and
toggleOutput methods by adding explicit request interfaces (e.g.,
CreatePaperRequest and ToggleOutputRequest) at the top of the file and updating
signatures: change create: async (data: any) => ... to create: async (data:
CreatePaperRequest) => ... and toggleOutput: async (id: string, data: any) =>
... to toggleOutput: async (id: string, data: ToggleOutputRequest) => ...;
ensure CreatePaperRequest models title, resource_ids, format_config (with
duration_minutes, total_marks, sections array and section fields name,
question_count, marks_per_question, type) and ToggleOutputRequest includes
optional include_answers and include_explanations so callers get proper type
checking and auto-complete.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5013e668-2445-4be8-ae98-bf096bf74db4

📥 Commits

Reviewing files that changed from the base of the PR and between 334514d and cfc15d2.

⛔ Files ignored due to path filters (1)
  • frontend/i7.png is excluded by !**/*.png
📒 Files selected for processing (21)
  • backend/alembic/versions/ab47c0990b86_add_question_pdf_url_to_paper_outputs.py
  • backend/app/config.py
  • backend/app/models/paper_output.py
  • backend/app/routers/papers.py
  • backend/app/routers/resources.py
  • backend/app/schemas/paper.py
  • backend/app/services/pdf.py
  • backend/app/templates/paper.html
  • backend/app/workers/arq_worker.py
  • backend/app/workers/tasks.py
  • context/architecture.md
  • context/code-standards.md
  • context/progress-tracker.md
  • context/project-overview.md
  • context/ui-context.md
  • frontend/src/App.tsx
  • frontend/src/api/papers.ts
  • frontend/src/index.css
  • frontend/src/pages/Generator.tsx
  • frontend/src/pages/Solver.tsx
  • todo.md
✅ Files skipped from review due to trivial changes (5)
  • backend/alembic/versions/ab47c0990b86_add_question_pdf_url_to_paper_outputs.py
  • todo.md
  • context/code-standards.md
  • context/project-overview.md
  • context/progress-tracker.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/schemas/paper.py

Comment on lines +117 to +136
# 3. Link Resources
for res_link in data.resources:
# Verify resource exists and belongs to user
res_result = await db.execute(
select(Resource).where(Resource.id == res_link.resource_id, Resource.user_id == current_user.id)
)
if not res_result.scalar_one_or_none():
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Resource {res_link.resource_id} not found or unauthorized."
)

# Insert into association table
await db.execute(
paper_resources.insert().values(
paper_id=new_paper.id,
resource_id=res_link.resource_id,
resource_role=res_link.role
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject resources that aren't ready yet.

This loop only checks ownership. A direct API caller can attach processing or failed resources, and the worker then burns a queue slot before dying with "No context found in selected resources." Require status == "ready" and non-empty extracted text here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/papers.py` around lines 117 - 136, The loop in
routers/papers.py currently only checks ownership of each res_link; update the
existence query to also require the resource is ready and has non-empty
extracted text before inserting into paper_resources. Specifically, change the
select(...) used for res_result to include Resource.status == "ready" and
Resource.extracted_text is not null/empty (or the equivalent check your Resource
model uses), then if scalar_one_or_none() returns None raise HTTPException with
a message like "Resource {id} not ready or has no extracted text." Use the same
symbols (data.resources, res_link.resource_id, current_user.id, Resource,
res_result, paper_resources.insert(), new_paper.id) to locate and modify the
code.

Comment on lines +141 to +167
from ..models.job import Job
new_job = Job(
user_id=current_user.id,
job_type="generate_paper",
status="queued",
ref_id=new_paper.id
)
db.add(new_job)
await db.flush()

# 4. Enqueue background task
redis = await create_pool(RedisSettings.from_dsn(settings.REDIS_URL))
try:
job = await redis.enqueue_job("generate_paper_task", str(new_paper.id), str(new_job.id))
if job is None:
raise RuntimeError("Failed to enqueue generate_paper_task")
await db.commit()
except Exception as e:
logger.error(f"Redis enqueue error: {e}")
new_paper.status = "failed"
await db.commit()
raise HTTPException(
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
detail="Paper generation queued failed, please retry."
)
finally:
await redis.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Commit the Job row before passing its ID to generate_paper_task.

Line 154 hands ARQ an uncommitted new_job.id. If the worker starts before the transaction commits, generate_paper_task can observe no row and crash when it later dereferences job. The exception path also commits the flushed queued job, leaving a phantom job for work that never ran.

🧰 Tools
🪛 Ruff (0.15.13)

[warning] 158-158: Do not catch blind exception: Exception

(BLE001)


[warning] 162-165: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/papers.py` around lines 141 - 167, The Job row is
committed only after enqueue, so the worker may see a missing Job id; move the
DB commit for the newly created Job (the new_job created and currently flushed
via await db.flush()) to occur before calling redis.enqueue_job for
"generate_paper_task" so the worker always observes the row; additionally,
update the exception path around redis.enqueue_job to remove or mark and commit
the new_job consistently (e.g., delete the phantom new_job or set its status to
"failed" and commit) to avoid leaving a queued Job that never ran; reference
new_job, await db.flush(), await db.commit(),
redis.enqueue_job("generate_paper_task") and generate_paper_task when making the
change.

Comment on lines +250 to +260
if output:
if output.pdf_url:
object_name = output.pdf_url.replace(f"{settings.SPACES_PUBLIC_URL}/", "")
storage_service.delete_file(object_name)
if output.question_pdf_url:
object_name = output.question_pdf_url.replace(f"{settings.SPACES_PUBLIC_URL}/", "")
storage_service.delete_file(object_name)

# Deleting the paper will cascade and delete PaperOutput and paper_resources automatically
await db.delete(paper)
await db.commit()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't drop the paper record if PDF cleanup fails.

Both storage deletions ignore their return value. If either object stays in Spaces, the paper disappears from the app but its generated PDFs remain accessible and billable. Mirror the resource delete flow: fail the request or at least surface/log cleanup failure before committing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/papers.py` around lines 250 - 260, The current deletion
path removes the Paper (await db.delete(paper); await db.commit()) even if
storage deletions via storage_service.delete_file for output.pdf_url or
output.question_pdf_url fail, risking orphaned billable files; modify the logic
in the papers deletion flow to check the result/raise on failure of
storage_service.delete_file (or catch and log+raise) for both output.pdf_url and
output.question_pdf_url (use the same pattern as the resource delete flow), and
only proceed to call db.delete(paper) and db.commit() when both deletions
succeed (or surface the cleanup failure to the caller), referencing
output.pdf_url, output.question_pdf_url, storage_service.delete_file,
db.delete(paper), and db.commit() to locate the code to change.


# Set up Jinja2 environment
template_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), "templates")
env = Environment(loader=FileSystemLoader(template_dir))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Enable autoescape to prevent XSS in generated PDFs.

The Jinja2 Environment is created without autoescape=True, which means user-provided content in questions could contain malicious HTML or JavaScript that would be rendered directly into the PDF. If generated PDFs are viewed in browsers, this creates an XSS vulnerability.

🔒 Proposed fix to enable autoescape
-env = Environment(loader=FileSystemLoader(template_dir))
+env = Environment(loader=FileSystemLoader(template_dir), autoescape=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env = Environment(loader=FileSystemLoader(template_dir))
env = Environment(loader=FileSystemLoader(template_dir), autoescape=True)
🧰 Tools
🪛 Ruff (0.15.13)

[error] 9-9: By default, jinja2 sets autoescape to False. Consider using autoescape=True or the select_autoescape function to mitigate XSS vulnerabilities.

(S701)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/pdf.py` at line 9, The Jinja2 Environment is created
without autoescaping which can cause XSS when rendering user content; update the
Environment instantiation (the env variable created with
Environment(loader=FileSystemLoader(template_dir))) to enable autoescape for
HTML templates (e.g., pass autoescape=True or use select_autoescape for HTML/XML
templates) so user-provided question content is escaped by default when
rendering PDFs.

Comment on lines +11 to +13
def map_option_letter(index: str) -> str:
"""Helper to map 0, 1, 2, 3 to A, B, C, D"""
return chr(65 + int(index))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add bounds checking to prevent invalid option letters.

The map_option_letter function converts a string index to a letter without validating bounds. If the index is outside 0-3 range (for typical A-D options), it will produce unexpected characters or raise an error for non-numeric input.

🛡️ Proposed fix with bounds validation
 def map_option_letter(index: str) -> str:
-    """Helper to map 0, 1, 2, 3 to A, B, C, D"""
-    return chr(65 + int(index))
+    """Helper to map 0, 1, 2, 3 to A, B, C, D"""
+    try:
+        idx = int(index)
+        if idx < 0 or idx > 25:
+            raise ValueError(f"Index {idx} out of range for option letters")
+        return chr(65 + idx)
+    except (ValueError, TypeError) as e:
+        raise ValueError(f"Invalid option index: {index}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def map_option_letter(index: str) -> str:
"""Helper to map 0, 1, 2, 3 to A, B, C, D"""
return chr(65 + int(index))
def map_option_letter(index: str) -> str:
"""Helper to map 0, 1, 2, 3 to A, B, C, D"""
try:
idx = int(index)
if idx < 0 or idx > 25:
raise ValueError(f"Index {idx} out of range for option letters")
return chr(65 + idx)
except (ValueError, TypeError) as e:
raise ValueError(f"Invalid option index: {index}") from e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/pdf.py` around lines 11 - 13, The map_option_letter
function needs input validation: in map_option_letter convert the incoming index
to an int inside a try/except and raise a clear ValueError for non-numeric
input, then check that the int is within the allowed range (0 through 3) and
raise a ValueError if out of bounds; only after those checks return chr(65 +
idx). Use the function name map_option_letter in your changes so the reviewer
can find the updated logic.

Comment thread frontend/src/App.tsx
Comment on lines +229 to +275
<Card
onClick={() => navigate('/resources')}
className="border-dashed bg-muted/20 hover:bg-muted/30 transition-all cursor-pointer group hover:border-primary/50"
>
<CardHeader className="flex flex-row items-center gap-4 py-6">
<div className="size-10 bg-background rounded-lg flex items-center justify-center shadow-sm border border-border/50 group-hover:scale-110 transition-transform">
<FileText className="size-5 text-primary" />
</div>
<div className="space-y-0.5">
<CardTitle className="text-base">Upload Resources</CardTitle>
<CardDescription className="text-xs">Add new study materials</CardDescription>
</div>
<ArrowRight className="size-4 ml-auto text-muted-foreground group-hover:text-primary transition-colors" />
</CardHeader>
</Card>

<Card
onClick={() => navigate('/solver')}
className="border-dashed bg-muted/20 hover:bg-muted/30 transition-all cursor-pointer group hover:border-yellow-500/50"
>
<CardHeader className="flex flex-row items-center gap-4 py-6">
<div className="size-10 bg-background rounded-lg flex items-center justify-center shadow-sm border border-border/50 group-hover:scale-110 transition-transform">
<Zap className="size-5 text-yellow-500" />
</div>
<div className="space-y-0.5">
<CardTitle className="text-base">Ask AI Tutor</CardTitle>
<CardDescription className="text-xs">Get answers from your notes</CardDescription>
</div>
<ArrowRight className="size-4 ml-auto text-muted-foreground group-hover:text-yellow-500 transition-colors" />
</CardHeader>
</Card>

<Card
onClick={() => navigate('/generator')}
className="border-dashed bg-muted/20 hover:bg-muted/30 transition-all cursor-pointer group hover:border-purple-500/50"
>
<CardHeader className="flex flex-row items-center gap-4 py-6">
<div className="size-10 bg-background rounded-lg flex items-center justify-center shadow-sm border border-border/50 group-hover:scale-110 transition-transform">
<FileEdit className="size-5 text-purple-500" />
</div>
<div className="space-y-0.5">
<CardTitle className="text-base">Generate Paper</CardTitle>
<CardDescription className="text-xs">Create a new mock exam</CardDescription>
</div>
<ArrowRight className="size-4 ml-auto text-muted-foreground group-hover:text-purple-500 transition-colors" />
</CardHeader>
</Card>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use real links/buttons for these clickable cards.

These Card wrappers only handle pointer clicks. Keyboard users can't tab to them or activate them, so the dashboard's main navigation is inaccessible. Render them as <Link>/<button> (or use asChild with one of those) instead of bare clickable containers.

Also applies to: 289-333

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/App.tsx` around lines 229 - 275, The Card components used for
navigation (the Card elements with onClick={() => navigate('/resources')},
onClick={()=>navigate('/solver')} and onClick={()=>navigate('/generator')}
inside App.tsx) are not keyboard-accessible; replace these clickable-only
containers with semantic interactive elements by rendering the Card as a real
Link or button (or use the Card's asChild prop to wrap a react-router <Link
to="..."> or a <button>) so they receive focus, support keyboard activation, and
preserve the existing styling and CardHeader/CardTitle/CardDescription layout;
repeat the same change for the other Card blocks noted around lines 289-333.

Comment on lines +238 to +246
// --- QUOTA CHECK ---
const now = new Date();
const startOfMonth = new Date(now.getFullYear(), now.getMonth(), 1);
const monthlyPapers = papers?.filter((p: any) => new Date(p.created_at) >= startOfMonth) || [];

if (user?.plan === 'free' && monthlyPapers.length >= 3) {
toast.error('limit exceed, upgrade to continue');
return;
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't hardcode the monthly paper limit in the client.

The backend now enforces settings.PAPERS_LIMIT, but this preflight still blocks at 3. Any env override will make the UI reject valid generations or display a different quota than the API actually uses. Fetch the limit from the backend or rely on the server-side 403 instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/Generator.tsx` around lines 238 - 246, The client-side
quota check in Generator.tsx currently hardcodes the monthly limit as 3 (see
monthlyPapers, papers, user) which can diverge from backend settings; replace
this by either fetching the canonical limit from the backend (e.g., call the
existing settings or config API to get PAPERS_LIMIT before comparing
monthlyPapers.length) and use that value instead of 3, or remove the preflight
check entirely and handle the server-side 403/error from the generation API
(showing toast.error based on the API response). Ensure you use the same
identifiers (monthlyPapers, papers, user, toast) so the change integrates with
the existing logic.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open an issue for it for later fixing

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeRabbit open an issue for it for later fixing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo shubhamxdd/pyq-appl \
  --title "fix: Remove hardcoded monthly paper limit (3) from client-side quota check in Generator.tsx" \
  --body "## Problem

In \`frontend/src/pages/Generator.tsx\` (lines 238–246), the client-side quota preflight check hardcodes the monthly paper limit as \`3\`:

\`\`\`ts
if (user?.plan === 'free' && monthlyPapers.length >= 3) {
  toast.error('limit exceed, upgrade to continue');
  return;
}
\`\`\`

However, the backend enforces this limit via \`settings.PAPERS_LIMIT\` (currently defaulting to \`3\` in \`backend/app/config.py\`), which can be overridden through environment variables. If the backend value is changed, the UI will still block or allow generation incorrectly, leading to a mismatch between client and server behaviour.

## Expected Behaviour

The UI quota check should reflect the same limit as the backend. Either:
- Fetch the canonical limit from the backend (e.g., a config/settings endpoint) before comparing \`monthlyPapers.length\`, or
- Remove the preflight check entirely and handle the server-side \`403\`/error response from the paper creation API (surfacing the error via \`toast.error\` based on \`error.response?.data?.detail\`).

## References

- PR: https://github.com/shubhamxdd/pyq-appl/pull/8
- Review comment: https://github.com/shubhamxdd/pyq-appl/pull/8#discussion_r3262861973

/cc `@shubhamxdd`" \
  --assignee shubhamxdd

Length of output: 48


🏁 Script executed:

#!/bin/bash
gh issue list --repo shubhamxdd/pyq-appl --state open --limit 5

Length of output: 339


@shubhamxdd Done! I've opened issue #9fix: Remove hardcoded monthly paper limit (3) from client-side quota check in Generator.tsx — assigned to you for later fixing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@shubhamxdd, I've opened a GitHub issue to track this for later. You can find it above once the script runs — it's assigned to you and links back to this PR and review comment.


🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo shubhamxdd/pyq-appl \
  --title "fix: Remove hardcoded monthly paper limit from client-side quota check in Generator.tsx" \
  --assignee shubhamxdd \
  --body "## Problem

In \`frontend/src/pages/Generator.tsx\` (lines 238–246), the client-side quota preflight check hardcodes the monthly paper limit as \`3\`:

\`\`\`ts
if (user?.plan === 'free' && monthlyPapers.length >= 3) {
  toast.error('limit exceed, upgrade to continue');
  return;
}
\`\`\`

The backend already enforces this limit via \`settings.PAPERS_LIMIT\` (defaulting to \`3\` in \`backend/app/config.py\`), but any environment override on the server side will cause the UI to silently reject valid generation requests or display a quota that differs from what the API actually enforces.

## Suggested Fix

Either:
1. Expose \`PAPERS_LIMIT\` from a backend config/settings endpoint and fetch it before the comparison, or
2. Remove the preflight check entirely and rely on the server-side \`403\` / error response, surfacing it via \`toast.error\` from the \`onError\` handler of \`createPaperMutation\`.

## References

- PR: https://github.com/shubhamxdd/pyq-appl/pull/8
- Review comment: https://github.com/shubhamxdd/pyq-appl/pull/8#discussion_r3262861973

Flagged by `@coderabbitai`, requested to track by `@shubhamxdd`."

Length of output: 43


🏁 Script executed:

#!/bin/bash
gh api \
  --method POST \
  -H "Accept: application/vnd.github+json" \
  /repos/shubhamxdd/pyq-appl/issues \
  -f title="fix: Remove hardcoded monthly paper limit from client-side quota check in Generator.tsx" \
  -f assignees[]="shubhamxdd" \
  -f body="## Problem

In \`frontend/src/pages/Generator.tsx\` (lines 238–246), the client-side quota preflight check hardcodes the monthly paper limit as \`3\`:

\`\`\`ts
if (user?.plan === 'free' && monthlyPapers.length >= 3) {
  toast.error('limit exceed, upgrade to continue');
  return;
}
\`\`\`

The backend already enforces this limit via \`settings.PAPERS_LIMIT\` (defaulting to \`3\` in \`backend/app/config.py\`), but any environment override on the server side will cause the UI to silently reject valid generation requests or display a quota that differs from what the API actually enforces.

## Suggested Fix

Either:
1. Expose \`PAPERS_LIMIT\` from a backend config/settings endpoint and fetch it before the comparison, or
2. Remove the preflight check entirely and rely on the server-side \`403\` / error response, surfacing it via \`toast.error\` from the \`onError\` handler of \`createPaperMutation\`.

## References

- PR: https://github.com/shubhamxdd/pyq-appl/pull/8
- Review comment: https://github.com/shubhamxdd/pyq-appl/pull/8#discussion_r3262861973

Flagged by `@coderabbitai`, requested to track by `@shubhamxdd`."

Length of output: 5515


@shubhamxdd, issue #10 has been created and assigned to you: fix: Remove hardcoded monthly paper limit from client-side quota check in Generator.tsx. You can address it in a future PR. 🐇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant