Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 224 additions & 0 deletions MOCK_DOCUMENTS_FOR_DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# Mock Documents for Colore Development

Development feature that returns realistic mock documents when a document doesn't exist, eliminating the need to copy production data.

## What It Does

```
Request for non-existent document → Returns realistic MOCK document
Request for created document → Returns real document from storage
```

Perfect for development: work with external documents without production data.

---

## Quick Start

### Get mock document (non-existent)
```bash
curl http://localhost:9240/document/app-1/doc-xyz
# Returns: Mock Document - doc-xyz
```

### Create real document
```bash
curl -X PUT -F "file=@test.txt" \
http://localhost:9240/document/dev-app/my-doc/test.txt
```

### Get real document
```bash
curl http://localhost:9240/document/dev-app/my-doc
# Returns: actual document (not mock)
```

### Delete
```bash
curl -X DELETE http://localhost:9240/document/dev-app/my-doc
```

---

## Enable/Disable

### Development (Enable mocks)
```bash
# Edit: docker/colore/variables.env
MOCK_DOCUMENTS_ENABLED=true
RACK_ENV=development

# Optional: Set random authors (usernames from people, comma-separated)
MOCK_DOCUMENT_AUTHORS=f.rossi,a.rodriguez,p.doe,m.smith

# Rebuild
docker-compose up --build -d colore
```

If `MOCK_DOCUMENT_AUTHORS` is not set, defaults to `Mock System`.

### Production (Mocks auto-disabled for safety)
```bash
RACK_ENV=production
# Mocks automatically disabled - no action needed
```

---

## Key Features

### Random Authors (from people)
To avoid people search error when an application in development uses the author of a Mock Document. Mock document include a **randomly selected author** from a configurable list of people usernames:

```bash
# Configuration (comma-separated usernames)
MOCK_DOCUMENT_AUTHORS=f.rossi,a.rodriguez,p.doe,m.smith
```

Each mock document will have a random author from this list in its metadata.

### Supported File Types
- `.txt` - Text
- `.pdf` - PDF structure
- `.html` - HTML
- `.json` - JSON
- `.docx` - Word document
- Others - Text fallback

### What Works

| Operation | Mock | Real |
|-----------|------|------|
| GET document | ✅ | ✅ |
| GET file | ✅ | ✅ |
| POST title | ❌ | ✅ |
| POST version | ❌ | ✅ |
| DELETE | ⏭️ Ignored | ✅ |

Mock documents are read-only by design.

---

## How It's Implemented

### Files Created
- `lib/mock_document.rb` - Generates realistic mocks
- Tests - Unit and integration tests

### Files Modified
- `lib/config.rb` - Environment detection + production safety + author list
- `lib/document.rb` - Returns mock if enabled
- `lib/app.rb` - Endpoint protections
- `config/app.yml` - Configuration + author list
- `docker/colore/variables.env` - Set MOCK_DOCUMENTS_ENABLED + MOCK_DOCUMENT_AUTHORS

### Core Logic
```ruby
# When loading a document:
1. Check if exists on disk → Return real document
2. If not exists + MOCK_DOCUMENTS_ENABLED=true → Return mock
3. If not exists + not enabled → Return 404
```

---

## Production Safety

✅ **Automatic protection** - mocks cannot be used in production

```
Environment Detection (in config.rb)
If RACK_ENV='production' → Force MOCK_DOCUMENTS_ENABLED=false
If RACK_ENV='development' → Use configured value
```

**Result:**
- Even if someone sets `MOCK_DOCUMENTS_ENABLED=true` in production by mistake
- Mocks are automatically disabled
- Application works normally (returns 404 for non-existent docs)
- No failures, no errors

---

## Ruby Integration Example

```ruby
class DocumentService
def fetch_document(app_id, doc_id)
response = HTTP.get("http://colore:9240/document/#{app_id}/#{doc_id}")
JSON.parse(response.body)
# Works with both mocks and real documents automatically
end

def create_document(app_id, doc_id, filename, file_content)
HTTP.put(
"http://colore:9240/document/#{app_id}/#{doc_id}/#{filename}",
form: { file: file_content }
)
end
end
```

---

## Current Status

✅ Running in development with mocks enabled
```bash
RACK_ENV=development
MOCK_DOCUMENTS_ENABLED=true
```

✅ Tested and verified
- Mock documents return realistic structures
- Real documents work normally
- Hybrid flow seamless
- Production safety active

---

## Common Issues

**Getting 404 instead of mock?**
→ Check `MOCK_DOCUMENTS_ENABLED=true` in `docker/colore/variables.env`

**Can't update/delete mock?**
→ Intentional - mocks are read-only. Create real document instead.

**Want to verify?**
```bash
curl http://localhost:9240/document/test-app/any-id | jq .title
# If contains "Mock Document" → returns mock
```

---

## Testing

```bash
# All tests
rspec spec/lib/mock_document_spec.rb
rspec spec/integration/mock_document_spec.rb
```

---

## Summary

- ✅ Mock documents enabled in development
- ✅ **Random authors** from configurable list
- ✅ No need to copy production database
- ✅ Hybrid flow: mocks + real documents coexist
- ✅ Automatic production safety
- ✅ Works transparently with client code
- ✅ Fully tested and working

### Example Authors Configuration
```bash
# docker/colore/variables.env
# Usernames from people microservice (comma-separated)
MOCK_DOCUMENT_AUTHORS=f.rossi,a.rodriguez,p.doe,m.smith,j.williams
```

Each mock document will randomly assign one of these usernames as metadata.
3 changes: 3 additions & 0 deletions config/app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@ wkhtmltopdf_path: <%= ENV['WKHTMLTOPDF_PATH'] %>
# Other settings
tika_config_directory: <%= ENV['TIKA_CONFIG_DIRECTORY'] %>
wkhtmltopdf_params: '-d 100 --encoding UTF-8'

# Development settings - Enable mock documents when document not found
mock_documents_enabled: <%= ENV.fetch('MOCK_DOCUMENTS_ENABLED', 'false') == 'true' %>
4 changes: 4 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ services:
- ./docker/colore/variables.env
environment:
RACK_ENV: development
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- colore
ports:
Expand Down Expand Up @@ -44,6 +46,8 @@ services:
- ./docker/colore/variables.env
environment:
RACK_ENV: development
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- colore
restart: on-failure
Expand Down
8 changes: 6 additions & 2 deletions docker/colore/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,25 @@ RUN apt-get update && apt-get -yq install --no-install-suggests --no-install-rec

# Needed to get the latest libreoffice
# Ref: https://wiki.debian.org/LibreOffice#Using_Debian_backports
RUN echo 'deb https://deb.debian.org/debian bullseye-backports main contrib non-free' >> /etc/apt/sources.list
# Note: Bullseye backports moved to archive.debian.org
RUN echo 'deb [trusted=yes] https://archive.debian.org/debian bullseye-backports main contrib non-free' >> /etc/apt/sources.list

# Needed for Tesseract 5
# Ref: https://notesalexp.org/tesseract-ocr/html/
RUN echo 'deb https://notesalexp.org/tesseract-ocr5/bullseye bullseye main' >> /etc/apt/sources.list
RUN wget -qO /etc/apt/trusted.gpg.d/alexp_key.asc https://notesalexp.org/debian/alexp_key.asc

# Allow unauthenticated packages from archived repository
RUN echo 'Acquire::Check-Valid-Until "false";' > /etc/apt/apt.conf.d/90ignore-release-date

RUN apt-get update && apt-get -yq -t bullseye-backports install \
libreoffice \
tesseract-ocr \
tesseract-ocr-ara \
tesseract-ocr-fra \
tesseract-ocr-spa

ARG TIKA_VERSION=3.2.2
ARG TIKA_VERSION=3.2.3

RUN wget --quiet https://dlcdn.apache.org/tika/KEYS -O tika-keys && \
wget --quiet https://dlcdn.apache.org/tika/${TIKA_VERSION}/tika-app-${TIKA_VERSION}.jar.asc -O tika-app.jar.asc && \
Expand Down
10 changes: 10 additions & 0 deletions docker/colore/variables.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,13 @@ LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_ALL=C.UTF-8
REDIS_URL=redis://redis:6379/4

# Development settings - Enable mock documents when document not found
# Set to 'true' to enable mock documents for development without copying production data
MOCK_DOCUMENTS_ENABLED=false

# List of authors for mock documents (comma-separated usernames from people microservice)
# A random author from this list will be assigned to each mock document
# Example: 'f.rossi,a.rodriguez,p.doe'
# Default: 'Mock System'
MOCK_DOCUMENT_AUTHORS=f.rossi,a.rodriguez,p.doe
Loading