docs: update PROJECT_STATE.md with current status and next steps
This commit is contained in:
+110
-70
@@ -8,56 +8,73 @@ Runs as a Docker Compose stack managed via Portainer, exposed externally via Clo
|
||||
|
||||
---
|
||||
|
||||
## Build Progress
|
||||
## Current Status
|
||||
|
||||
### ✅ Steps 1-7 — Complete
|
||||
All backend, frontend, and deployment steps are complete. App is live and accessible.
|
||||
**App is live at:** http://commander.bussenet.ca (HTTP locally) / https://commander.bussenet.ca (via Cloudflare)
|
||||
|
||||
- ✅ Login works
|
||||
- ✅ Admin panel works
|
||||
- ✅ Collection import endpoints exist
|
||||
- ❌ Deck generation failing (see Active Issues below)
|
||||
|
||||
---
|
||||
|
||||
## Live Deployment
|
||||
## Active Issues
|
||||
|
||||
- **URL:** https://commander.bussenet.ca
|
||||
- **Admin login:** busse.daniel@gmail.com / Admin1234
|
||||
- **Portainer stack:** commander-forge (ID: 54)
|
||||
- **Stack type:** Pre-built images only — no build directives
|
||||
### 1. Deck Generation JSON Parse Failure
|
||||
Claude returns a response structured as `commander` + `deck`/`decklist` instead of the required `deck_name` + `strategy_summary` + `cards`. Additionally the response was being truncated due to insufficient `max_tokens`.
|
||||
|
||||
**Fixes applied:**
|
||||
- `max_tokens` increased to 16000 in `deck_service.py`
|
||||
- `_build_payload` updated to accept `decklist`/`deck` as fallback keys
|
||||
- `_parse_json` updated with multi-stage parsing and fallback extraction
|
||||
- System prompt strengthened with explicit JSON structure requirements
|
||||
|
||||
**Current state:** max_tokens fix is confirmed in the running image (grep shows GENERATE_MAX_TOKENS=16000 at line 15). Not yet confirmed working end-to-end due to deploy pipeline issues.
|
||||
|
||||
### 2. Cloudflare 100s Timeout
|
||||
Claude API calls take 30-60 seconds. Cloudflare free tier imposes a 100s limit. With max_tokens=16000, responses may take longer and hit this limit.
|
||||
|
||||
**Planned fix:** Implement async deck generation — return job ID immediately, frontend polls for result.
|
||||
|
||||
### 3. Deployment Pipeline (Root Cause of Most Pain)
|
||||
Building Docker images directly from git URLs (`docker build http://gitea/...`) uses Docker's internal git cache which frequently serves stale code even with `--no-cache --pull`. This caused multiple "fix applied but not running" cycles.
|
||||
|
||||
**Planned fix:** Set up CI/CD webhook — Gitea push triggers server script that clones fresh, builds from local filesystem, restarts container. This is the **top priority for the next session**.
|
||||
|
||||
---
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
### Stack
|
||||
- Portainer stack ID: **54** (commander-forge)
|
||||
- All services use pre-built images — no `build:` directives in compose file
|
||||
- Images: `commander-forge-nginx:latest`, `commander-forge-frontend:latest`, `commander-forge-backend:latest`
|
||||
|
||||
### Networking
|
||||
- Cloudflare Tunnel → `http://localhost:80` → Traefik → nginx (`traefik-public` network)
|
||||
- Traefik routes by hostname label using `traefik.docker.network=traefik-public`
|
||||
- All other services (backend, db, cache, frontend) on `commander-forge_internal` network
|
||||
- Cloudflared runs on host network — must use `localhost` not container hostnames
|
||||
- `traefik.docker.network=traefik-public` label required on nginx
|
||||
- All other services on `commander-forge_internal` network
|
||||
- Cloudflared on host network — all ingress uses `localhost` not container IPs
|
||||
- Cloudflare region2 (`198.41.200.x`) unreachable — ISP routing issue, outside our control
|
||||
|
||||
### Image Management
|
||||
All three custom images are built manually on the server and Portainer uses pre-built images:
|
||||
### Manual Build Commands (current process — to be replaced by CI/CD)
|
||||
```bash
|
||||
sudo docker build -t commander-forge-nginx:latest --no-cache --pull http://192.168.0.62:3001/Dan/Commander-Deck-App.git#master:nginx
|
||||
sudo docker build -t commander-forge-frontend:latest --no-cache http://192.168.0.62:3001/Dan/Commander-Deck-App.git#master:frontend
|
||||
sudo docker build -t commander-forge-backend:latest --no-cache --pull http://192.168.0.62:3001/Dan/Commander-Deck-App.git#master:backend
|
||||
# Always build from GitHub to avoid Gitea git cache issues
|
||||
sudo docker rmi commander-forge-backend -f
|
||||
sudo docker build -t commander-forge-backend:latest --no-cache --pull "https://github.com/danbusse/Commander-Deck-App.git#master:backend"
|
||||
sudo docker restart commander-forge-backend-1
|
||||
|
||||
sudo docker rmi commander-forge-frontend -f
|
||||
sudo docker build -t commander-forge-frontend:latest --no-cache "https://github.com/danbusse/Commander-Deck-App.git#master:frontend"
|
||||
sudo docker restart commander-forge-frontend-1
|
||||
|
||||
sudo docker rmi commander-forge-nginx -f
|
||||
sudo docker build -t commander-forge-nginx:latest --no-cache --pull "https://github.com/danbusse/Commander-Deck-App.git#master:nginx"
|
||||
sudo docker restart commander-forge-nginx-1
|
||||
```
|
||||
|
||||
After rebuilding, redeploy the stack in Portainer to pick up the new images.
|
||||
|
||||
### Why pre-built images?
|
||||
Portainer's repository-based builds aggressively cache the git source. Even with `--no-cache --pull`, Portainer's internal git clone cache serves stale code. Pre-built images bypass this entirely.
|
||||
|
||||
---
|
||||
|
||||
## Known Fixes Applied
|
||||
|
||||
- `passlib` replaced with `bcrypt==4.1.3` in requirements.txt and security.py
|
||||
- `npm ci` replaced with `npm install` in frontend Dockerfile (Windows-generated lockfile missing Linux binaries)
|
||||
- `package-lock.json` added to repo
|
||||
- nginx baked into its own image via `nginx/Dockerfile` (Portainer pre-creates volume mount paths as directories)
|
||||
- `docker-compose.yml` uses Traefik labels + `traefik.docker.network=traefik-public`
|
||||
- `DATABASE_URL` and `REDIS_URL` passed explicitly as env vars
|
||||
- `UserRole` enum members renamed to lowercase (`pending/approved/admin`) to match database values
|
||||
- `admin_bootstrap.py` and `deps.py` updated to use lowercase `UserRole.admin`, `UserRole.pending`, etc.
|
||||
- Portainer git source caches aggressively — always use `--no-cache --pull` and pre-built images
|
||||
**IMPORTANT:** Always build from GitHub URL, not Gitea. Gitea has persistent git cache issues.
|
||||
|
||||
---
|
||||
|
||||
@@ -73,16 +90,38 @@ Portainer's repository-based builds aggressively cache the git source. Even with
|
||||
| Portainer MCP | https://mcp-portainer.bussenet.ca/sse | Custom image with entrypoint fix |
|
||||
|
||||
### Portainer MCP
|
||||
Custom `mcp-portainer:latest` image built from `ghcr.io/serraniel/portainer-mcp-docker:http` with a fixed entrypoint that passes `--` before the portainer-mcp command. Tools written to `/tmp/tools.yaml`. PORTAINER_SERVER must be set without protocol prefix (e.g. `192.168.0.62:9443`) since the MCP binary always prepends `https://`.
|
||||
|
||||
### Cloudflare Tunnel
|
||||
- Tunnel ID: `3a032a2b-aa42-46a2-b749-e3fa9166ac59`
|
||||
- Only region1 (`198.41.192.x`) is reachable — region2 (`198.41.200.x`) times out (ISP routing issue, outside our control)
|
||||
- Tunnel maintains 1 of 4 connections; services are accessible but logs show constant reconnect attempts
|
||||
- Cloudflared runs on host network — all tunnel ingress entries use `localhost` not container IPs
|
||||
Custom `mcp-portainer:latest` image built from `ghcr.io/serraniel/portainer-mcp-docker:http`.
|
||||
- Fixed entrypoint passes `--` before portainer-mcp command
|
||||
- Tools written to `/tmp/tools.yaml`
|
||||
- PORTAINER_SERVER set to `192.168.0.62:9443` (no protocol prefix — binary prepends https://)
|
||||
- Rebuild command: `sudo docker build -t mcp-portainer:latest ~/portainer-mcp-build/`
|
||||
|
||||
### GitHub API Access
|
||||
Claude can read/write files to the GitHub mirror using token stored in Vault at `secret/github.claude-api-token`. This is the primary mechanism for Claude to update project files between sessions.
|
||||
Claude reads/writes files via GitHub API using token in Vault at `secret/github.claude-api-token`.
|
||||
This is Claude's primary mechanism for updating project files between sessions.
|
||||
|
||||
### Git Workflow
|
||||
Two remotes configured:
|
||||
- `origin` → Gitea (`ssh://git@192.168.0.62:2222/Dan/Commander-Deck-App.git`)
|
||||
- `github` → GitHub (`https://github.com/danbusse/Commander-Deck-App.git`)
|
||||
|
||||
Always push to both: `git push origin master && git push github master`
|
||||
|
||||
---
|
||||
|
||||
## Known Fixes Applied
|
||||
|
||||
| Issue | Fix | File |
|
||||
|-------|-----|------|
|
||||
| passlib incompatible with bcrypt 4.x | Replaced with `bcrypt==4.1.3` | `requirements.txt`, `security.py` |
|
||||
| npm ci fails on Linux | Changed to `npm install` | `frontend/Dockerfile` |
|
||||
| Portainer volume mount creates directory | Baked nginx config into image | `nginx/Dockerfile` |
|
||||
| Traefik routing wrong network | Added `traefik.docker.network=traefik-public` label | `docker-compose.yml` |
|
||||
| UserRole enum uppercase/lowercase mismatch | Renamed members to lowercase (`pending/approved/admin`) | `user.py`, `admin_bootstrap.py`, `deps.py`, `admin.py` |
|
||||
| Missing DATABASE_URL/REDIS_URL | Passed explicitly in stack env vars | Portainer stack |
|
||||
| JSON truncation in deck generation | Increased max_tokens to 16000 | `deck_service.py` |
|
||||
| Claude returns wrong JSON structure | Added fallback key handling + multi-stage parser | `claude_client.py` |
|
||||
| Archidekt JSON crash on missing set code | Added `or ""` before `.lower()` | `archidekt.py` |
|
||||
|
||||
---
|
||||
|
||||
@@ -102,40 +141,41 @@ Claude can read/write files to the GitHub mirror using token stored in Vault at
|
||||
|
||||
---
|
||||
|
||||
## Git Workflow
|
||||
|
||||
Two remotes are configured:
|
||||
- `origin` → Gitea (`ssh://git@192.168.0.62:2222/Dan/Commander-Deck-App.git`)
|
||||
- `github` → GitHub (`https://github.com/danbusse/Commander-Deck-App.git`)
|
||||
|
||||
Always push to both:
|
||||
## Test Suite
|
||||
Located at `backend/tests/`. Run with:
|
||||
```bash
|
||||
git push origin master
|
||||
git push github master
|
||||
cd /tmp/Commander-Deck-App/backend
|
||||
pip install -r requirements.txt --break-system-packages
|
||||
pytest tests/ -v
|
||||
```
|
||||
56 tests, all passing. Covers: claude_client parsing, constraints, archidekt/manabox importers, UserRole enum.
|
||||
|
||||
---
|
||||
|
||||
## Next Session — Start Here
|
||||
|
||||
**Priority 1 — Harden credentials:**
|
||||
- Generate new SECRET_KEY: `openssl rand -hex 32`
|
||||
- Set strong POSTGRES_PASSWORD
|
||||
- Update DATABASE_URL to match
|
||||
- Update these in Portainer stack env vars
|
||||
### Priority 1 — Set up CI/CD webhook (DO THIS FIRST)
|
||||
The manual build process is unreliable due to Docker git source caching. Set up a Gitea webhook that triggers a deploy script on the server on every push to master.
|
||||
|
||||
**Priority 2 — Test end to end:**
|
||||
- Try building a deck via Generate mode
|
||||
- Test collection import with an Archidekt export
|
||||
- Verify admin approval flow for a new registered user
|
||||
Basic approach:
|
||||
1. Create a deploy script on the server (`/home/dan/deploy.sh`) that:
|
||||
- `git clone` or `git pull` from Gitea into a temp directory
|
||||
- `docker build` from local filesystem (not git URL)
|
||||
- `docker restart` the affected container
|
||||
2. Set up a simple webhook receiver (e.g. a small Python/bash HTTP server or use Gitea's built-in webhook with a tool like `webhook`)
|
||||
3. Configure Gitea to POST to the webhook on push to master
|
||||
|
||||
**Priority 3 — Commit docker-compose.yml changes to repo:**
|
||||
The docker-compose.yml in the repo still has `build:` directives. Update it to use `image:` directives to match the actual deployment approach:
|
||||
```yaml
|
||||
backend:
|
||||
image: commander-forge-backend:latest
|
||||
frontend:
|
||||
image: commander-forge-frontend:latest
|
||||
nginx:
|
||||
image: commander-forge-nginx:latest
|
||||
```
|
||||
### Priority 2 — Confirm deck generation works
|
||||
Once CI/CD is in place and we can deploy reliably, test deck generation with the max_tokens=16000 fix.
|
||||
|
||||
### Priority 3 — Async deck generation
|
||||
If deck generation still hits Cloudflare's 100s timeout, implement async pattern:
|
||||
- POST /generate returns job ID immediately
|
||||
- Background task runs Claude call
|
||||
- Frontend polls GET /decks/{id}/status until complete
|
||||
|
||||
### Priority 4 — Harden credentials
|
||||
- `SECRET_KEY` → `openssl rand -hex 32`
|
||||
- `POSTGRES_PASSWORD` → strong password
|
||||
- Update `DATABASE_URL` to match
|
||||
- Update in Portainer stack env vars
|
||||
|
||||
Reference in New Issue
Block a user