A redacted sample from a real engagement: an AI-built startup-intelligence SaaS audited ahead of its public launch. Client identifiers are masked and sensitive figures rounded — the findings, file paths and fixes are what the client received.
The platform under review is a working prototype with a real foundation: multi-source scraping, an LLM analysis pipeline, scoring, public pages and an admin panel. It is not yet a production system. Two critical issues — live credentials committed to the repository and a fallback signing key that lets production boot with a predictable secret — were on track to ship to a public launch.
The deeper pattern is typical of AI-generated codebases: the happy path works, the failure path was never written. Failed LLM analyses are marked as processed and silently dropped, schema migrations run as raw DDL inside except: pass, and the analyzer blocks its own event loop while it waits on a third-party API.
None of this is fatal. The highest-risk items add up to 10–15 hours of focused work, and the report sequences them so security lands first, reliability second, throughput third — before any attempt to scale the dataset.
Every finding carries a severity, an exact location and a concrete fix. The thirteen from this engagement:
The format is identical for every finding in the report: what it is, where it lives, why it matters for the business, and exactly how to fix it.
Session tokens are signed with SECRET_KEY. Because the config falls back to a static default when the variable is missing, any environment where it was never set signs admin sessions with a value that lives in the repository. Anyone who has read the code can mint a valid admin token.
Working credentials for the paid scraping proxy sit in a tracked JSON config. Everyone with repository access — contractors, CI, a leaked laptop — owns that account, and git history keeps the secret long after the file is “cleaned”.
The lead loop marks every URL as checked in a finally block — including when the LLM call failed on a rate limit or network error. A transient outage permanently throws away leads, and nothing in the admin shows it happened. A classic AI-generated pattern: the happy path works, the failure path was never written.
Findings are cut into a backlog a single developer can execute — priorities, business effect, hour estimates. P0 alone removes both criticals in under a working day.
| Priority | Task | Why it matters | Estimate |
|---|---|---|---|
| P0 | Remove committed credentials, rotate proxy + signing keys | Closes the account-takeover path | 1–2h |
| P0 | Enforce required production secrets, lock CORS to known origins | Admin surface no longer one mistake from open | 1–2h |
| P0 | Move the LLM call off the event loop | API stops freezing during every analysis | 1h |
| P0 | Honest lead statuses with retry handling | Transient errors stop deleting work | 2–3h |
| P1 | Parallel analyzer with bounded concurrency | Throughput scales without babysitting | 4–8h |
| P1 | Database-level dedup instead of in-memory URL sets | Stays fast past 100k leads | 2–3h |
| P1 | Cheap pre-filter before the expensive LLM call | Stops paying full price for junk leads | 6–10h |
| P2 | Move pipeline state from JSON into the database | Deploys stop erasing progress | 2–4h |
| P2 | Replace raw-DDL migrations with Alembic | Schema changes become predictable | 3–5h |
| P2 | Pipeline metrics dashboard | Bottlenecks visible before they bite | 4–6h |
A report that only lists vulnerabilities misses why prototypes stall. The full deliverable also models the unit economics of the LLM pipeline and the order in which to build toward revenue:
Eighteen sections. The findings above are roughly a third of it — the rest sequences the path from prototype to sellable product.
Every finding with location, business impact and a concrete fix. No “consider improving” filler.
P0/P1/P2 with hour estimates, cut into sprints a single developer can execute and a reviewer can verify.
LLM spend per processed record, projections at 10k / 50k / 100k scale, and hard budget guardrails.
Stabilize the pipeline, ship self-serve, scale the dataset — then layer the AI product on top.
Source-by-source terms-of-service review with explicit go / API-only / avoid calls before launch.
A three-agent pipeline with queue, retries, caching and pre-compute — specified to the file level, with estimates.
Client name, URLs and commercially sensitive figures are redacted or rounded. Findings, file paths and fixes are unchanged from the delivered report. This engagement covered a Python/FastAPI codebase — AI-generated code fails the same way in every stack, and Laravel reports follow the identical format and severity model.
Fixed price, five-day turnaround, every finding shipped with a fix.