v0.4.0 — Sentinel Evaluation Engine, New Models, Workflow Pipelines
Automated quality scoring is live. Plus: Flux.2 Klein support, multi-step workflow builder, and 3 new benchmark categories.
v0.4.0 is our biggest release yet. The headline feature is Sentinel, our automated model evaluation engine, but there is a lot more packed into this one. Here is the full breakdown.
Sentinel Evaluation Engine
Every output generated through Runflow is now automatically scored across three dimensions: FID (distributional similarity), CLIP (prompt alignment), and human eval calibration. Scores are weighted per niche, so a corporate headshot is evaluated differently than a creative portrait or a product photo.
This powers our benchmark tables and gives customers real-time quality metrics in the dashboard. Read the full technical deep-dive in Building Sentinel: Our Automated Model Evaluation System.
New Models
- Flux.2 Klein — A lightweight variant of Flux.2 optimized for speed. 2x faster than Flux.2 [schnell] with only a 3-point quality tradeoff. Ideal for real-time preview generation and interactive applications.
- SDXL Lightning v2 — Updated 4-step distilled model with improved face coherence. Scores 3 points higher than v1 on our portrait benchmark while maintaining the same latency profile.
Workflow Pipelines
You can now chain multiple operations into a single API call. Define a pipeline that generates an image, scores it, and enhances it, all in one request with a single webhook callback.
const result = await runflow.workflow({
steps: [
{ action: "generate", model: "flux.2-dev", prompt },
{ action: "score", niche: "corporate-headshot" },
{ action: "enhance", model: "real-esrgan-x4" }
]
});New Benchmark Categories
Sentinel now tracks three additional niche categories:
- E-commerce product shots — Scoring optimized for white-background product photography, edge clarity, and color accuracy
- Creative portraits — Artistic style generation with emphasis on aesthetic quality and prompt creativity adherence
- Virtual try-on — Clothing overlay accuracy, body proportion preservation, and garment texture realism
Bug Fixes
- Fixed an issue where webhook callbacks would occasionally fire before the image was fully uploaded to CDN, resulting in 404s on the image URL
- Resolved a race condition in the async job queue that could cause duplicate processing of the same request under high concurrency
- Fixed EXIF orientation handling for uploaded reference images that were being rotated incorrectly on iOS Safari uploads
- Corrected timeout handling for long-running SDXL inpainting jobs that exceeded the default 30-second window
Update to v0.4.0 by running npm update @runflow/sdk. The Sentinel scoring API is available immediately on all plans. Workflow pipelines are in beta and available on Pro and Enterprise.
Want custom benchmarks for your workload?
We'll run our evaluation pipeline against your production data, for free.
Talk to Founders