Recap & Resources

Data pipeline logs tensor shape and dtype at every stage

Preprocessing matches training exactly at inference (resize, normalize, channel order)

A baseline (linear, or pretrained backbone with a linear head) is measured alongside the main model

Evaluation set is drawn from the same distribution as production (scanners, cameras, lighting, user types)

Per-class accuracy tracked — not just global mAP or top-1

Confidence/precision-recall operating point chosen deliberately, not left at the default 0.5

Pretrained weights used wherever possible; training from scratch justified with data scale

For medical/industrial data: fixed physical spacing, canonical orientation, domain-appropriate normalization

Cost, latency, and memory measured per stage of the pipeline

Failure cases collected for regression testing

Key Takeaways