TailorBench

An open benchmark for evaluating AI understanding of garment construction.

The Problem

Current fashion AI benchmarks test aesthetics—whether an image "looks like" a suit.

None test construction—whether AI understands how garments are actually built.

This matters because:

AI generates physically impossible garments (lapels that can't roll, impossible button stances)
Fashion recommendations lack fit understanding
No standard exists for evaluating construction knowledge
Models trained on Pinterest can't distinguish surgeon cuffs from decorative buttons

What We Test

Category	Examples	Why It Matters
Identification	Lapel types, vent styles, shoulder construction	Basic recognition
Reasoning	Why single vent suits seated clients	Understanding mechanics
Error Detection	Spotting impossible construction in AI images	Practical application

Difficulty Levels

Basic — Anyone who wears suits occasionally should know this
Intermediate — Enthusiasts, salespeople, MTM customers
Expert — Tailors, pattern makers, bespoke clients

A model that scores 90% on basic but 10% on expert doesn't understand construction. It's pattern matching surface features.

Taxonomy Overview

Jackets/Blazers

Lapels: Type, width, gorge height, roll
Shoulders: Structured, natural, roped, Neapolitan
Construction: Full canvas, half canvas, fused
Details: Vents, button stance, pockets, surgeon cuffs

Trousers

Waist: Waistband style, closures, side adjusters
Pleats: Flat front, single, double, direction
Leg: Break, cuffs, taper

Shirts

Collar: Point, spread, cutaway, button-down
Cuffs: Barrel, French, convertible
Body: Placket, yoke, back pleats

See benchmark/taxonomy/ for the complete element breakdown.

Benchmark Structure

tailorbench/
├── benchmark/
│   ├── taxonomy/           # What we test
│   │   ├── construction_elements.md
│   │   ├── difficulty_levels.md
│   │   └── question_formats.md
│   ├── dataset/            # Images and questions
│   │   ├── images/         # Jacket, trouser, shirt images
│   │   └── questions/      # JSON question files
│   ├── evaluation/         # How we score
│   │   └── metrics.md
│   └── baselines/          # Model results (coming)
└── research/
    ├── gap_analysis.md     # What's missing in fashion AI
    └── sources.md          # Reference materials

Evaluation Approach

Scoring

Overall Accuracy: Simple correct/total
Weighted Accuracy: Expert questions count 3x, Intermediate 2x, Basic 1x
Category Breakdown: Separate scores for identification, reasoning, error detection

Question Formats

Multiple choice (identification)
True/False
Count/Measure
Error detection
Open-ended reasoning
Comparison
Recommendation

See benchmark/evaluation/metrics.md for full scoring methodology.

Current Status

Taxonomy defined (50+ construction elements)
Evaluation framework (weighted scoring, 7 question formats)
Gap analysis documented
Dataset structure ready (images + JSON questions)
Dataset v0.1 population (in progress)
Baseline testing (GPT-4V, Gemini, Claude)
Public leaderboard

Why This Exists

Built by practitioners, not just researchers.

We come from production tailoring—real garments, real customers, real fit data. Fashion AI needs domain expertise that image datasets can't provide.

The gap between "looks like a suit" and "understands suit construction" is where current AI fails. This benchmark measures that gap.

Contributing

See CONTRIBUTING.md for how to help.

Priority areas:

Construction element additions (especially regional variations)
Question contributions with clear correct answers
Baseline testing on additional models
Translations (terminology varies by region)

Citation

If you use TailorBench in research:

@misc{tailorbench2025,
  title={TailorBench: A Benchmark for Garment Construction Understanding in AI},
  author={FashionX},
  year={2025},
  url={https://github.com/fashionx-ai/tailorbench}
}

License

MIT License — see LICENSE

Contact

Twitter: @fashionx112
Builder: @sudoinX

Version 0.1 — December 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TailorBench

The Problem

What We Test

Difficulty Levels

Taxonomy Overview

Jackets/Blazers

Trousers

Shirts

Benchmark Structure

Evaluation Approach

Scoring

Question Formats

Current Status

Why This Exists

Contributing

Citation

License

Contact

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark		benchmark
research		research
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

sudoingX/tailorbench

Folders and files

Latest commit

History

Repository files navigation

TailorBench

The Problem

What We Test

Difficulty Levels

Taxonomy Overview

Jackets/Blazers

Trousers

Shirts

Benchmark Structure

Evaluation Approach

Scoring

Question Formats

Current Status

Why This Exists

Contributing

Citation

License

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages