Simplify Calibration Data for qmodel_prep

## Is your feature request related to a problem? Please describe.

Currently, qmodel_prep supports various datatypes for model input data for calibration under the hood.  However, it is very confusing for users to figure out exactly what type of data qmodel_prep expects and how to pack it.

Furthermore, we don't have any tests currently that check this part of the code base for consistency, bugs, etc.  We also don't have any documentation/examples for users to prepare custom calibration data.  We have some examples for popular datasets, but it may not be enough.

## Describe the solution you'd like

A new entry in the quantized config like `qcfg["calibration_dtype"]` that holds the type of data you are passing in.  This could be a list, dictionary, Tokenizer, BatchEncoding, DataLoader, etc.

We should support a simplified list of calibration data types and offer ways internally of converting them to qmodel_prep()'s desired process.  If a datatype is passed that we don't support, we should throw an error and give instructions on custom calibration datatype preparation.  This functionality should exist in a single place, rather than being scattered among various functions as it is today.

As for testing, we should have a test harness that is capable of checking that any supported type is consumed properly, and any unsupported type throws an error.  We should also have tests that ensure that the data is arranged properly for the model.

For documentation, we should have a README for instructions on how to get the data into form that qmodel_prep will accept.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify Calibration Data for qmodel_prep #123

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simplify Calibration Data for qmodel_prep #123

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions