Evaluation Benchmark for Mamba2 3B model

These are the evaluation benchmarks for **Mamba2 3B** model for the `640k` checkpoint. 

For Majorities of the base scores: 

| Metric             | Value       |
|--------------------|-------------|
| winogrande         | 0.700868    |
| truthfulqa_mc2         | 0.362402     |
| social_iqa         | 0.326510       |
| sciq               | 0.926    |
| piqa               | 0.800326       |
| openbookqa         | 0.436    |
| lambada            | 4.382999    |
| lambada_openai     | 4.046894    |
| lambada_standard   | 4.719104    |
| hellaswag          | 0.758415        |
| copa               | 0.85    |
| boolq              | 0.717125    |
| arc_easy           | 0.709175    |
| arc_challenge      | 0.420648    |


For MMLU:

| Model         | MMLU  |
|---------------|-------|
| mamba2_3b_640k| 0.411409 |


Benchmarks for previous checkpoints can be found [here](https://github.com/foundation-model-stack/avengers/issues/4#issue-2430025785). 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation Benchmark for Mamba2 3B model #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Value
winogrande	0.700868
truthfulqa_mc2	0.362402
social_iqa	0.326510
sciq	0.926
piqa	0.800326
openbookqa	0.436
lambada	4.382999
lambada_openai	4.046894
lambada_standard	4.719104
hellaswag	0.758415
copa	0.85
boolq	0.717125
arc_easy	0.709175
arc_challenge	0.420648

Evaluation Benchmark for Mamba2 3B model #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions