Skip to content

Evaluation Benchmarks for Llama3 3B model #2

@divya-kumari32

Description

@divya-kumari32

This table displays the evaluation benchmarks for Llama3 3B model trained on 1T tokens.

Metric llama3_3b_1t_100k llama3_3b_1t_200k llama3_3b_1t_300k llama3_3b_1t_400k llama3_3b_1t_500k llama3_3b_1t_600k llama3_3b_1t_700k llama3_3b_1t_800k llama3_3b_1t_900k llama3_3b_1t_1000k
winogrande 0.606156 0.599053 0.610103 0.617995 0.617995 0.636148 0.633781 0.639305 0.647987 0.657459
truthfulqa_mc2 0.346594 0.366435 0.404596 0.369011 0.426658 0.364889 0.394947 0.388509 0.350430 0.393013
social_iqa 0.325998 0.327533 0.329069 0.329069 0.328045 0.325486 0.329580 0.333675 0.320880 0.318321
sciq 0.860 0.878 0.854 0.864 0.866 0.882 0.875 0.901 0.884 0.889
piqa 0.731774 0.738303 0.751360 0.749184 0.764418 0.756801 0.766594 0.771491 0.772579 0.762242
openbookqa 0.410 0.410 0.418 0.408 0.430 0.426 0.448 0.450 0.456 0.438
lambada 23.339747 18.346882 14.713008 14.887917 13.840738 12.946245 11.497174 11.167095 10.157396 11.664451
lambada_openai 15.943495 14.2658622 12.675483 12.241740 10.795581 10.252835 9.736051 8.705719 8.582805 9.018882
lambada_standard 30.735998 22.427902 16.750534 17.534093 16.885896 15.639655 13.258297 13.628472 11.731986 14.310020
hellaswag 0.588229 0.608146 0.621291 0.625075 0.638120 0.648277 0.659928 0.671878 0.677455 0.677156
copa 0.77 0.76 0.78 0.78 0.76 0.82 0.80 0.82 0.83 0.82
boolq 0.633639 0.582263 0.575535 0.652294 0.612538 0.682263 0.664526 0.665138 0.681651 0.694190
arc_easy 0.689815 0.697811 0.699074 0.688131 0.697811 0.726431 0.727273 0.737374 0.753367 0.730219
arc_challenge 0.401024 0.403584 0.421502 0.408703 0.430887 0.460751 0.441980 0.467577 0.488908 0.461604
mmlu 0.257656 0.247828 0.261003 0.258154 0.270047 0.243341 0.273821 0.261216 0.257157 0.293263

Trend (plot format):

arc_challenge
arc_easy
boolq
copa
hellaswag
lambada_openai
lambada_standard
lambada
mmlu
openbookqa
piqa
sciq
social_iqa
truthfulqa_mc2
winogrande

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions