Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
- |---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
- |mmlu | 1|none | |acc |↑ |0.2305|± |0.0035|
- | - humanities | 1|none | |acc |↑ |0.2427|± |0.0062|
- | - formal_logic | 0|none | 0|acc |↑ |0.2778|± |0.0401|
- | - high_school_european_history | 0|none | 0|acc |↑ |0.2242|± |0.0326|
- | - high_school_us_history | 0|none | 0|acc |↑ |0.2500|± |0.0304|
- | - high_school_world_history | 0|none | 0|acc |↑ |0.2700|± |0.0289|
- | - international_law | 0|none | 0|acc |↑ |0.2397|± |0.0390|
- | - jurisprudence | 0|none | 0|acc |↑ |0.2685|± |0.0428|
- | - logical_fallacies | 0|none | 0|acc |↑ |0.2209|± |0.0326|
- | - moral_disputes | 0|none | 0|acc |↑ |0.2486|± |0.0233|
- | - moral_scenarios | 0|none | 0|acc |↑ |0.2413|± |0.0143|
- | - philosophy | 0|none | 0|acc |↑ |0.1865|± |0.0221|
- | - prehistory | 0|none | 0|acc |↑ |0.2160|± |0.0229|
- | - professional_law | 0|none | 0|acc |↑ |0.2458|± |0.0110|
- | - world_religions | 0|none | 0|acc |↑ |0.3158|± |0.0357|
- | - other | 1|none | |acc |↑ |0.2414|± |0.0077|
- | - business_ethics | 0|none | 0|acc |↑ |0.3100|± |0.0465|
- | - clinical_knowledge | 0|none | 0|acc |↑ |0.2151|± |0.0253|
- | - college_medicine | 0|none | 0|acc |↑ |0.2139|± |0.0313|
- | - global_facts | 0|none | 0|acc |↑ |0.1800|± |0.0386|
- | - human_aging | 0|none | 0|acc |↑ |0.3139|± |0.0311|
- | - management | 0|none | 0|acc |↑ |0.1748|± |0.0376|
- | - marketing | 0|none | 0|acc |↑ |0.2991|± |0.0300|
- | - medical_genetics | 0|none | 0|acc |↑ |0.2900|± |0.0456|
- | - miscellaneous | 0|none | 0|acc |↑ |0.2375|± |0.0152|
- | - nutrition | 0|none | 0|acc |↑ |0.2157|± |0.0236|
- | - professional_accounting | 0|none | 0|acc |↑ |0.2340|± |0.0253|
- | - professional_medicine | 0|none | 0|acc |↑ |0.2022|± |0.0244|
- | - virology | 0|none | 0|acc |↑ |0.2831|± |0.0351|
- | - social sciences | 1|none | |acc |↑ |0.2174|± |0.0074|
- | - econometrics | 0|none | 0|acc |↑ |0.2368|± |0.0400|
- | - high_school_geography | 0|none | 0|acc |↑ |0.1768|± |0.0272|
- | - high_school_government_and_politics| 0|none | 0|acc |↑ |0.1969|± |0.0287|
- | - high_school_macroeconomics | 0|none | 0|acc |↑ |0.2026|± |0.0204|
- | - high_school_microeconomics | 0|none | 0|acc |↑ |0.2101|± |0.0265|
- | - high_school_psychology | 0|none | 0|acc |↑ |0.1927|± |0.0169|
- | - human_sexuality | 0|none | 0|acc |↑ |0.2595|± |0.0384|
- | - professional_psychology | 0|none | 0|acc |↑ |0.2516|± |0.0176|
- | - public_relations | 0|none | 0|acc |↑ |0.2182|± |0.0396|
- | - security_studies | 0|none | 0|acc |↑ |0.1878|± |0.0250|
- | - sociology | 0|none | 0|acc |↑ |0.2438|± |0.0304|
- | - us_foreign_policy | 0|none | 0|acc |↑ |0.2800|± |0.0451|
- | - stem | 1|none | |acc |↑ |0.2141|± |0.0073|
- | - abstract_algebra | 0|none | 0|acc |↑ |0.2200|± |0.0416|
- | - anatomy | 0|none | 0|acc |↑ |0.1852|± |0.0336|
- | - astronomy | 0|none | 0|acc |↑ |0.1711|± |0.0306|
- | - college_biology | 0|none | 0|acc |↑ |0.2778|± |0.0375|
- | - college_chemistry | 0|none | 0|acc |↑ |0.2000|± |0.0402|
- | - college_computer_science | 0|none | 0|acc |↑ |0.2400|± |0.0429|
- | - college_mathematics | 0|none | 0|acc |↑ |0.2100|± |0.0409|
- | - college_physics | 0|none | 0|acc |↑ |0.2157|± |0.0409|
- | - computer_security | 0|none | 0|acc |↑ |0.3100|± |0.0465|
- | - conceptual_physics | 0|none | 0|acc |↑ |0.2638|± |0.0288|
- | - electrical_engineering | 0|none | 0|acc |↑ |0.2414|± |0.0357|
- | - elementary_mathematics | 0|none | 0|acc |↑ |0.2143|± |0.0211|
- | - high_school_biology | 0|none | 0|acc |↑ |0.1806|± |0.0219|
- | - high_school_chemistry | 0|none | 0|acc |↑ |0.1626|± |0.0260|
- | - high_school_computer_science | 0|none | 0|acc |↑ |0.2500|± |0.0435|
- | - high_school_mathematics | 0|none | 0|acc |↑ |0.2000|± |0.0244|
- | - high_school_physics | 0|none | 0|acc |↑ |0.1921|± |0.0322|
- | - high_school_statistics | 0|none | 0|acc |↑ |0.1528|± |0.0245|
- | - machine_learning | 0|none | 0|acc |↑ |0.3214|± |0.0443|
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement