Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- assets/evaluation_results/human_eval_meta-llama3-1-405b_text_generation/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: human_eval_meta-llama3-1-405b_text_generation
- version: 2.22.07
- display_name: human_eval_Meta-Llama3-1-405B_text_generation
- description: Meta-Llama3-1-405B run for human_eval dataset
- dataset_family: human_eval
- dataset_name: human_eval
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: text-generation
- accuracy_metric_name: pass@1
- metrics:
- accuracy: 0.853658537
- properties:
- n_shot: 0
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: None
- fewshot_split: "None"
- -----
- assets/evaluation_results/gsm8k_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: gsm8k_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: gsm8k_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for gsm8k dataset
- dataset_family: gsm8k
- dataset_name: gsm8k
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.968157695
- properties:
- n_shot: 8
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- full github
- Skip to content
- Navigation Menu
- Sign in
- Azure /
- azureml-assets
- Public
- Code
- Issues
- Pull requests 262
- Discussions
- Actions
- Projects
- Wiki
- Security
- Create Llama3.1 assets for 8B/70B/405B #3180
- Open
- SamGos93 wants to merge 1 commit into main from sagoswami/llama3_1_assets
- +1,542 −0
- Conversation 0
- Commits 1
- Checks 15
- Files changed 90
- Open
- Create Llama3.1 assets for 8B/70B/405B
- #3180
- File filter
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/boolq_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/boolq_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: boolq_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: boolq_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for boolq dataset
- dataset_family: boolq
- dataset_name: boolq
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.921406728
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/boolq_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/boolq_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: boolq_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: boolq_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for boolq dataset
- dataset_family: boolq
- dataset_name: boolq
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.908868502
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/boolq_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/boolq_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: boolq_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: boolq_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for boolq dataset
- dataset_family: boolq
- dataset_name: boolq
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.870642202
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/gsm8k_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/gsm8k_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: gsm8k_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: gsm8k_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for gsm8k dataset
- dataset_family: gsm8k
- dataset_name: gsm8k
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.968157695
- properties:
- n_shot: 8
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/gsm8k_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/gsm8k_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: gsm8k_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: gsm8k_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for gsm8k dataset
- dataset_family: gsm8k
- dataset_name: gsm8k
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.948445792
- properties:
- n_shot: 8
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/gsm8k_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/gsm8k_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: gsm8k_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: gsm8k_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for gsm8k dataset
- dataset_family: gsm8k
- dataset_name: gsm8k
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.843821077
- properties:
- n_shot: 8
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/hellaswag_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/hellaswag_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: hellaswag_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: hellaswag_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for hellaswag dataset
- dataset_family: hellaswag
- dataset_name: hellaswag
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.919637522
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/hellaswag_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/hellaswag_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: hellaswag_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: hellaswag_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for hellaswag dataset
- dataset_family: hellaswag
- dataset_name: hellaswag
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.907986457
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/hellaswag_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/hellaswag_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: hellaswag_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: hellaswag_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for hellaswag dataset
- dataset_family: hellaswag
- dataset_name: hellaswag
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.768472416
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/human_eval_meta-llama3-1-405b_text_generation/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/human_eval_meta-llama3-1-405b_text_generation/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: human_eval_meta-llama3-1-405b_text_generation
- version: 2.22.07
- display_name: human_eval_Meta-Llama3-1-405B_text_generation
- description: Meta-Llama3-1-405B run for human_eval dataset
- dataset_family: human_eval
- dataset_name: human_eval
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: text-generation
- accuracy_metric_name: pass@1
- metrics:
- accuracy: 0.853658537
- properties:
- n_shot: 0
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: None
- fewshot_split: "None"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/human_eval_meta-llama3-1-70b_text_generation/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/human_eval_meta-llama3-1-70b_text_generation/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: human_eval_meta-llama3-1-70b_text_generation
- version: 2.22.07
- display_name: human_eval_Meta-Llama3-1-70B_text_generation
- description: Meta-Llama-3.1-70B run for human_eval dataset
- dataset_family: human_eval
- dataset_name: human_eval
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: text-generation
- accuracy_metric_name: pass@1
- metrics:
- accuracy: 0.792682927
- properties:
- n_shot: 0
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: None
- fewshot_split: "None"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/human_eval_meta-llama3-1-8b_text_generation/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/human_eval_meta-llama3-1-8b_text_generation/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: human_eval_meta-llama3-1-8b_text_generation
- version: 2.22.07
- display_name: human_eval_Meta-Llama3-1-8B_text_generation
- description: Meta-Llama-3.1-8B run for human_eval dataset
- dataset_family: human_eval
- dataset_name: human_eval
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: text-generation
- accuracy_metric_name: pass@1
- metrics:
- accuracy: 0.682926829
- properties:
- n_shot: 0
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: None
- fewshot_split: "None"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_humanities_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_humanities_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_humanities_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: mmlu_humanities_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for mmlu_humanities dataset
- dataset_family: mmlu_humanities
- dataset_name: mmlu_humanities
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.817853348
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_humanities_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_humanities_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_humanities_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: mmlu_humanities_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for mmlu_humanities dataset
- dataset_family: mmlu
- dataset_name: mmlu_humanities
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.794686504
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_humanities_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_humanities_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_humanities_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: mmlu_humanities_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for mmlu_humanities dataset
- dataset_family: mmlu
- dataset_name: mmlu_humanities
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.618916047
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_other_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_other_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_other_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: mmlu_other_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for mmlu_other dataset
- dataset_family: mmlu_other
- dataset_name: mmlu_other
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.874798841
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_other_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_other_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_other_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: mmlu_other_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for mmlu_other dataset
- dataset_family: mmlu
- dataset_name: mmlu_other
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.85226907
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_other_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_other_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_other_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: mmlu_other_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for mmlu_other dataset
- dataset_family: mmlu
- dataset_name: mmlu_other
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.74026392
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- .../evaluation_results/mmlu_social_sciences_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- ...s/evaluation_results/mmlu_social_sciences_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_social_sciences_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: mmlu_social_sciences_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for mmlu_social_sciences dataset
- dataset_family: mmlu_social_sciences
- dataset_name: mmlu_social_sciences
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.897627559
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- ...s/evaluation_results/mmlu_social_sciences_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- ...ts/evaluation_results/mmlu_social_sciences_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_social_sciences_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: mmlu_social_sciences_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for mmlu_social_sciences dataset
- dataset_family: mmlu
- dataset_name: mmlu_social_sciences
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.877803055
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- ...ts/evaluation_results/mmlu_social_sciences_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_social_sciences_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_social_sciences_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: mmlu_social_sciences_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for mmlu_social_sciences dataset
- dataset_family: mmlu
- dataset_name: mmlu_social_sciences
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.76080598
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_stem_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_stem_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_stem_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: mmlu_stem_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for mmlu_stem dataset
- dataset_family: mmlu_stem
- dataset_name: mmlu_stem
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.830954646
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_stem_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_stem_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_stem_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: mmlu_stem_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for mmlu_stem dataset
- dataset_family: mmlu
- dataset_name: mmlu_stem
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.771328893
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/mmlu_stem_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/mmlu_stem_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: mmlu_stem_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: mmlu_stem_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for mmlu_stem dataset
- dataset_family: mmlu
- dataset_name: mmlu_stem
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.594988899
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "test"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/openbookqa_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/openbookqa_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: openbookqa_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: openbookqa_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for openbookqa dataset
- dataset_family: openbookqa
- dataset_name: openbookqa
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.908
- properties:
- n_shot: 10
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/openbookqa_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/openbookqa_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: openbookqa_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: openbookqa_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for openbookqa dataset
- dataset_family: openbookqa
- dataset_name: openbookqa
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.936
- properties:
- n_shot: 10
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/openbookqa_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/openbookqa_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: openbookqa_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: openbookqa_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for openbookqa dataset
- dataset_family: openbookqa
- dataset_name: openbookqa
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.852
- properties:
- n_shot: 10
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/piqa_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/piqa_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: piqa_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: piqa_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for piqa dataset
- dataset_family: piqa
- dataset_name: piqa
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.874319913
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 0.3
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/piqa_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/piqa_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: piqa_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: piqa_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for piqa dataset
- dataset_family: piqa
- dataset_name: piqa
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.861806311
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 0.3
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/piqa_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/piqa_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: piqa_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: piqa_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for piqa dataset
- dataset_family: piqa
- dataset_name: piqa
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.800870511
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 0.3
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/social_iqa_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/social_iqa_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: social_iqa_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: social_iqa_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for social_iqa dataset
- dataset_family: social_iqa
- dataset_name: social_iqa
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.796827021
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 0.3
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/social_iqa_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/social_iqa_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: social_iqa_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: social_iqa_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for social_iqa dataset
- dataset_family: social_iqa
- dataset_name: social_iqa
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.812691914
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 0.3
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/social_iqa_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/social_iqa_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: social_iqa_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: social_iqa_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for social_iqa dataset
- dataset_family: social_iqa
- dataset_name: social_iqa
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.734390993
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 0.3
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/squad_v2_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 33 changes: 33 additions & 0 deletions 33
- assets/evaluation_results/squad_v2_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,33 @@
- type: evaluationresult
- name: squad_v2_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: squad_v2_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for squad_v2 dataset
- dataset_family: squad_v2
- dataset_name: squad_v2
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: nan
- metrics:
- groundedness: 3.762426285
- relevance: 4.085930918
- GPTSimilarity: 3.082561078
- properties:
- n_shot: 2
- evaluation_sampling_ratio: 0.2
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/squad_v2_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 33 changes: 33 additions & 0 deletions 33
- assets/evaluation_results/squad_v2_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,33 @@
- type: evaluationresult
- name: squad_v2_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: squad_v2_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for squad_v2 dataset
- dataset_family: squad_v2
- dataset_name: squad_v2
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: nan
- metrics:
- groundedness: 3.65206402695871
- relevance: 3.91280539174389
- GPTSimilarity: 3.02864363942712
- properties:
- n_shot: 2
- evaluation_sampling_ratio: 0.2
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/squad_v2_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 33 changes: 33 additions & 0 deletions 33
- assets/evaluation_results/squad_v2_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,33 @@
- type: evaluationresult
- name: squad_v2_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: squad_v2_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for squad_v2 dataset
- dataset_family: squad_v2
- dataset_name: squad_v2
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: nan
- metrics:
- groundedness: 3.96545914069081
- relevance: 4.09317032040472
- GPTSimilarity: 3.01727042965459
- properties:
- n_shot: 2
- evaluation_sampling_ratio: 0.2
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- ...evaluation_results/truthfulqa_generation_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 33 changes: 33 additions & 0 deletions 33
- .../evaluation_results/truthfulqa_generation_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,33 @@
- type: evaluationresult
- name: truthfulqa_generation_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: truthfulqa_generation_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for truthfulqa_generation dataset
- dataset_family: truthfulqa_generation
- dataset_name: truthfulqa_generation
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: nan
- metrics:
- coherence: 4.88372093
- fluency: 4.729498164
- GPTSimilarity: 3.088127295
- properties:
- n_shot: 6
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- .../evaluation_results/truthfulqa_generation_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 33 changes: 33 additions & 0 deletions 33
- ...s/evaluation_results/truthfulqa_generation_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,33 @@
- type: evaluationresult
- name: truthfulqa_generation_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: truthfulqa_generation_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for truthfulqa_generation dataset
- dataset_family: truthfulqa
- dataset_name: truthfulqa_generation
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: nan
- metrics:
- coherence: 4.86658506731946
- fluency: 4.7172582619339
- GPTSimilarity: 2.96328029375765
- properties:
- n_shot: 6
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- ...s/evaluation_results/truthfulqa_generation_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 33 changes: 33 additions & 0 deletions 33
- ...ts/evaluation_results/truthfulqa_generation_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,33 @@
- type: evaluationresult
- name: truthfulqa_generation_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: truthfulqa_generation_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for truthfulqa_generation dataset
- dataset_family: truthfulqa
- dataset_name: truthfulqa_generation
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: nan
- metrics:
- coherence: 4.80048959608323
- fluency: 4.59730722154222
- GPTSimilarity: 2.59975520195838
- properties:
- n_shot: 6
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: truthfulqa_mc1_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: truthfulqa_mc1_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for truthfulqa_mc1 dataset
- dataset_family: truthfulqa_mc1
- dataset_name: truthfulqa_mc1
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.800489596
- properties:
- n_shot: 6
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: truthfulqa_mc1_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: truthfulqa_mc1_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for truthfulqa_mc1 dataset
- dataset_family: truthfulqa
- dataset_name: truthfulqa_mc1
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.768665851
- properties:
- n_shot: 6
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: truthfulqa_mc1_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: truthfulqa_mc1_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for truthfulqa_mc1 dataset
- dataset_family: truthfulqa
- dataset_name: truthfulqa_mc1
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.605875153
- properties:
- n_shot: 6
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "dev"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/winogrande_meta-llama3-1-405b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/winogrande_meta-llama3-1-405b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: winogrande_meta-llama3-1-405b_question_answering
- version: 2.22.07
- display_name: winogrande_Meta-Llama3-1-405B_question_answering
- description: Meta-Llama3-1-405B run for winogrande dataset
- dataset_family: winogrande
- dataset_name: winogrande
- model_name: Meta-Llama-3.1-405B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.867403315
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/winogrande_meta-llama3-1-70b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/winogrande_meta-llama3-1-70b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: winogrande_meta-llama3-1-70b_question_answering
- version: 2.22.07
- display_name: winogrande_Meta-Llama3-1-70B_question_answering
- description: Meta-Llama-3.1-70B run for winogrande dataset
- dataset_family: winogrande
- dataset_name: winogrande
- model_name: Meta-Llama-3.1-70B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.844514601
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- 3 changes: 3 additions & 0 deletions 3
- assets/evaluation_results/winogrande_meta-llama3-1-8b_question_answering/asset.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,3 @@
- type: evaluationresult
- spec: spec.yaml
- categories: ["EvaluationResult"]
- 31 changes: 31 additions & 0 deletions 31
- assets/evaluation_results/winogrande_meta-llama3-1-8b_question_answering/spec.yaml
- Original file line number Diff line number Diff line change
- @@ -0,0 +1,31 @@
- type: evaluationresult
- name: winogrande_meta-llama3-1-8b_question_answering
- version: 2.22.07
- display_name: winogrande_Meta-Llama3-1-8B_question_answering
- description: Meta-Llama-3.1-8B run for winogrande dataset
- dataset_family: winogrande
- dataset_name: winogrande
- model_name: Meta-Llama-3.1-8B
- model_version: "1"
- model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- relationships:
- - relationshipType: Source
- assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
- tags:
- evaluation_type: text_generation
- task: question-answering
- accuracy_metric_name: exact_match
- metrics:
- accuracy: 0.649565904
- properties:
- n_shot: 5
- evaluation_sampling_ratio: 1.0
- evaluation_split: "validation"
- fewshot_sampling_ratio: 1.0
- fewshot_split: "train"
- Footer
- © 2024 GitHub, Inc.
- Footer navigation
- Terms
- Privacy
- Security
- Status
- Docs
- Contact
- Create Llama3.1 assets for 8B/70B/405B by SamGos93 · Pull Request #3180 · Azure/azureml-assets · GitHub
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement