Advertisement
Guest User

Untitled

a guest
Jul 22nd, 2024
280
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 61.91 KB | None | 0 0
  1. assets/evaluation_results/human_eval_meta-llama3-1-405b_text_generation/spec.yaml
  2. Original file line number Diff line number Diff line change
  3. @@ -0,0 +1,31 @@
  4. type: evaluationresult
  5. name: human_eval_meta-llama3-1-405b_text_generation
  6. version: 2.22.07
  7. display_name: human_eval_Meta-Llama3-1-405B_text_generation
  8. description: Meta-Llama3-1-405B run for human_eval dataset
  9. dataset_family: human_eval
  10. dataset_name: human_eval
  11.  
  12. model_name: Meta-Llama-3.1-405B
  13. model_version: "1"
  14. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  15.  
  16. relationships:
  17. - relationshipType: Source
  18. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  19.  
  20. tags:
  21. evaluation_type: text_generation
  22. task: text-generation
  23. accuracy_metric_name: pass@1
  24.  
  25. metrics:
  26. accuracy: 0.853658537
  27.  
  28.  
  29. properties:
  30. n_shot: 0
  31. evaluation_sampling_ratio: 1.0
  32. evaluation_split: "test"
  33. fewshot_sampling_ratio: None
  34. fewshot_split: "None"
  35.  
  36.  
  37. -----
  38.  
  39. assets/evaluation_results/gsm8k_meta-llama3-1-405b_question_answering/spec.yaml
  40. Original file line number Diff line number Diff line change
  41. @@ -0,0 +1,31 @@
  42. type: evaluationresult
  43. name: gsm8k_meta-llama3-1-405b_question_answering
  44. version: 2.22.07
  45. display_name: gsm8k_Meta-Llama3-1-405B_question_answering
  46. description: Meta-Llama3-1-405B run for gsm8k dataset
  47. dataset_family: gsm8k
  48. dataset_name: gsm8k
  49.  
  50. model_name: Meta-Llama-3.1-405B
  51. model_version: "1"
  52. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  53.  
  54. relationships:
  55. - relationshipType: Source
  56. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  57.  
  58. tags:
  59. evaluation_type: text_generation
  60. task: question-answering
  61. accuracy_metric_name: exact_match
  62.  
  63. metrics:
  64. accuracy: 0.968157695
  65.  
  66.  
  67. properties:
  68. n_shot: 8
  69. evaluation_sampling_ratio: 1.0
  70. evaluation_split: "test"
  71. fewshot_sampling_ratio: 1.0
  72. fewshot_split: "dev"
  73.  
  74.  
  75.  
  76.  
  77.  
  78.  
  79.  
  80.  
  81.  
  82.  
  83.  
  84.  
  85.  
  86.  
  87.  
  88.  
  89.  
  90.  
  91.  
  92.  
  93.  
  94.  
  95.  
  96.  
  97.  
  98.  
  99.  
  100.  
  101.  
  102.  
  103.  
  104.  
  105.  
  106.  
  107.  
  108.  
  109.  
  110.  
  111.  
  112.  
  113.  
  114.  
  115.  
  116.  
  117.  
  118.  
  119.  
  120.  
  121.  
  122.  
  123.  
  124.  
  125.  
  126.  
  127.  
  128.  
  129.  
  130.  
  131.  
  132.  
  133.  
  134.  
  135.  
  136.  
  137.  
  138.  
  139.  
  140.  
  141.  
  142.  
  143.  
  144.  
  145.  
  146.  
  147.  
  148.  
  149.  
  150.  
  151.  
  152.  
  153.  
  154.  
  155.  
  156.  
  157.  
  158.  
  159.  
  160.  
  161.  
  162.  
  163.  
  164.  
  165. full github
  166.  
  167.  
  168. Skip to content
  169. Navigation Menu
  170. Sign in
  171.  
  172. Azure /
  173. azureml-assets
  174. Public
  175.  
  176. Code
  177. Issues
  178. Pull requests 262
  179. Discussions
  180. Actions
  181. Projects
  182. Wiki
  183. Security
  184.  
  185. Create Llama3.1 assets for 8B/70B/405B #3180
  186. Open
  187. SamGos93 wants to merge 1 commit into main from sagoswami/llama3_1_assets
  188. +1,542 −0
  189. Conversation 0
  190. Commits 1
  191. Checks 15
  192. Files changed 90
  193. Open
  194. Create Llama3.1 assets for 8B/70B/405B
  195. #3180
  196. File filter
  197.  
  198. 3 changes: 3 additions & 0 deletions 3
  199. assets/evaluation_results/boolq_meta-llama3-1-405b_question_answering/asset.yaml
  200. Original file line number Diff line number Diff line change
  201. @@ -0,0 +1,3 @@
  202. type: evaluationresult
  203. spec: spec.yaml
  204. categories: ["EvaluationResult"]
  205. 31 changes: 31 additions & 0 deletions 31
  206. assets/evaluation_results/boolq_meta-llama3-1-405b_question_answering/spec.yaml
  207. Original file line number Diff line number Diff line change
  208. @@ -0,0 +1,31 @@
  209. type: evaluationresult
  210. name: boolq_meta-llama3-1-405b_question_answering
  211. version: 2.22.07
  212. display_name: boolq_Meta-Llama3-1-405B_question_answering
  213. description: Meta-Llama3-1-405B run for boolq dataset
  214. dataset_family: boolq
  215. dataset_name: boolq
  216.  
  217. model_name: Meta-Llama-3.1-405B
  218. model_version: "1"
  219. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  220.  
  221. relationships:
  222. - relationshipType: Source
  223. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  224.  
  225. tags:
  226. evaluation_type: text_generation
  227. task: question-answering
  228. accuracy_metric_name: exact_match
  229.  
  230. metrics:
  231. accuracy: 0.921406728
  232.  
  233.  
  234. properties:
  235. n_shot: 5
  236. evaluation_sampling_ratio: 1.0
  237. evaluation_split: "validation"
  238. fewshot_sampling_ratio: 1.0
  239. fewshot_split: "train"
  240. 3 changes: 3 additions & 0 deletions 3
  241. assets/evaluation_results/boolq_meta-llama3-1-70b_question_answering/asset.yaml
  242. Original file line number Diff line number Diff line change
  243. @@ -0,0 +1,3 @@
  244. type: evaluationresult
  245. spec: spec.yaml
  246. categories: ["EvaluationResult"]
  247. 31 changes: 31 additions & 0 deletions 31
  248. assets/evaluation_results/boolq_meta-llama3-1-70b_question_answering/spec.yaml
  249. Original file line number Diff line number Diff line change
  250. @@ -0,0 +1,31 @@
  251. type: evaluationresult
  252. name: boolq_meta-llama3-1-70b_question_answering
  253. version: 2.22.07
  254. display_name: boolq_Meta-Llama3-1-70B_question_answering
  255. description: Meta-Llama-3.1-70B run for boolq dataset
  256. dataset_family: boolq
  257. dataset_name: boolq
  258.  
  259. model_name: Meta-Llama-3.1-70B
  260. model_version: "1"
  261. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  262.  
  263. relationships:
  264. - relationshipType: Source
  265. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  266.  
  267. tags:
  268. evaluation_type: text_generation
  269. task: question-answering
  270. accuracy_metric_name: exact_match
  271.  
  272. metrics:
  273. accuracy: 0.908868502
  274.  
  275.  
  276. properties:
  277. n_shot: 5
  278. evaluation_sampling_ratio: 1.0
  279. evaluation_split: "validation"
  280. fewshot_sampling_ratio: 1.0
  281. fewshot_split: "train"
  282. 3 changes: 3 additions & 0 deletions 3
  283. assets/evaluation_results/boolq_meta-llama3-1-8b_question_answering/asset.yaml
  284. Original file line number Diff line number Diff line change
  285. @@ -0,0 +1,3 @@
  286. type: evaluationresult
  287. spec: spec.yaml
  288. categories: ["EvaluationResult"]
  289. 31 changes: 31 additions & 0 deletions 31
  290. assets/evaluation_results/boolq_meta-llama3-1-8b_question_answering/spec.yaml
  291. Original file line number Diff line number Diff line change
  292. @@ -0,0 +1,31 @@
  293. type: evaluationresult
  294. name: boolq_meta-llama3-1-8b_question_answering
  295. version: 2.22.07
  296. display_name: boolq_Meta-Llama3-1-8B_question_answering
  297. description: Meta-Llama-3.1-8B run for boolq dataset
  298. dataset_family: boolq
  299. dataset_name: boolq
  300.  
  301. model_name: Meta-Llama-3.1-8B
  302. model_version: "1"
  303. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  304.  
  305. relationships:
  306. - relationshipType: Source
  307. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  308.  
  309. tags:
  310. evaluation_type: text_generation
  311. task: question-answering
  312. accuracy_metric_name: exact_match
  313.  
  314. metrics:
  315. accuracy: 0.870642202
  316.  
  317.  
  318. properties:
  319. n_shot: 5
  320. evaluation_sampling_ratio: 1.0
  321. evaluation_split: "validation"
  322. fewshot_sampling_ratio: 1.0
  323. fewshot_split: "train"
  324. 3 changes: 3 additions & 0 deletions 3
  325. assets/evaluation_results/gsm8k_meta-llama3-1-405b_question_answering/asset.yaml
  326. Original file line number Diff line number Diff line change
  327. @@ -0,0 +1,3 @@
  328. type: evaluationresult
  329. spec: spec.yaml
  330. categories: ["EvaluationResult"]
  331. 31 changes: 31 additions & 0 deletions 31
  332. assets/evaluation_results/gsm8k_meta-llama3-1-405b_question_answering/spec.yaml
  333. Original file line number Diff line number Diff line change
  334. @@ -0,0 +1,31 @@
  335. type: evaluationresult
  336. name: gsm8k_meta-llama3-1-405b_question_answering
  337. version: 2.22.07
  338. display_name: gsm8k_Meta-Llama3-1-405B_question_answering
  339. description: Meta-Llama3-1-405B run for gsm8k dataset
  340. dataset_family: gsm8k
  341. dataset_name: gsm8k
  342.  
  343. model_name: Meta-Llama-3.1-405B
  344. model_version: "1"
  345. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  346.  
  347. relationships:
  348. - relationshipType: Source
  349. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  350.  
  351. tags:
  352. evaluation_type: text_generation
  353. task: question-answering
  354. accuracy_metric_name: exact_match
  355.  
  356. metrics:
  357. accuracy: 0.968157695
  358.  
  359.  
  360. properties:
  361. n_shot: 8
  362. evaluation_sampling_ratio: 1.0
  363. evaluation_split: "test"
  364. fewshot_sampling_ratio: 1.0
  365. fewshot_split: "dev"
  366. 3 changes: 3 additions & 0 deletions 3
  367. assets/evaluation_results/gsm8k_meta-llama3-1-70b_question_answering/asset.yaml
  368. Original file line number Diff line number Diff line change
  369. @@ -0,0 +1,3 @@
  370. type: evaluationresult
  371. spec: spec.yaml
  372. categories: ["EvaluationResult"]
  373. 31 changes: 31 additions & 0 deletions 31
  374. assets/evaluation_results/gsm8k_meta-llama3-1-70b_question_answering/spec.yaml
  375. Original file line number Diff line number Diff line change
  376. @@ -0,0 +1,31 @@
  377. type: evaluationresult
  378. name: gsm8k_meta-llama3-1-70b_question_answering
  379. version: 2.22.07
  380. display_name: gsm8k_Meta-Llama3-1-70B_question_answering
  381. description: Meta-Llama-3.1-70B run for gsm8k dataset
  382. dataset_family: gsm8k
  383. dataset_name: gsm8k
  384.  
  385. model_name: Meta-Llama-3.1-70B
  386. model_version: "1"
  387. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  388.  
  389. relationships:
  390. - relationshipType: Source
  391. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  392.  
  393. tags:
  394. evaluation_type: text_generation
  395. task: question-answering
  396. accuracy_metric_name: exact_match
  397.  
  398. metrics:
  399. accuracy: 0.948445792
  400.  
  401.  
  402. properties:
  403. n_shot: 8
  404. evaluation_sampling_ratio: 1.0
  405. evaluation_split: "test"
  406. fewshot_sampling_ratio: 1.0
  407. fewshot_split: "dev"
  408. 3 changes: 3 additions & 0 deletions 3
  409. assets/evaluation_results/gsm8k_meta-llama3-1-8b_question_answering/asset.yaml
  410. Original file line number Diff line number Diff line change
  411. @@ -0,0 +1,3 @@
  412. type: evaluationresult
  413. spec: spec.yaml
  414. categories: ["EvaluationResult"]
  415. 31 changes: 31 additions & 0 deletions 31
  416. assets/evaluation_results/gsm8k_meta-llama3-1-8b_question_answering/spec.yaml
  417. Original file line number Diff line number Diff line change
  418. @@ -0,0 +1,31 @@
  419. type: evaluationresult
  420. name: gsm8k_meta-llama3-1-8b_question_answering
  421. version: 2.22.07
  422. display_name: gsm8k_Meta-Llama3-1-8B_question_answering
  423. description: Meta-Llama-3.1-8B run for gsm8k dataset
  424. dataset_family: gsm8k
  425. dataset_name: gsm8k
  426.  
  427. model_name: Meta-Llama-3.1-8B
  428. model_version: "1"
  429. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  430.  
  431. relationships:
  432. - relationshipType: Source
  433. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  434.  
  435. tags:
  436. evaluation_type: text_generation
  437. task: question-answering
  438. accuracy_metric_name: exact_match
  439.  
  440. metrics:
  441. accuracy: 0.843821077
  442.  
  443.  
  444. properties:
  445. n_shot: 8
  446. evaluation_sampling_ratio: 1.0
  447. evaluation_split: "test"
  448. fewshot_sampling_ratio: 1.0
  449. fewshot_split: "dev"
  450. 3 changes: 3 additions & 0 deletions 3
  451. assets/evaluation_results/hellaswag_meta-llama3-1-405b_question_answering/asset.yaml
  452. Original file line number Diff line number Diff line change
  453. @@ -0,0 +1,3 @@
  454. type: evaluationresult
  455. spec: spec.yaml
  456. categories: ["EvaluationResult"]
  457. 31 changes: 31 additions & 0 deletions 31
  458. assets/evaluation_results/hellaswag_meta-llama3-1-405b_question_answering/spec.yaml
  459. Original file line number Diff line number Diff line change
  460. @@ -0,0 +1,31 @@
  461. type: evaluationresult
  462. name: hellaswag_meta-llama3-1-405b_question_answering
  463. version: 2.22.07
  464. display_name: hellaswag_Meta-Llama3-1-405B_question_answering
  465. description: Meta-Llama3-1-405B run for hellaswag dataset
  466. dataset_family: hellaswag
  467. dataset_name: hellaswag
  468.  
  469. model_name: Meta-Llama-3.1-405B
  470. model_version: "1"
  471. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  472.  
  473. relationships:
  474. - relationshipType: Source
  475. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  476.  
  477. tags:
  478. evaluation_type: text_generation
  479. task: question-answering
  480. accuracy_metric_name: exact_match
  481.  
  482. metrics:
  483. accuracy: 0.919637522
  484.  
  485.  
  486. properties:
  487. n_shot: 5
  488. evaluation_sampling_ratio: 1.0
  489. evaluation_split: "validation"
  490. fewshot_sampling_ratio: 1.0
  491. fewshot_split: "train"
  492. 3 changes: 3 additions & 0 deletions 3
  493. assets/evaluation_results/hellaswag_meta-llama3-1-70b_question_answering/asset.yaml
  494. Original file line number Diff line number Diff line change
  495. @@ -0,0 +1,3 @@
  496. type: evaluationresult
  497. spec: spec.yaml
  498. categories: ["EvaluationResult"]
  499. 31 changes: 31 additions & 0 deletions 31
  500. assets/evaluation_results/hellaswag_meta-llama3-1-70b_question_answering/spec.yaml
  501. Original file line number Diff line number Diff line change
  502. @@ -0,0 +1,31 @@
  503. type: evaluationresult
  504. name: hellaswag_meta-llama3-1-70b_question_answering
  505. version: 2.22.07
  506. display_name: hellaswag_Meta-Llama3-1-70B_question_answering
  507. description: Meta-Llama-3.1-70B run for hellaswag dataset
  508. dataset_family: hellaswag
  509. dataset_name: hellaswag
  510.  
  511. model_name: Meta-Llama-3.1-70B
  512. model_version: "1"
  513. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  514.  
  515. relationships:
  516. - relationshipType: Source
  517. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  518.  
  519. tags:
  520. evaluation_type: text_generation
  521. task: question-answering
  522. accuracy_metric_name: exact_match
  523.  
  524. metrics:
  525. accuracy: 0.907986457
  526.  
  527.  
  528. properties:
  529. n_shot: 5
  530. evaluation_sampling_ratio: 1.0
  531. evaluation_split: "validation"
  532. fewshot_sampling_ratio: 1.0
  533. fewshot_split: "train"
  534. 3 changes: 3 additions & 0 deletions 3
  535. assets/evaluation_results/hellaswag_meta-llama3-1-8b_question_answering/asset.yaml
  536. Original file line number Diff line number Diff line change
  537. @@ -0,0 +1,3 @@
  538. type: evaluationresult
  539. spec: spec.yaml
  540. categories: ["EvaluationResult"]
  541. 31 changes: 31 additions & 0 deletions 31
  542. assets/evaluation_results/hellaswag_meta-llama3-1-8b_question_answering/spec.yaml
  543. Original file line number Diff line number Diff line change
  544. @@ -0,0 +1,31 @@
  545. type: evaluationresult
  546. name: hellaswag_meta-llama3-1-8b_question_answering
  547. version: 2.22.07
  548. display_name: hellaswag_Meta-Llama3-1-8B_question_answering
  549. description: Meta-Llama-3.1-8B run for hellaswag dataset
  550. dataset_family: hellaswag
  551. dataset_name: hellaswag
  552.  
  553. model_name: Meta-Llama-3.1-8B
  554. model_version: "1"
  555. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  556.  
  557. relationships:
  558. - relationshipType: Source
  559. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  560.  
  561. tags:
  562. evaluation_type: text_generation
  563. task: question-answering
  564. accuracy_metric_name: exact_match
  565.  
  566. metrics:
  567. accuracy: 0.768472416
  568.  
  569.  
  570. properties:
  571. n_shot: 5
  572. evaluation_sampling_ratio: 1.0
  573. evaluation_split: "validation"
  574. fewshot_sampling_ratio: 1.0
  575. fewshot_split: "train"
  576. 3 changes: 3 additions & 0 deletions 3
  577. assets/evaluation_results/human_eval_meta-llama3-1-405b_text_generation/asset.yaml
  578. Original file line number Diff line number Diff line change
  579. @@ -0,0 +1,3 @@
  580. type: evaluationresult
  581. spec: spec.yaml
  582. categories: ["EvaluationResult"]
  583. 31 changes: 31 additions & 0 deletions 31
  584. assets/evaluation_results/human_eval_meta-llama3-1-405b_text_generation/spec.yaml
  585. Original file line number Diff line number Diff line change
  586. @@ -0,0 +1,31 @@
  587. type: evaluationresult
  588. name: human_eval_meta-llama3-1-405b_text_generation
  589. version: 2.22.07
  590. display_name: human_eval_Meta-Llama3-1-405B_text_generation
  591. description: Meta-Llama3-1-405B run for human_eval dataset
  592. dataset_family: human_eval
  593. dataset_name: human_eval
  594.  
  595. model_name: Meta-Llama-3.1-405B
  596. model_version: "1"
  597. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  598.  
  599. relationships:
  600. - relationshipType: Source
  601. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  602.  
  603. tags:
  604. evaluation_type: text_generation
  605. task: text-generation
  606. accuracy_metric_name: pass@1
  607.  
  608. metrics:
  609. accuracy: 0.853658537
  610.  
  611.  
  612. properties:
  613. n_shot: 0
  614. evaluation_sampling_ratio: 1.0
  615. evaluation_split: "test"
  616. fewshot_sampling_ratio: None
  617. fewshot_split: "None"
  618. 3 changes: 3 additions & 0 deletions 3
  619. assets/evaluation_results/human_eval_meta-llama3-1-70b_text_generation/asset.yaml
  620. Original file line number Diff line number Diff line change
  621. @@ -0,0 +1,3 @@
  622. type: evaluationresult
  623. spec: spec.yaml
  624. categories: ["EvaluationResult"]
  625. 31 changes: 31 additions & 0 deletions 31
  626. assets/evaluation_results/human_eval_meta-llama3-1-70b_text_generation/spec.yaml
  627. Original file line number Diff line number Diff line change
  628. @@ -0,0 +1,31 @@
  629. type: evaluationresult
  630. name: human_eval_meta-llama3-1-70b_text_generation
  631. version: 2.22.07
  632. display_name: human_eval_Meta-Llama3-1-70B_text_generation
  633. description: Meta-Llama-3.1-70B run for human_eval dataset
  634. dataset_family: human_eval
  635. dataset_name: human_eval
  636.  
  637. model_name: Meta-Llama-3.1-70B
  638. model_version: "1"
  639. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  640.  
  641. relationships:
  642. - relationshipType: Source
  643. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  644.  
  645. tags:
  646. evaluation_type: text_generation
  647. task: text-generation
  648. accuracy_metric_name: pass@1
  649.  
  650. metrics:
  651. accuracy: 0.792682927
  652.  
  653.  
  654. properties:
  655. n_shot: 0
  656. evaluation_sampling_ratio: 1.0
  657. evaluation_split: "test"
  658. fewshot_sampling_ratio: None
  659. fewshot_split: "None"
  660. 3 changes: 3 additions & 0 deletions 3
  661. assets/evaluation_results/human_eval_meta-llama3-1-8b_text_generation/asset.yaml
  662. Original file line number Diff line number Diff line change
  663. @@ -0,0 +1,3 @@
  664. type: evaluationresult
  665. spec: spec.yaml
  666. categories: ["EvaluationResult"]
  667. 31 changes: 31 additions & 0 deletions 31
  668. assets/evaluation_results/human_eval_meta-llama3-1-8b_text_generation/spec.yaml
  669. Original file line number Diff line number Diff line change
  670. @@ -0,0 +1,31 @@
  671. type: evaluationresult
  672. name: human_eval_meta-llama3-1-8b_text_generation
  673. version: 2.22.07
  674. display_name: human_eval_Meta-Llama3-1-8B_text_generation
  675. description: Meta-Llama-3.1-8B run for human_eval dataset
  676. dataset_family: human_eval
  677. dataset_name: human_eval
  678.  
  679. model_name: Meta-Llama-3.1-8B
  680. model_version: "1"
  681. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  682.  
  683. relationships:
  684. - relationshipType: Source
  685. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  686.  
  687. tags:
  688. evaluation_type: text_generation
  689. task: text-generation
  690. accuracy_metric_name: pass@1
  691.  
  692. metrics:
  693. accuracy: 0.682926829
  694.  
  695.  
  696. properties:
  697. n_shot: 0
  698. evaluation_sampling_ratio: 1.0
  699. evaluation_split: "test"
  700. fewshot_sampling_ratio: None
  701. fewshot_split: "None"
  702. 3 changes: 3 additions & 0 deletions 3
  703. assets/evaluation_results/mmlu_humanities_meta-llama3-1-405b_question_answering/asset.yaml
  704. Original file line number Diff line number Diff line change
  705. @@ -0,0 +1,3 @@
  706. type: evaluationresult
  707. spec: spec.yaml
  708. categories: ["EvaluationResult"]
  709. 31 changes: 31 additions & 0 deletions 31
  710. assets/evaluation_results/mmlu_humanities_meta-llama3-1-405b_question_answering/spec.yaml
  711. Original file line number Diff line number Diff line change
  712. @@ -0,0 +1,31 @@
  713. type: evaluationresult
  714. name: mmlu_humanities_meta-llama3-1-405b_question_answering
  715. version: 2.22.07
  716. display_name: mmlu_humanities_Meta-Llama3-1-405B_question_answering
  717. description: Meta-Llama3-1-405B run for mmlu_humanities dataset
  718. dataset_family: mmlu_humanities
  719. dataset_name: mmlu_humanities
  720.  
  721. model_name: Meta-Llama-3.1-405B
  722. model_version: "1"
  723. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  724.  
  725. relationships:
  726. - relationshipType: Source
  727. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  728.  
  729. tags:
  730. evaluation_type: text_generation
  731. task: question-answering
  732. accuracy_metric_name: exact_match
  733.  
  734. metrics:
  735. accuracy: 0.817853348
  736.  
  737.  
  738. properties:
  739. n_shot: 5
  740. evaluation_sampling_ratio: 1.0
  741. evaluation_split: "test"
  742. fewshot_sampling_ratio: 1.0
  743. fewshot_split: "dev"
  744. 3 changes: 3 additions & 0 deletions 3
  745. assets/evaluation_results/mmlu_humanities_meta-llama3-1-70b_question_answering/asset.yaml
  746. Original file line number Diff line number Diff line change
  747. @@ -0,0 +1,3 @@
  748. type: evaluationresult
  749. spec: spec.yaml
  750. categories: ["EvaluationResult"]
  751. 31 changes: 31 additions & 0 deletions 31
  752. assets/evaluation_results/mmlu_humanities_meta-llama3-1-70b_question_answering/spec.yaml
  753. Original file line number Diff line number Diff line change
  754. @@ -0,0 +1,31 @@
  755. type: evaluationresult
  756. name: mmlu_humanities_meta-llama3-1-70b_question_answering
  757. version: 2.22.07
  758. display_name: mmlu_humanities_Meta-Llama3-1-70B_question_answering
  759. description: Meta-Llama-3.1-70B run for mmlu_humanities dataset
  760. dataset_family: mmlu
  761. dataset_name: mmlu_humanities
  762.  
  763. model_name: Meta-Llama-3.1-70B
  764. model_version: "1"
  765. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  766.  
  767. relationships:
  768. - relationshipType: Source
  769. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  770.  
  771. tags:
  772. evaluation_type: text_generation
  773. task: question-answering
  774. accuracy_metric_name: exact_match
  775.  
  776. metrics:
  777. accuracy: 0.794686504
  778.  
  779.  
  780. properties:
  781. n_shot: 5
  782. evaluation_sampling_ratio: 1.0
  783. evaluation_split: "test"
  784. fewshot_sampling_ratio: 1.0
  785. fewshot_split: "dev"
  786. 3 changes: 3 additions & 0 deletions 3
  787. assets/evaluation_results/mmlu_humanities_meta-llama3-1-8b_question_answering/asset.yaml
  788. Original file line number Diff line number Diff line change
  789. @@ -0,0 +1,3 @@
  790. type: evaluationresult
  791. spec: spec.yaml
  792. categories: ["EvaluationResult"]
  793. 31 changes: 31 additions & 0 deletions 31
  794. assets/evaluation_results/mmlu_humanities_meta-llama3-1-8b_question_answering/spec.yaml
  795. Original file line number Diff line number Diff line change
  796. @@ -0,0 +1,31 @@
  797. type: evaluationresult
  798. name: mmlu_humanities_meta-llama3-1-8b_question_answering
  799. version: 2.22.07
  800. display_name: mmlu_humanities_Meta-Llama3-1-8B_question_answering
  801. description: Meta-Llama-3.1-8B run for mmlu_humanities dataset
  802. dataset_family: mmlu
  803. dataset_name: mmlu_humanities
  804.  
  805. model_name: Meta-Llama-3.1-8B
  806. model_version: "1"
  807. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  808.  
  809. relationships:
  810. - relationshipType: Source
  811. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  812.  
  813. tags:
  814. evaluation_type: text_generation
  815. task: question-answering
  816. accuracy_metric_name: exact_match
  817.  
  818. metrics:
  819. accuracy: 0.618916047
  820.  
  821.  
  822. properties:
  823. n_shot: 5
  824. evaluation_sampling_ratio: 1.0
  825. evaluation_split: "test"
  826. fewshot_sampling_ratio: 1.0
  827. fewshot_split: "dev"
  828. 3 changes: 3 additions & 0 deletions 3
  829. assets/evaluation_results/mmlu_other_meta-llama3-1-405b_question_answering/asset.yaml
  830. Original file line number Diff line number Diff line change
  831. @@ -0,0 +1,3 @@
  832. type: evaluationresult
  833. spec: spec.yaml
  834. categories: ["EvaluationResult"]
  835. 31 changes: 31 additions & 0 deletions 31
  836. assets/evaluation_results/mmlu_other_meta-llama3-1-405b_question_answering/spec.yaml
  837. Original file line number Diff line number Diff line change
  838. @@ -0,0 +1,31 @@
  839. type: evaluationresult
  840. name: mmlu_other_meta-llama3-1-405b_question_answering
  841. version: 2.22.07
  842. display_name: mmlu_other_Meta-Llama3-1-405B_question_answering
  843. description: Meta-Llama3-1-405B run for mmlu_other dataset
  844. dataset_family: mmlu_other
  845. dataset_name: mmlu_other
  846.  
  847. model_name: Meta-Llama-3.1-405B
  848. model_version: "1"
  849. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  850.  
  851. relationships:
  852. - relationshipType: Source
  853. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  854.  
  855. tags:
  856. evaluation_type: text_generation
  857. task: question-answering
  858. accuracy_metric_name: exact_match
  859.  
  860. metrics:
  861. accuracy: 0.874798841
  862.  
  863.  
  864. properties:
  865. n_shot: 5
  866. evaluation_sampling_ratio: 1.0
  867. evaluation_split: "test"
  868. fewshot_sampling_ratio: 1.0
  869. fewshot_split: "dev"
  870. 3 changes: 3 additions & 0 deletions 3
  871. assets/evaluation_results/mmlu_other_meta-llama3-1-70b_question_answering/asset.yaml
  872. Original file line number Diff line number Diff line change
  873. @@ -0,0 +1,3 @@
  874. type: evaluationresult
  875. spec: spec.yaml
  876. categories: ["EvaluationResult"]
  877. 31 changes: 31 additions & 0 deletions 31
  878. assets/evaluation_results/mmlu_other_meta-llama3-1-70b_question_answering/spec.yaml
  879. Original file line number Diff line number Diff line change
  880. @@ -0,0 +1,31 @@
  881. type: evaluationresult
  882. name: mmlu_other_meta-llama3-1-70b_question_answering
  883. version: 2.22.07
  884. display_name: mmlu_other_Meta-Llama3-1-70B_question_answering
  885. description: Meta-Llama-3.1-70B run for mmlu_other dataset
  886. dataset_family: mmlu
  887. dataset_name: mmlu_other
  888.  
  889. model_name: Meta-Llama-3.1-70B
  890. model_version: "1"
  891. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  892.  
  893. relationships:
  894. - relationshipType: Source
  895. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  896.  
  897. tags:
  898. evaluation_type: text_generation
  899. task: question-answering
  900. accuracy_metric_name: exact_match
  901.  
  902. metrics:
  903. accuracy: 0.85226907
  904.  
  905.  
  906. properties:
  907. n_shot: 5
  908. evaluation_sampling_ratio: 1.0
  909. evaluation_split: "test"
  910. fewshot_sampling_ratio: 1.0
  911. fewshot_split: "dev"
  912. 3 changes: 3 additions & 0 deletions 3
  913. assets/evaluation_results/mmlu_other_meta-llama3-1-8b_question_answering/asset.yaml
  914. Original file line number Diff line number Diff line change
  915. @@ -0,0 +1,3 @@
  916. type: evaluationresult
  917. spec: spec.yaml
  918. categories: ["EvaluationResult"]
  919. 31 changes: 31 additions & 0 deletions 31
  920. assets/evaluation_results/mmlu_other_meta-llama3-1-8b_question_answering/spec.yaml
  921. Original file line number Diff line number Diff line change
  922. @@ -0,0 +1,31 @@
  923. type: evaluationresult
  924. name: mmlu_other_meta-llama3-1-8b_question_answering
  925. version: 2.22.07
  926. display_name: mmlu_other_Meta-Llama3-1-8B_question_answering
  927. description: Meta-Llama-3.1-8B run for mmlu_other dataset
  928. dataset_family: mmlu
  929. dataset_name: mmlu_other
  930.  
  931. model_name: Meta-Llama-3.1-8B
  932. model_version: "1"
  933. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  934.  
  935. relationships:
  936. - relationshipType: Source
  937. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  938.  
  939. tags:
  940. evaluation_type: text_generation
  941. task: question-answering
  942. accuracy_metric_name: exact_match
  943.  
  944. metrics:
  945. accuracy: 0.74026392
  946.  
  947.  
  948. properties:
  949. n_shot: 5
  950. evaluation_sampling_ratio: 1.0
  951. evaluation_split: "test"
  952. fewshot_sampling_ratio: 1.0
  953. fewshot_split: "dev"
  954. 3 changes: 3 additions & 0 deletions 3
  955. .../evaluation_results/mmlu_social_sciences_meta-llama3-1-405b_question_answering/asset.yaml
  956. Original file line number Diff line number Diff line change
  957. @@ -0,0 +1,3 @@
  958. type: evaluationresult
  959. spec: spec.yaml
  960. categories: ["EvaluationResult"]
  961. 31 changes: 31 additions & 0 deletions 31
  962. ...s/evaluation_results/mmlu_social_sciences_meta-llama3-1-405b_question_answering/spec.yaml
  963. Original file line number Diff line number Diff line change
  964. @@ -0,0 +1,31 @@
  965. type: evaluationresult
  966. name: mmlu_social_sciences_meta-llama3-1-405b_question_answering
  967. version: 2.22.07
  968. display_name: mmlu_social_sciences_Meta-Llama3-1-405B_question_answering
  969. description: Meta-Llama3-1-405B run for mmlu_social_sciences dataset
  970. dataset_family: mmlu_social_sciences
  971. dataset_name: mmlu_social_sciences
  972.  
  973. model_name: Meta-Llama-3.1-405B
  974. model_version: "1"
  975. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  976.  
  977. relationships:
  978. - relationshipType: Source
  979. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  980.  
  981. tags:
  982. evaluation_type: text_generation
  983. task: question-answering
  984. accuracy_metric_name: exact_match
  985.  
  986. metrics:
  987. accuracy: 0.897627559
  988.  
  989.  
  990. properties:
  991. n_shot: 5
  992. evaluation_sampling_ratio: 1.0
  993. evaluation_split: "test"
  994. fewshot_sampling_ratio: 1.0
  995. fewshot_split: "dev"
  996. 3 changes: 3 additions & 0 deletions 3
  997. ...s/evaluation_results/mmlu_social_sciences_meta-llama3-1-70b_question_answering/asset.yaml
  998. Original file line number Diff line number Diff line change
  999. @@ -0,0 +1,3 @@
  1000. type: evaluationresult
  1001. spec: spec.yaml
  1002. categories: ["EvaluationResult"]
  1003. 31 changes: 31 additions & 0 deletions 31
  1004. ...ts/evaluation_results/mmlu_social_sciences_meta-llama3-1-70b_question_answering/spec.yaml
  1005. Original file line number Diff line number Diff line change
  1006. @@ -0,0 +1,31 @@
  1007. type: evaluationresult
  1008. name: mmlu_social_sciences_meta-llama3-1-70b_question_answering
  1009. version: 2.22.07
  1010. display_name: mmlu_social_sciences_Meta-Llama3-1-70B_question_answering
  1011. description: Meta-Llama-3.1-70B run for mmlu_social_sciences dataset
  1012. dataset_family: mmlu
  1013. dataset_name: mmlu_social_sciences
  1014.  
  1015. model_name: Meta-Llama-3.1-70B
  1016. model_version: "1"
  1017. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1018.  
  1019. relationships:
  1020. - relationshipType: Source
  1021. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1022.  
  1023. tags:
  1024. evaluation_type: text_generation
  1025. task: question-answering
  1026. accuracy_metric_name: exact_match
  1027.  
  1028. metrics:
  1029. accuracy: 0.877803055
  1030.  
  1031.  
  1032. properties:
  1033. n_shot: 5
  1034. evaluation_sampling_ratio: 1.0
  1035. evaluation_split: "test"
  1036. fewshot_sampling_ratio: 1.0
  1037. fewshot_split: "dev"
  1038. 3 changes: 3 additions & 0 deletions 3
  1039. ...ts/evaluation_results/mmlu_social_sciences_meta-llama3-1-8b_question_answering/asset.yaml
  1040. Original file line number Diff line number Diff line change
  1041. @@ -0,0 +1,3 @@
  1042. type: evaluationresult
  1043. spec: spec.yaml
  1044. categories: ["EvaluationResult"]
  1045. 31 changes: 31 additions & 0 deletions 31
  1046. assets/evaluation_results/mmlu_social_sciences_meta-llama3-1-8b_question_answering/spec.yaml
  1047. Original file line number Diff line number Diff line change
  1048. @@ -0,0 +1,31 @@
  1049. type: evaluationresult
  1050. name: mmlu_social_sciences_meta-llama3-1-8b_question_answering
  1051. version: 2.22.07
  1052. display_name: mmlu_social_sciences_Meta-Llama3-1-8B_question_answering
  1053. description: Meta-Llama-3.1-8B run for mmlu_social_sciences dataset
  1054. dataset_family: mmlu
  1055. dataset_name: mmlu_social_sciences
  1056.  
  1057. model_name: Meta-Llama-3.1-8B
  1058. model_version: "1"
  1059. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1060.  
  1061. relationships:
  1062. - relationshipType: Source
  1063. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1064.  
  1065. tags:
  1066. evaluation_type: text_generation
  1067. task: question-answering
  1068. accuracy_metric_name: exact_match
  1069.  
  1070. metrics:
  1071. accuracy: 0.76080598
  1072.  
  1073.  
  1074. properties:
  1075. n_shot: 5
  1076. evaluation_sampling_ratio: 1.0
  1077. evaluation_split: "test"
  1078. fewshot_sampling_ratio: 1.0
  1079. fewshot_split: "dev"
  1080. 3 changes: 3 additions & 0 deletions 3
  1081. assets/evaluation_results/mmlu_stem_meta-llama3-1-405b_question_answering/asset.yaml
  1082. Original file line number Diff line number Diff line change
  1083. @@ -0,0 +1,3 @@
  1084. type: evaluationresult
  1085. spec: spec.yaml
  1086. categories: ["EvaluationResult"]
  1087. 31 changes: 31 additions & 0 deletions 31
  1088. assets/evaluation_results/mmlu_stem_meta-llama3-1-405b_question_answering/spec.yaml
  1089. Original file line number Diff line number Diff line change
  1090. @@ -0,0 +1,31 @@
  1091. type: evaluationresult
  1092. name: mmlu_stem_meta-llama3-1-405b_question_answering
  1093. version: 2.22.07
  1094. display_name: mmlu_stem_Meta-Llama3-1-405B_question_answering
  1095. description: Meta-Llama3-1-405B run for mmlu_stem dataset
  1096. dataset_family: mmlu_stem
  1097. dataset_name: mmlu_stem
  1098.  
  1099. model_name: Meta-Llama-3.1-405B
  1100. model_version: "1"
  1101. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1102.  
  1103. relationships:
  1104. - relationshipType: Source
  1105. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1106.  
  1107. tags:
  1108. evaluation_type: text_generation
  1109. task: question-answering
  1110. accuracy_metric_name: exact_match
  1111.  
  1112. metrics:
  1113. accuracy: 0.830954646
  1114.  
  1115.  
  1116. properties:
  1117. n_shot: 5
  1118. evaluation_sampling_ratio: 1.0
  1119. evaluation_split: "test"
  1120. fewshot_sampling_ratio: 1.0
  1121. fewshot_split: "dev"
  1122. 3 changes: 3 additions & 0 deletions 3
  1123. assets/evaluation_results/mmlu_stem_meta-llama3-1-70b_question_answering/asset.yaml
  1124. Original file line number Diff line number Diff line change
  1125. @@ -0,0 +1,3 @@
  1126. type: evaluationresult
  1127. spec: spec.yaml
  1128. categories: ["EvaluationResult"]
  1129. 31 changes: 31 additions & 0 deletions 31
  1130. assets/evaluation_results/mmlu_stem_meta-llama3-1-70b_question_answering/spec.yaml
  1131. Original file line number Diff line number Diff line change
  1132. @@ -0,0 +1,31 @@
  1133. type: evaluationresult
  1134. name: mmlu_stem_meta-llama3-1-70b_question_answering
  1135. version: 2.22.07
  1136. display_name: mmlu_stem_Meta-Llama3-1-70B_question_answering
  1137. description: Meta-Llama-3.1-70B run for mmlu_stem dataset
  1138. dataset_family: mmlu
  1139. dataset_name: mmlu_stem
  1140.  
  1141. model_name: Meta-Llama-3.1-70B
  1142. model_version: "1"
  1143. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1144.  
  1145. relationships:
  1146. - relationshipType: Source
  1147. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1148.  
  1149. tags:
  1150. evaluation_type: text_generation
  1151. task: question-answering
  1152. accuracy_metric_name: exact_match
  1153.  
  1154. metrics:
  1155. accuracy: 0.771328893
  1156.  
  1157.  
  1158. properties:
  1159. n_shot: 5
  1160. evaluation_sampling_ratio: 1.0
  1161. evaluation_split: "test"
  1162. fewshot_sampling_ratio: 1.0
  1163. fewshot_split: "dev"
  1164. 3 changes: 3 additions & 0 deletions 3
  1165. assets/evaluation_results/mmlu_stem_meta-llama3-1-8b_question_answering/asset.yaml
  1166. Original file line number Diff line number Diff line change
  1167. @@ -0,0 +1,3 @@
  1168. type: evaluationresult
  1169. spec: spec.yaml
  1170. categories: ["EvaluationResult"]
  1171. 31 changes: 31 additions & 0 deletions 31
  1172. assets/evaluation_results/mmlu_stem_meta-llama3-1-8b_question_answering/spec.yaml
  1173. Original file line number Diff line number Diff line change
  1174. @@ -0,0 +1,31 @@
  1175. type: evaluationresult
  1176. name: mmlu_stem_meta-llama3-1-8b_question_answering
  1177. version: 2.22.07
  1178. display_name: mmlu_stem_Meta-Llama3-1-8B_question_answering
  1179. description: Meta-Llama-3.1-8B run for mmlu_stem dataset
  1180. dataset_family: mmlu
  1181. dataset_name: mmlu_stem
  1182.  
  1183. model_name: Meta-Llama-3.1-8B
  1184. model_version: "1"
  1185. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1186.  
  1187. relationships:
  1188. - relationshipType: Source
  1189. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1190.  
  1191. tags:
  1192. evaluation_type: text_generation
  1193. task: question-answering
  1194. accuracy_metric_name: exact_match
  1195.  
  1196. metrics:
  1197. accuracy: 0.594988899
  1198.  
  1199.  
  1200. properties:
  1201. n_shot: 5
  1202. evaluation_sampling_ratio: 1.0
  1203. evaluation_split: "test"
  1204. fewshot_sampling_ratio: 1.0
  1205. fewshot_split: "dev"
  1206. 3 changes: 3 additions & 0 deletions 3
  1207. assets/evaluation_results/openbookqa_meta-llama3-1-405b_question_answering/asset.yaml
  1208. Original file line number Diff line number Diff line change
  1209. @@ -0,0 +1,3 @@
  1210. type: evaluationresult
  1211. spec: spec.yaml
  1212. categories: ["EvaluationResult"]
  1213. 31 changes: 31 additions & 0 deletions 31
  1214. assets/evaluation_results/openbookqa_meta-llama3-1-405b_question_answering/spec.yaml
  1215. Original file line number Diff line number Diff line change
  1216. @@ -0,0 +1,31 @@
  1217. type: evaluationresult
  1218. name: openbookqa_meta-llama3-1-405b_question_answering
  1219. version: 2.22.07
  1220. display_name: openbookqa_Meta-Llama3-1-405B_question_answering
  1221. description: Meta-Llama3-1-405B run for openbookqa dataset
  1222. dataset_family: openbookqa
  1223. dataset_name: openbookqa
  1224.  
  1225. model_name: Meta-Llama-3.1-405B
  1226. model_version: "1"
  1227. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1228.  
  1229. relationships:
  1230. - relationshipType: Source
  1231. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1232.  
  1233. tags:
  1234. evaluation_type: text_generation
  1235. task: question-answering
  1236. accuracy_metric_name: exact_match
  1237.  
  1238. metrics:
  1239. accuracy: 0.908
  1240.  
  1241.  
  1242. properties:
  1243. n_shot: 10
  1244. evaluation_sampling_ratio: 1.0
  1245. evaluation_split: "validation"
  1246. fewshot_sampling_ratio: 1.0
  1247. fewshot_split: "train"
  1248. 3 changes: 3 additions & 0 deletions 3
  1249. assets/evaluation_results/openbookqa_meta-llama3-1-70b_question_answering/asset.yaml
  1250. Original file line number Diff line number Diff line change
  1251. @@ -0,0 +1,3 @@
  1252. type: evaluationresult
  1253. spec: spec.yaml
  1254. categories: ["EvaluationResult"]
  1255. 31 changes: 31 additions & 0 deletions 31
  1256. assets/evaluation_results/openbookqa_meta-llama3-1-70b_question_answering/spec.yaml
  1257. Original file line number Diff line number Diff line change
  1258. @@ -0,0 +1,31 @@
  1259. type: evaluationresult
  1260. name: openbookqa_meta-llama3-1-70b_question_answering
  1261. version: 2.22.07
  1262. display_name: openbookqa_Meta-Llama3-1-70B_question_answering
  1263. description: Meta-Llama-3.1-70B run for openbookqa dataset
  1264. dataset_family: openbookqa
  1265. dataset_name: openbookqa
  1266.  
  1267. model_name: Meta-Llama-3.1-70B
  1268. model_version: "1"
  1269. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1270.  
  1271. relationships:
  1272. - relationshipType: Source
  1273. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1274.  
  1275. tags:
  1276. evaluation_type: text_generation
  1277. task: question-answering
  1278. accuracy_metric_name: exact_match
  1279.  
  1280. metrics:
  1281. accuracy: 0.936
  1282.  
  1283.  
  1284. properties:
  1285. n_shot: 10
  1286. evaluation_sampling_ratio: 1.0
  1287. evaluation_split: "validation"
  1288. fewshot_sampling_ratio: 1.0
  1289. fewshot_split: "train"
  1290. 3 changes: 3 additions & 0 deletions 3
  1291. assets/evaluation_results/openbookqa_meta-llama3-1-8b_question_answering/asset.yaml
  1292. Original file line number Diff line number Diff line change
  1293. @@ -0,0 +1,3 @@
  1294. type: evaluationresult
  1295. spec: spec.yaml
  1296. categories: ["EvaluationResult"]
  1297. 31 changes: 31 additions & 0 deletions 31
  1298. assets/evaluation_results/openbookqa_meta-llama3-1-8b_question_answering/spec.yaml
  1299. Original file line number Diff line number Diff line change
  1300. @@ -0,0 +1,31 @@
  1301. type: evaluationresult
  1302. name: openbookqa_meta-llama3-1-8b_question_answering
  1303. version: 2.22.07
  1304. display_name: openbookqa_Meta-Llama3-1-8B_question_answering
  1305. description: Meta-Llama-3.1-8B run for openbookqa dataset
  1306. dataset_family: openbookqa
  1307. dataset_name: openbookqa
  1308.  
  1309. model_name: Meta-Llama-3.1-8B
  1310. model_version: "1"
  1311. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1312.  
  1313. relationships:
  1314. - relationshipType: Source
  1315. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1316.  
  1317. tags:
  1318. evaluation_type: text_generation
  1319. task: question-answering
  1320. accuracy_metric_name: exact_match
  1321.  
  1322. metrics:
  1323. accuracy: 0.852
  1324.  
  1325.  
  1326. properties:
  1327. n_shot: 10
  1328. evaluation_sampling_ratio: 1.0
  1329. evaluation_split: "validation"
  1330. fewshot_sampling_ratio: 1.0
  1331. fewshot_split: "train"
  1332. 3 changes: 3 additions & 0 deletions 3
  1333. assets/evaluation_results/piqa_meta-llama3-1-405b_question_answering/asset.yaml
  1334. Original file line number Diff line number Diff line change
  1335. @@ -0,0 +1,3 @@
  1336. type: evaluationresult
  1337. spec: spec.yaml
  1338. categories: ["EvaluationResult"]
  1339. 31 changes: 31 additions & 0 deletions 31
  1340. assets/evaluation_results/piqa_meta-llama3-1-405b_question_answering/spec.yaml
  1341. Original file line number Diff line number Diff line change
  1342. @@ -0,0 +1,31 @@
  1343. type: evaluationresult
  1344. name: piqa_meta-llama3-1-405b_question_answering
  1345. version: 2.22.07
  1346. display_name: piqa_Meta-Llama3-1-405B_question_answering
  1347. description: Meta-Llama3-1-405B run for piqa dataset
  1348. dataset_family: piqa
  1349. dataset_name: piqa
  1350.  
  1351. model_name: Meta-Llama-3.1-405B
  1352. model_version: "1"
  1353. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1354.  
  1355. relationships:
  1356. - relationshipType: Source
  1357. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1358.  
  1359. tags:
  1360. evaluation_type: text_generation
  1361. task: question-answering
  1362. accuracy_metric_name: exact_match
  1363.  
  1364. metrics:
  1365. accuracy: 0.874319913
  1366.  
  1367.  
  1368. properties:
  1369. n_shot: 5
  1370. evaluation_sampling_ratio: 1.0
  1371. evaluation_split: "validation"
  1372. fewshot_sampling_ratio: 0.3
  1373. fewshot_split: "train"
  1374. 3 changes: 3 additions & 0 deletions 3
  1375. assets/evaluation_results/piqa_meta-llama3-1-70b_question_answering/asset.yaml
  1376. Original file line number Diff line number Diff line change
  1377. @@ -0,0 +1,3 @@
  1378. type: evaluationresult
  1379. spec: spec.yaml
  1380. categories: ["EvaluationResult"]
  1381. 31 changes: 31 additions & 0 deletions 31
  1382. assets/evaluation_results/piqa_meta-llama3-1-70b_question_answering/spec.yaml
  1383. Original file line number Diff line number Diff line change
  1384. @@ -0,0 +1,31 @@
  1385. type: evaluationresult
  1386. name: piqa_meta-llama3-1-70b_question_answering
  1387. version: 2.22.07
  1388. display_name: piqa_Meta-Llama3-1-70B_question_answering
  1389. description: Meta-Llama-3.1-70B run for piqa dataset
  1390. dataset_family: piqa
  1391. dataset_name: piqa
  1392.  
  1393. model_name: Meta-Llama-3.1-70B
  1394. model_version: "1"
  1395. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1396.  
  1397. relationships:
  1398. - relationshipType: Source
  1399. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1400.  
  1401. tags:
  1402. evaluation_type: text_generation
  1403. task: question-answering
  1404. accuracy_metric_name: exact_match
  1405.  
  1406. metrics:
  1407. accuracy: 0.861806311
  1408.  
  1409.  
  1410. properties:
  1411. n_shot: 5
  1412. evaluation_sampling_ratio: 1.0
  1413. evaluation_split: "validation"
  1414. fewshot_sampling_ratio: 0.3
  1415. fewshot_split: "train"
  1416. 3 changes: 3 additions & 0 deletions 3
  1417. assets/evaluation_results/piqa_meta-llama3-1-8b_question_answering/asset.yaml
  1418. Original file line number Diff line number Diff line change
  1419. @@ -0,0 +1,3 @@
  1420. type: evaluationresult
  1421. spec: spec.yaml
  1422. categories: ["EvaluationResult"]
  1423. 31 changes: 31 additions & 0 deletions 31
  1424. assets/evaluation_results/piqa_meta-llama3-1-8b_question_answering/spec.yaml
  1425. Original file line number Diff line number Diff line change
  1426. @@ -0,0 +1,31 @@
  1427. type: evaluationresult
  1428. name: piqa_meta-llama3-1-8b_question_answering
  1429. version: 2.22.07
  1430. display_name: piqa_Meta-Llama3-1-8B_question_answering
  1431. description: Meta-Llama-3.1-8B run for piqa dataset
  1432. dataset_family: piqa
  1433. dataset_name: piqa
  1434.  
  1435. model_name: Meta-Llama-3.1-8B
  1436. model_version: "1"
  1437. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1438.  
  1439. relationships:
  1440. - relationshipType: Source
  1441. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1442.  
  1443. tags:
  1444. evaluation_type: text_generation
  1445. task: question-answering
  1446. accuracy_metric_name: exact_match
  1447.  
  1448. metrics:
  1449. accuracy: 0.800870511
  1450.  
  1451.  
  1452. properties:
  1453. n_shot: 5
  1454. evaluation_sampling_ratio: 1.0
  1455. evaluation_split: "validation"
  1456. fewshot_sampling_ratio: 0.3
  1457. fewshot_split: "train"
  1458. 3 changes: 3 additions & 0 deletions 3
  1459. assets/evaluation_results/social_iqa_meta-llama3-1-405b_question_answering/asset.yaml
  1460. Original file line number Diff line number Diff line change
  1461. @@ -0,0 +1,3 @@
  1462. type: evaluationresult
  1463. spec: spec.yaml
  1464. categories: ["EvaluationResult"]
  1465. 31 changes: 31 additions & 0 deletions 31
  1466. assets/evaluation_results/social_iqa_meta-llama3-1-405b_question_answering/spec.yaml
  1467. Original file line number Diff line number Diff line change
  1468. @@ -0,0 +1,31 @@
  1469. type: evaluationresult
  1470. name: social_iqa_meta-llama3-1-405b_question_answering
  1471. version: 2.22.07
  1472. display_name: social_iqa_Meta-Llama3-1-405B_question_answering
  1473. description: Meta-Llama3-1-405B run for social_iqa dataset
  1474. dataset_family: social_iqa
  1475. dataset_name: social_iqa
  1476.  
  1477. model_name: Meta-Llama-3.1-405B
  1478. model_version: "1"
  1479. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1480.  
  1481. relationships:
  1482. - relationshipType: Source
  1483. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1484.  
  1485. tags:
  1486. evaluation_type: text_generation
  1487. task: question-answering
  1488. accuracy_metric_name: exact_match
  1489.  
  1490. metrics:
  1491. accuracy: 0.796827021
  1492.  
  1493.  
  1494. properties:
  1495. n_shot: 5
  1496. evaluation_sampling_ratio: 1.0
  1497. evaluation_split: "validation"
  1498. fewshot_sampling_ratio: 0.3
  1499. fewshot_split: "train"
  1500. 3 changes: 3 additions & 0 deletions 3
  1501. assets/evaluation_results/social_iqa_meta-llama3-1-70b_question_answering/asset.yaml
  1502. Original file line number Diff line number Diff line change
  1503. @@ -0,0 +1,3 @@
  1504. type: evaluationresult
  1505. spec: spec.yaml
  1506. categories: ["EvaluationResult"]
  1507. 31 changes: 31 additions & 0 deletions 31
  1508. assets/evaluation_results/social_iqa_meta-llama3-1-70b_question_answering/spec.yaml
  1509. Original file line number Diff line number Diff line change
  1510. @@ -0,0 +1,31 @@
  1511. type: evaluationresult
  1512. name: social_iqa_meta-llama3-1-70b_question_answering
  1513. version: 2.22.07
  1514. display_name: social_iqa_Meta-Llama3-1-70B_question_answering
  1515. description: Meta-Llama-3.1-70B run for social_iqa dataset
  1516. dataset_family: social_iqa
  1517. dataset_name: social_iqa
  1518.  
  1519. model_name: Meta-Llama-3.1-70B
  1520. model_version: "1"
  1521. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1522.  
  1523. relationships:
  1524. - relationshipType: Source
  1525. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1526.  
  1527. tags:
  1528. evaluation_type: text_generation
  1529. task: question-answering
  1530. accuracy_metric_name: exact_match
  1531.  
  1532. metrics:
  1533. accuracy: 0.812691914
  1534.  
  1535.  
  1536. properties:
  1537. n_shot: 5
  1538. evaluation_sampling_ratio: 1.0
  1539. evaluation_split: "validation"
  1540. fewshot_sampling_ratio: 0.3
  1541. fewshot_split: "train"
  1542. 3 changes: 3 additions & 0 deletions 3
  1543. assets/evaluation_results/social_iqa_meta-llama3-1-8b_question_answering/asset.yaml
  1544. Original file line number Diff line number Diff line change
  1545. @@ -0,0 +1,3 @@
  1546. type: evaluationresult
  1547. spec: spec.yaml
  1548. categories: ["EvaluationResult"]
  1549. 31 changes: 31 additions & 0 deletions 31
  1550. assets/evaluation_results/social_iqa_meta-llama3-1-8b_question_answering/spec.yaml
  1551. Original file line number Diff line number Diff line change
  1552. @@ -0,0 +1,31 @@
  1553. type: evaluationresult
  1554. name: social_iqa_meta-llama3-1-8b_question_answering
  1555. version: 2.22.07
  1556. display_name: social_iqa_Meta-Llama3-1-8B_question_answering
  1557. description: Meta-Llama-3.1-8B run for social_iqa dataset
  1558. dataset_family: social_iqa
  1559. dataset_name: social_iqa
  1560.  
  1561. model_name: Meta-Llama-3.1-8B
  1562. model_version: "1"
  1563. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1564.  
  1565. relationships:
  1566. - relationshipType: Source
  1567. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1568.  
  1569. tags:
  1570. evaluation_type: text_generation
  1571. task: question-answering
  1572. accuracy_metric_name: exact_match
  1573.  
  1574. metrics:
  1575. accuracy: 0.734390993
  1576.  
  1577.  
  1578. properties:
  1579. n_shot: 5
  1580. evaluation_sampling_ratio: 1.0
  1581. evaluation_split: "validation"
  1582. fewshot_sampling_ratio: 0.3
  1583. fewshot_split: "train"
  1584. 3 changes: 3 additions & 0 deletions 3
  1585. assets/evaluation_results/squad_v2_meta-llama3-1-405b_question_answering/asset.yaml
  1586. Original file line number Diff line number Diff line change
  1587. @@ -0,0 +1,3 @@
  1588. type: evaluationresult
  1589. spec: spec.yaml
  1590. categories: ["EvaluationResult"]
  1591. 33 changes: 33 additions & 0 deletions 33
  1592. assets/evaluation_results/squad_v2_meta-llama3-1-405b_question_answering/spec.yaml
  1593. Original file line number Diff line number Diff line change
  1594. @@ -0,0 +1,33 @@
  1595. type: evaluationresult
  1596. name: squad_v2_meta-llama3-1-405b_question_answering
  1597. version: 2.22.07
  1598. display_name: squad_v2_Meta-Llama3-1-405B_question_answering
  1599. description: Meta-Llama3-1-405B run for squad_v2 dataset
  1600. dataset_family: squad_v2
  1601. dataset_name: squad_v2
  1602.  
  1603. model_name: Meta-Llama-3.1-405B
  1604. model_version: "1"
  1605. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1606.  
  1607. relationships:
  1608. - relationshipType: Source
  1609. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1610.  
  1611. tags:
  1612. evaluation_type: text_generation
  1613. task: question-answering
  1614. accuracy_metric_name: nan
  1615.  
  1616. metrics:
  1617. groundedness: 3.762426285
  1618. relevance: 4.085930918
  1619. GPTSimilarity: 3.082561078
  1620.  
  1621.  
  1622. properties:
  1623. n_shot: 2
  1624. evaluation_sampling_ratio: 0.2
  1625. evaluation_split: "validation"
  1626. fewshot_sampling_ratio: 1.0
  1627. fewshot_split: "dev"
  1628. 3 changes: 3 additions & 0 deletions 3
  1629. assets/evaluation_results/squad_v2_meta-llama3-1-70b_question_answering/asset.yaml
  1630. Original file line number Diff line number Diff line change
  1631. @@ -0,0 +1,3 @@
  1632. type: evaluationresult
  1633. spec: spec.yaml
  1634. categories: ["EvaluationResult"]
  1635. 33 changes: 33 additions & 0 deletions 33
  1636. assets/evaluation_results/squad_v2_meta-llama3-1-70b_question_answering/spec.yaml
  1637. Original file line number Diff line number Diff line change
  1638. @@ -0,0 +1,33 @@
  1639. type: evaluationresult
  1640. name: squad_v2_meta-llama3-1-70b_question_answering
  1641. version: 2.22.07
  1642. display_name: squad_v2_Meta-Llama3-1-70B_question_answering
  1643. description: Meta-Llama-3.1-70B run for squad_v2 dataset
  1644. dataset_family: squad_v2
  1645. dataset_name: squad_v2
  1646.  
  1647. model_name: Meta-Llama-3.1-70B
  1648. model_version: "1"
  1649. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1650.  
  1651. relationships:
  1652. - relationshipType: Source
  1653. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1654.  
  1655. tags:
  1656. evaluation_type: text_generation
  1657. task: question-answering
  1658. accuracy_metric_name: nan
  1659.  
  1660. metrics:
  1661. groundedness: 3.65206402695871
  1662. relevance: 3.91280539174389
  1663. GPTSimilarity: 3.02864363942712
  1664.  
  1665.  
  1666. properties:
  1667. n_shot: 2
  1668. evaluation_sampling_ratio: 0.2
  1669. evaluation_split: "validation"
  1670. fewshot_sampling_ratio: 1.0
  1671. fewshot_split: "dev"
  1672. 3 changes: 3 additions & 0 deletions 3
  1673. assets/evaluation_results/squad_v2_meta-llama3-1-8b_question_answering/asset.yaml
  1674. Original file line number Diff line number Diff line change
  1675. @@ -0,0 +1,3 @@
  1676. type: evaluationresult
  1677. spec: spec.yaml
  1678. categories: ["EvaluationResult"]
  1679. 33 changes: 33 additions & 0 deletions 33
  1680. assets/evaluation_results/squad_v2_meta-llama3-1-8b_question_answering/spec.yaml
  1681. Original file line number Diff line number Diff line change
  1682. @@ -0,0 +1,33 @@
  1683. type: evaluationresult
  1684. name: squad_v2_meta-llama3-1-8b_question_answering
  1685. version: 2.22.07
  1686. display_name: squad_v2_Meta-Llama3-1-8B_question_answering
  1687. description: Meta-Llama-3.1-8B run for squad_v2 dataset
  1688. dataset_family: squad_v2
  1689. dataset_name: squad_v2
  1690.  
  1691. model_name: Meta-Llama-3.1-8B
  1692. model_version: "1"
  1693. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1694.  
  1695. relationships:
  1696. - relationshipType: Source
  1697. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1698.  
  1699. tags:
  1700. evaluation_type: text_generation
  1701. task: question-answering
  1702. accuracy_metric_name: nan
  1703.  
  1704. metrics:
  1705. groundedness: 3.96545914069081
  1706. relevance: 4.09317032040472
  1707. GPTSimilarity: 3.01727042965459
  1708.  
  1709.  
  1710. properties:
  1711. n_shot: 2
  1712. evaluation_sampling_ratio: 0.2
  1713. evaluation_split: "validation"
  1714. fewshot_sampling_ratio: 1.0
  1715. fewshot_split: "dev"
  1716. 3 changes: 3 additions & 0 deletions 3
  1717. ...evaluation_results/truthfulqa_generation_meta-llama3-1-405b_question_answering/asset.yaml
  1718. Original file line number Diff line number Diff line change
  1719. @@ -0,0 +1,3 @@
  1720. type: evaluationresult
  1721. spec: spec.yaml
  1722. categories: ["EvaluationResult"]
  1723. 33 changes: 33 additions & 0 deletions 33
  1724. .../evaluation_results/truthfulqa_generation_meta-llama3-1-405b_question_answering/spec.yaml
  1725. Original file line number Diff line number Diff line change
  1726. @@ -0,0 +1,33 @@
  1727. type: evaluationresult
  1728. name: truthfulqa_generation_meta-llama3-1-405b_question_answering
  1729. version: 2.22.07
  1730. display_name: truthfulqa_generation_Meta-Llama3-1-405B_question_answering
  1731. description: Meta-Llama3-1-405B run for truthfulqa_generation dataset
  1732. dataset_family: truthfulqa_generation
  1733. dataset_name: truthfulqa_generation
  1734.  
  1735. model_name: Meta-Llama-3.1-405B
  1736. model_version: "1"
  1737. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1738.  
  1739. relationships:
  1740. - relationshipType: Source
  1741. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1742.  
  1743. tags:
  1744. evaluation_type: text_generation
  1745. task: question-answering
  1746. accuracy_metric_name: nan
  1747.  
  1748. metrics:
  1749. coherence: 4.88372093
  1750. fluency: 4.729498164
  1751. GPTSimilarity: 3.088127295
  1752.  
  1753.  
  1754. properties:
  1755. n_shot: 6
  1756. evaluation_sampling_ratio: 1.0
  1757. evaluation_split: "validation"
  1758. fewshot_sampling_ratio: 1.0
  1759. fewshot_split: "dev"
  1760. 3 changes: 3 additions & 0 deletions 3
  1761. .../evaluation_results/truthfulqa_generation_meta-llama3-1-70b_question_answering/asset.yaml
  1762. Original file line number Diff line number Diff line change
  1763. @@ -0,0 +1,3 @@
  1764. type: evaluationresult
  1765. spec: spec.yaml
  1766. categories: ["EvaluationResult"]
  1767. 33 changes: 33 additions & 0 deletions 33
  1768. ...s/evaluation_results/truthfulqa_generation_meta-llama3-1-70b_question_answering/spec.yaml
  1769. Original file line number Diff line number Diff line change
  1770. @@ -0,0 +1,33 @@
  1771. type: evaluationresult
  1772. name: truthfulqa_generation_meta-llama3-1-70b_question_answering
  1773. version: 2.22.07
  1774. display_name: truthfulqa_generation_Meta-Llama3-1-70B_question_answering
  1775. description: Meta-Llama-3.1-70B run for truthfulqa_generation dataset
  1776. dataset_family: truthfulqa
  1777. dataset_name: truthfulqa_generation
  1778.  
  1779. model_name: Meta-Llama-3.1-70B
  1780. model_version: "1"
  1781. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1782.  
  1783. relationships:
  1784. - relationshipType: Source
  1785. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1786.  
  1787. tags:
  1788. evaluation_type: text_generation
  1789. task: question-answering
  1790. accuracy_metric_name: nan
  1791.  
  1792. metrics:
  1793. coherence: 4.86658506731946
  1794. fluency: 4.7172582619339
  1795. GPTSimilarity: 2.96328029375765
  1796.  
  1797.  
  1798. properties:
  1799. n_shot: 6
  1800. evaluation_sampling_ratio: 1.0
  1801. evaluation_split: "validation"
  1802. fewshot_sampling_ratio: 1.0
  1803. fewshot_split: "dev"
  1804. 3 changes: 3 additions & 0 deletions 3
  1805. ...s/evaluation_results/truthfulqa_generation_meta-llama3-1-8b_question_answering/asset.yaml
  1806. Original file line number Diff line number Diff line change
  1807. @@ -0,0 +1,3 @@
  1808. type: evaluationresult
  1809. spec: spec.yaml
  1810. categories: ["EvaluationResult"]
  1811. 33 changes: 33 additions & 0 deletions 33
  1812. ...ts/evaluation_results/truthfulqa_generation_meta-llama3-1-8b_question_answering/spec.yaml
  1813. Original file line number Diff line number Diff line change
  1814. @@ -0,0 +1,33 @@
  1815. type: evaluationresult
  1816. name: truthfulqa_generation_meta-llama3-1-8b_question_answering
  1817. version: 2.22.07
  1818. display_name: truthfulqa_generation_Meta-Llama3-1-8B_question_answering
  1819. description: Meta-Llama-3.1-8B run for truthfulqa_generation dataset
  1820. dataset_family: truthfulqa
  1821. dataset_name: truthfulqa_generation
  1822.  
  1823. model_name: Meta-Llama-3.1-8B
  1824. model_version: "1"
  1825. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1826.  
  1827. relationships:
  1828. - relationshipType: Source
  1829. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1830.  
  1831. tags:
  1832. evaluation_type: text_generation
  1833. task: question-answering
  1834. accuracy_metric_name: nan
  1835.  
  1836. metrics:
  1837. coherence: 4.80048959608323
  1838. fluency: 4.59730722154222
  1839. GPTSimilarity: 2.59975520195838
  1840.  
  1841.  
  1842. properties:
  1843. n_shot: 6
  1844. evaluation_sampling_ratio: 1.0
  1845. evaluation_split: "validation"
  1846. fewshot_sampling_ratio: 1.0
  1847. fewshot_split: "dev"
  1848. 3 changes: 3 additions & 0 deletions 3
  1849. assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-405b_question_answering/asset.yaml
  1850. Original file line number Diff line number Diff line change
  1851. @@ -0,0 +1,3 @@
  1852. type: evaluationresult
  1853. spec: spec.yaml
  1854. categories: ["EvaluationResult"]
  1855. 31 changes: 31 additions & 0 deletions 31
  1856. assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-405b_question_answering/spec.yaml
  1857. Original file line number Diff line number Diff line change
  1858. @@ -0,0 +1,31 @@
  1859. type: evaluationresult
  1860. name: truthfulqa_mc1_meta-llama3-1-405b_question_answering
  1861. version: 2.22.07
  1862. display_name: truthfulqa_mc1_Meta-Llama3-1-405B_question_answering
  1863. description: Meta-Llama3-1-405B run for truthfulqa_mc1 dataset
  1864. dataset_family: truthfulqa_mc1
  1865. dataset_name: truthfulqa_mc1
  1866.  
  1867. model_name: Meta-Llama-3.1-405B
  1868. model_version: "1"
  1869. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1870.  
  1871. relationships:
  1872. - relationshipType: Source
  1873. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1874.  
  1875. tags:
  1876. evaluation_type: text_generation
  1877. task: question-answering
  1878. accuracy_metric_name: exact_match
  1879.  
  1880. metrics:
  1881. accuracy: 0.800489596
  1882.  
  1883.  
  1884. properties:
  1885. n_shot: 6
  1886. evaluation_sampling_ratio: 1.0
  1887. evaluation_split: "validation"
  1888. fewshot_sampling_ratio: 1.0
  1889. fewshot_split: "dev"
  1890. 3 changes: 3 additions & 0 deletions 3
  1891. assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-70b_question_answering/asset.yaml
  1892. Original file line number Diff line number Diff line change
  1893. @@ -0,0 +1,3 @@
  1894. type: evaluationresult
  1895. spec: spec.yaml
  1896. categories: ["EvaluationResult"]
  1897. 31 changes: 31 additions & 0 deletions 31
  1898. assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-70b_question_answering/spec.yaml
  1899. Original file line number Diff line number Diff line change
  1900. @@ -0,0 +1,31 @@
  1901. type: evaluationresult
  1902. name: truthfulqa_mc1_meta-llama3-1-70b_question_answering
  1903. version: 2.22.07
  1904. display_name: truthfulqa_mc1_Meta-Llama3-1-70B_question_answering
  1905. description: Meta-Llama-3.1-70B run for truthfulqa_mc1 dataset
  1906. dataset_family: truthfulqa
  1907. dataset_name: truthfulqa_mc1
  1908.  
  1909. model_name: Meta-Llama-3.1-70B
  1910. model_version: "1"
  1911. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1912.  
  1913. relationships:
  1914. - relationshipType: Source
  1915. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  1916.  
  1917. tags:
  1918. evaluation_type: text_generation
  1919. task: question-answering
  1920. accuracy_metric_name: exact_match
  1921.  
  1922. metrics:
  1923. accuracy: 0.768665851
  1924.  
  1925.  
  1926. properties:
  1927. n_shot: 6
  1928. evaluation_sampling_ratio: 1.0
  1929. evaluation_split: "validation"
  1930. fewshot_sampling_ratio: 1.0
  1931. fewshot_split: "dev"
  1932. 3 changes: 3 additions & 0 deletions 3
  1933. assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-8b_question_answering/asset.yaml
  1934. Original file line number Diff line number Diff line change
  1935. @@ -0,0 +1,3 @@
  1936. type: evaluationresult
  1937. spec: spec.yaml
  1938. categories: ["EvaluationResult"]
  1939. 31 changes: 31 additions & 0 deletions 31
  1940. assets/evaluation_results/truthfulqa_mc1_meta-llama3-1-8b_question_answering/spec.yaml
  1941. Original file line number Diff line number Diff line change
  1942. @@ -0,0 +1,31 @@
  1943. type: evaluationresult
  1944. name: truthfulqa_mc1_meta-llama3-1-8b_question_answering
  1945. version: 2.22.07
  1946. display_name: truthfulqa_mc1_Meta-Llama3-1-8B_question_answering
  1947. description: Meta-Llama-3.1-8B run for truthfulqa_mc1 dataset
  1948. dataset_family: truthfulqa
  1949. dataset_name: truthfulqa_mc1
  1950.  
  1951. model_name: Meta-Llama-3.1-8B
  1952. model_version: "1"
  1953. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1954.  
  1955. relationships:
  1956. - relationshipType: Source
  1957. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  1958.  
  1959. tags:
  1960. evaluation_type: text_generation
  1961. task: question-answering
  1962. accuracy_metric_name: exact_match
  1963.  
  1964. metrics:
  1965. accuracy: 0.605875153
  1966.  
  1967.  
  1968. properties:
  1969. n_shot: 6
  1970. evaluation_sampling_ratio: 1.0
  1971. evaluation_split: "validation"
  1972. fewshot_sampling_ratio: 1.0
  1973. fewshot_split: "dev"
  1974. 3 changes: 3 additions & 0 deletions 3
  1975. assets/evaluation_results/winogrande_meta-llama3-1-405b_question_answering/asset.yaml
  1976. Original file line number Diff line number Diff line change
  1977. @@ -0,0 +1,3 @@
  1978. type: evaluationresult
  1979. spec: spec.yaml
  1980. categories: ["EvaluationResult"]
  1981. 31 changes: 31 additions & 0 deletions 31
  1982. assets/evaluation_results/winogrande_meta-llama3-1-405b_question_answering/spec.yaml
  1983. Original file line number Diff line number Diff line change
  1984. @@ -0,0 +1,31 @@
  1985. type: evaluationresult
  1986. name: winogrande_meta-llama3-1-405b_question_answering
  1987. version: 2.22.07
  1988. display_name: winogrande_Meta-Llama3-1-405B_question_answering
  1989. description: Meta-Llama3-1-405B run for winogrande dataset
  1990. dataset_family: winogrande
  1991. dataset_name: winogrande
  1992.  
  1993. model_name: Meta-Llama-3.1-405B
  1994. model_version: "1"
  1995. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  1996.  
  1997. relationships:
  1998. - relationshipType: Source
  1999. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
  2000.  
  2001. tags:
  2002. evaluation_type: text_generation
  2003. task: question-answering
  2004. accuracy_metric_name: exact_match
  2005.  
  2006. metrics:
  2007. accuracy: 0.867403315
  2008.  
  2009.  
  2010. properties:
  2011. n_shot: 5
  2012. evaluation_sampling_ratio: 1.0
  2013. evaluation_split: "validation"
  2014. fewshot_sampling_ratio: 1.0
  2015. fewshot_split: "train"
  2016. 3 changes: 3 additions & 0 deletions 3
  2017. assets/evaluation_results/winogrande_meta-llama3-1-70b_question_answering/asset.yaml
  2018. Original file line number Diff line number Diff line change
  2019. @@ -0,0 +1,3 @@
  2020. type: evaluationresult
  2021. spec: spec.yaml
  2022. categories: ["EvaluationResult"]
  2023. 31 changes: 31 additions & 0 deletions 31
  2024. assets/evaluation_results/winogrande_meta-llama3-1-70b_question_answering/spec.yaml
  2025. Original file line number Diff line number Diff line change
  2026. @@ -0,0 +1,31 @@
  2027. type: evaluationresult
  2028. name: winogrande_meta-llama3-1-70b_question_answering
  2029. version: 2.22.07
  2030. display_name: winogrande_Meta-Llama3-1-70B_question_answering
  2031. description: Meta-Llama-3.1-70B run for winogrande dataset
  2032. dataset_family: winogrande
  2033. dataset_name: winogrande
  2034.  
  2035. model_name: Meta-Llama-3.1-70B
  2036. model_version: "1"
  2037. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  2038.  
  2039. relationships:
  2040. - relationshipType: Source
  2041. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-70B/versions/1
  2042.  
  2043. tags:
  2044. evaluation_type: text_generation
  2045. task: question-answering
  2046. accuracy_metric_name: exact_match
  2047.  
  2048. metrics:
  2049. accuracy: 0.844514601
  2050.  
  2051.  
  2052. properties:
  2053. n_shot: 5
  2054. evaluation_sampling_ratio: 1.0
  2055. evaluation_split: "validation"
  2056. fewshot_sampling_ratio: 1.0
  2057. fewshot_split: "train"
  2058. 3 changes: 3 additions & 0 deletions 3
  2059. assets/evaluation_results/winogrande_meta-llama3-1-8b_question_answering/asset.yaml
  2060. Original file line number Diff line number Diff line change
  2061. @@ -0,0 +1,3 @@
  2062. type: evaluationresult
  2063. spec: spec.yaml
  2064. categories: ["EvaluationResult"]
  2065. 31 changes: 31 additions & 0 deletions 31
  2066. assets/evaluation_results/winogrande_meta-llama3-1-8b_question_answering/spec.yaml
  2067. Original file line number Diff line number Diff line change
  2068. @@ -0,0 +1,31 @@
  2069. type: evaluationresult
  2070. name: winogrande_meta-llama3-1-8b_question_answering
  2071. version: 2.22.07
  2072. display_name: winogrande_Meta-Llama3-1-8B_question_answering
  2073. description: Meta-Llama-3.1-8B run for winogrande dataset
  2074. dataset_family: winogrande
  2075. dataset_name: winogrande
  2076.  
  2077. model_name: Meta-Llama-3.1-8B
  2078. model_version: "1"
  2079. model_asset_id: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  2080.  
  2081. relationships:
  2082. - relationshipType: Source
  2083. assetId: azureml://registries/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
  2084.  
  2085. tags:
  2086. evaluation_type: text_generation
  2087. task: question-answering
  2088. accuracy_metric_name: exact_match
  2089.  
  2090. metrics:
  2091. accuracy: 0.649565904
  2092.  
  2093.  
  2094. properties:
  2095. n_shot: 5
  2096. evaluation_sampling_ratio: 1.0
  2097. evaluation_split: "validation"
  2098. fewshot_sampling_ratio: 1.0
  2099. fewshot_split: "train"
  2100. Footer
  2101. © 2024 GitHub, Inc.
  2102. Footer navigation
  2103.  
  2104. Terms
  2105. Privacy
  2106. Security
  2107. Status
  2108. Docs
  2109. Contact
  2110.  
  2111. Create Llama3.1 assets for 8B/70B/405B by SamGos93 · Pull Request #3180 · Azure/azureml-assets · GitHub
  2112.  
  2113.  
  2114.  
  2115.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement