Guest User

Untitled

a guest
Nov 25th, 2017
98
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 35.88 KB | None | 0 0
  1. ```python
  2. %matplotlib notebook
  3.  
  4. import matplotlib as mpl
  5. from matplotlib import pyplot as plt
  6. from matplotlib import rcParams
  7.  
  8. import seaborn as sns
  9.  
  10. sns.set_style("white")
  11. sns.set_context("paper", rc={"lines.linewidth": 1})
  12.  
  13. rcParams['axes.titlepad'] = 20
  14. rcParams['axes.titlesize'] = "medium"
  15. rcParams['axes.edgecolor'] = "red"
  16. rcParams['axes.spines.right'] = rcParams['axes.spines.top'] = False
  17. rcParams['xtick.labelsize'] = "small"
  18. rcParams['xtick.major.pad'] = 10
  19.  
  20. rcParams['ytick.labelsize'] = "small"
  21. rcParams['ytick.major.pad'] = 10
  22. rcParams['axes.formatter.use_mathtext'] = True
  23. rcParams['axes.labelpad'] = 10
  24. ```
  25.  
  26.  
  27. ```python
  28. root_path = "/home/fat-fighter/Documents/cs771-project/hybrid-method/"
  29. ```
  30.  
  31. ## Description of Files
  32.  
  33. ### Folder: features
  34.  
  35. - **tracks-mfcc.csv** - Contains already extracted mfcc features from all tracks using 30-60 seconds of tracks
  36. - **tracks-cluster-probabilities.csv** - Contains the cluster probabolities and assignments for all tracks (based on their mfcc features_
  37. - **timbres-cluster-probabilities.csv** - Contains the cluster probabilities and assignments for all segment timbres of all tracks
  38. - **tracks-collective-timbres-clusters-features.csv** - Contains the extracted features of a track using its timbres' collective cluster probabilities
  39.  
  40. ### Folder: million-song-subset
  41.  
  42. - **tracks-features.csv** - Contains mfcc features extracted from tracks in the MSS
  43. - **tracks-timbres.csv** - Contains segment timbres for all tracks
  44.  
  45. ### Folder: taste-profile-subset
  46.  
  47. - **songs.txt** - A list of song ids
  48. - **users.txt** - A list of user ids
  49. - **train-triplets.txt** - A user-song-count triplets
  50. - **song-to-tracks.txt** - A song-track id mapping
  51.  
  52. # Collaborative Filtering
  53.  
  54. ## Finding Optimal Number of Track Clusters (Based on Tracks' MFCC Features)
  55.  
  56.  
  57. ```python
  58. import numpy as np
  59. import pandas as pd
  60.  
  61. from sklearn.cluster import KMeans
  62. from sklearn.decomposition import PCA
  63. ```
  64.  
  65.  
  66. ```python
  67. local_path = root_path + "data/"
  68.  
  69. n_jobs = -1
  70. max_iter = 500
  71. algorithm = "full"
  72. n_init = 5
  73. ```
  74.  
  75.  
  76. ```python
  77. tracks_data = pd.read_csv(local_path + "features/tracks-mfcc.csv", sep="\t")
  78.  
  79. cols = tracks_data.columns.tolist()[1:]
  80. tracks_features = tracks_data[cols]
  81. ```
  82.  
  83.  
  84. ```python
  85. estimators = [
  86. (n_clusters, KMeans(n_clusters=n_clusters, random_state=0, n_jobs=n_jobs, max_iter=max_iter, algorithm=algorithm, n_init=n_init))
  87. for n_clusters in range(5, 16, 1)
  88. ]
  89. ```
  90.  
  91.  
  92. ```python
  93. for n_clusters, estimator in estimators:
  94. estimator.fit(tracks_features)
  95. ```
  96.  
  97.  
  98. ```python
  99. with open(local_path + "features/tracks-clustering-kmeans-inertias.csv", "w") as f:
  100. cluster_inertias = []
  101.  
  102. for n_clusters, estimator in estimators:
  103. cluster_inertias.append([n_clusters, estimator.inertia_])
  104.  
  105. f.write("\n".join([str(n_clusters) + "\t" + str(inertia) for n_clusters, inertia in cluster_inertias]))
  106. ```
  107.  
  108. ### Inertial Plot
  109.  
  110.  
  111. ```python
  112. with open(local_path + "features/tracks-clustering-kmeans-inertias.csv") as f:
  113. cluster_inertias = [line.strip(" \t\n\r").split("\t") for line in f.readlines()]
  114.  
  115. cluster_inertias = [[int(cluster), float(inertia)] for cluster, inertia in cluster_inertias]
  116. cluster_inertias = np.array(cluster_inertias)
  117. ```
  118.  
  119.  
  120. ```python
  121. sns.pointplot(cluster_inertias[:, 0], cluster_inertias[:, 1])
  122.  
  123. plt.title("Tracks Clustering: Inertia for K-Means")
  124. plt.xlabel("Number of Clusters")
  125. plt.ylabel("Variance")
  126.  
  127. plt.savefig(local_path + "plots/tracks-clustering-kmeans-inertia.png", dpi=250)
  128. plt.show()
  129. ```
  130.  
  131. ### PCA Plot of Tracks MFCC (for 10 Clusters)
  132.  
  133.  
  134. ```python
  135. decomposed_tracks_features = PCA(n_components=2).fit(tracks_features).transform(tracks_features)
  136. ```
  137.  
  138.  
  139. ```python
  140. n_clusters, estimator = estimators[5]
  141. cluster_assignments = estimator.labels_
  142. ```
  143.  
  144.  
  145. ```python
  146. plt.scatter(decomposed_tracks_features[:, 0], decomposed_tracks_features[:, 1], alpha=.8, s=0.7)
  147.  
  148. plt.title("Tracks MFCC: PCA Plot")
  149.  
  150. plt.savefig(local_path + "plots/tracks-mfcc-pca.png", dpi=250)
  151. plt.show()
  152. ```
  153.  
  154. ## Clustering Tracks using GMM
  155.  
  156.  
  157. ```python
  158. import pandas as pd
  159.  
  160. from sklearn.externals import joblib
  161. from sklearn.decomposition import PCA
  162. from sklearn.mixture import GaussianMixture
  163. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
  164. ```
  165.  
  166.  
  167. ```python
  168. local_path = root_path + "data/"
  169.  
  170. n_clusters = 10
  171. max_iter = 5000
  172. covariance_type = "diag"
  173. n_init = 3
  174. ```
  175.  
  176.  
  177. ```python
  178. tracks_data = pd.read_csv(local_path + "features/tracks-mfcc.csv", sep="\t")
  179.  
  180. cols = tracks_data.columns[1:]
  181. tracks_mfcc = tracks_data[cols]
  182. ```
  183.  
  184.  
  185. ```python
  186. estimator = GaussianMixture(n_components=n_clusters, covariance_type=covariance_type, max_iter=max_iter, random_state=0, n_init=n_init)
  187. ```
  188.  
  189.  
  190. ```python
  191. estimator.fit(tracks_mfcc)
  192. ```
  193.  
  194.  
  195. ```python
  196. joblib.dump(estimator, local_path + "models/tracks-clustering-gmm-model.pkl")
  197. ```
  198.  
  199.  
  200. ```python
  201. estimator = joblib.load(local_path + "models/tracks-clustering-gmm-model.pkl")
  202. ```
  203.  
  204.  
  205. ```python
  206. probs = estimator.predict_proba(tracks_mfcc)
  207. cluster_assignments = estimator.predict(tracks_mfcc)
  208. ```
  209.  
  210.  
  211. ```python
  212. with open(local_path + "tracks-cluster-probabilities.csv", "w") as f:
  213. for i, song_id in enumerate(tracks_data["id"]):
  214. params = [song_id] + list(probs[i]) + [cluster_assignments[i]]
  215.  
  216. params = [str(param) for param in params]
  217.  
  218. f.write("\t".join(params) + "\n")
  219. ```
  220.  
  221. ### LDA Plot of Tracks MFCC
  222.  
  223.  
  224. ```python
  225. decomposed_tracks_mfcc = LinearDiscriminantAnalysis(n_components=2).fit(tracks_mfcc, cluster_assignments).transform(tracks_mfcc)
  226. ```
  227.  
  228.  
  229. ```python
  230. for i in range(n_clusters):
  231. plt.scatter(decomposed_tracks_mfcc[cluster_assignments == i, 0], decomposed_tracks_mfcc[cluster_assignments == i, 1], alpha=.8, s=0.7)
  232.  
  233. plt.gca().set_xlim([-16, 6])
  234. plt.gca().set_ylim([-5, 5])
  235. plt.title("Tracks MFCC: LDA Plot (After GMM)")
  236.  
  237. plt.savefig(local_path + "plots/tracks-mfcc-gmm-clustering-pca.png", dpi=250)
  238. plt.show()
  239. ```
  240.  
  241. ## Mapping Users to Tracks
  242.  
  243.  
  244. ```python
  245. local_path = root_path + "data/taste-profile-subset/"
  246. ```
  247.  
  248.  
  249. ```python
  250. songs_to_tracks = dict()
  251. count = 0
  252. with open(local_path + "songs-to-tracks.txt", "r") as f:
  253. for line in f.readlines():
  254. line = line.strip(" \t\n\r").split()
  255. if len(line) > 1:
  256. songs_to_tracks[line[0]] = line[1:]
  257. ```
  258.  
  259.  
  260. ```python
  261. outfile = open(local_path + "user-track-counts-raw.txt", "w")
  262. ```
  263.  
  264.  
  265. ```python
  266. with open(local_path + "user-song-counts.txt", "r") as f:
  267. line = f.readline()
  268. while line:
  269. line = line.strip(" \t\n\r").split()
  270. if len(line) == 3 and line[1] in songs_to_tracks:
  271. for track in songs_to_tracks[line[1]]:
  272. outfile.write("\t".join([line[0], track, line[2]]) + "\n")
  273. line = f.readline()
  274. ```
  275.  
  276.  
  277. ```python
  278. outfile.close()
  279. ```
  280.  
  281. ## Splitting Users into Training and Evaluation Sets
  282.  
  283.  
  284. ```python
  285. import random
  286. ```
  287.  
  288.  
  289. ```python
  290. local_path = root_path + "data/"
  291. ```
  292.  
  293.  
  294. ```bash
  295. %%bash -s "$local_path"
  296.  
  297. cd $1/taste-profile-subset
  298.  
  299. cut -f1 user-track-counts-raw.txt | sort | uniq -c > user-counts.txt
  300. cat user-counts.txt | sed 's/^ *\([0-9]*\) /\1\t/g' | awk '($1 > 49)' > t; mv t user-counts.txt
  301. ```
  302.  
  303.  
  304. ```bash
  305. %%bash -s "$local_path"
  306.  
  307. cd $1/taste-profile-subset/
  308.  
  309. awk 'BEGIN {
  310. FS = OFS = "\t"
  311. }
  312. NR == FNR {
  313. f[$2] = $0
  314. next
  315. }
  316. $1 in f {
  317. print $0
  318. }' user-counts.txt user-track-counts-raw.txt > t
  319. ```
  320.  
  321.  
  322. ```bash
  323. %%bash -s "$local_path"
  324.  
  325. cd $1
  326.  
  327. awk 'BEGIN {
  328. FS = OFS = "\t"
  329. }
  330. NR == FNR {
  331. f[$1] = 1
  332. next
  333. }
  334. $2 in f {
  335. print $0
  336. }' features/tracks-cluster-probabilities.csv taste-profile-subset/t > taste-profile-subset/user-track-counts.txt
  337. ```
  338.  
  339.  
  340. ```bash
  341. %%bash -s "$local_path"
  342.  
  343. cd $1/taste-profile-subset/
  344.  
  345. cut -f2 -d$'\t' user-counts.txt | sort --random-sort > t
  346.  
  347. size=`cat user-counts.txt | wc -l`
  348. vsize=$(( $size / 10 ))
  349.  
  350. head -$vsize t > users-validation.txt
  351. tail -n+$vsize t > users-train.txt
  352.  
  353. rm t
  354. ```
  355.  
  356.  
  357. ```bash
  358. %%bash -s "$local_path"
  359.  
  360. cd $1/taste-profile-subset/
  361.  
  362. awk 'BEGIN {
  363. FS = OFS = "\t"
  364. }
  365. NR == FNR {
  366. f[$1] = 1
  367. next
  368. }
  369. $1 in f {
  370. print $0
  371. }' users-train.txt user-track-counts.txt > user-track-counts-train.txt
  372. ```
  373.  
  374.  
  375. ```bash
  376. %%bash -s "$local_path"
  377.  
  378. cd $1/taste-profile-subset/
  379.  
  380. awk 'BEGIN {
  381. FS = OFS = "\t"
  382. }
  383. NR == FNR {
  384. f[$1] = 1
  385. next
  386. }
  387. $1 in f {
  388. print $0
  389. }' users-validation.txt user-track-counts.txt > user-track-counts-validation.txt
  390. ```
  391.  
  392. ## Computing User Features (Based on Tracks' Cluster Probabilities)
  393.  
  394.  
  395. ```python
  396. import numpy as np
  397. ```
  398.  
  399.  
  400. ```python
  401. local_path = root_path + "data/"
  402.  
  403. n_clusters = 10
  404. ```
  405.  
  406.  
  407. ```python
  408. tracks_mfcc = dict()
  409. with open(local_path + "features/tracks-cluster-probabilities.csv", "r") as f:
  410. line = f.readline()
  411. while line:
  412. line = f.readline()
  413. line = line.strip(" \t\n\r").split()
  414. if len(line) == 12:
  415. tracks_mfcc[line[0]] = np.array([float(field) for field in line[1:-1]])
  416. ```
  417.  
  418.  
  419. ```python
  420. with open(local_path + "taste-profile-subset/users-train.txt") as f:
  421. users_train = [user.strip(" \n\r") for user in f.readlines()]
  422.  
  423. with open(local_path + "taste-profile-subset/users-validation.txt") as f:
  424. users_validation = [user.strip(" \n\r") for user in f.readlines()]
  425. ```
  426.  
  427.  
  428. ```python
  429. user_features = dict()
  430. user_track_counts = dict()
  431. ```
  432.  
  433.  
  434. ```python
  435. with open(local_path + "taste-profile-subset/user-track-counts.txt", "r") as f:
  436. for line in f:
  437. line = line.strip(" \t\n\r").split()
  438. if len(line) == 3:
  439. if line[0] not in user_track_counts:
  440. user_features[line[0]] = np.zeros(n_clusters)
  441. user_track_counts[line[0]] = 0
  442.  
  443. user_features[line[0]] += tracks_mfcc[line[1]]
  444. user_track_counts[line[0]] += 1
  445. ```
  446.  
  447.  
  448. ```python
  449. outfile_train = local_path + "features/user-features-train.csv"
  450. outfile_validation = local_path + "features/user-features-validation.csv"
  451. ```
  452.  
  453.  
  454. ```python
  455. with open(outfile_train, "w") as f:
  456. for user in users_train:
  457. f.write("\t".join([user] + [str(field) for field in (user_features[user] / float(user_track_counts[user]))]) + "\n")
  458.  
  459. with open(outfile_validation, "w") as f:
  460. for user in users_validation:
  461. f.write("\t".join([user] + [str(field) for field in (user_features[user] / float(user_track_counts[user]))]) + "\n")
  462. ```
  463.  
  464. ## Finding Optimal Number of Users Clusters (Based on Users' Computed Features)
  465.  
  466.  
  467. ```python
  468. import numpy as np
  469. import pandas as pd
  470.  
  471. from sklearn.cluster import KMeans
  472. from sklearn.decomposition import PCA
  473. ```
  474.  
  475.  
  476. ```python
  477. local_path = root_path + "data/"
  478. ```
  479.  
  480.  
  481. ```python
  482. n_jobs = -1
  483. max_iter = 500
  484. algorithm = "full"
  485. n_init = 5
  486. ```
  487.  
  488.  
  489. ```python
  490. user_data = pd.read_csv(local_path + "features/user-features-train.csv", sep="\t", header=None)
  491.  
  492. cols = user_data.columns.tolist()[1:]
  493. user_features = user_data[cols]
  494. ```
  495.  
  496.  
  497. ```python
  498. estimators = [
  499. (n_clusters, KMeans(n_clusters=n_clusters, random_state=0, n_jobs=n_jobs, max_iter=max_iter, algorithm=algorithm, n_init=n_init))
  500. for n_clusters in range(10, 30, 1)
  501. ]
  502. ```
  503.  
  504.  
  505. ```python
  506. for n_clusters, estimator in estimators:
  507. estimator.fit(user_features)
  508. ```
  509.  
  510.  
  511. ```python
  512. with open(local_path + "features/users-clustering-kmeans-inertias.csv", "w") as f:
  513. cluster_inertias = []
  514.  
  515. for n_clusters, estimator in estimators:
  516. cluster_inertias.append([n_clusters, estimator.inertia_])
  517.  
  518. f.write("\n".join([str(n_clusters) + "\t" + str(inertia) for n_cluster, inertia in cluster_inertias]))
  519.  
  520. cluster_inertias = np.array(cluster_inertias)
  521. ```
  522.  
  523. ### Inertial Plot
  524.  
  525.  
  526. ```python
  527. with open(local_path + "features/users-clustering-kmeans-inertias.csv") as f:
  528. cluster_inertias = [line.strip(" \t\n\r").split("\t") for line in f.readlines()]
  529.  
  530. cluster_inertias = [[int(cluster), float(inertia)] for cluster, inertia in cluster_inertias]
  531. cluster_inertias = np.array(cluster_inertias)
  532. ```
  533.  
  534.  
  535. ```python
  536. sns.pointplot(cluster_inertias[:, 0].astype(int), cluster_inertias[:, 1])
  537.  
  538. plt.title("Users Clustering: Inertia for K-Means")
  539. plt.xlabel("Number of Clusters")
  540. plt.ylabel("Variance")
  541.  
  542. plt.savefig(local_path + "plots/users-clustering-kmeans-inertia.png", dpi=250)
  543. plt.show()
  544. ```
  545.  
  546. ### PCA Plot of Users MFCC (for 20 Clusters)
  547.  
  548.  
  549. ```python
  550. decomposed_user_features = PCA(n_components=2).fit(user_features).transform(user_features)
  551. ```
  552.  
  553.  
  554. ```python
  555. plt.scatter(decomposed_user_features[:, 0], decomposed_user_features[:, 1], alpha=.8, s=0.7)
  556.  
  557. plt.title("User Features: PCA Plot")
  558.  
  559. plt.savefig(local_path + "plots/user-features-pca.png", dpi=250)
  560. plt.show()
  561. ```
  562.  
  563. ## Clustering Users using GMM
  564.  
  565.  
  566. ```python
  567. import pandas as pd
  568.  
  569. from sklearn.externals import joblib
  570. from sklearn.decomposition import PCA
  571. from sklearn.mixture import GaussianMixture
  572. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
  573. ```
  574.  
  575.  
  576. ```python
  577. local_path = root_path + "data/"
  578.  
  579. n_clusters = 20
  580. max_iter = 5000
  581. covariance_type = "diag"
  582. n_init = 3
  583. ```
  584.  
  585.  
  586. ```python
  587. user_data = pd.read_csv(local_path + "/features/user-features-train.csv", sep="\t", header=None)
  588.  
  589. cols = user_data.columns[1:]
  590. user_features = user_data[cols]
  591. ```
  592.  
  593.  
  594. ```python
  595. estimator = GaussianMixture(n_components=n_clusters, covariance_type=covariance_type, max_iter=max_iter, random_state=0, n_init=n_init)
  596. ```
  597.  
  598.  
  599. ```python
  600. estimator.fit(user_features)
  601. ```
  602.  
  603.  
  604. ```python
  605. joblib.dump(estimator, local_path + "/models/users-clustering-gmm-model.pkl")
  606. ```
  607.  
  608.  
  609. ```python
  610. estimator = joblib.load(local_path + "/models/users-clustering-gmm-model.pkl")
  611. ```
  612.  
  613.  
  614. ```python
  615. probs = estimator.predict_proba(user_features)
  616. cluster_assignments = estimator.predict(user_features)
  617. ```
  618.  
  619.  
  620. ```python
  621. for cluster in range(n_clusters):
  622. with open(local_path + "taste-profile-subset/clusters/user-ids-" + str(cluster + 1) + ".txt", "w") as f:
  623. f.write("\n".join(user_data[cluster_assignments == cluster][0]))
  624. ```
  625.  
  626.  
  627. ```python
  628. with open(local_path + "/features/user-cluster-probabilities.csv", "w") as f:
  629. for i, user_id in enumerate(user_data[user_data.columns[0]]):
  630. params = [user_id] + list(probs[i]) + [cluster_assignments[i]]
  631.  
  632. params = [str(param) for param in params]
  633.  
  634. f.write("\t".join(params) + "\n")
  635. ```
  636.  
  637. ### LDA Plot of User Features
  638.  
  639.  
  640. ```python
  641. decomposed_user_features = LinearDiscriminantAnalysis(n_components=2).fit(user_features, cluster_assignments).transform(user_features)
  642. ```
  643.  
  644.  
  645. ```python
  646. for i in range(n_clusters):
  647. plt.scatter(decomposed_user_features[cluster_assignments == i, 0], decomposed_user_features[cluster_assignments == i, 1], alpha=.8, rasterized=True, s=0.7)
  648.  
  649. plt.gca().set_ylim([-15, 5])
  650. plt.title("User Features: LDA Plot (After GMM)")
  651.  
  652. plt.savefig(local_path + "plots/user-features-gmm-clustering-pca.png", dpi=250)
  653. plt.show()
  654. ```
  655.  
  656. ## Distributing Users by their clusters
  657.  
  658.  
  659. ```python
  660. local_path = root_path + "data/taste-profile-subset/"
  661. ```
  662.  
  663.  
  664. ```bash
  665. %%bash -s "$local_path"
  666.  
  667. cd $1/clusters/
  668. for cluster in {1..20}; do
  669. cat user-ids-$cluster.txt | sed "s/$/\t$cluster/g"
  670. echo ""
  671. done > user-clusters.txt
  672. ```
  673.  
  674.  
  675. ```python
  676. n_clusters = 20
  677. ```
  678.  
  679.  
  680. ```python
  681. cluster_files = [open(local_path + "clusters/user-track-counts-" + str(cluster + 1) + ".txt", "w") for cluster in range(n_clusters)]
  682. ```
  683.  
  684.  
  685. ```python
  686. user_clusters = dict()
  687. with open(local_path + "clusters/user-clusters.txt") as f:
  688. for line in f:
  689. line = line.strip("\t\n\r").split("\t")
  690. user_clusters[line[0]] = int(line[1])
  691. ```
  692.  
  693.  
  694. ```python
  695. with open(local_path + "user-track-counts.txt") as f:
  696. for line in f:
  697. line = line.strip("\t\n\r").split("\t")
  698. if line[0] in user_clusters:
  699. cluster_files[user_clusters[line[0]] - 1].write("\t".join(line) + "\n")
  700. ```
  701.  
  702.  
  703. ```python
  704. for f in cluster_files:
  705. f.close()
  706. ```
  707.  
  708. ## Collaborative Filtering On User Clusters
  709.  
  710.  
  711. ```python
  712. from math import sqrt
  713.  
  714. import numpy as np
  715. from scipy.sparse import csr_matrix
  716. ```
  717.  
  718.  
  719. ```python
  720. local_path = root_path + "data/taste-profile-subset/"
  721.  
  722. n_clusters = 20
  723. ```
  724.  
  725.  
  726. ```python
  727. user_suggestions_file = open(local_path + "suggestions.csv", "w")
  728. ```
  729.  
  730.  
  731. ```python
  732. user_track_counts = dict()
  733.  
  734. with open(local_path + "clustered-user-track-counts/cluster-k0.txt") as f:
  735. for line in f:
  736. line = line.strip(" \t\n\r").split("\t")
  737. if line != []:
  738. user_track_counts[line[0]] = set(line[1:])
  739.  
  740. similarity = [[0]*len(user_track_counts)]*len(user_track_counts)
  741. ```
  742.  
  743.  
  744. ```python
  745. tracks = set([])
  746. for user in user_track_counts:
  747. for track in user_track_counts[user]:
  748. tracks.add(track)
  749.  
  750. tracks = list(tracks)
  751. users = list(user_track_counts)
  752. ```
  753.  
  754.  
  755. ```python
  756. N, M = (len(users), len(tracks))
  757. ```
  758.  
  759.  
  760. ```python
  761. for i, user_i in enumerate(users):
  762. weights = dict()
  763. for track in tracks:
  764. weights[track] = 0
  765.  
  766. for user_j in user_track_counts:
  767. if user_i != user_j:
  768.  
  769. similarity = len(user_track_counts[user_i].intersection(user_track_counts[user_j]))
  770. similarity = similarity / (sqrt(len(user_track_counts[user_i])) * sqrt(len(user_track_counts[user_j])))
  771.  
  772. for track in user_track_counts[user_j]:
  773. if track not in user_track_counts[user_i]:
  774. weights[track] += similarity
  775.  
  776. keys = sorted(list(weights), key=lambda x: -weights[x])[:50]
  777. user_suggestions_file.write(user_i + "\t" + "\t".join(keys) + "\n")
  778. ```
  779.  
  780.  
  781. ```python
  782. user_suggestions_file.close()
  783. ```
  784.  
  785. ## Generating Recommendations for Validation Users (User-User Localized Similarity)
  786.  
  787.  
  788. ```python
  789. import random
  790. from math import sqrt
  791.  
  792. import numpy as np
  793. from sklearn.externals import joblib
  794.  
  795. from multiprocessing import Pool
  796. ```
  797.  
  798.  
  799. ```python
  800. local_path = root_path + "data/"
  801.  
  802. n_clusters = 20
  803. ```
  804.  
  805.  
  806. ```python
  807. user_features = dict()
  808. with open(local_path + "features/user-features-validation.csv") as f:
  809. for line in f:
  810. line = line.strip(" \t\n\r").split()
  811. user_features[line[0]] = line[1:]
  812. ```
  813.  
  814.  
  815. ```python
  816. users = list(user_features)
  817. ```
  818.  
  819.  
  820. ```python
  821. gmm_clustering_model = joblib.load(local_path + "models/users-clustering-gmm-model.pkl")
  822. ```
  823.  
  824.  
  825. ```python
  826. clustered_users = dict()
  827. for cluster in range(n_clusters):
  828. clustered_users[cluster] = []
  829.  
  830. for user in users:
  831. cluster = gmm_clustering_model.predict([user_features[user]])[0]
  832. clustered_users[cluster].append(user)
  833. ```
  834.  
  835.  
  836. ```python
  837. user_tracks = dict()
  838. user_validation_tracks = dict()
  839. for user in users:
  840. user_tracks[user] = [set([]), 0]
  841. user_validation_tracks[user] = set([])
  842.  
  843. with open(local_path + "taste-profile-subset/user-track-counts-validation.txt") as f:
  844. for line in f:
  845. line = line.strip(" \n\r").split("\t")
  846. if random.random() > 0.35:
  847. user_tracks[line[0]][0].add(line[1])
  848. else:
  849. user_validation_tracks[line[0]].add(line[1])
  850.  
  851. for user in users:
  852. user_tracks[user][1] = sqrt(len(user_tracks[user][0]))
  853. ```
  854.  
  855.  
  856. ```python
  857. def get_suggestions_for_cluster(cluster):
  858. global user_tracks, clustered_users, local_path
  859.  
  860. outfile = open(local_path + "taste-profile-subset/suggestions-validation-" + str(cluster) + ".txt", "w")
  861.  
  862. print "Starting for cluster", cluster
  863. tracks = set([])
  864.  
  865. cluster_user_tracks = dict()
  866. with open(local_path + "taste-profile-subset/clusters/user-ids-" + str(cluster + 1) + ".txt") as f:
  867. for line in f:
  868. cluster_user_tracks[line.strip(" \n\r")] = [set([]), 0]
  869.  
  870. with open(local_path + "taste-profile-subset/clusters/user-track-counts-" + str(cluster + 1) + ".txt") as f:
  871. for line in f:
  872. line = line.strip(" \n\r").split("\t")
  873. cluster_user_tracks[line[0]][0].add(line[1])
  874. tracks.add(line[1])
  875.  
  876. for user in cluster_user_tracks:
  877. cluster_user_tracks[user][1] = sqrt(len(cluster_user_tracks[user][0]))
  878.  
  879. for i, user_v in enumerate(clustered_users[cluster]):
  880. if i % 10 == 0:
  881. print "\tStarting for user", i
  882.  
  883. track_weights = dict()
  884. for track in tracks:
  885. track_weights[track] = 0
  886.  
  887. for user_t in cluster_user_tracks:
  888. similarity = len(user_tracks[user_v][0].intersection(cluster_user_tracks[user_t][0]))
  889. similarity = similarity / (user_tracks[user_v][1] * cluster_user_tracks[user_t][1])
  890. similarity = pow(similarity, 6)
  891.  
  892. for track in cluster_user_tracks[user_t][0].difference(user_tracks[user_v][0]):
  893. track_weights[track] += similarity
  894.  
  895. suggestions = np.array(sorted(tracks, key=lambda x: track_weights[x]))[-500:]
  896. suggestions = set(suggestions[np.searchsorted([track_weights[track] for track in suggestions], 0, side="right"):])
  897.  
  898. outfile.write(user_v + "\t" + "\t".join(suggestions) + "\n")
  899.  
  900. outfile.close()
  901. ```
  902.  
  903.  
  904. ```python
  905. process_pool = Pool(4)
  906. for i in range(n_clusters):
  907. process_pool.map(get_suggestions_for_cluster, range(n_clusters))
  908. ```
  909.  
  910.  
  911. ```bash
  912. %%bash -s "$local_path"
  913.  
  914. cd $1/taste-profile-subset
  915.  
  916. for cluster in {1..20}; do
  917. cat suggestions-validation-$cluster.txt
  918. done > suggestions-validation.txt
  919.  
  920. for cluster in {1..20}; do
  921. rm suggestions-validation-$cluster.txt
  922. done
  923. ```
  924.  
  925.  
  926. ```python
  927. with open(local_path + "taste-profile-subset/user-tracks-used-validation.txt", "w") as f:
  928. for user in user_tracks:
  929. f.write(user + "\t" + "\t".join(user_tracks[user][0]) + "\n")
  930. ```
  931.  
  932. ## Generating Recommendations for Validation Users (Item-Item Localized Similarity) NOT
  933.  
  934.  
  935. ```python
  936. import random
  937. from math import sqrt
  938.  
  939. import numpy as np
  940. from sklearn.externals import joblib
  941.  
  942. from multiprocessing import Pool
  943. ```
  944.  
  945.  
  946. ```python
  947. local_path = root_path + "data/"
  948.  
  949. n_clusters = 20
  950. ```
  951.  
  952.  
  953. ```python
  954. user_features = dict()
  955. with open(local_path + "features/user-features-validation.csv") as f:
  956. for line in f:
  957. line = line.strip(" \t\n\r").split()
  958. user_features[line[0]] = line[1:]
  959. ```
  960.  
  961.  
  962. ```python
  963. users = list(user_features)
  964. ```
  965.  
  966.  
  967. ```python
  968. gmm_clustering_model = joblib.load(local_path + "models/users-clustering-gmm-model.pkl")
  969. ```
  970.  
  971.  
  972. ```python
  973. user_tracks = dict()
  974. user_validation_tracks = dict()
  975. for user in users:
  976. user_tracks[user] = [set([]), 0]
  977. user_validation_tracks[user] = set([])
  978.  
  979. track_users = dict()
  980.  
  981. with open(local_path + "taste-profile-subset/user-track-counts-validation.txt") as f:
  982. for line in f:
  983. line = line.strip(" \n\r").split("\t")
  984. if random.random() > 0.35:
  985. if line[1] not in track_users:
  986. track_users[line[1]] = [set([]), 0]
  987.  
  988. track_users[line[1]][0].add(line[0])
  989. user_tracks[line[0]][0].add(line[1])
  990. else:
  991. user_validation_tracks[line[0]].add(line[1])
  992.  
  993. for user in users:
  994. user_tracks[user][1] = sqrt(len(user_tracks[user][0]))
  995.  
  996. for track in track_users:
  997. track_users[track][1] = sqrt(len(track_users[track][0]))
  998. ```
  999.  
  1000.  
  1001. ```python
  1002. clustered_users = dict()
  1003. clustered_tracks = dict()
  1004. for cluster in range(n_clusters):
  1005. clustered_users[cluster] = []
  1006. clustered_tracks[cluster] = set([])
  1007.  
  1008. for user in users:
  1009. cluster = gmm_clustering_model.predict([user_features[user]])[0]
  1010.  
  1011. clustered_users[cluster].append(user)
  1012. clustered_tracks[cluster] = clustered_tracks[cluster].union(user_tracks[user][0])
  1013. ```
  1014.  
  1015.  
  1016. ```python
  1017. def get_suggestions_for_cluster(cluster):
  1018. global track_users, clustered_users, clustered_tracks, local_path
  1019.  
  1020. outfile = open(local_path + "taste-profile-subset/suggestions-validation-" + str(cluster) + ".txt", "w")
  1021.  
  1022. print "Starting for cluster", cluster
  1023.  
  1024. cluster_track_users = dict()
  1025. with open(local_path + "taste-profile-subset/clusters/user-track-counts-" + str(cluster + 1) + ".txt") as f:
  1026. for line in f:
  1027. line = line.strip(" \n\r").split("\t")
  1028. if line[1] not in cluster_track_users:
  1029. cluster_track_users[line[1]] = [set([]), 0]
  1030.  
  1031. cluster_track_users[line[1]][0].add(line[0])
  1032.  
  1033. for track in cluster_track_users:
  1034. cluster_track_users[track][1] = sqrt(len(cluster_track_users[track][0]))
  1035.  
  1036. for i, user_v in enumerate(list(clustered_users[cluster])):
  1037. if i % 10 == 9:
  1038. print "\tStarting for user", i
  1039. outfile.close()
  1040.  
  1041. suggestions = []
  1042.  
  1043. for j, track_t in enumerate(list(cluster_track_users)):
  1044. similarity = 0
  1045.  
  1046. for track_v in list(user_tracks[user_v][0]):
  1047. similarity_t = len(track_users[track_v][0].intersection(cluster_track_users[track_t][0]))
  1048. similarity_t = similarity / (track_users[track_v][1] * cluster_track_users[track_t][1])
  1049. similarity_t = pow(similarity, 3)
  1050.  
  1051. similarity += similarity_t
  1052.  
  1053. suggestions.append((track_t, similarity))
  1054.  
  1055. suggestions.sort(key=lambda x: -x[1])
  1056. suggestions = suggestions[:500]
  1057. suggestions = suggestions[:np.searchsorted([track[1] for track in suggestions], 0, side="left")]
  1058.  
  1059. print suggestions
  1060.  
  1061. outfile.write(user_v + "\t" + "\t".join(suggestions) + "\n")
  1062.  
  1063. outfile.close()
  1064. ```
  1065.  
  1066.  
  1067. ```python
  1068. # process_pool = Pool(1)
  1069. # process_pool.map(get_suggestions_for_cluster, range(n_clusters))
  1070. get_suggestions_for_cluster(0)
  1071. ```
  1072.  
  1073.  
  1074. ```bash
  1075. %%bash -s "$local_path"
  1076.  
  1077. cd $1/taste-profile-subset
  1078.  
  1079. for cluster in {1..20}; do
  1080. cat suggestions-validation-$cluster.txt
  1081. done > suggestions-validation.txt
  1082.  
  1083. for cluster in {1..20}; do
  1084. rm suggestions-validation-$cluster.txt
  1085. done
  1086. ```
  1087.  
  1088.  
  1089. ```python
  1090. with open(local_path + "taste-profile-subset/user-tracks-used-validation.txt", "w") as f:
  1091. for user in user_tracks:
  1092. f.write(user + "\t" + "\t".join(user_tracks[user][0]) + "\n")
  1093. ```
  1094.  
  1095. ## Computing Truncated mAP on the Predicted Recommendations
  1096.  
  1097.  
  1098. ```python
  1099. import numpy as np
  1100. ```
  1101.  
  1102.  
  1103. ```python
  1104. local_path = root_path + "data/taste-profile-subset/"
  1105. ```
  1106.  
  1107.  
  1108. ```python
  1109. listened_user_tracks = dict()
  1110. with open(local_path + "users-validation.txt") as f:
  1111. for line in f:
  1112. line = line.strip(" \n\r")
  1113. listened_user_tracks[line] = set([])
  1114.  
  1115. with open(local_path + "user-track-counts-validation.txt") as f:
  1116. for line in f:
  1117. line = line.strip(" \n\r").split("\t")
  1118. listened_user_tracks[line[0]].add(line[1])
  1119.  
  1120. for user in listened_user_tracks:
  1121. listened_user_tracks[user] = set(listened_user_tracks[user])
  1122. ```
  1123.  
  1124.  
  1125. ```python
  1126. with open(local_path + "user-tracks-used-validation-uu2.txt") as f:
  1127. for line in f:
  1128. line = line.strip(" \n\r").split("\t")
  1129. listened_user_tracks[line[0]] = listened_user_tracks[line[0]].difference(line[1:])
  1130. ```
  1131.  
  1132.  
  1133. ```python
  1134. with open(local_path + "suggestions-uu2.txt") as f:
  1135. aps = list()
  1136.  
  1137. for line in f:
  1138. if line.strip() == "":
  1139. continue
  1140.  
  1141. line = line.strip(" \t\n\r").split("\t")
  1142.  
  1143. user = line[0]
  1144. tracks = line[1:]
  1145. tracks = np.array(tracks[:500])
  1146.  
  1147. k = 0
  1148. l = 0
  1149. p = 0.0
  1150. for i, track in enumerate(tracks):
  1151. k += 1
  1152. if track in listened_user_tracks[user]:
  1153. l += 1
  1154. p += float(l) / float(k)
  1155.  
  1156. if l != 0:
  1157. aps.append(p / l)
  1158. else:
  1159. aps.append(0)
  1160.  
  1161. print np.mean(aps)
  1162. ```
  1163.  
  1164. 0.275001780514
  1165.  
  1166.  
  1167. ## LDA Plot of the User Suggestions
  1168.  
  1169.  
  1170. ```python
  1171. import numpy as np
  1172.  
  1173. from sklearn.externals import joblib
  1174. from sklearn.decomposition import PCA
  1175. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
  1176. ```
  1177.  
  1178.  
  1179. ```python
  1180. local_path = root_path + "data/"
  1181.  
  1182. n_clusters = 10
  1183. ```
  1184.  
  1185.  
  1186. ```python
  1187. listened_user_tracks = dict()
  1188. with open(local_path + "taste-profile-subset/users-validation.txt") as f:
  1189. for line in f:
  1190. line = line.strip(" \n\r")
  1191. listened_user_tracks[line] = set([])
  1192.  
  1193. with open(local_path + "taste-profile-subset/user-track-counts-validation.txt") as f:
  1194. for line in f:
  1195. line = line.strip(" \n\r").split("\t")
  1196. listened_user_tracks[line[0]].add(line[1])
  1197.  
  1198. for user in listened_user_tracks:
  1199. listened_user_tracks[user] = set(listened_user_tracks[user])
  1200. ```
  1201.  
  1202.  
  1203. ```python
  1204. with open(local_path + "taste-profile-subset/user-tracks-used-validation-uu2.txt") as f:
  1205. for line in f:
  1206. line = line.strip(" \n\r").split("\t")
  1207. listened_user_tracks[line[0]] = listened_user_tracks[line[0]].difference(line[1:])
  1208. ```
  1209.  
  1210.  
  1211. ```python
  1212. user_suggestions = dict()
  1213. with open(local_path + "taste-profile-subset/suggestions-uu2.txt") as f:
  1214. for line in f:
  1215. line = line.strip(" \t\n\r").split("\t")
  1216. user_suggestions[line[0]] = set(line[1:]).difference(listened_user_tracks[user])
  1217. ```
  1218.  
  1219. ### Clustering Tracks
  1220.  
  1221.  
  1222. ```python
  1223. tracks_clustering_model = joblib.load(local_path + "models/tracks-clustering-gmm-model.pkl")
  1224. ```
  1225.  
  1226.  
  1227. ```python
  1228. tracks_mfcc = []
  1229. with open(local_path + "features/tracks-mfcc.csv") as f:
  1230. f.readline()
  1231. for line in f:
  1232. line = line.strip(" \t\n\r").split()
  1233. tracks_mfcc.append([float(field) for field in line[1:]])
  1234. ```
  1235.  
  1236.  
  1237. ```python
  1238. cluster_assignments = tracks_clustering_model.predict(tracks_mfcc)
  1239. ```
  1240.  
  1241. ### Loading User Tracks
  1242.  
  1243.  
  1244. ```python
  1245. user = list(user_suggestions)[0]
  1246. ```
  1247.  
  1248.  
  1249. ```python
  1250. user_tracks = listened_user_tracks[user]
  1251. user_suggestions = user_suggestions[user]
  1252. ```
  1253.  
  1254.  
  1255. ```python
  1256. user_tracks_mfcc = []
  1257. user_suggestions_mfcc = []
  1258. with open(local_path + "features/tracks-mfcc.csv") as f:
  1259. f.readline()
  1260. for line in f:
  1261. line = line.strip(" \t\n\r").split()
  1262. if line[0] in user_tracks:
  1263. user_tracks_mfcc.append([float(field) for field in line[1:]])
  1264.  
  1265. if line[0] in user_suggestions:
  1266. user_suggestions_mfcc.append([float(field) for field in line[1:]])
  1267. ```
  1268.  
  1269. ### LDA Plot of User Tracks and Suggestions
  1270.  
  1271.  
  1272. ```python
  1273. lda_model = LinearDiscriminantAnalysis(n_components=2).fit(tracks_mfcc, cluster_assignments)
  1274. ```
  1275.  
  1276.  
  1277. ```python
  1278. decomposed_tracks_mfcc = lda_model.transform(tracks_mfcc)
  1279. decomposed_user_tracks_mfcc = lda_model.transform(user_tracks_mfcc)
  1280. decomposed_user_suggestions_mfcc = lda_model.transform(user_suggestions_mfcc)
  1281. ```
  1282.  
  1283.  
  1284. ```python
  1285. for i in range(n_clusters):
  1286. plt.scatter(decomposed_tracks_mfcc[cluster_assignments == i, 0], decomposed_tracks_mfcc[cluster_assignments == i, 1], alpha=.8, rasterized=True, s=0.7)
  1287.  
  1288. plt.scatter(decomposed_user_tracks_mfcc[:, 0], decomposed_user_tracks_mfcc[:, 1], alpha=1, s=8, c="blue")
  1289. plt.scatter(decomposed_user_suggestions_mfcc[:, 0], decomposed_user_suggestions_mfcc[:, 1], alpha=1, s=8, c="black")
  1290.  
  1291. plt.gca().set_xlim([-15, 5])
  1292. plt.gca().set_ylim([-4, 4.5])
  1293. plt.title("Tracks MFCC: LDA Plot (After GMM)")
  1294.  
  1295. plt.savefig(local_path + "plots/tracks-mfcc-lda-exploited-suggestions.png", dpi=250)
  1296. plt.show()
  1297. ```
  1298.  
  1299. # Exploration
  1300.  
  1301. ## Generating Track Recommendations through Exploration
  1302.  
  1303.  
  1304. ```python
  1305. from math import sqrt
  1306.  
  1307. import numpy as np
  1308.  
  1309. from sklearn.externals import joblib
  1310. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
  1311. ```
  1312.  
  1313.  
  1314. ```python
  1315. local_path = root_path + "data/"
  1316.  
  1317. n_clusters = 10
  1318. n_suggestions = 25
  1319. ```
  1320.  
  1321.  
  1322. ```python
  1323. users = set()
  1324. tracks = set()
  1325. with open(local_path + "taste-profile-subset/user-track-counts.txt") as f:
  1326. for line in f:
  1327. line = line.strip(" \t\n\r").split("\t")
  1328.  
  1329. users.add(line[0])
  1330. tracks.add(line[1])
  1331. ```
  1332.  
  1333.  
  1334. ```python
  1335. users = list(users)
  1336. tracks = list(tracks)
  1337. ```
  1338.  
  1339.  
  1340. ```python
  1341. user_indices = dict()
  1342. track_indices = dict()
  1343.  
  1344. for i, user in enumerate(users):
  1345. user_indices[user] = i
  1346.  
  1347. for i, track in enumerate(tracks):
  1348. track_indices[track] = i
  1349. ```
  1350.  
  1351.  
  1352. ```python
  1353. user_tracks = dict()
  1354. track_features = dict()
  1355.  
  1356. for i in range(len(users)):
  1357. user_tracks[i] = set()
  1358.  
  1359. for i in range(len(tracks)):
  1360. track_features[i] = [0, -1, -1]
  1361. ```
  1362.  
  1363.  
  1364. ```python
  1365. with open(local_path + "taste-profile-subset/user-track-counts.txt") as f:
  1366. for line in f:
  1367. line = line.strip(" \t\n\r").split("\t")
  1368.  
  1369. user, track = user_indices[line[0]], track_indices[line[1]]
  1370. user_tracks[user].add(track)
  1371. track_features[track][0] += 1
  1372. ```
  1373.  
  1374.  
  1375. ```python
  1376. clustered_tracks = dict()
  1377. for cluster in range(n_clusters):
  1378. clustered_tracks[cluster] = []
  1379.  
  1380. with open(local_path + "features/tracks-cluster-probabilities.csv") as f:
  1381. for track in f:
  1382. track = track.strip(" \t\n\r").split("\t")
  1383. if track[0] in track_indices:
  1384. track[0] = track_indices[track[0]]
  1385.  
  1386. clustered_tracks[int(track[-1])].append(track[0])
  1387. track_features[track[0]][1] = int(track[-1])
  1388. track_features[track[0]][2] = float(track[int(track[-1]) + 1])
  1389. ```
  1390.  
  1391.  
  1392. ```python
  1393. track_features[list(track_features)[0]]
  1394. ```
  1395.  
  1396.  
  1397. ```python
  1398. for cluster in clustered_tracks:
  1399. clustered_tracks[cluster].sort(key=lambda track: -track_features[track][0] * track_features[track][2])
  1400. ```
  1401.  
  1402.  
  1403. ```python
  1404. user_tracks_clusters = dict()
  1405. for user in user_tracks:
  1406. user_tracks_clusters[user] = []
  1407. for cluster in range(n_clusters):
  1408. user_tracks_clusters[user].append(1)
  1409.  
  1410. for track in user_tracks[user]:
  1411. user_tracks_clusters[user][track_features[track][1]] += 1
  1412. ```
  1413.  
  1414.  
  1415. ```python
  1416. for user in user_tracks:
  1417. normalization_const = 0
  1418. for cluster in range(n_clusters):
  1419. user_tracks_clusters[user][cluster] = sqrt(len(clustered_tracks[cluster])) / user_tracks_clusters[user][cluster]
  1420. normalization_const += user_tracks_clusters[user][cluster]
  1421.  
  1422. for cluster in range(n_clusters):
  1423. user_tracks_clusters[user][cluster] = user_tracks_clusters[user][cluster] / normalization_const
  1424. ```
  1425.  
  1426.  
  1427. ```python
  1428. outfile = open(local_path + "taste-profile-subset/suggestions-exploration.txt", "w")
  1429.  
  1430. user_suggestions = dict()
  1431. for user in user_tracks:
  1432. suggestions = set([])
  1433. cluster_indices = [0] * n_suggestions
  1434.  
  1435. while len(suggestions) < n_suggestions:
  1436. cluster = np.argmax(np.random.multinomial(20, user_tracks_clusters[user], size = 1))
  1437.  
  1438. while clustered_tracks[cluster][cluster_indices[cluster]] in user_tracks[user]:
  1439. cluster_indices[cluster] += 1
  1440.  
  1441. suggestions.add(clustered_tracks[cluster][cluster_indices[cluster]])
  1442. cluster_indices[cluster] += 1
  1443.  
  1444. user_suggestions[user] = suggestions
  1445. outfile.write(users[user] + "\t" + "\t".join([tracks[track] for track in suggestions]) + "\n")
  1446.  
  1447. outfile.close()
  1448. ```
  1449.  
  1450. ## Plotting User Suggestions
  1451.  
  1452.  
  1453. ```python
  1454. from sklearn.externals import joblib
  1455. ```
  1456.  
  1457.  
  1458. ```python
  1459. local_path = root_path + "data/"
  1460. ```
  1461.  
  1462.  
  1463. ```python
  1464. tracks_clustering_model = joblib.load(local_path + "models/tracks-clustering-gmm-model.pkl")
  1465. ```
  1466.  
  1467.  
  1468. ```python
  1469. tracks_mfcc = []
  1470. with open(local_path + "features/tracks-mfcc.csv") as f:
  1471. f.readline()
  1472. for line in f:
  1473. line = line.strip(" \t\n\r").split()
  1474. tracks_mfcc.append([float(field) for field in line[1:]])
  1475. ```
  1476.  
  1477.  
  1478. ```python
  1479. cluster_assignments = tracks_clustering_model.predict(tracks_mfcc)
  1480. ```
  1481.  
  1482. ### Loading Tracks for First User
  1483.  
  1484.  
  1485. ```python
  1486. user_suggestions = []
  1487. with open(local_path + "taste-profile-subset/suggestions-exploration.txt") as f:
  1488. user = f.readline().split("\t")
  1489. user_suggestions = user[1:]
  1490. user = user[0]
  1491. ```
  1492.  
  1493.  
  1494. ```python
  1495. user_tracks = []
  1496. with open(local_path + "taste-profile-subset/user-track-counts.txt") as f:
  1497. for line in f:
  1498. line = line.strip(" \t\n\r").split("\t")
  1499.  
  1500. if line[0] == user:
  1501. user_tracks.append(line[1])
  1502. ```
  1503.  
  1504.  
  1505. ```python
  1506. user_tracks_mfcc = []
  1507. user_suggestions_mfcc = []
  1508. with open(local_path + "features/tracks-mfcc.csv") as f:
  1509. f.readline()
  1510. for line in f:
  1511. line = line.strip(" \t\n\r").split()
  1512. if line[0] in user_tracks:
  1513. user_tracks_mfcc.append([float(field) for field in line[1:]])
  1514.  
  1515. if line[0] in user_suggestions:
  1516. user_suggestions_mfcc.append([float(field) for field in line[1:]])
  1517. ```
  1518.  
  1519. ### LDA Plot of Tracks
  1520.  
  1521.  
  1522. ```python
  1523. lda_model = LinearDiscriminantAnalysis(n_components=2).fit(tracks_mfcc, cluster_assignments)
  1524. ```
  1525.  
  1526.  
  1527. ```python
  1528. decomposed_tracks_mfcc = lda_model.transform(tracks_mfcc)
  1529. decomposed_user_tracks_mfcc = lda_model.transform(user_tracks_mfcc)
  1530. decomposed_user_suggestions_mfcc = lda_model.transform(user_suggestions_mfcc)
  1531. ```
  1532.  
  1533.  
  1534. ```python
  1535. for i in range(n_clusters):
  1536. plt.scatter(decomposed_tracks_mfcc[cluster_assignments == i, 0], decomposed_tracks_mfcc[cluster_assignments == i, 1], alpha=.8, rasterized=True, s=0.7)
  1537.  
  1538. plt.scatter(decomposed_user_tracks_mfcc[:, 0], decomposed_user_tracks_mfcc[:, 1], alpha=1, s=8, c="blue")
  1539. plt.scatter(decomposed_user_suggestions_mfcc[:, 0], decomposed_user_suggestions_mfcc[:, 1], alpha=1, s=15, c="black")
  1540.  
  1541. plt.gca().set_xlim([-15, 5])
  1542. plt.gca().set_ylim([-4, 4.5])
  1543. plt.title("Tracks MFCC: LDA Plot (After GMM)")
  1544.  
  1545. plt.savefig(local_path + "plots/tracks-mfcc-lda-explored-suggestions.png", dpi=250)
  1546. plt.show()
  1547. ```
Add Comment
Please, Sign In to add comment