Advertisement
Guest User

k-means

a guest
Jun 29th, 2017
57
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.67 KB | None | 0 0
  1. k = 2 # Define the number of clusters in which we want to partion THE data
  2. # Define the proper notion of distance to deal with documents
  3. from sklearn.metrics.pairwise import cosine_similarity
  4. dist = 1 - cosine_similarity(X)
  5. # Run the algorithm KMeans
  6. model = KMeans(n_clusters = k)
  7. model.fit(X);
  8.  
  9. print("Top terms per cluster:\n")
  10. order_centroids = model.cluster_centers_.argsort()[:, ::-1]
  11. terms = vectorizer.get_feature_names()
  12. for i in range(k):
  13.     print ("Cluster %i:" % i, end='')
  14.     for ind in order_centroids[i, :3]:
  15.         print (' %s,' % terms[ind], end='')
  16.     print ("")
  17. Top terms per cluster:
  18.  
  19. Cluster 0: awesome, staff, cs50,
  20. Cluster 1: dog, cat, keeps,
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement