daily pastebin goal
25%
SHARE
TWEET

Untitled

a guest Feb 22nd, 2019 84 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. def pagerank2(E, return_vector = False, max_iters = 1000, tolerance = 1e-6):
  2.     """compute page rank from dense adjacency matrix
  3.  
  4.     Inputs:
  5.       E: adjacency matrix with links going from cols to rows.
  6.          E is a matrix of 0s and 1s, where E[i,j] = 1 means
  7.          that web page (vertex) j has a link to web page i.
  8.       return_vector = False: If True, return the eigenvector as well as the ranking.
  9.       max_iters = 1000: Maximum number of power iterations to do.
  10.       tolerance = 1e-6: Stop when the eigenvector norm changes by less than this.
  11.      
  12.     Outputs:
  13.       ranking: Permutation giving the ranking, most important first
  14.       vector (only if return_vector is True): Dominant eigenvector of PageRank matrix
  15.  
  16.     This computes page rank by the following steps:
  17.     1. Add links from any dangling vertices to all vertices.
  18.     2. Scale the columns to sum to 1.
  19.     3. Add a constant matrix to represent jumping at random 15% of the time.
  20.     4. Find the dominant eigenvector with the power method.
  21.     5. Sort the eigenvector to get the rankings.
  22.  
  23.     The homework problem due February 22 asks you to rewrite this code so
  24.     it takes input E as a scipy csr_sparse matrix, and then never creates
  25.     a full matrix or any large matrix other than E.
  26.     """
  27.    
  28.     #E = sparse.csr_matrix(E)
  29.        
  30.                
  31.     #nnz = np.count_nonzero(E) # This call for sparse E may be different
  32.    # outdegree = sparse.csr_matrix.sum(E, axis=0).T  # This call for sparse E may be different
  33.     nrows, n = E.shape
  34.  
  35.     assert nrows == n, 'E must be square'
  36.     #assert np.max(E) == 1 and np.sum(E) == nnz, 'E must contain only zeros and ones'
  37.    
  38.     #  1. Add links from any dangling vertices to all other vertices.
  39.     #  3. Add a constant matrix to represent jumping at random 15% of the time.
  40.  
  41.     s = 1.0/n
  42.     m = 0.15
  43.     E = (1 - m) * E/sparse.csr_matrix.sum(E, 0) + m*s
  44.    
  45.     #  4. Find the dominant eigenvector.
  46.     #  Start with a vector all of whose entries are equal.
  47.    
  48.     e = np.ones(n)
  49.     v = e / npla.norm(e)
  50.    
  51.     for iteration in range(max_iters):
  52.         oldv = v
  53.         v = E.dot(v)
  54.         v = np.squeeze(np.asarray(v))
  55.         eigval = npla.norm(v)
  56.         v = v / eigval
  57.        
  58.         if npla.norm(v - oldv) < tolerance:
  59.             break
  60.    
  61.     if npla.norm(v - oldv) < tolerance:
  62.         print('Dominant eigenvalue is %f after %d iterations.\n' % (eigval, iteration+1))
  63.     else:
  64.         print('Did not converge to tolerance %e after %d iterations.\n' % (tolerance, max_iters))
  65.  
  66.     # Check that the eigenvector elements are all the same sign, and make them positive
  67.     #assert np.all(v > 0) or np.all(v < 0), 'Error: eigenvector is not all > 0 or < 0'
  68.     vector = np.abs(v)
  69.        
  70.     #  5. Sort the eigenvector and reverse the permutation to get the rankings.
  71.     ranking = np.argsort(vector)[::-1]
  72.  
  73.     if return_vector:
  74.         return ranking, vector
  75.     else:
  76.         return ranking
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top