Guest User

Untitled

a guest
Sep 21st, 2018
129
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.82 KB | None | 0 0
  1. """
  2. I think I will need to:
  3. - delete line 47
  4. sorted_df = sorted_df.reset_index(drop=True) causes extra processing with no real benefit in this case.
  5. And updating the index for a large dataset can be very expensive
  6.  
  7. -or better still, on lines 46:48, I could replace:
  8. sorted_df = df.sort_values(by=['distance'])
  9. sorted_df = sorted_df.reset_index(drop=True)
  10. trimmed_df = sorted_df.drop('distance', axis=1).head(n)
  11. with:
  12. df = df.nsmallest(n, 'distance').drop('distance', axis=1)
  13. This will handle the sorting, drop the distance column and reassign the dataframe to the df variable, instead of assigning the dataframe to a new variable trimmed_df and allocating more memory.
  14.  
  15. - Also, I could use a mergesort instead of the default quicksort as mergesort has a worst case complexity o O(n log n), whereas quicksort has a worst case complexity of O(nxn)
  16.  
  17. - another option is to use numpy (argsort) for sorting the distance column rather than the sort_values method. Numpy has been proven to be faster than pandas when sorting
  18.  
  19. - another option is to utilize scipy's squareform, pdist packages for computing distance
  20.  
  21. """
  22.  
  23. def nearest_n_with_package(self,params):
  24. """
  25. returns a list of n coordinates in ascending order of the distance between params['x','y'] coordinate and each coordinate in the dataset
  26. using the pandas and shapely package
  27. Args:
  28. params (dict): Dictionary containing x, y, n keys
  29.  
  30. Returns:
  31. List: list of objects
  32. """
  33. x = float(params['x'])
  34. y = float(params['y'])
  35. n = int(params['n'])
  36.  
  37. request_point = Point(x, y)
  38. df = pd.read_csv(self.dataset_path, delimiter=';')
  39.  
  40. def distance_calc(row):
  41. data_point = Point(float(row['x']), float(row['y']))
  42. return request_point.distance(data_point)
  43.  
  44. df['distance'] = df.apply(distance_calc, axis=1)
  45.  
  46. sorted_df = df.sort_values(by=['distance'])
  47. sorted_df = sorted_df.reset_index(drop=True)
  48. trimmed_df = sorted_df.drop('distance', axis=1).head(n)
  49.  
  50. json_string = trimmed_df.to_json(orient = "records")
  51.  
  52. return json.loads(json_string)
  53.  
  54.  
  55. """Refactored Method"""
  56. def nearest_n_with_package(self,params):
  57. """
  58. returns a list of n coordinates in ascending order of the distance between params['x','y'] coordinate and each coordinate in the dataset
  59. using the pandas and shapely package
  60. Args:
  61. params (dict): Dictionary containing x, y, n keys
  62. Returns:
  63. List: list of objects
  64. """
  65. x = float(params['x'])
  66. y = float(params['y'])
  67. n = int(params['n'])
  68.  
  69. request_point = Point(x, y)
  70. df = pd.read_csv(self.dataset_path, delimiter=';')
  71.  
  72. def distance_calc(row):
  73. data_point = Point(float(row['x']), float(row['y']))
  74. return request_point.distance(data_point)
  75.  
  76. df['distance'] = df.apply(distance_calc, axis=1)
  77. df = df.nsmallest(n, 'distance').drop('distance', axis=1)
  78.  
  79. json_string = df.to_json(orient = "records")
  80.  
  81. return json.loads(json_string)
Add Comment
Please, Sign In to add comment