Guest User

Untitled

a guest
Oct 22nd, 2017
98
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.85 KB | None | 0 0
  1. """
  2. An example for adapting a non-predicting estimator (i.e.,
  3. one that doesn't expose a public ``predict`` method, but
  4. only a ``fit_predict`` one), such as ``LocalOutlierFactor``
  5. to make predictions on "unseen" data.
  6.  
  7. One could argue whether or not this particular approach is
  8. entirely legitimate, as not exposing a ``predict`` method,
  9. in most sensible cases will have been due to design and
  10. semantic constraints.
  11.  
  12. Nevertheless, for the adventurous crowd out there, I've
  13. provided a rudimentary means of making predictions on
  14. "unseen" data via sub-classing the estimator in question.
  15. """
  16.  
  17. import numpy as np
  18. from sklearn.model_selection import GridSearchCV, KFold
  19. from sklearn.neighbors import LocalOutlierFactor
  20.  
  21.  
  22. SEED = 42
  23.  
  24. class LOFPredictor(LocalOutlierFactor):
  25. def predict(self, X=None):
  26. return self._predict(X)
  27.  
  28. rng = np.random.RandomState(SEED)
  29.  
  30. # Example settings
  31. n_samples = 200
  32. true_outliers_fraction = 0.25
  33. offset = 2
  34.  
  35. xx, yy = np.meshgrid(np.linspace(-7, 7, n_samples / 2),
  36. np.linspace(-7, 7, n_samples / 2))
  37. n_outliers = int(true_outliers_fraction * n_samples)
  38. n_inliers = n_samples - n_outliers
  39. y_true = np.ones(n_samples, dtype=int)
  40. y_true[-n_outliers:] = -1
  41.  
  42. np.random.seed(SEED)
  43. # Data generation
  44. X1 = 0.3 * np.random.randn(n_inliers // 2, 2) - offset
  45. X2 = 0.3 * np.random.randn(n_inliers // 2, 2) + offset
  46. X = np.concatenate([X1, X2], axis=0)
  47. # Add outliers
  48. X = np.concatenate([X, np.random.uniform(low=-6, high=6,
  49. size=(n_outliers, 2))], axis=0)
  50.  
  51. outliers_fraction = .25
  52. kfold = KFold(n_splits=3, shuffle=True, random_state=42)
  53.  
  54. param_grid = [
  55. {
  56. 'n_neighbors': (25, 29, 35),
  57. 'contamination': (.25, .27, .3),
  58. },
  59. ]
  60.  
  61. clf = GridSearchCV(LOFPredictor(), param_grid=param_grid, scoring="accuracy",
  62. cv=kfold, n_jobs=-1)
  63. clf.fit(X, y_true)
  64.  
  65. print("Best params: {}".format(clf.best_params_))
Add Comment
Please, Sign In to add comment