SHARE
TWEET

Untitled

a guest Jul 22nd, 2019 60 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. from sklearn.ensemble import IsolationForest
  2.  
  3. def print_anomalies(query,column):
  4.     df_anom = df[(df['query'] == query) & (df['device'] == 'desktop')]
  5.     x=df_anom[column].values
  6.     xx = np.linspace(df_anom[column].min(), df_anom[column].max(), len(df)).reshape(-1,1)
  7.  
  8.     isolation_forest = IsolationForest(n_estimators=100)
  9.     isolation_forest.fit(x.reshape(-1, 1))
  10.  
  11.     anomaly_score = isolation_forest.decision_function(xx)
  12.     # 1 = inlier, 0 = outlier
  13.     outlier = isolation_forest.predict(xx)
  14.     df_outliers = df_anom[list(map(lambda v: True if v < 0 else False,isolation_forest.predict(x.reshape(-1, 1))))]
  15.     df_outliers = df_outliers[df_outliers.date >= df.date.max() - datetime.timedelta(days=14)]
  16.     print(df_outliers)
  17.    
  18. for q in top_queries_by_clicks:
  19.     print_anomalies(q,'impressions')
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top