Advertisement
Guest User

Untitled

a guest
Feb 21st, 2018
84
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.62 KB | None | 0 0
  1. articles = pd.read_csv('articles.csv')
  2. articles = articles[pd.notnull(articles['author'])]
  3. year = articles.groupby(['year'])['id'].nunique().idxmax()
  4. articles['authors'] = articles['author'].apply(lambda L:
  5.     str(L).replace(' and ', ',').replace('&', ',').replace('</strong>', '').replace('<strong>', '').replace('</sub>', '').replace('<sub>', '').split(','))
  6. s = articles.apply(lambda x: pd.Series(x['authors']),axis=1).stack().reset_index(level=1, drop=True)
  7. s.name = 'author'
  8. articles = articles.drop('author', axis=1).join(s)
  9. a = articles.loc[articles['year'] == year]
  10. a.groupby('author')['id'].nunique().nlargest(n=10)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement