Advertisement
Guest User

Untitled

a guest
Sep 20th, 2019
116
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.59 KB | None | 0 0
  1. # Note that the pandas dataframe created from the dataset was named article_df and article text was stored in the
  2. # 'content' column.
  3.  
  4. # Remove non-ASCII characters:
  5. for index, row in article_df.iterrows():
  6. article_df.loc[index, 'content'] = ''.join([i if ord(i) < 128 else ' '
  7. for i in article_df.loc[index, 'content']])
  8.  
  9. # Remove punctuation:
  10. for index, row in article_df.iterrows():
  11. article_df.loc[index, 'content'] = (article_df.loc[index, 'content']
  12. .translate(str.maketrans('', '', string.punctuation)))
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement