SHARE
TWEET

Untitled

a guest Sep 20th, 2019 97 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. # Note that the pandas dataframe created from the dataset was named article_df and article text was stored in the
  2. # 'content' column.
  3.    
  4. # Remove non-ASCII characters:
  5. for index, row in article_df.iterrows():
  6.     article_df.loc[index, 'content'] = ''.join([i if ord(i) < 128 else ' '
  7.                                                 for i in article_df.loc[index, 'content']])
  8.  
  9. # Remove punctuation:
  10. for index, row in article_df.iterrows():
  11.     article_df.loc[index, 'content'] = (article_df.loc[index, 'content']
  12.                                         .translate(str.maketrans('', '', string.punctuation)))
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top