Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Note that the pandas dataframe created from the dataset was named article_df and article text was stored in the
- # 'content' column.
- # Remove non-ASCII characters:
- for index, row in article_df.iterrows():
- article_df.loc[index, 'content'] = ''.join([i if ord(i) < 128 else ' '
- for i in article_df.loc[index, 'content']])
- # Remove punctuation:
- for index, row in article_df.iterrows():
- article_df.loc[index, 'content'] = (article_df.loc[index, 'content']
- .translate(str.maketrans('', '', string.punctuation)))
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement