Guest User

Untitled

a guest
Jan 19th, 2018
95
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.19 KB | None | 0 0
  1. import pandas as pd
  2.  
  3. # Create Dataframe having 10 rows and 2 columns 'code' and 'URL'
  4. df = pd.DataFrame({'code': [1,1,2,2,3,4,1,2,2,5],
  5. 'URL': ['www.abc.de','https://www.abc.fr/-de','www.abc.fr','www.abc.fr','www.abc.co.uk','www.abc.es','www.abc.de','www.abc.fr','www.abc.fr','www.abc.it']})
  6.  
  7. # Create new dataframe by filtering out all rows where the column 'code' is equal to 1
  8. new_df = df[df['code'] == 1]
  9.  
  10. # Below is how the new dataframe looks like
  11. print(new_df)
  12.  
  13. URL code
  14. 0 www.abc.de 1
  15. 1 https://www.abc.fr/-de 1
  16. 6 www.abc.de 1
  17.  
  18. Below are the dtypes for reference
  19. print(new_df.dtypes)
  20. URL object
  21. code int64
  22. dtype: object
  23.  
  24. # Now I am trying to exclude all those rows where the 'URL' column does not have .de as the pattern. This should retain only the 2nd row in new_df from above output
  25. new_df = new_df[~ new_df['URL'].str.contains(r".de", case = True)]
  26.  
  27. # Below is how the output looks like
  28. print(new_df)
  29. Empty DataFrame
  30. Columns: [URL, code]
  31. Index: []
  32.  
  33. new_df <- new_df[grep(".de",new_df$URL, fixed = TRUE, invert = TRUE), ]
  34.  
  35. # Desired output for new_df
  36. URL code
  37. https://www.abc.fr/-de 1
Add Comment
Please, Sign In to add comment