Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- import pandas as pd
- # Create Dataframe having 10 rows and 2 columns 'code' and 'URL'
- df = pd.DataFrame({'code': [1,1,2,2,3,4,1,2,2,5],
- 'URL': ['www.abc.de','https://www.abc.fr/-de','www.abc.fr','www.abc.fr','www.abc.co.uk','www.abc.es','www.abc.de','www.abc.fr','www.abc.fr','www.abc.it']})
- # Create new dataframe by filtering out all rows where the column 'code' is equal to 1
- new_df = df[df['code'] == 1]
- # Below is how the new dataframe looks like
- print(new_df)
- URL code
- 0 www.abc.de 1
- 1 https://www.abc.fr/-de 1
- 6 www.abc.de 1
- Below are the dtypes for reference
- print(new_df.dtypes)
- URL object
- code int64
- dtype: object
- # Now I am trying to exclude all those rows where the 'URL' column does not have .de as the pattern. This should retain only the 2nd row in new_df from above output
- new_df = new_df[~ new_df['URL'].str.contains(r".de", case = True)]
- # Below is how the output looks like
- print(new_df)
- Empty DataFrame
- Columns: [URL, code]
- Index: []
- new_df <- new_df[grep(".de",new_df$URL, fixed = TRUE, invert = TRUE), ]
- # Desired output for new_df
- URL code
- https://www.abc.fr/-de 1
Add Comment
Please, Sign In to add comment