Advertisement
Guest User

Untitled

a guest
May 29th, 2017
44
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.83 KB | None | 0 0
  1. # The most common form would be to de-duplicate the entire data.frame.
  2.  
  3. dat <- readRDS("some_data_frame.rds")
  4. w <- which(duplicated(dat))
  5. dat_dd <- dat[-w,]
  6.  
  7.  
  8. # Another form of de-duplication would be to eliminiate redundancy base on a single column within the data.frame
  9.  
  10. w <- which(duplicated(dat$some_column))
  11. dat_dd <- dat[-w,]
  12.  
  13.  
  14. # However, the most effective form is de-duplicating based on several columns
  15.  
  16. w <- which(duplicated(dat[,c('some_column','another_column')]))
  17. df_dd <- dat[-w,]
  18.  
  19.  
  20. # Having a tool like this becomes very useful when looking for a set depicting the maxima or minima of a given dataset.
  21.  
  22. # find the maximum
  23. dat <- dat[order(-dat$some_column),]
  24. w <- which(duplicated(dat[,c('some_column','another_column')]))
  25.  
  26. # dataset with maximum value for all unique variables
  27. dat_dd <- dat[-w,]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement