Guest User

Untitled

a guest
Oct 18th, 2018
77
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.37 KB | None | 0 0
  1. #Getting path for all files within a particular directory.
  2. files = list.files(path = dirs[j], full.names = T)
  3.  
  4. #The files are in csv format and in the ideal case have exactly 5 columns. However, the 5th column can contain an arbitary number of commas. If I try to fread with sep = ",", certain rows can be of arbitarily high length. If I use select = 1:5 to subset each row, I lose data.
  5.  
  6. #My solution was to read each line into a single column and then seperate into columns within the script based on the location of the first 4 commas.
  7. data <- rbindlist(lapply(files,fread,sep = "n",fill = T,header = F))
  8.  
  9. #Removing empty rows.
  10. retain <- unlist(lapply(data, function(x) {
  11. str_detect(x,".")
  12. }))
  13. data[retain,] -> data
  14.  
  15. #Removing rows where there is no data in the 5th column.
  16. retain <- unlist(lapply(data, function(x) {
  17. str_detect(trimws(x,which ='both') ,".+,.+,.*,.*,.+")
  18. }))
  19. data[retain,] -> data
  20.  
  21. #This replaces the first 4 commas with a tab-delimiter.
  22. for(i in 1:4){
  23. data <- data.frame(lapply(data, function(x) {
  24. str_replace(x,",","t")
  25. }),stringsAsFactors = F)
  26. }
  27.  
  28. #This splits the row into 5 seperate columns, always.
  29. data <- unlist(lapply(data, function(x) {
  30. unlist(strsplit(x,"t",fixed = T))
  31. }))
  32.  
  33. #Changes the format from a character vector to a data table.
  34. data = data.frame(matrix(data,ncol=5,byrow = T),stringsAsFactors = F)
Add Comment
Please, Sign In to add comment