Don't like ads? PRO users don't see any ads ;-)
Guest

Untitled

By: a guest on Jun 21st, 2012  |  syntax: None  |  size: 1.13 KB  |  hits: 13  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. Record linking data frames with function RLBigDataLinkage in package RecordLinkage in R
  2. > head(temp.compustat[order(temp.compustat$CONML, decreasing = T), ])
  3.         GVKEY             CONML
  4. 225994  13023  ZZZZ Best Co Inc
  5. 211017  11696       Zytrex Corp
  6. 213816  11951 Zytec Systems Inc
  7. 309886  29163        Zytec Corp
  8. 373950 129441         Zynex Inc
  9. 383184 145228  ZymoGenetics Inc
  10. > dim(temp.compustat)
  11. [1] 31354     2
  12.        
  13. > head(temp.dealscan[ order(temp.dealscan$company, decreasing = T), ])
  14.       companyid            company
  15. 70473     18192 Zytec Corp
  16. 32025     16969 Zynaxis Inc
  17. 19714     92271 ZYGO Teraoptix Inc
  18. 80473     13185 Zygo Corp
  19. 1901      24303 Zycon Corp SDN Bhd
  20. 33993     21219 Zycon Corp
  21.  
  22. > dim(temp.dealscan)
  23. [1] 85818     2
  24.        
  25. > library(RecordLinkage)
  26. > rpairs <- RLBigDataLinkage(dataset1 = temp.compustat, dataset2 = temp.dealscan, exclude = 1, strcmp = 2, strcmpfun = "levenshtein")
  27. > result <- epiClassify(rpairs, threshold.upper = 0.5)
  28. Error in if (max <= min) stop("must have max > min") :
  29.   missing value where TRUE/FALSE needed
  30. In addition: Warning message:
  31. In nData1 * nData2 : NAs produced by integer overflow