Advertisement
Guest User

Untitled

a guest
Aug 19th, 2023
68
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.79 KB | None | 0 0
  1. msa() seems unable to process input sequences in a fasta file, I keep getting (t is an input fasta file with two short AA sequences):
  2.  
  3. ```r
  4.  
  5. v <- AAStringSet(c(mt="FIPLLVVILFAVDTGLFISTQQQVT",wt="FIPLLVVILFAVHTGLFISTQQQVT"))
  6. t <- tempfile(fileext=".fasta")
  7. on.exit(unlink(t),add=TRUE)
  8. writeXStringSet(v,t)
  9. m <- msa(t, type="protein")
  10.  
  11. ```
  12.  
  13. Output:
  14.  
  15. ```
  16.  
  17. use default substitution matrix
  18.  
  19.  
  20. ERROR: Cannot open input file. No alignment!
  21.  
  22. Error in msaClustalW(t, type = "protein") : ClustalW finished with errors
  23.  
  24. ```
  25.  
  26. If I track down what happens to the inputSeq argument in msa(), it seems it gets turned into upper case regardless if its a file or if its an input AAStringSet:
  27.  
  28. https://code.bioconductor.org/browse/msa/blob/RELEASE_3_17/R/helperFunctions.R#L56 , see line 56
  29.  
  30. If I change that to instead just: return(inputSeq) , it works.:
  31.  
  32. ```
  33. > devtools::load_all("~/git/bioconductor/msa/")
  34. ℹ Loading msa
  35. > v <- AAStringSet(c(mt="FIPLLVVILFAVDTGLFISTQQQVT",wt="FIPLLVVILFAVHTGLFISTQQQVT"))
  36. t <- tempfile(fileext=".fasta")
  37. on.exit(unlink(t),add=TRUE)
  38. writeXStringSet(v,t)
  39. m <- msa(t, type="protein")
  40. use default substitution matrix
  41. > m
  42. CLUSTAL 2.1
  43.  
  44. Call:
  45. msa(t, type = "protein")
  46.  
  47. MsaAAMultipleAlignment with 2 rows and 25 columns
  48. aln
  49. [1] FIPLLVVILFAVDTGLFISTQQQVT
  50. [2] FIPLLVVILFAVHTGLFISTQQQVT
  51. Con FIPLLVVILFAV?TGLFISTQQQVT
  52. >
  53. ```
  54.  
  55.  
  56. ```
  57. > sessionInfo()
  58. R version 4.3.1 (2023-06-16)
  59. Platform: x86_64-pc-linux-gnu (64-bit)
  60. Running under: Ubuntu 22.04.3 LTS
  61.  
  62. Matrix products: default
  63. BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
  64. LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3; LAPACK version 3.10.0
  65.  
  66. locale:
  67. [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
  68. [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
  69. [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
  70. [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
  71. [9] LC_ADDRESS=C LC_TELEPHONE=C
  72. [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
  73.  
  74. time zone: Europe/Oslo
  75. tzcode source: system (glibc)
  76.  
  77. attached base packages:
  78. [1] stats4 stats graphics grDevices datasets utils methods
  79. [8] base
  80.  
  81. other attached packages:
  82. [1] msa_1.33.1 Biostrings_2.68.1 GenomeInfoDb_1.36.1
  83. [4] XVector_0.40.0 IRanges_2.34.1 S4Vectors_0.38.1
  84. [7] BiocGenerics_0.46.0
  85.  
  86. loaded via a namespace (and not attached):
  87. [1] renv_1.0.2 bitops_1.0-7 stringi_1.7.12
  88. [4] digest_0.6.33 magrittr_2.0.3 pkgload_1.3.2.1
  89. [7] fastmap_1.1.1 rprojroot_2.0.3 processx_3.8.2
  90. [10] pkgbuild_1.4.2 sessioninfo_1.2.2 brio_1.1.3
  91. [13] urlchecker_1.0.1 ps_1.7.5 promises_1.2.1
  92. [16] BiocManager_1.30.22 purrr_1.0.2 codetools_0.2-19
  93. [19] cli_3.6.1 shiny_1.7.5 rlang_1.1.1
  94. [22] crayon_1.5.2 ellipsis_0.3.2 remotes_2.4.2.1
  95. [25] withr_2.5.0 cachem_1.0.8 devtools_2.4.5
  96. [28] tools_4.3.1 memoise_2.0.1 GenomeInfoDbData_1.2.10
  97. [31] httpuv_1.6.11 vctrs_0.6.3 R6_2.5.1
  98. [34] mime_0.12 lifecycle_1.0.3 zlibbioc_1.46.0
  99. [37] stringr_1.5.0 fs_1.6.3 htmlwidgets_1.6.2
  100. [40] usethis_2.2.2 miniUI_0.1.1.1 desc_1.4.2
  101. [43] callr_3.7.3 later_1.3.1 glue_1.6.2
  102. [46] profvis_0.3.8 Rcpp_1.0.11 rstudioapi_0.15.0
  103. [49] xtable_1.8-4 htmltools_0.5.6 testthat_3.1.10
  104. [52] compiler_4.3.1 prettyunits_1.1.1 RCurl_1.98-1.12
  105. ```
  106.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement