Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- msa() seems unable to process input sequences in a fasta file, I keep getting (t is an input fasta file with two short AA sequences):
- ```r
- v <- AAStringSet(c(mt="FIPLLVVILFAVDTGLFISTQQQVT",wt="FIPLLVVILFAVHTGLFISTQQQVT"))
- t <- tempfile(fileext=".fasta")
- on.exit(unlink(t),add=TRUE)
- writeXStringSet(v,t)
- m <- msa(t, type="protein")
- ```
- Output:
- ```
- use default substitution matrix
- ERROR: Cannot open input file. No alignment!
- Error in msaClustalW(t, type = "protein") : ClustalW finished with errors
- ```
- If I track down what happens to the inputSeq argument in msa(), it seems it gets turned into upper case regardless if its a file or if its an input AAStringSet:
- https://code.bioconductor.org/browse/msa/blob/RELEASE_3_17/R/helperFunctions.R#L56 , see line 56
- If I change that to instead just: return(inputSeq) , it works.:
- ```
- > devtools::load_all("~/git/bioconductor/msa/")
- ℹ Loading msa
- > v <- AAStringSet(c(mt="FIPLLVVILFAVDTGLFISTQQQVT",wt="FIPLLVVILFAVHTGLFISTQQQVT"))
- t <- tempfile(fileext=".fasta")
- on.exit(unlink(t),add=TRUE)
- writeXStringSet(v,t)
- m <- msa(t, type="protein")
- use default substitution matrix
- > m
- CLUSTAL 2.1
- Call:
- msa(t, type = "protein")
- MsaAAMultipleAlignment with 2 rows and 25 columns
- aln
- [1] FIPLLVVILFAVDTGLFISTQQQVT
- [2] FIPLLVVILFAVHTGLFISTQQQVT
- Con FIPLLVVILFAV?TGLFISTQQQVT
- >
- ```
- ```
- > sessionInfo()
- R version 4.3.1 (2023-06-16)
- Platform: x86_64-pc-linux-gnu (64-bit)
- Running under: Ubuntu 22.04.3 LTS
- Matrix products: default
- BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
- LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3; LAPACK version 3.10.0
- locale:
- [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
- [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
- [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
- [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
- [9] LC_ADDRESS=C LC_TELEPHONE=C
- [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
- time zone: Europe/Oslo
- tzcode source: system (glibc)
- attached base packages:
- [1] stats4 stats graphics grDevices datasets utils methods
- [8] base
- other attached packages:
- [1] msa_1.33.1 Biostrings_2.68.1 GenomeInfoDb_1.36.1
- [4] XVector_0.40.0 IRanges_2.34.1 S4Vectors_0.38.1
- [7] BiocGenerics_0.46.0
- loaded via a namespace (and not attached):
- [1] renv_1.0.2 bitops_1.0-7 stringi_1.7.12
- [4] digest_0.6.33 magrittr_2.0.3 pkgload_1.3.2.1
- [7] fastmap_1.1.1 rprojroot_2.0.3 processx_3.8.2
- [10] pkgbuild_1.4.2 sessioninfo_1.2.2 brio_1.1.3
- [13] urlchecker_1.0.1 ps_1.7.5 promises_1.2.1
- [16] BiocManager_1.30.22 purrr_1.0.2 codetools_0.2-19
- [19] cli_3.6.1 shiny_1.7.5 rlang_1.1.1
- [22] crayon_1.5.2 ellipsis_0.3.2 remotes_2.4.2.1
- [25] withr_2.5.0 cachem_1.0.8 devtools_2.4.5
- [28] tools_4.3.1 memoise_2.0.1 GenomeInfoDbData_1.2.10
- [31] httpuv_1.6.11 vctrs_0.6.3 R6_2.5.1
- [34] mime_0.12 lifecycle_1.0.3 zlibbioc_1.46.0
- [37] stringr_1.5.0 fs_1.6.3 htmlwidgets_1.6.2
- [40] usethis_2.2.2 miniUI_0.1.1.1 desc_1.4.2
- [43] callr_3.7.3 later_1.3.1 glue_1.6.2
- [46] profvis_0.3.8 Rcpp_1.0.11 rstudioapi_0.15.0
- [49] xtable_1.8-4 htmltools_0.5.6 testthat_3.1.10
- [52] compiler_4.3.1 prettyunits_1.1.1 RCurl_1.98-1.12
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement