Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- (base) [genomica@localhost RSEM]$ sudo ./rsem-generate-ngvector
- Invalid number of arguments!
- NAME
- rsem-generate-ngvector - Create Ng vector for EBSeq based only on
- transcript sequences.
- SYNOPSIS
- rsem-generate-ngvector [options] input_fasta_file output_name
- ARGUMENTS
- input_fasta_file
- The fasta file containing all reference transcripts. The transcripts
- must be in the same order as those in expression value files. Thus,
- 'reference_name.transcripts.fa' generated by
- 'rsem-prepare-reference' should be used.
- output_name
- The name of all output files. The Ng vector will be stored as
- 'output_name.ngvec'.
- OPTIONS
- -k <int>
- k mer length. See description section. (Default: 25)
- -h/--help
- Show help information.
- DESCRIPTION
- This program generates the Ng vector required by EBSeq for isoform level
- differential expression analysis based on reference sequences only.
- EBSeq can take variance due to read mapping ambiguity into consideration
- by grouping isoforms with parent gene's number of isoforms. However, for
- de novo assembled transcriptome, it is hard to obtain an accurate
- gene-isoform relationship. Instead, this program groups isoforms by
- using measures on read mappaing ambiguity directly. First, it calculates
- the 'unmappability' of each transcript. The 'unmappability' of a
- transcript is the ratio between the number of k mers with at least one
- perfect match to other transcripts and the total number of k mers of
- this transcript, where k is a parameter. Then, Ng vector is generated by
- applying Kmeans algorithm to the 'unmappability' values with number of
- clusters set as 3. 'rsem-generate-ngvector' will make sure the mean
- 'unmappability' scores for clusters are in ascending order. All
- transcripts whose lengths are less than k are assigned to cluster 3.
- If your reference is a de novo assembled transcript set, you should run
- 'rsem-generate-ngvector' first. Then load the resulting
- 'output_name.ngvec' into R. For example, you can use
- NgVec <- scan(file="output_name.ngvec", what=0, sep="\n")
- . After that, replace 'IsoNgTrun' with 'NgVec' in the second line of
- section 3.2.5 (Page 10) of EBSeq's vignette:
- IsoEBres=EBTest(Data=IsoMat, NgVector=NgVec, ...)
- This program only needs to run once per RSEM reference.
- OUTPUT
- output_name.ump
- 'unmappability' scores for each transcript. This file contains two
- columns. The first column is transcript name and the second column
- is 'unmappability' score.
- output_name.ngvec
- Ng vector generated by this program.
- EXAMPLES
- Suppose the reference sequences file is
- '/ref/mouse_125/mouse_125.transcripts.fa' and we set the output_name as
- 'mouse_125':
- rsem-generate-ngvector /ref/mouse_125/mouse_125.transcripts.fa mouse_125
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement