Advertisement
Guest User

Untitled

a guest
Mar 22nd, 2017
69
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.13 KB | None | 0 0
  1. WITH
  2. --
  3. -- we start by translating the correlations that we got to ranks,
  4. -- based on sorting the genes on corrByGene "DESC"
  5. -- this will result in the highest positive correlation getting
  6. -- rank #1, etc
  7. -- we also lightly filter the genes by excluding any with near-zero
  8. -- or negative correlation coefficients, and the result is a list
  9. -- of approx 9000 genes with symbol, correlation, and rank
  10. geneScoresT AS (
  11. SELECT
  12. gene,
  13. corrByGene,
  14. -- here we do get a DENSE ranking based on ORDERing the
  15. -- correlations from most positive downward
  16. ## DENSE_RANK() OVER(ORDER BY corrByGene ASC) AS gene_score
  17. -- rather than doing a ranking, we could also just try using the
  18. -- correlations directly, after inverting them so that a
  19. -- a smaller value is "better"
  20. (1-corrByGene) AS gene_score
  21. FROM
  22. `isb-cgc.smr_scratch.CPTAC_mRNAseq_corrs`
  23. WHERE
  24. corrByGene>0.10 ),
  25. --
  26. -- next we JOIN the ranks to GO terms based on gene symbol;
  27. -- basically what we are doing is annotating the ranking with the
  28. -- GO_ID, GO_Name and Evidence category:
  29. --
  30. annotScoresT AS (
  31. SELECT
  32. gene,
  33. corrByGene,
  34. gene_score,
  35. GO_ID,
  36. GO_Name,
  37. Evidence
  38. FROM
  39. geneScoresT AS a
  40. JOIN
  41. `isb-cgc-02-0001.Daves_working_area.go_slim_hgnc` AS b
  42. ON
  43. a.gene = b.Symbol ),
  44. --
  45. -- Now we're going to "score" the GO gene-sets by grouping on
  46. -- GO_ID and GO_Name. For each ID/Name, we sum the ranks, count
  47. -- the number of genes, and then define the score as the sum
  48. -- of the ranks divided by the number of genes
  49. --
  50. goScoresT AS (
  51. SELECT
  52. GO_ID,
  53. GO_Name,
  54. SUM(gene_score) AS sumGenesScore,
  55. COUNT(gene) AS numGenes,
  56. SUM(gene_score) / COUNT(gene) AS normScore
  57. FROM
  58. annotScoresT
  59. WHERE
  60. ( Evidence="EXP"
  61. OR Evidence="IDA"
  62. OR Evidence="IPI"
  63. OR Evidence="IMP"
  64. OR Evidence="IGI"
  65. OR Evidence="IEP" )
  66. GROUP BY
  67. GO_ID,
  68. GO_Name )
  69. --
  70. -- Finally, we should have our scored GO categories!
  71. -- remember that the lower the score, the better,
  72. -- so we sort ASCending
  73. --
  74. SELECT
  75. *
  76. FROM
  77. goScoresT
  78. WHERE
  79. numGenes >= 20
  80. ORDER BY
  81. normScore ASC
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement