Guest User

Untitled

a guest
Feb 2nd, 2024
186
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 66.47 KB | None | 0 0
  1. I have this paper:
  2. Single-cell multi-omics defines the cell-type specific impact of splicing
  3. aberrations in human hematopoietic clonal outgrowths
  4. Federico Gaiti1,2†, Paulina Chamely1,2†, Allegra G. Hawkins1,2†, Mariela Cortés-López1,2†, Ariel D. Swett1,2, Saravanan
  5. Ganesan1,2, Tarek H. Mouhieddine3, Xiaoguang Dai4, Lloyd Kluegel1,2, Celine Chen1,2,5, Kiran Batta6, John Beaulaurier7,
  6. Alexander W. Drong4, Scott Hickey7, Neville Dusaj1,2,5, Gavriel Mullokandov1,2, Jiayu Su1,8, Ronan Chaligné1,2, Sissel Juul4,
  7. Eoghan Harrington4, David A. Knowles1,8,9, Daniel H. Wiseman6, Irene M. Ghobrial3, Justin Taylor10, Omar AbdelWahab11*, Dan A. Landau1,2,12*
  8. 1New York Genome Center, New York, NY, USA. 2Division of Hematology and Medical Oncology, Department of Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA. 3Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. 4Oxford Nanopore Technologies Inc, New York, NY, USA. 5Tri-Institutional MD-PhD Program, Weill Cornell Medicine, Rockefeller University, Memorial Sloan Kettering Cancer Center, New York, NY,
  9. USA.
  10. 6Division of Cancer Sciences, The University of Manchester, Manchester, United Kingdom.
  11. 7Oxford Nanopore Technologies Inc, San Francisco, CA, USA. 8Department of Systems Biology, Columbia University, New York, NY, USA. 9Department of Computer Science, Columbia University, New York, NY, USA. 10Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA. 11Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 12Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
  12. †Contributed equally to this work; *Co-corresponding authors.
  13. Corresponding authors contact details: Dan A. Landau, MD, PhD - Weill Cornell Medicine Belfer Research Building, 413 East 69th Street, New
  14. York, NY 10021. Email: [email protected]; Omar Abdel-Wahab, MD - Memorial Sloan Kettering Cancer Center, 1275 York Ave, New
  15. York, NY 10065. Email: [email protected].
  16. Keywords: Single-cell, RNA-seq, multi-omics, splicing, long-read sequencing, genotyping, clonal hematopoiesis, myelodysplastic syndrome
  17. ABSTRACT
  18. RNA splicing factors are recurrently affected by alteration-of-function mutations in clonal blood disorders, highlighting
  19. the importance of splicing regulation in hematopoiesis. However, our understanding of the impact of dysregulated RNA
  20. splicing has been hampered by the inability to distinguish mutant and wildtype cells in primary patient samples, the celltype complexity of the hematopoietic system, and the sparse and biased coverage of splice junctions by short-read
  21. sequencing typically used in single-cell RNA sequencing. To overcome these limitations, we developed GoT-Splice by
  22. integrating Genotyping of Transcriptomes (GoT) with enhanced efficiency long-read single-cell transcriptome profiling,
  23. as well as proteogenomics (with CITE-seq). This allowed for the simultaneous single-cell profiling of gene expression, cell
  24. surface protein markers, somatic mutation status, and RNA splicing. We applied GoT-Splice to bone marrow progenitors
  25. from patients with myelodysplastic syndrome (MDS) affected by mutations in the most prevalent mutated RNA splicing
  26. factor – the core RNA splicing factor SF3B1. High-resolution mapping of SF3B1mut vs. SF3B1wt hematopoietic progenitors
  27. revealed a fitness advantage of SF3B1mut cells in the megakaryocytic-erythroid lineage, resulting in an expansion of
  28. SF3B1mut erythroid progenitor (EP) cells. SF3B1mut EP cells exhibited upregulation of genes involved in regulation of cell
  29. cycle and mRNA translation. Long-read single-cell transcriptomes revealed the previously reported increase of aberrant
  30. 3’ splicing site usage in SF3B1mut cells. However, the ability to profile splicing within individual cell populations uncovered
  31. distinct cryptic 3’ splice site usage across different progenitor populations, as well as stage-specific aberrant splicing
  32. during erythroid maturation. Lastly, as splice factor mutations occur in clonal hematopoiesis (CH) with increased risk of
  33. neoplastic transformation, we applied GoT-Splice to CH samples. These data revealed that the erythroid lineage bias, as
  34. well as cell-type specific cryptic 3’ splice site usage in SF3B1mut cells, precede overt MDS. Collectively, we present an
  35. expanded multi-omics single-cell toolkit to define the cell-type specific impact of somatic mutations on RNA splicing, from
  36. the earliest phases of clonal outgrowths to overt neoplasia, directly in human samples.
  37. INTRODUCTION
  38. Genetic diversity in the form of clonal outgrowths has
  39. been ubiquitously observed across normal and
  40. malignant human tissues1–13. Likewise, phenotypic
  41. diversity is a hallmark of both normal and malignant
  42. tissues in human samples, as has been observed with
  43. the widespread application of single-cell RNA
  44. sequencing (scRNA-seq)14–20. These two axes of
  45. available under aCC-BY-NC-ND 4.0 International license.
  46. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  47. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  48. 2 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  49. cellular diversity likely exhibit complex interplay, as
  50. cell state may affect the phenotypic impact of somatic
  51. mutations21. Recent advances in single-cell multiomics sequencing have allowed us to reconcile these
  52. two aspects of cellular variability in human
  53. tissues15,22,23, and link genetic variation and
  54. transcriptional cell state diversity in somatic
  55. evolution. For example, through the application of
  56. Genotyping of Transcriptomes (GoT)15 technology,
  57. which enables genotyping of somatic mutations
  58. together with high-throughput droplet-based scRNAseq, we have previously demonstrated that the effects
  59. of somatic mutations on cellular fitness in blood
  60. myeloproliferative disorders vary as a function of
  61. progenitor cell identity15.
  62. Mutations in genes encoding RNA splicing
  63. factors serve as an informative example of the
  64. challenge of linking genotype to phenotype in complex
  65. human tissues. Somatic change-of-function mutations
  66. in RNA splicing factors are recurrent in hematologic
  67. malignancies24–26, highlighting the importance of
  68. dysregulated RNA splicing in human hematopoietic
  69. disorders. SF3B1 (splicing factor 3b subunit 1), a core
  70. component of the spliceosome complex, is a
  71. commonly mutated splicing factor across hematologic
  72. malignancies and solid tumors, and is heavily
  73. implicated in the pathogenesis of myelodysplastic
  74. syndromes (MDS)27,28. SF3B1 mutations also occur in
  75. subjects with clonal hematopoiesis (CH), where they
  76. confer increased risk of conversion to overt myeloid
  77. neoplasms compared to other CH driver mutations1,2.
  78. SF3B1 mutations result in incorrect branch point
  79. recognition during RNA splicing, often leading to an
  80. increased usage of aberrant (or cryptic) intronproximal 3’ splice sites in hundreds of genes29. Such
  81. aberrant 3’ splice site recognition typically results in
  82. the inclusion of short intronic fragments in spliced
  83. mRNA, which most commonly alters the frame of the
  84. transcript and renders it a substrate for nonsense
  85. mediated mRNA decay (NMD)30. Prior work has
  86. demonstrated that through mis-splicing, SF3B1
  87. mutations lead to altered cell metabolism31 and
  88. disruption of ribosomal biogenesis32, leading to the
  89. aberrant hematopoietic differentiation typical of MDS.
  90. While these are key advances in our understanding of
  91. the role of SF3B1 mutations in MDS development, the
  92. mechanisms through which mis-splicing leads to
  93. disrupted hematopoietic differentiation in humans
  94. remain elusive.
  95. To date, cell culture systems and murine
  96. models have been critical for elucidating the role of
  97. splicing factor mutations in disordered
  98. hematopoiesis. Nonetheless, these methods may not
  99. fully recapitulate MDS development in the human
  100. context. For example, alternatively spliced genes from
  101. murine models of SF3B1mut MDS, which share some
  102. phenotypic similarities with human MDS, show
  103. limited overlap with those identified in human
  104. samples33. The study of splice-altering mutations in
  105. humans has been further hampered by three
  106. important limitations. First, normal wildtype (WT)
  107. and aberrant mutated (MUT) cells are often admixed
  108. without discriminating cell surface markers that are
  109. required to uniquely isolate MUT cells, limiting the
  110. ability to identify signals that can be specifically linked
  111. to the SF3B1mut genotype. This obstacle is magnified in
  112. the context of CH where MUT cells commonly
  113. constitute a minority of the hematopoietic progenitor
  114. population. Second, the hematopoietic differentiation
  115. process yields significant complexity of cell progenitor
  116. types that further hinders the ability to link mutated
  117. genotypes with distinct cellular phenotypes. SF3B1mut
  118. MDS is indeed associated with a specific clinicomorphological phenotype of refractory anemia and
  119. accumulation of ringed sideroblasts28,34, strongly
  120. suggesting that the interplay between cell identity and
  121. SF3B1 mutations is fundamental in driving disrupted
  122. hematopoietic differentiation. Third, scRNA-seq by 3’
  123. or 5’ biased short-read sequencing is limited in its
  124. ability to map full-length RNA isoforms and splicing
  125. aberrations.
  126. To overcome these limitations and identify
  127. cell-identity-dependent mis-splicing mediated by
  128. SF3B1 mutations, we developed GoT-Splice by
  129. integrating GoT15 with long-read single-cell
  130. transcriptome profiling (with Oxford Nanopore
  131. Technologies [ONT]) as well as proteogenomics (with
  132. CITE-seq)35. This allowed for the simultaneous
  133. profiling of gene expression, cell surface protein
  134. markers, somatic mutation genotyping, and RNA
  135. splicing within the same single cell. The application of
  136. GoT-Splice to bone marrow progenitor samples from
  137. individuals with SF3B1-mutated MDS and CH revealed
  138. that, while SF3B1 mutations arise in uncommitted
  139. hematopoietic stem progenitor cells (HSPCs), their
  140. effect on fitness increases with differentiation into
  141. committed erythroid progenitors (EPs), in line with
  142. the SF3B1mut-driven dyserythropoiesis phenotype.
  143. Importantly, the integration of GoT with full-length
  144. isoform mapping via long-read sequencing showed
  145. that SF3B1 mutations exert cell-type specific missplicing, already apparent in CH long before disease
  146. onset.
  147. RESULTS
  148. available under aCC-BY-NC-ND 4.0 International license.
  149. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  150. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  151. 3 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  152. GoT integrated with proteogenomics reveals
  153. increased fitness of SF3B1mut cells in the erythroid
  154. lineage linked to overexpression of cell-cycle and
  155. mRNA translation genes
  156. As we have recently demonstrated that the impact of
  157. somatic mutations on the transcriptome varies as a
  158. function of underlying cell identity in
  159. myeloproliferative neoplasms15, we hypothesized that
  160. an interplay between cell identity and SF3B1
  161. mutations may drive disrupted hematopoietic
  162. differentiation in MDS. To test this, we applied GoT15
  163. (Fig. 1a) to CD34+ bone marrow progenitor cells from
  164. three untreated MDS patients with SF3B1 K700E
  165. mutations (discovery cohort, MDS01-03), as well as a
  166. distinct cohort consisting of three MDS patients
  167. undergoing treatment (validation cohort, MDS04-06)
  168. with erythropoietin (EPO) and/or granulocyte colonystimulating factor (G-CSF; Fig. 1b; Supplementary
  169. Table 1). As normal hematopoietic development has
  170. been extensively studied using flow cytometry cell
  171. surface markers, we further integrated GoT with
  172. single-cell proteogenomics (CITE-seq35,36; Fig. 1a). A
  173. total of 24,315 cells across the six MDS samples were
  174. obtained after sequencing and quality control filtering
  175. (Extended Data Fig. 1a, b; MDS02 was sequenced in
  176. two technical replicates). To chart the differentiation
  177. map of the CD34+ progenitor cells, we integrated the
  178. data across the primary MDS samples (MDS01-03), as
  179. well as the MDS validation samples (MDS04-06), and
  180. clustered based on transcriptomic data alone, agnostic
  181. to the genotyping and protein information (Fig. 1c;
  182. Extended Data Fig. 1c, d). Using previously
  183. annotated RNA identity markers for human CD34+
  184. progenitor cells37, validated via Antibody-Derived Tag
  185. (ADT) markers in the CITE-seq panel
  186. (Supplementary Table 2, 3), we identified the
  187. expected progenitor subtypes in the primary MDS
  188. cohort, along with a population of mature monocytic
  189. cells characterized by CD14 expression and lack of
  190. CD34 expression often observed in CD34+ sorting of
  191. human bone marrow38 (Fig. 1c; Extended Data Fig.
  192. 2a-c). Cell clustering was further validated using RNA
  193. and ADT multimodal integration (Extended Data Fig.
  194. 2d). The expected progenitor subtypes were similarly
  195. identified in the MDS validation cohort (MDS04-06;
  196. Extended Data Fig. 3a-c).
  197. Genotyping data were available for 15,650
  198. MDS cells (64.4% across MDS01-06) through GoT
  199. (Fig. 1b; Extended Data Fig. 4a-d). The per-patient
  200. mutant cell fractions obtained through GoT were
  201. highly correlated with the variant allele frequencies
  202. (VAFs) obtained through bulk sequencing of matched
  203. unsorted peripheral blood mononuclear cells
  204. (Pearson’s r = 0.81, P-value = 0.008; Extended Data
  205. Fig. 4a). Projection of the genotyping information
  206. onto the differentiation map demonstrated MUT and
  207. WT cells co-mingled throughout the differentiation
  208. topology (Extended Data Fig. 4c, d), highlighting the
  209. need for single-cell multi-omics to link genotypes with
  210. cellular phenotypes in SF3B1mut MDS. Although MUT
  211. cells were found across CD34+ progenitor cells, we
  212. observed an accumulation of MUT cells along the
  213. erythroid trajectory (Fig. 1d), suggesting that SF3B1
  214. mutant cell frequency (MCF) varies as a function of the
  215. progenitor subtype. To confirm this, we evaluated the
  216. MCF across the different prevalent progenitor cell
  217. types (limited to progenitor subsets with > 300 cells).
  218. Of note, as cells may display variable expression of
  219. SF3B1 itself, we performed amplicon UMIdownsampling to exclude sampling biases given the
  220. heterozygosity of the mutated allele as a potential
  221. confounder for observed differences in MCF (see
  222. Methods). Across samples, we observed a significant
  223. increase in MCF in the megakaryocyte-erythroid
  224. lineage with the highest MCF observed in EPs
  225. compared to HSPCs (P-value < 10-16; Fig. 1e;
  226. Extended Data Fig. 4e), consistent with the erythroid
  227. lineage-specific impact of mutated SF3B139,40.
  228. The ability to layer protein measurements on
  229. top of GoT data further allowed us to identify
  230. differentially expressed proteins between MUT and
  231. WT cells within each progenitor subset. After quality
  232. control filtering for ADT markers with adequate
  233. expression in at least two major progenitor subtypes
  234. (see Methods), protein expression was highest in the
  235. expected cell types, and correlated with mRNA
  236. expression, both at the individual cell as well as celltype level, comparable to previous data35 (Extended
  237. Data Fig. 5a, b). We directly compared protein
  238. expression between MUT and WT cells, accounting for
  239. sample-to-sample variability in mutated cells through
  240. downsampling (see Methods), and observed
  241. differential expression of CD38, CD99, CD36 and CD71
  242. in at least one progenitor cell-type (Fig. 1f;
  243. Supplementary Table 4). CD38 is a known marker
  244. for the transition of primitive CD34+ stem and
  245. progenitor cells into more committed precursor
  246. cells37,41,42. Its overexpression in SF3B1mut is consistent
  247. with the observed higher MCF in committed
  248. progenitor subsets. CD99, over-expressed in MUT
  249. immature myeloid progenitor cells (IMP) cells, was
  250. previously noted to be overexpressed in both AML and
  251. MDS stem cells, serving as a potential therapeutic
  252. target of malignant stem cells43,44. Finally, CD36 and
  253. CD71, erythroid lineage markers, were found to be
  254. over-expressed in MUT EPs when compared to WT
  255. available under aCC-BY-NC-ND 4.0 International license.
  256. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  257. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  258. 4 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  259. EPs, consistent with the SF3B1mut-driven
  260. dyserythropoiesis phenotype. We further leveraged
  261. these erythroid maturation cell surface protein
  262. markers to validate pseudo-temporal (pseudotime)
  263. ordering of the continuous process of erythroid
  264. maturation45 (Extended Data Fig. 5c). This analysis
  265. revealed an increase in MCF along erythroid lineage
  266. maturation (Fig. 1g), confirming that SF3B1
  267. Human CD34+ bone marrow
  268. MDS01-03
  269. No. of cells = 15,436
  270. F.
  271. UMAP1
  272. UMAP2
  273. WT (N = 2,498)
  274. MUT (N = 9,996)
  275. E/B/M
  276. HSPC IMP
  277. MkP
  278. NP
  279. MEP
  280. EP
  281. T/B cells
  282. MonoDC
  283. DC
  284. Mono
  285. PreB
  286. G.
  287. A.
  288. C.
  289. UMAP1
  290. UMAP2
  291. 0.7
  292. 0.8
  293. 0.9
  294. 1.0
  295. 1.1
  296. EP
  297. MEP
  298. MkP
  299. IMP
  300. HSPC
  301. NP
  302. Mono
  303. Normalized mutant cell ratio
  304. (only cell types with >300 cells in MDS01-03)
  305. P < 2.2x10-16
  306. H.
  307. HSP90B1
  308. CALR
  309. SMC1A
  310. LMNA
  311. EIF3A
  312. DYNLL1 SPTA1
  313. CCNE1
  314. DDX17
  315. PRKDC
  316. EIF5A
  317. INCENP
  318. DDX5
  319. RPS29
  320. CTNNB1
  321. PSMD12 SPTB
  322. EIF2S3
  323. INO80D
  324. RAD21
  325. EIF3E
  326. EIF4B
  327. EIF3B
  328. PSMD2 WSB1
  329. DNMT1 TP53 MDM4
  330. GALNT6
  331. P4HB
  332. CCDC42
  333. DLK1
  334. TIMP1
  335. 0.0
  336. 0.5
  337. 1.0
  338. 1.5
  339. 2.0
  340. 2.5
  341. 3.0
  342. 3.5
  343. 4.0
  344. 4.5
  345. 5.0
  346. −1.0 −0.5 0.0 0.5 1.0
  347. −log10(P−value)
  348. # of genes = 2,000
  349. Down-regulated
  350. in EP SF3B1mut cells
  351. Up-regulated
  352. in EP SF3B1mut cells
  353. log2
  354. (EP SF3B1mut / EP SF3B1wt gene expression)
  355. Translation
  356. Cell cycle
  357. CD71
  358. CD36
  359. CD99
  360. CD38
  361. Antibody-Derived Tag (ADT) markers
  362. 0.4 0.6 0.8 1.0 −0.2 −0.1 0.0 0.1
  363. ADT log10FC (MUT vs. WT) ADT expression
  364. * P < 0.05
  365. *
  366. *
  367. **
  368. **
  369. ** P < 0.01
  370. NP IMP HSPC MkP MEP EP 1.0
  371. 2.0
  372. 3.0
  373. 0.8
  374. 0.85
  375. 0.9
  376. 0.95
  377. 0.0
  378. 0.1
  379. 0.2
  380. 0.3
  381. Megakaryocyte-erythro differentiation
  382. Pseudotime Early Late
  383. EP
  384. MEP
  385. HSPC
  386. Density ADT expression
  387. ADT-CD36
  388. ADT-CD71
  389. IMP
  390. D.
  391. E.
  392. I.
  393. J.
  394. Mutant cell fraction
  395. P = 0.021 P = 0.013 P = 6.6x10-11
  396. HSPC
  397. IMP
  398. MEP
  399. EP
  400. NP
  401. −0.08
  402. −0.06
  403. −0.04
  404. −0.02
  405. HSPC
  406. IMP
  407. MEP
  408. EP
  409. NP
  410. −0.08
  411. −0.07
  412. −0.06
  413. −0.05
  414. −0.04
  415. Cell cycle
  416. Module score
  417. * *
  418. *
  419. Enzymes
  420. Barcoded
  421. Beads
  422. Cells
  423. (incubated with ABs)
  424. 1 Generate droplets
  425. Cell
  426. barcode
  427. UMI
  428. Full-length
  429. cDNA
  430. 2 Amplify full length
  431. cDNA library
  432. 5 Amplify locus of interest
  433. (GoT)
  434. 3 Fragment and
  435. prepare library
  436. (Short read + CITE-seq)
  437. 4 Assess alternative splicing
  438. in individual cells
  439. (Long-read using ONT)
  440. 6 Integrate whole transcriptome with genotype,
  441. protein markers and alternative splicing
  442. Transcriptome +
  443. Protein markers
  444. Alternative Splicing Genotype
  445. AAAAAAAAA AAAAAAAAA
  446. + AAAAAAAAA
  447. AAAAAAAAA
  448. AAAAAAAAA
  449. AAAAAAAAA
  450. B.
  451. (MDS01-03)
  452. MDS01-03
  453. MDS01-03
  454. SF3B1 Mutant Dataset
  455. (CD34+ sorted hematopoietic progenitors)
  456. Disease Type Patient ID Mutation Chromium 10X GoT Bulk VAF Additional
  457. Cell # Cells Genotyped (%) Mutations (VAF)
  458. Myelodysplastic
  459. Syndrome
  460. MDS01 SF3B1-K700E 757 47.2 0.395 TET2-M5331 (0.42)
  461. DNMT3A-X556 (0.37)
  462. MDS02 SF3B1-K700E 10676 80.9 0.376 None
  463. MDS03 SF3B1-K700E 4003 87.3 0.386 None
  464. Myelodysplastic
  465. Syndrome
  466. (Validation Cohort)
  467. MDS04 SF3B1-K700E 1130 22.0 0.110 ASXL1-H633Qfs*2 (0.03)
  468. IDH-R132G (0.50)
  469. MDS05 SF3B1-K700E 5235 25.7 0.320 TP53-Y234C (0.39)
  470. TYK2-S887W (0.53)
  471. MDS06 SF3B1-K700E 2514 62.0 0.379 TET2-G1172Vfs*3 (0.39)
  472. TET2-R1262P (0.49)
  473. DNMT3A-G707Afs*72 (0.46)
  474. Clonal
  475. Hematopoiesis
  476. CH01 SF3B1-K666N 2998 31.0 0.221 None
  477. CH02 SF3B1-K700E 6009 45.1 0.154 None
  478. MUT
  479. WT
  480. * P < 0.05
  481. MDS01-03
  482. TP53 pathway
  483. Module score
  484. available under aCC-BY-NC-ND 4.0 International license.
  485. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  486. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  487. 5 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  488. mutational fitness increases with differentiation into
  489. committed EPs.
  490. To further explore SF3B1 driven
  491. transcriptional dysregulation in committed EPs, we
  492. performed differential gene expression analysis
  493. between SF3B1mut and SF3B1wt cells. Mutated EPs
  494. upregulated genes encoding important translation
  495. and ribosome biogenesis factors (FDR < 0.2; Fig. 1h;
  496. Supplementary Table 5, 6), including a number of
  497. eukaryotic initiation factors (e.g., EIF3, EIF5), DEADbox helicases (e.g., DDX5, DDX17), and ribosome
  498. subunits (e.g., RPS29). This dysregulation of
  499. translational activity, or ribosomal stress signal, is
  500. evocative of studies showing that translational
  501. regulation is critical during hematopoiesis46–49, and
  502. may lead to cell- and tissue type–restricted activation
  503. of TP53 signaling pathway in myeloid disease50–55.
  504. Specifically, cells that require high levels of protein
  505. synthesis, such as erythroid progenitors, may be more
  506. sensitive to changes caused by translational
  507. dysfunction56. In line with this notion, TP53 gene
  508. target upregulation in SF3B1mut cells was more
  509. prominent in the megakaryocyte-erythroid lineage,
  510. with no increased expression of TP53-related genes in
  511. earlier progenitors (HSPCs) or in neutrophil
  512. progenitors (NPs) compared to WT cells (Fig. 1i). Our
  513. results therefore establish a molecular phenotype for
  514. the SF3B1 mutation in human bone marrow
  515. progenitors, potentially phenocopying translational
  516. dysregulation typically observed in ribosomopathies
  517. driven by germline deleterious mutations of
  518. ribosomal subunits56.
  519. Mutated EPs also upregulated genes involved
  520. in cell-cycle and checkpoint control (FDR < 0.2; Fig.
  521. 1h, j; Supplementary Table 5, 6). In particular, we
  522. observed an increase in expression of CCNE1, a
  523. positive regulator of the G1/S transition of the cell
  524. cycle57, and MDM4. The latter gene works together
  525. with TP53 during the G1/S checkpoint of the cell cycle
  526. to determine cell fate by regulating pathways such as
  527. DNA repair, apoptosis, and senescence58. Increased
  528. expression of MDM4 during ribosomal stress59
  529. prevents TP53 degradation and blocks subsequential
  530. inactivation of p21, resulting in a sustained cell
  531. proliferation60. Together, the combined upregulation
  532. of TP53, CCNE1, and MDM4 in mutated EPs may
  533. therefore lead to cell survival and accumulation rather
  534. than cell death, supporting the finding that SF3B1
  535. mutations impart a greater fitness advantage
  536. specifically in the erythroid lineage.
  537. GoT-Splice links somatic mutations, alternative
  538. splicing, and cellular phenotype at single-cell
  539. resolution
  540. Figure 1. Increased fitness advantage of SF3B1mut cells in the megakaryocytic-erythroid lineage.
  541. (A) Schematic of GoT-Splice workflow. The combination of GoT with CITE-seq and long-read full-length cDNA using Oxford
  542. Nanopore Technologies (ONT) enables the simultaneous profiling of protein and gene expression, somatic mutation status, and
  543. alternative splicing at single-cell resolution. (B) Summary of patient metadata and GoT data (after quality control) for MDS and
  544. CH samples with SF3B1 mutations. (C) Uniform manifold approximation and projection (UMAP) of CD34+ cells (n = 15,436 cells)
  545. from myelodysplastic syndrome patient samples with SF3B1 K700E mutations (n = 3 individuals), overlaid with cluster cell-type
  546. assignments. HSPC, hematopoietic stem progenitor cells; IMP, immature myeloid progenitors; MkP, megakaryocytic
  547. progenitors; MEP, megakaryocytic-erythroid progenitors; EP, erythroid progenitors; NP, neutrophil progenitors; E/B/M,
  548. eosinophil/basophil/mast progenitor cells; T/B cells; Mono, monocyte; DC, dendritic cells; Pre-B, precursors B cells; Mono DC,
  549. monocyte/dendritic cell progenitors. (D) Density plot of SF3B1mut vs. SF3B1wt cells. Genotyping information (MDS01-03) was
  550. obtained for 12,494 cells (80.9 % of all cells). (E) Normalized frequency of SF3B1 K700E MUT cells in progenitor subsets with
  551. at least 300 genotyped cells. Bars show aggregate analysis of samples MDS01-03 with mean +/- s.e.m. of 100 downsampling
  552. iterations to 1 genotyping UMI per cell. Only cell types with >300 cells were used in the analysis. P-value from likelihood ratio
  553. test of linear mixed model with or without mutation status. (F) Differential ADT marker expression between SF3B1mut and
  554. SF3B1wt cells. Red: higher expression in SF3B1mut cells; blue: higher expression in SF3B1wt cells. Size of the dot corresponds to
  555. the average expression of ADT marker across cells in a given cell-type. P-values determined through permutation testing. (G)
  556. Mutant cell fraction and ADT expression levels of CD36 and CD71 as a function of pseudotime along the megakaryocyteerythroid differentiation trajectory for SF3B1mut and SF3B1wt cells in MDS01-03. Shading denotes 95% confidence interval.
  557. Histogram shows cell density of clusters included in the analysis, ordered by pseudotime. P-values were calculated by Wilcoxon
  558. rank sum test by comparing mutant cell fraction between pseudotime trajectory quartiles. (H) Differential gene expression
  559. between SF3B1mut and SF3B1wt EP cells in MDS samples. Genes with an absolute log2(fold change) > 0.1 and P-value < 0.05 were
  560. defined as differentially expressed (DE). DE genes belonging to cell cycle (red) and translation (blue) pathways (Reactome) are
  561. highlighted (BH-FDR < 0.2). (I) Expression (mean +/- s.e.m.) of TP53 pathway related genes (Reactome) between SF3B1mut and
  562. SF3B1wt cells in progenitor cells from MDS01-03 samples. Red: module score in SF3B1mut cells; blue: module score in SF3B1wt
  563. cells. P-values from likelihood ratio test of linear mixed model with or without mutation status. (J) Same as (I) for expression of
  564. cell cycle related genes (Reactome) between SF3B1mut and SF3B1wt cells in progenitor cells from MDS01-03 samples.
  565. available under aCC-BY-NC-ND 4.0 International license.
  566. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  567. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  568. 6 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  569. The integration of GoT with single-cell
  570. proteogenomics data revealed that SF3B1 mutations
  571. reshape hematopoietic differentiation and mediate
  572. cell-identity-dependent transcriptional changes (Fig.
  573. 1). Given the pivotal role of SF3B1 in mRNA splicing,
  574. we next explored how mis-splicing may serve as a link
  575. between genotypes and cellular phenotypes. Indeed,
  576. SF3B1 mutations promote recognition of alternative
  577. branch points, most commonly leading to increased
  578. usage of aberrant 3’ splice sites29. However, previous
  579. studies in primary human samples have been
  580. performed on bulk samples admixing MUT and WT
  581. cells as well as progenitor subtypes30,32,61,62.
  582. Conversely, short-read sequencing typically employed
  583. in scRNA-seq does not adequately cover splice
  584. junctions. Recent advances suggest that long-read
  585. integration into scRNA-seq may overcome these
  586. limitations63–67. We therefore integrated GoT with fulllength ONT long-read sequencing, allowing for highthroughput, single-cell integration of genotype, cell
  587. surface proteome, gene expression, and mRNA
  588. splicing information (GoT-Splice; Fig. 1a). We note
  589. that single-cell cDNA sequencing with ONT presents
  590. unique challenges, as cDNA amplification artifacts are
  591. still productively sequenced when using standard
  592. ONT ligation chemistry. This leads to a high fraction of
  593. uninformative reads in the highly amplified single-cell
  594. libraries. To enhance ONT efficiency, we incorporated
  595. a biotin enrichment step using on-bead PCR to
  596. selectively amplify full-length reads containing intact
  597. cell barcodes and unique molecular identifiers64
  598. (UMIs; Fig. 2a). This approach increased the yield of
  599. full-length reads from 50.4 +/- 2.7 to 77.6 +/- 2.0
  600. (mean +/- s.e.m.) percent of all sequenced reads. Thus,
  601. GoT-Splice delivers high-resolution single-cell fulllength transcriptional profiles that are comparable
  602. with short-read sequencing (Fig. 2b, c). To accurately
  603. identify splice junctions using single-cell long-read
  604. sequencing, we developed an analytical pipeline that
  605. leverages the recently published SiCeLoRe pipeline64
  606. (Extended Data Fig. 6a). To reduce alignment noise,
  607. we generated a splice junction reference identified in
  608. single-cell SMART-seq2 data from human CD34+ cells
  609. with no SF3B1 mutation (see Methods). Next, we
  610. carried out intron-centric junction calling which
  611. allows for the independent measurement of splicing at
  612. both the 5’ and 3’ ends of each intron. This allows for
  613. an unbiased assessment of junctions and a greater
  614. accuracy in measuring the degree of mis-splicing of a
  615. particular transcript when compared to exon-centric
  616. quantification approaches68, which are typically used
  617. for cassette exon usage profiling and rely on
  618. predefined transcript models or splicing events, both
  619. of which may be inaccurate or incomplete69,70. As
  620. anticipated, we observed a 4-fold increase in the
  621. number of junctions per cell detected using full-length
  622. long-read sequencing over short-read, despite lower
  623. absolute number of UMI/cell (Fig. 2d). Additionally,
  624. GoT-Splice afforded greater coverage uniformity
  625. across the entire transcript, compared to 3’-biased
  626. coverage in short-read sequencing (Fig. 2e).
  627. The most common mis-splicing events
  628. observed in MDS SF3B1mut cells involved alternative 3’
  629. splice sites, accounting for 57% of alternative splicing
  630. events (Fig. 2f), consistent with prior reports29,71.
  631. Notably, the usage of such alternative 3’ splice sites
  632. was not observed in a CD34+ sample with no SF3B1
  633. mutation (Extended Data Fig. 6b). ONT long-read
  634. sequencing also allowed us to quantify the presence of
  635. different splicing events across the same mRNA
  636. transcript. While only one aberrant 3’ splice site event
  637. was observed for the majority of mRNA transcripts,
  638. we identified a total of 428 genes (21.4% of the total
  639. number of genes with at least one cryptic 3’ splice site)
  640. with more than one aberrant 3’ splice site event.
  641. Interestingly, these cryptic 3’ splicing events tend to
  642. appear in different copies of the transcript (Extended
  643. Data Fig. 6c), highlighting the unique advantages of
  644. long-read sequencing in this context. Consistent with
  645. previous MDS bulk sequencing data72,73, we observed
  646. a relative enrichment of purines upstream of the
  647. aberrant 3’ splice site when compared to the canonical
  648. 3’ splice site (Extended Data Fig. 6d).
  649. We next leveraged the unique ability of GoTSplice to resolve differential splice junction usage
  650. between SF3B1mut and SF3B1wt cells within the same
  651. primary human sample (see Methods). Of the
  652. differentially mis-spliced cryptic 3’ splice sites (those
  653. 0-100bp from the canonical splice site) between MUT
  654. and WT cells, 87% were used more highly in SF3B1mut
  655. cells (Fig. 2f, inset), aligning with known
  656. characteristics of SF3B1 mutations. Furthermore, we
  657. observed a high correlation between GoT-Splice delta
  658. PSI (dPSI; percent spliced in) measurements obtained
  659. by comparing SF3B1mut and SF3B1wt cells, and dPSI
  660. derived from bulk RNA-sequencing of CD34+ cells
  661. from SF3B1mut vs. SF3B1wt MDS samples32 for shared
  662. cryptic 3’ splice sites (Fig. 2g). In line with previous
  663. work, the majority of these cryptic 3’ splice sites were
  664. found to be ~15-20 bps upstream of the canonical 3’
  665. site29 (Fig. 3a; Extended Data Fig. 7a-d). GoT-Splice
  666. enabled the visualization of cryptic 3’ splice sites in
  667. SF3B1mut vs. SF3B1wt cells, highlighting the striking
  668. increased usage of cryptic 3’ splice sites specific to
  669. SF3B1mut (Fig. 3b). Altogether, GoT-Splice extends the
  670. ability to connect somatic mutations not only to
  671. transcriptional and cell surface protein marker
  672. available under aCC-BY-NC-ND 4.0 International license.
  673. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  674. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  675. 7 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  676. phenotypes, but also to single-cell mapping of splicing
  677. changes.
  678. GoT-Splice shows progenitor-specific patterns in
  679. SF3B1mut- mis-splicing
  680. An important advantage of GoT-Splice is the ability to
  681. detect splicing changes at single-cell resolution, which
  682. Figure 2. Simultaneous profiling of gene expression, cell surface protein markers, somatic mutation status and
  683. alternative splicing at single-cell resolution.
  684. (A) A comparison of the percentage of ONT reads with either incorrect structure (double TSO, no adaptors, single R1 or single
  685. TSO) or correct structure (full-length reads) both before and after the inclusion of a biotin enrichment protocol step during
  686. preparation for sequencing. Bars show the aggregate analysis of n = 5 samples with mean +/- s.d. of the percentage for each
  687. category. (B) Scatter plot of the correlation between the number of UMIs/cell detected in long-read ONT vs. short-read Illumina
  688. data for cells that were sequenced across both platforms for sample MDS05. (C) Density plot of the correlation between the
  689. number of UMIs/gene detected in long-read ONT vs. short-read Illumina data for sample MDS05. (D) Number of splice junctions
  690. captured in the full-length long-read ONT data compared to short-read sequencing data, showing that GoT-Splice allows for a
  691. significant increase in the number of junctions captured per cell. (E) GoT-Splice provides greater sequencing coverage
  692. uniformity compared to inadequate coverage of short-read sequencing over splice junctions, as exemplified here for the ERGIC3
  693. gene. (F) Pie chart summarizing the distribution of different alternative splicing events detected after junction annotation. Inset:
  694. Pie chart summarizing the differences in the usage of cryptic 3’ and 5’ splice site events between SF3B1mut and SF3B1wt cells
  695. measured with a dPSI (SF3B1mut PSI - SF3B1wt PSI). Associated with SF3B1mut: +ve dPSI; associated with SF3B1wt: -ve dPSI. (G)
  696. Comparison of delta percent spliced-in (dPSI) values of shared cryptic 3’ splicing events identified in the MUT vs. WT cell
  697. comparison from GoT-Splice of SF3B1mut MDS01-03 samples and in the SF3B1mut vs. SF3B1wt bulk comparison from bulk RNAsequencing of CD34+ cells of MDS samples in Pellagatti et al.
  698. 32. Correlation coefficient ρ calculated using Spearman’s correlation
  699. and P-value derived from Student's t-distribution.
  700. available under aCC-BY-NC-ND 4.0 International license.
  701. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  702. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  703. 8 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  704. enables the comparison of alternative splicing
  705. aberrations between MUT and WT cells within specific
  706. cell subsets (Fig. 3c, Supplementary Table 7). We
  707. identified both shared and unique SF3B1mut cryptic 3’
  708. splice site events across progenitor subtypes. The
  709. usage of cryptic 3’ splice sites was highest along the
  710. megakaryocyte-erythroid lineage, with SF3B1mut
  711. MEPs and EPs accounting for the majority of cell-type
  712. specific cryptic 3’ splice site events, highlighting the
  713. specific impact of SF3B1 mutations on the erythroid
  714. lineage. These progenitor specific patterns in SF3B1mut
  715. mis-splicing were further detected in the validation
  716. cohort of MDS patient samples (MDS04-06; Extended
  717. Data Fig. 7e, f). In both MDS cohorts, progenitor
  718. specific cryptic 3’ splice sites involved genes related to
  719. cell cycle (e.g., CENPT)74, RNA processing (e.g., CHTOP,
  720. SF3B175, SRSF11, PRPF38A), erythroid differentiation
  721. (e.g., CD36, FOXRED1, GATA134,76,77), and heme
  722. metabolism (e.g., UROD, PPOX, CIAO1) (Fig. 3c;
  723. Extended Data Fig. 7e, f; Supplementary Table 7,
  724. 8). Many of these genes and pathways have previously
  725. been reported to be disrupted by alternative splicing
  726. in bulk studies of SF3B1mut MDS samples32, but their
  727. cell-type specificity was unknown. For instance, while
  728. the alternative splicing event in SF3B1 itself has been
  729. suggested before as being neoplasm-specific, here we
  730. narrowed down its erythroid-specific pattern. This
  731. isoform – SF3B1ins – is predicted to affect splicing by
  732. impairing U2 snRNP assembly75, likely contributing to
  733. the enhanced mis-splicing dysregulation in the
  734. megakaryocyte-erythroid lineage. In addition, cell
  735. cycle plays a critical role in the terminal
  736. differentiation of hematopoietic stem cells78 and RNA
  737. processing, erythroid differentiation, and heme
  738. metabolism pathways are directly linked to the
  739. regulation of erythropoiesis79–81. To further validate
  740. cell-type specificity of mis-splicing events, we
  741. compared the genes with cryptic 3’ splice site events
  742. unique to MEPs and EPs in the two distinct MDS
  743. cohorts and observed significant overlap of
  744. megakaryocyte-erythroid lineage-specific aberrantly
  745. spliced genes between the discovery and the
  746. validation MDS cohorts (P-value = 0.00029, Fisher’s
  747. exact test, with 46.8% of the cryptically spliced genes
  748. in MDS also aberrantly spliced in the MDS validation
  749. cohort). In contrast, no significant overlap was
  750. observed when comparing the genes with cryptic 3’
  751. splice site events unique to MEPs and EPs in the MDS
  752. discovery cohort to genes with cryptic 3’ splice sites
  753. unique to earlier progenitor cells in the MDS
  754. validation cohort (1.6% overlap; P-value = 0.46,
  755. Fisher’s exact test; Extended Data Fig. 7f). These
  756. findings reveal that alternative splicing is cell-type
  757. and differentiation-stage dependent27,82–84.
  758. Of note, erythropoiesis occupies a continuum
  759. of cell states and is dependent on a series of
  760. transcriptional changes that occur along a continuous
  761. trajectory45. Analyzing the SF3B1mut mis-splicing along
  762. this continuum (Fig. 4a) revealed that some erythroid
  763. differentiation and heme metabolism genes can be
  764. mis-spliced more frequently at the earliest stages of
  765. EP maturation (e.g., UROD and FOXRED185), while
  766. others display increased mis-splicing in the more
  767. differentiated EPs (e.g., GYPA and PPOX). UROD is part
  768. of the heme biosynthesis pathway and not only is
  769. heme an important structural component of erythroid
  770. cells but it also plays a regulatory role in the
  771. differentiation of erythroid precursors86. PPOX
  772. encodes for an enzyme involved in mitochondrial
  773. heme biosynthesis and, as such, its degradation leads
  774. to ineffective erythropoiesis and accumulation of iron
  775. in the mitochondria typical of MDS with ring
  776. sideroblast clinical phenotype87. These results provide
  777. evidence that disruptive and pathogenic SF3B1mutdriven mis-splicing impacts key mediators of
  778. hemoglobin synthesis and erythroid differentiation at
  779. all stages of erythroid maturation88,89.
  780. We further noted that the degree of missplicing of a particular transcript (measured via PSI)
  781. positively correlated with its expression across the
  782. erythroid differentiation trajectory in some cases. In
  783. others, mis-splicing was anti-correlated with gene
  784. expression, often in cryptic 3’ splice site events that
  785. are predicted to lead to transcript degradation by the
  786. NMD pathway (Fig. 4b for representative examples).
  787. Cryptic 3’ splice sites result in the inclusion of short
  788. intronic fragments in mRNA and often introduce a
  789. premature termination codon (PTC)90–92. mRNAs
  790. harboring an NMD-inducing PTC located ≥50 bps
  791. upstream of the last exon–exon junction are predicted
  792. to undergo NMD, which in turn prevents the
  793. production of potentially aberrant proteins. In
  794. contrast, mRNAs harboring an NMD-neutral PTC,
  795. which is generally located ≤50 bps upstream of the
  796. last exon–exon junction or in the last exon, fail to
  797. trigger NMD and produce dysfunctional proteins93,94.
  798. We classified cryptic 3’ splice sites detected in the MDS
  799. samples into three major groups: (i) NMD-inducing
  800. event (due to the introduction of a PTC); (ii) NMDneutral with a frameshift event; and (iii) NMD-neutral
  801. with no frameshift event (Supplementary Table 7).
  802. In accordance with previous reports71, of the 421
  803. cryptic 3’ splice sites significantly associated with the
  804. SF3B1mut cells, 228 (54%) of these were classified as
  805. NMD-inducing events while the remaining 193 (46%)
  806. were NMD-neutral (60 events involving a frameshift
  807. and 133 events were in-frame). As expected, we
  808. available under aCC-BY-NC-ND 4.0 International license.
  809. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  810. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  811. 9 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  812. observed a significant decrease in the expression of
  813. genes harboring NMD-inducing events compared with
  814. those harboring NMD-neutral events (P-value = 0.017,
  815. Mann Whitney U test; Fig. 4c).
  816. Figure 3. Progenitor cell-type specific mis-splicing in SF3B1mut MDS.
  817. (A) Differential splicing analysis between SF3B1mut and SF3B1wt cells across MDS samples. Junctions with an absolute dPSI > 2
  818. and BH-FDR adjusted P-value < 0.2 were defined as differentially spliced. Top: Bars showing the percentage of genes
  819. differentially spliced in SF3B1mut and SF3B1wt cells in the MDS and MDS validation cohorts. Inset: Expected peak in the number
  820. of identified cryptic 3’ splice sites at the anticipated distance (15-20 base pairs) upstream of the canonical 3’ splice site in
  821. SF3B1mut cells. (B) Sashimi Plot of METTL17 intron junction with an SF3B1mut associated cryptic 3’ splice site showing RNA-seq
  822. coverage in SF3B1mut vs. SF3B1wt cells within MDS samples. Inset: Expected marked increase in the PSI value for the usage of this
  823. cryptic 3’ splice site in SF3B1mut cells. (C) Representation of dPSI values between SF3B1mut and SF3B1wt cells for cryptic 3’ splicing
  824. events identified in the main progenitor subsets across MDS samples. Rows correspond to cryptic 3’ junctions found to be
  825. differentially spliced in at least one cell-type, with P-value <= 0.05 and dPSI >= 2. Columns correspond to cell-type. Genes that
  826. belong to pathways cell cycle (purple), heme metabolism (green), oxygen homeostasis (black), RNA processing (red) and
  827. erythroid differentiation (yellow) are highlighted. The left bar plots show the fraction of differentially spliced cryptic 3’ splice
  828. sites per cell. Top bar plots quantify the total number of cell types where an event is differentially spliced, with the cell-type
  829. specific events located to the right side of the plot.
  830. available under aCC-BY-NC-ND 4.0 International license.
  831. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  832. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  833. 10 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  834. NMD-inducing events affected genes including
  835. UROD, GYPA, FOXRED1 and PPOX – key genes in
  836. erythroid development. The loss of these transcripts
  837. via NMD95,96 may thus contribute to disrupted
  838. terminal differentiation of EPs. Notable among NMDneutral affected genes, we identified BAX, a member of
  839. the Bcl-2 gene family and transcriptional target of
  840. TP53. BAX is a vital component of the apoptotic
  841. Figure 4. SF3B1mut-associated mis-splicing changes along the continuum of erythropoiesis.
  842. (A) Percent spliced-in (PSI) of junctions in SF3B1mut cells along the hematopoietic differentiation trajectory (HSPCs, IMPs, MEPs,
  843. EPs). Rows (z-score normalized) correspond to cryptic 3’ splice sites; columns represent the PSI for the usage of a given cryptic
  844. 3’ splice site in each window (size of 3000 SF3B1mut cells, sliding by 300 SF3B1mut cells). Only junctions found to be differentially
  845. spliced in at least one cell-type with a dPSI > 2 were used in the analysis. The ADT expression of erythroid lineage marker CD71,
  846. along with the fraction of cell types in each window, is shown. Rows are ordered according to the peak in PSI. Genes that belong
  847. to pathways cell cycle (purple), heme metabolism (green), oxygen homeostasis (black), RNA processing (red), erythroid
  848. differentiation (yellow) and apoptosis (blue) are highlighted. (B) Examples of mis-spliced genes at different stages of erythroid
  849. maturation. Bars represent PSI in SF3B1mut cells. Red lines represent ONT expression of the given junction in SF3B1mut cells. (C)
  850. Fold change (log2) of gene expression between SF3B1mut and SF3B1wt EP cells in NMD-inducing vs. NMD-neutral genes. (D) Gene
  851. model of BAX and relevant isoforms. Characteristic domains and their location are highlighted in BAX-ɑ, the main isoform. The
  852. cryptic 3’ splicing event on the terminal exon defines the BAX-ω isoform, characterized by the disruption of the transmembrane
  853. domain (TM) as a result of a frameshift.
  854. available under aCC-BY-NC-ND 4.0 International license.
  855. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  856. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  857. 11 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  858. cascade and in turn plays an important role in
  859. balancing the control of survival, differentiation and
  860. proliferation of EPs at later stages of erythropoiesis97
  861. (Fig. 4a). The identified BAX cryptic 3’ splice site,
  862. though NMD-neutral, causes a frameshift in the last
  863. exon, disrupting the C-terminus of the protein. This
  864. BAX isoform, previously denoted as BAX-ω (Fig. 4d),
  865. has been shown to protect cells from apoptotic cell
  866. Figure 5. SF3B1 mutation promotes accumulation of mutant cells along the erythroid lineage in clonal hematopoiesis.
  867. (A) UMAP of CD34+ cells (n = 9,007 cells) from clonal hematopoiesis (CH) samples, one with SF3B1 K700E mutation and one
  868. with SF3B1 K666N mutation (n = 2 individuals), overlaid with cluster cell-type assignments. HSPC, hematopoietic stem
  869. progenitor cells; IMP, immature myeloid progenitors; MEP, megakaryocytic-erythroid progenitors; EP, erythroid progenitors;
  870. MkP, megakaryocytic progenitors; NP, neutrophil progenitors; E/B/M, eosinophil/basophil/mast progenitor cells; Pre-B,
  871. precursors B cells. (B) UMAP of CD34+ cells from CH samples overlaid with genotyping data. WT, cells with genotype data
  872. without SF3B1 mutation; MUT, cells with genotype data with SF3B1 mutation; NA, unassignable cells with no genotype data. (C)
  873. UMAP of CD34+ cells from CH samples overlaid with pseudotemporal ordering. Inset: Pseudotime in SF3B1mut vs. SF3B1wt cells
  874. in the aggregate of CH01-02. P-value for comparison of means from Wilcoxon rank sum test. (D) Normalized ratio of mutated
  875. cells along pseudotime quartiles. Bars show aggregate analysis of samples CH01-CH02 with mean +/- s.e.m. of 100
  876. downsampling iterations to 1 genotyping UMI per cell. Only cell types with >300 cells were used in the analysis. P-value from
  877. likelihood ratio test of linear mixed model with or without mutation status. Bottom: Fraction of cell types within each
  878. pseudotime quartile. (E) Differential gene expression between SF3B1mut and SF3B1wt HSPC cells in CH samples. Genes with an
  879. absolute log2(fold change) > 0.1 and P-value < 0.05 were defined as differentially expressed (DE). DE genes belonging to the
  880. translation pathway (red, Reactome) are highlighted (BH-FDR < 0.2). (F) Gene Set Enrichment Analysis of DE genes in SF3B1mut
  881. HSPC cells across CH samples. Gene sets that overlap with SF3B1mut EP cells in MDS highlighted (red). (G) Expression (mean +/-
  882. s.e.m.) of mRNA translation-related genes (Reactome) between SF3B1mut and SF3B1wt cells in progenitor cells from CH01-02
  883. samples. P-values from likelihood ratio test of linear mixed model with or without mutation status.
  884. available under aCC-BY-NC-ND 4.0 International license.
  885. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  886. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  887. 12 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  888. death98,99. Interestingly, a recent study revealed Cterminal BAX mutations in myeloid clones that arise in
  889. chronic lymphocytic leukemia patients upon
  890. prolonged exposure to venetoclax, demonstrating a
  891. role for BAX c-terminal alterations in conferring a
  892. survival advantage to myeloid cells with this proapoptotic treatment. Of note, early clinical
  893. observations reported lower response to venetoclax
  894. in SF3B1mut AML100,101, consistent with a potential
  895. anti-apoptotic effect of BAX-ω. Together, these
  896. findings suggest a potential mechanism underlying
  897. the erythroid-dysplasia phenotype of SF3B1mut MDS.
  898. Despite the injury to translational machinery (Fig. 1hi), SF3B1mut EPs may gain some degree of protection
  899. against cell death due to the presence of isoform BAXω, arising from aberrant splicing.
  900. Accumulation of SF3B1mut cells in the erythroid
  901. progenitor population and extensive mis-splicing
  902. in clonal hematopoiesis
  903. While SF3B1 mutations are the most common genetic
  904. alterations in MDS patients, they are also associated
  905. with a high-risk of malignant transformation in clonal
  906. hematopoiesis (CH)4–8,102,103. However, the study of
  907. SF3B1 mutations directly in primary human samples
  908. has been largely limited to MDS, where confounding
  909. co-occurrence of other genetic alterations is common.
  910. Thus, CH presents a unique setting to interrogate the
  911. molecular consequences of SF3B1 mutations in nonmalignant human hematopoiesis.
  912. We therefore isolated viable CD34+ cells from
  913. two CH samples with SF3B1 mutations (VAFs: 0.15
  914. and 0.22, from CD34+ autologous grafts collected from
  915. patients with multiple myeloma in remission) and
  916. performed GoT-Splice. A total of 9,007 cells across
  917. both samples passed quality filters (Extended Data
  918. Fig. 8a) and were integrated and clustered based on
  919. transcriptome data alone, agnostic to genotyping
  920. information (Fig. 5a; Extended Data Fig. 8b).
  921. Consistent with clinical data indicating normal
  922. hematopoietic production, we identified the expected
  923. progenitor subtypes using previously annotated
  924. progenitor identity markers (Fig. 5a; Extended Data
  925. Fig. 8c, d). Genotyping data were available for 3,642
  926. cells of these 9,007 cells (40.4%) through GoT
  927. (Extended Data Fig. 9a). Finally, to exclude
  928. additional genetic lesions in these CH samples, we
  929. performed copy number analysis with scRNA-seq data
  930. and identified no significant chromosomal gains or
  931. losses (Extended Data Fig. 9b).
  932. Projection of the genotyping information onto
  933. the differentiation map (Fig. 5b), showed no novel cell
  934. identities formed by the SF3B1 mutations, consistent
  935. with the fact that patients with CH exhibit no overt
  936. peripheral blood count or morphological
  937. abnormalities. However, a differentiation pseudotime
  938. ordering analysis showed that SF3B1mut cells are
  939. enriched at later pseudotime points when compared
  940. to SF3B1wt cells (Fig. 5c; Extended Data Fig. 9c). To
  941. further identify differentiation biases in SF3B1mut CH,
  942. we evaluated the mutated cell frequencies across the
  943. different prevalent progenitor cell types, as
  944. performed in MDS (Fig. 1e). Mutated cells were
  945. enriched in more differentiated EPs compared to the
  946. earlier HSPCs (P-value < 0.001, linear mixed model,
  947. Fig. 5d; Extended Data Fig. 9d), showing that
  948. SF3B1mut CH cells already demonstrate an erythroid
  949. lineage bias.
  950. To further identify transcriptional
  951. dysregulation in SF3B1mut HSPCs, we performed
  952. differential gene expression analysis between
  953. mutated and wildtype cells. We observed a similar upregulation of genes involved in mRNA translation in
  954. the SF3B1mut HSPC in CH (Fig. 5e, f; Supplementary
  955. Table 9, 10), a pathway also observed to be
  956. upregulated in our MDS analysis (Fig. 1h). In CH,
  957. upregulation of mRNA translation pathway genes was
  958. observed across multiple cell subtypes along
  959. erythroid differentiation, while absent in NPs (Fig.
  960. 5g). Thus, although no overt blood count
  961. abnormalities are observed with SF3B1 mutation in
  962. CH individuals, both the erythroid differentiation bias
  963. and aberrant transcriptional profiles are already
  964. apparent at this early pre-disease stage.
  965. The analysis of differentially used alternative
  966. 3’ splice sites between SF3B1mut and SF3B1wt CH cells
  967. revealed a marked increase in cryptic 3’ splice site
  968. usage in SF3B1mut cells, as observed in MDS (Fig. 6a).
  969. These mutant-specific cryptic 3’ splice sites affected
  970. genes including UROD, OXAIL, SERBP1, MED6 and
  971. ERGIC3, which were also detected to be cryptically
  972. spliced in the SF3B1mut MDS cells. Importantly, the
  973. lower VAF associated with pre-malignant CH samples
  974. highlights the necessity for GoT-Splice to increase the
  975. detection of mis-splicing events occurring at low
  976. frequencies, and that may otherwise be missed in bulk
  977. sequencing studies (Fig. 6b; Extended Data Fig.
  978. 10a).
  979. To compare mis-spliced transcripts between
  980. CH and MDS, we compared cryptic 3’ splice sites with
  981. a P-value < 0.05 and dPSI of >= 2 in at least one celltype along the erythroid differentiation trajectory
  982. (HSPC, IMP, MEP or EP) in both CH and in MDS cohorts
  983. (Supplementary Table 11). While the overall
  984. number of significant cryptic 3’ splice sites in CH was
  985. lower than in MDS, we observed a significant overlap
  986. available under aCC-BY-NC-ND 4.0 International license.
  987. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  988. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  989. 13 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  990. in shared cryptic events (P-value < 10-16, Fisher’s exact
  991. test; Fig. 6c). Similarly to MDS, we identified misspliced events specific to different stages of erythroid
  992. maturation, the majority of which overlapped with
  993. MDS cryptic 3’ splice sites (Fig. 6d). Notably, CH and
  994. MDS showed similar mis-splicing dynamics in the BAX
  995. transcript along the erythroid differentiation
  996. trajectory (Fig. 6e).
  997. Figure 6. SF3B1mut clonal hematopoiesis progenitor cells display cell-type specific cryptic 3’ splice site usage.
  998. (A) Differential splicing analysis between SF3B1mut and SF3B1wt cells across CH samples. Junctions with an absolute delta percent
  999. spliced-in (dPSI) > 2 and BH-FDR adjusted P-value < 0.2 were defined as differentially spliced. (B) Sashimi Plot of ERGIC intron
  1000. junction with an SF3B1mut associated cryptic 3’ splice site showing RNA-seq coverage in SF3B1mut vs. SF3B1wt cells within CH
  1001. samples, as well as compared to the CH samples when treated as bulk (pseudobulk of all cells regardless of genotype). PSI values
  1002. showing the expected marked increase in the usage of this cryptic 3’ splice site in SF3B1mut cells alone when compared to both
  1003. SF3B1wt cells as well as all cells (pseudobulk of sample). (C) Venn Diagram of the overlap of genes with cryptic junctions
  1004. significantly differentially spliced in at least one erythroid lineage cell type (HSPCs, IMPs, MEPs, EPs) with a dPSI > 2 between
  1005. MDS01-03 and CH samples. P-value for the overlap from Fisher’s Exact test. (D) Percent spliced-in (PSI) of junctions in SF3B1mut
  1006. cells along the hematopoietic differentiation trajectory of erythroid lineage cells. Rows (z-score normalized) correspond to
  1007. cryptic 3’ splice sites; columns represent the PSI for the usage of a given cryptic 3’ splice site in each window (size of 600 SF3B1mut
  1008. cells, sliding by 60 SF3B1mut cells). Only junctions found to be differentially spliced in at least one cell type with a dPSI > 2 were
  1009. used in the analysis. Pseudotime across each window shown. Rows are ordered according to the peak in PSI. Cryptic events also
  1010. found to be differentially spiced in MDS highlighted (red). (E) Bar plots of the PSI values for the usage of the BAX-ω isoform across
  1011. each window of SF3B1mut cells in the MDS, MDS validation and CH cohorts along the hematopoietic differentiation trajectory of
  1012. erythroid lineage cells. Fraction of cell types in each window shown per cohort (MDS: SF3B1mut cells (n = 6376) ordered by CD71
  1013. expression, MDS validation: SF3B1mut cells (n = 987) ordered by pseudotime, CH: MUT cells (n = 1021) ordered by pseudotime).
  1014. available under aCC-BY-NC-ND 4.0 International license.
  1015. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  1016. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  1017. 14 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  1018. DISCUSSION
  1019. Here, we present GoT-Splice, a single-cell multi-omics
  1020. integration that enables joint profiling of genotype,
  1021. gene expression, protein, and alternative splicing all
  1022. within the same cell. GoT, as previously described15,
  1023. allows for the comparison between somatically
  1024. mutated and wildtype cells within the same sample,
  1025. for genotype to phenotype inferences. Next, by further
  1026. optimization of long-read sequencing of scRNA-seq
  1027. libraries64, we were able to simultaneously capture
  1028. both short and long-read data within the same cell,
  1029. making it possible to analyze the impact of somatic
  1030. mutations on transcriptional and splicing phenotypes.
  1031. To date, few tools are available to process and
  1032. analyze single-cell long-read data, especially for the
  1033. purpose of alternative splicing. To address existing
  1034. analytic gaps, we developed a long-read splicing
  1035. analysis pipeline that detects and quantifies
  1036. alternative splicing events within single cells and
  1037. highlights differential junction usage across cell
  1038. subpopulations. For processing the long-read data, the
  1039. pipeline integrates SiCeLoRe64 to error-correct cell
  1040. barcodes and UMIs, followed by the generation of
  1041. consensus reads. Next, unlike other isoform detection
  1042. methods that perform exon-centric junction calling
  1043. (such as SiCeLoRe, TALON63, FLAME104), we opted for
  1044. an intron-centric approach followed by split five
  1045. prime and three prime PSI measurements. Calculating
  1046. the rate of splicing at the 5’ and 3’ ends of the intron
  1047. improves the detection of the true splicing rate of each
  1048. individual intron, compared to exon-centric
  1049. approaches68. In addition, our pipeline detected
  1050. differential splicing patterns between MUT and WT
  1051. cells, both across entire samples and within individual
  1052. cell types, with sample-aware permutation testing to
  1053. integrate across samples. Finally, the pipeline includes
  1054. a functional annotation step that provides information
  1055. regarding the translational consequences of the
  1056. alternative spliced isoforms. Altogether, our pipeline
  1057. provides a comprehensive toolkit to process and
  1058. analyze differential splicing events in scRNA-seq longread data.
  1059. By applying GoT-Splice to the most common
  1060. splice-altering mutation (SF3B1), we interrogated
  1061. differentiation biases, differential gene expression,
  1062. protein expression and splicing patterns, comparing
  1063. SF3B1mut vs. SF3B1wt cells co-existing within the same
  1064. bone marrow. Importantly, while GoT revealed that
  1065. SF3B1mut cells arise early on in uncommitted HSPCs,
  1066. we observed a differentiation bias of SF3B1mut cells
  1067. toward the erythroid progenitor fate. This finding is of
  1068. particular interest given the clinical association
  1069. between SF3B1 mutations and dysplastic
  1070. erythropoiesis. Differential gene and protein
  1071. expression in erythroid progenitors revealed
  1072. signatures that may contribute to this observed
  1073. differentiation bias of SF3B1mut cells toward the
  1074. erythroid fate. Notably, an increase in cell cycle and
  1075. checkpoint gene expression (TP53, MDM4 and CCNE1)
  1076. as well as the over-expression of erythroid lineage
  1077. markers, CD36 and CD71, specifically in SF3B1mut EPs,
  1078. suggest a fitness advantage for SF3B1mut cells along the
  1079. erythroid lineage.
  1080. CH samples likewise showed erythroid biased
  1081. differentiation with higher mutated cell frequency in
  1082. committed erythroid progenitors compared with
  1083. HSPCs. This is one of the first phenotypic studies of
  1084. clonal mosaicism in human samples, and thus the
  1085. observation of a somatic mutation-related phenotype,
  1086. which aligns with the more advanced MDS phenotype,
  1087. is of particular interest. In our results, SF3B1mut CH
  1088. cells showed upregulation of genes in pathways
  1089. involved in translation and mRNA processing, similar
  1090. to SF3B1mut cells in MDS. This finding suggests that the
  1091. pervasive mis-splicing observed with SF3B1 mutations may disrupt translation, reminiscent of
  1092. ribosomopathies, which often also result in
  1093. dyserythropoiesis105,106. Interestingly, it has been
  1094. shown that overexpression of MDM4 prevents TP53
  1095. degradation and leads to TP53 complex sequestration,
  1096. which interferes with p21 activation and results in a
  1097. sustained cell proliferation. This finding aligns with
  1098. the observed upregulation of TP53 and other TP53-
  1099. related pathway genes in SF3B1mut EPs in MDS. Thus,
  1100. in addition to the shared erythroid differentiation bias
  1101. in MDS and CH, aberrant transcriptional profiles
  1102. linked to a dyserythropoiesis phenotype are also
  1103. already apparent at the pre-disease CH stage.
  1104. Leveraging the single-cell resolution of GoTSplice and differential splicing analysis between
  1105. SF3B1mut and SF3B1wt cells revealed cell-type specific
  1106. effects of SF3B1 mutations on patterns of mis-splicing.
  1107. First, key genes involved in pathways important for
  1108. terminal differentiation of hematopoietic stem cells as
  1109. well as the regulation of erythropoiesis (namely RNA
  1110. processing, erythroid differentiation, cell cycle and
  1111. heme metabolism) were found to be cryptically
  1112. spliced across distinct SF3B1mut progenitor cell types,
  1113. many of which were previously reported to be
  1114. affected in bulk studies of SF3B1mut MDS54,72,73. While
  1115. some cryptic events were neutral in their effect, many
  1116. key genes important for erythroid differentiation
  1117. were found to be NMD-inducing (e.g., UROD, GYPA,
  1118. PPOX) or cause a frameshift event that may affect
  1119. protein structure and function (e.g., BAX) in both the
  1120. primary and validation MDS cohorts. Thus, our data
  1121. suggest that mis-splicing of erythroid specific genes
  1122. available under aCC-BY-NC-ND 4.0 International license.
  1123. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
  1124. bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
  1125. 15 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
  1126. and pathways, together with the dysregulation of
  1127. apoptotic programs, may ultimately lead to the
  1128. accumulation of SF3B1mut EPs that fail to reach
  1129. terminal differentiation97, leading to the
  1130. dyserythropoiesis clinical phenotype. Importantly,
  1131. this SF3B1mut mis-splicing phenotype was already
  1132. evident in the CH samples, suggesting that the impact
  1133. of somatic CH driver mutations may be conserved
  1134. from CH to overt myeloid neoplasia.
  1135. what is it about? what issue it tries to address, why is it important and what innovation it has? lastly, what can we learn from it?
  1136. please present your result in markdown.
Advertisement
Add Comment
Please, Sign In to add comment