Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- I have this paper:
- Single-cell multi-omics defines the cell-type specific impact of splicing
- aberrations in human hematopoietic clonal outgrowths
- Federico Gaiti1,2†, Paulina Chamely1,2†, Allegra G. Hawkins1,2†, Mariela Cortés-López1,2†, Ariel D. Swett1,2, Saravanan
- Ganesan1,2, Tarek H. Mouhieddine3, Xiaoguang Dai4, Lloyd Kluegel1,2, Celine Chen1,2,5, Kiran Batta6, John Beaulaurier7,
- Alexander W. Drong4, Scott Hickey7, Neville Dusaj1,2,5, Gavriel Mullokandov1,2, Jiayu Su1,8, Ronan Chaligné1,2, Sissel Juul4,
- Eoghan Harrington4, David A. Knowles1,8,9, Daniel H. Wiseman6, Irene M. Ghobrial3, Justin Taylor10, Omar AbdelWahab11*, Dan A. Landau1,2,12*
- 1New York Genome Center, New York, NY, USA. 2Division of Hematology and Medical Oncology, Department of Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA. 3Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. 4Oxford Nanopore Technologies Inc, New York, NY, USA. 5Tri-Institutional MD-PhD Program, Weill Cornell Medicine, Rockefeller University, Memorial Sloan Kettering Cancer Center, New York, NY,
- USA.
- 6Division of Cancer Sciences, The University of Manchester, Manchester, United Kingdom.
- 7Oxford Nanopore Technologies Inc, San Francisco, CA, USA. 8Department of Systems Biology, Columbia University, New York, NY, USA. 9Department of Computer Science, Columbia University, New York, NY, USA. 10Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA. 11Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 12Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- †Contributed equally to this work; *Co-corresponding authors.
- Corresponding authors contact details: Dan A. Landau, MD, PhD - Weill Cornell Medicine Belfer Research Building, 413 East 69th Street, New
- York, NY 10021. Email: [email protected]; Omar Abdel-Wahab, MD - Memorial Sloan Kettering Cancer Center, 1275 York Ave, New
- York, NY 10065. Email: [email protected].
- Keywords: Single-cell, RNA-seq, multi-omics, splicing, long-read sequencing, genotyping, clonal hematopoiesis, myelodysplastic syndrome
- ABSTRACT
- RNA splicing factors are recurrently affected by alteration-of-function mutations in clonal blood disorders, highlighting
- the importance of splicing regulation in hematopoiesis. However, our understanding of the impact of dysregulated RNA
- splicing has been hampered by the inability to distinguish mutant and wildtype cells in primary patient samples, the celltype complexity of the hematopoietic system, and the sparse and biased coverage of splice junctions by short-read
- sequencing typically used in single-cell RNA sequencing. To overcome these limitations, we developed GoT-Splice by
- integrating Genotyping of Transcriptomes (GoT) with enhanced efficiency long-read single-cell transcriptome profiling,
- as well as proteogenomics (with CITE-seq). This allowed for the simultaneous single-cell profiling of gene expression, cell
- surface protein markers, somatic mutation status, and RNA splicing. We applied GoT-Splice to bone marrow progenitors
- from patients with myelodysplastic syndrome (MDS) affected by mutations in the most prevalent mutated RNA splicing
- factor – the core RNA splicing factor SF3B1. High-resolution mapping of SF3B1mut vs. SF3B1wt hematopoietic progenitors
- revealed a fitness advantage of SF3B1mut cells in the megakaryocytic-erythroid lineage, resulting in an expansion of
- SF3B1mut erythroid progenitor (EP) cells. SF3B1mut EP cells exhibited upregulation of genes involved in regulation of cell
- cycle and mRNA translation. Long-read single-cell transcriptomes revealed the previously reported increase of aberrant
- 3’ splicing site usage in SF3B1mut cells. However, the ability to profile splicing within individual cell populations uncovered
- distinct cryptic 3’ splice site usage across different progenitor populations, as well as stage-specific aberrant splicing
- during erythroid maturation. Lastly, as splice factor mutations occur in clonal hematopoiesis (CH) with increased risk of
- neoplastic transformation, we applied GoT-Splice to CH samples. These data revealed that the erythroid lineage bias, as
- well as cell-type specific cryptic 3’ splice site usage in SF3B1mut cells, precede overt MDS. Collectively, we present an
- expanded multi-omics single-cell toolkit to define the cell-type specific impact of somatic mutations on RNA splicing, from
- the earliest phases of clonal outgrowths to overt neoplasia, directly in human samples.
- INTRODUCTION
- Genetic diversity in the form of clonal outgrowths has
- been ubiquitously observed across normal and
- malignant human tissues1–13. Likewise, phenotypic
- diversity is a hallmark of both normal and malignant
- tissues in human samples, as has been observed with
- the widespread application of single-cell RNA
- sequencing (scRNA-seq)14–20. These two axes of
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 2 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- cellular diversity likely exhibit complex interplay, as
- cell state may affect the phenotypic impact of somatic
- mutations21. Recent advances in single-cell multiomics sequencing have allowed us to reconcile these
- two aspects of cellular variability in human
- tissues15,22,23, and link genetic variation and
- transcriptional cell state diversity in somatic
- evolution. For example, through the application of
- Genotyping of Transcriptomes (GoT)15 technology,
- which enables genotyping of somatic mutations
- together with high-throughput droplet-based scRNAseq, we have previously demonstrated that the effects
- of somatic mutations on cellular fitness in blood
- myeloproliferative disorders vary as a function of
- progenitor cell identity15.
- Mutations in genes encoding RNA splicing
- factors serve as an informative example of the
- challenge of linking genotype to phenotype in complex
- human tissues. Somatic change-of-function mutations
- in RNA splicing factors are recurrent in hematologic
- malignancies24–26, highlighting the importance of
- dysregulated RNA splicing in human hematopoietic
- disorders. SF3B1 (splicing factor 3b subunit 1), a core
- component of the spliceosome complex, is a
- commonly mutated splicing factor across hematologic
- malignancies and solid tumors, and is heavily
- implicated in the pathogenesis of myelodysplastic
- syndromes (MDS)27,28. SF3B1 mutations also occur in
- subjects with clonal hematopoiesis (CH), where they
- confer increased risk of conversion to overt myeloid
- neoplasms compared to other CH driver mutations1,2.
- SF3B1 mutations result in incorrect branch point
- recognition during RNA splicing, often leading to an
- increased usage of aberrant (or cryptic) intronproximal 3’ splice sites in hundreds of genes29. Such
- aberrant 3’ splice site recognition typically results in
- the inclusion of short intronic fragments in spliced
- mRNA, which most commonly alters the frame of the
- transcript and renders it a substrate for nonsense
- mediated mRNA decay (NMD)30. Prior work has
- demonstrated that through mis-splicing, SF3B1
- mutations lead to altered cell metabolism31 and
- disruption of ribosomal biogenesis32, leading to the
- aberrant hematopoietic differentiation typical of MDS.
- While these are key advances in our understanding of
- the role of SF3B1 mutations in MDS development, the
- mechanisms through which mis-splicing leads to
- disrupted hematopoietic differentiation in humans
- remain elusive.
- To date, cell culture systems and murine
- models have been critical for elucidating the role of
- splicing factor mutations in disordered
- hematopoiesis. Nonetheless, these methods may not
- fully recapitulate MDS development in the human
- context. For example, alternatively spliced genes from
- murine models of SF3B1mut MDS, which share some
- phenotypic similarities with human MDS, show
- limited overlap with those identified in human
- samples33. The study of splice-altering mutations in
- humans has been further hampered by three
- important limitations. First, normal wildtype (WT)
- and aberrant mutated (MUT) cells are often admixed
- without discriminating cell surface markers that are
- required to uniquely isolate MUT cells, limiting the
- ability to identify signals that can be specifically linked
- to the SF3B1mut genotype. This obstacle is magnified in
- the context of CH where MUT cells commonly
- constitute a minority of the hematopoietic progenitor
- population. Second, the hematopoietic differentiation
- process yields significant complexity of cell progenitor
- types that further hinders the ability to link mutated
- genotypes with distinct cellular phenotypes. SF3B1mut
- MDS is indeed associated with a specific clinicomorphological phenotype of refractory anemia and
- accumulation of ringed sideroblasts28,34, strongly
- suggesting that the interplay between cell identity and
- SF3B1 mutations is fundamental in driving disrupted
- hematopoietic differentiation. Third, scRNA-seq by 3’
- or 5’ biased short-read sequencing is limited in its
- ability to map full-length RNA isoforms and splicing
- aberrations.
- To overcome these limitations and identify
- cell-identity-dependent mis-splicing mediated by
- SF3B1 mutations, we developed GoT-Splice by
- integrating GoT15 with long-read single-cell
- transcriptome profiling (with Oxford Nanopore
- Technologies [ONT]) as well as proteogenomics (with
- CITE-seq)35. This allowed for the simultaneous
- profiling of gene expression, cell surface protein
- markers, somatic mutation genotyping, and RNA
- splicing within the same single cell. The application of
- GoT-Splice to bone marrow progenitor samples from
- individuals with SF3B1-mutated MDS and CH revealed
- that, while SF3B1 mutations arise in uncommitted
- hematopoietic stem progenitor cells (HSPCs), their
- effect on fitness increases with differentiation into
- committed erythroid progenitors (EPs), in line with
- the SF3B1mut-driven dyserythropoiesis phenotype.
- Importantly, the integration of GoT with full-length
- isoform mapping via long-read sequencing showed
- that SF3B1 mutations exert cell-type specific missplicing, already apparent in CH long before disease
- onset.
- RESULTS
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 3 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- GoT integrated with proteogenomics reveals
- increased fitness of SF3B1mut cells in the erythroid
- lineage linked to overexpression of cell-cycle and
- mRNA translation genes
- As we have recently demonstrated that the impact of
- somatic mutations on the transcriptome varies as a
- function of underlying cell identity in
- myeloproliferative neoplasms15, we hypothesized that
- an interplay between cell identity and SF3B1
- mutations may drive disrupted hematopoietic
- differentiation in MDS. To test this, we applied GoT15
- (Fig. 1a) to CD34+ bone marrow progenitor cells from
- three untreated MDS patients with SF3B1 K700E
- mutations (discovery cohort, MDS01-03), as well as a
- distinct cohort consisting of three MDS patients
- undergoing treatment (validation cohort, MDS04-06)
- with erythropoietin (EPO) and/or granulocyte colonystimulating factor (G-CSF; Fig. 1b; Supplementary
- Table 1). As normal hematopoietic development has
- been extensively studied using flow cytometry cell
- surface markers, we further integrated GoT with
- single-cell proteogenomics (CITE-seq35,36; Fig. 1a). A
- total of 24,315 cells across the six MDS samples were
- obtained after sequencing and quality control filtering
- (Extended Data Fig. 1a, b; MDS02 was sequenced in
- two technical replicates). To chart the differentiation
- map of the CD34+ progenitor cells, we integrated the
- data across the primary MDS samples (MDS01-03), as
- well as the MDS validation samples (MDS04-06), and
- clustered based on transcriptomic data alone, agnostic
- to the genotyping and protein information (Fig. 1c;
- Extended Data Fig. 1c, d). Using previously
- annotated RNA identity markers for human CD34+
- progenitor cells37, validated via Antibody-Derived Tag
- (ADT) markers in the CITE-seq panel
- (Supplementary Table 2, 3), we identified the
- expected progenitor subtypes in the primary MDS
- cohort, along with a population of mature monocytic
- cells characterized by CD14 expression and lack of
- CD34 expression often observed in CD34+ sorting of
- human bone marrow38 (Fig. 1c; Extended Data Fig.
- 2a-c). Cell clustering was further validated using RNA
- and ADT multimodal integration (Extended Data Fig.
- 2d). The expected progenitor subtypes were similarly
- identified in the MDS validation cohort (MDS04-06;
- Extended Data Fig. 3a-c).
- Genotyping data were available for 15,650
- MDS cells (64.4% across MDS01-06) through GoT
- (Fig. 1b; Extended Data Fig. 4a-d). The per-patient
- mutant cell fractions obtained through GoT were
- highly correlated with the variant allele frequencies
- (VAFs) obtained through bulk sequencing of matched
- unsorted peripheral blood mononuclear cells
- (Pearson’s r = 0.81, P-value = 0.008; Extended Data
- Fig. 4a). Projection of the genotyping information
- onto the differentiation map demonstrated MUT and
- WT cells co-mingled throughout the differentiation
- topology (Extended Data Fig. 4c, d), highlighting the
- need for single-cell multi-omics to link genotypes with
- cellular phenotypes in SF3B1mut MDS. Although MUT
- cells were found across CD34+ progenitor cells, we
- observed an accumulation of MUT cells along the
- erythroid trajectory (Fig. 1d), suggesting that SF3B1
- mutant cell frequency (MCF) varies as a function of the
- progenitor subtype. To confirm this, we evaluated the
- MCF across the different prevalent progenitor cell
- types (limited to progenitor subsets with > 300 cells).
- Of note, as cells may display variable expression of
- SF3B1 itself, we performed amplicon UMIdownsampling to exclude sampling biases given the
- heterozygosity of the mutated allele as a potential
- confounder for observed differences in MCF (see
- Methods). Across samples, we observed a significant
- increase in MCF in the megakaryocyte-erythroid
- lineage with the highest MCF observed in EPs
- compared to HSPCs (P-value < 10-16; Fig. 1e;
- Extended Data Fig. 4e), consistent with the erythroid
- lineage-specific impact of mutated SF3B139,40.
- The ability to layer protein measurements on
- top of GoT data further allowed us to identify
- differentially expressed proteins between MUT and
- WT cells within each progenitor subset. After quality
- control filtering for ADT markers with adequate
- expression in at least two major progenitor subtypes
- (see Methods), protein expression was highest in the
- expected cell types, and correlated with mRNA
- expression, both at the individual cell as well as celltype level, comparable to previous data35 (Extended
- Data Fig. 5a, b). We directly compared protein
- expression between MUT and WT cells, accounting for
- sample-to-sample variability in mutated cells through
- downsampling (see Methods), and observed
- differential expression of CD38, CD99, CD36 and CD71
- in at least one progenitor cell-type (Fig. 1f;
- Supplementary Table 4). CD38 is a known marker
- for the transition of primitive CD34+ stem and
- progenitor cells into more committed precursor
- cells37,41,42. Its overexpression in SF3B1mut is consistent
- with the observed higher MCF in committed
- progenitor subsets. CD99, over-expressed in MUT
- immature myeloid progenitor cells (IMP) cells, was
- previously noted to be overexpressed in both AML and
- MDS stem cells, serving as a potential therapeutic
- target of malignant stem cells43,44. Finally, CD36 and
- CD71, erythroid lineage markers, were found to be
- over-expressed in MUT EPs when compared to WT
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 4 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- EPs, consistent with the SF3B1mut-driven
- dyserythropoiesis phenotype. We further leveraged
- these erythroid maturation cell surface protein
- markers to validate pseudo-temporal (pseudotime)
- ordering of the continuous process of erythroid
- maturation45 (Extended Data Fig. 5c). This analysis
- revealed an increase in MCF along erythroid lineage
- maturation (Fig. 1g), confirming that SF3B1
- Human CD34+ bone marrow
- MDS01-03
- No. of cells = 15,436
- F.
- UMAP1
- UMAP2
- WT (N = 2,498)
- MUT (N = 9,996)
- E/B/M
- HSPC IMP
- MkP
- NP
- MEP
- EP
- T/B cells
- MonoDC
- DC
- Mono
- PreB
- G.
- A.
- C.
- UMAP1
- UMAP2
- 0.7
- 0.8
- 0.9
- 1.0
- 1.1
- EP
- MEP
- MkP
- IMP
- HSPC
- NP
- Mono
- Normalized mutant cell ratio
- (only cell types with >300 cells in MDS01-03)
- P < 2.2x10-16
- H.
- HSP90B1
- CALR
- SMC1A
- LMNA
- EIF3A
- DYNLL1 SPTA1
- CCNE1
- DDX17
- PRKDC
- EIF5A
- INCENP
- DDX5
- RPS29
- CTNNB1
- PSMD12 SPTB
- EIF2S3
- INO80D
- RAD21
- EIF3E
- EIF4B
- EIF3B
- PSMD2 WSB1
- DNMT1 TP53 MDM4
- GALNT6
- P4HB
- CCDC42
- DLK1
- TIMP1
- 0.0
- 0.5
- 1.0
- 1.5
- 2.0
- 2.5
- 3.0
- 3.5
- 4.0
- 4.5
- 5.0
- −1.0 −0.5 0.0 0.5 1.0
- −log10(P−value)
- # of genes = 2,000
- Down-regulated
- in EP SF3B1mut cells
- Up-regulated
- in EP SF3B1mut cells
- log2
- (EP SF3B1mut / EP SF3B1wt gene expression)
- Translation
- Cell cycle
- CD71
- CD36
- CD99
- CD38
- Antibody-Derived Tag (ADT) markers
- 0.4 0.6 0.8 1.0 −0.2 −0.1 0.0 0.1
- ADT log10FC (MUT vs. WT) ADT expression
- * P < 0.05
- *
- *
- **
- **
- ** P < 0.01
- NP IMP HSPC MkP MEP EP 1.0
- 2.0
- 3.0
- 0.8
- 0.85
- 0.9
- 0.95
- 0.0
- 0.1
- 0.2
- 0.3
- Megakaryocyte-erythro differentiation
- Pseudotime Early Late
- EP
- MEP
- HSPC
- Density ADT expression
- ADT-CD36
- ADT-CD71
- IMP
- D.
- E.
- I.
- J.
- Mutant cell fraction
- P = 0.021 P = 0.013 P = 6.6x10-11
- HSPC
- IMP
- MEP
- EP
- NP
- −0.08
- −0.06
- −0.04
- −0.02
- HSPC
- IMP
- MEP
- EP
- NP
- −0.08
- −0.07
- −0.06
- −0.05
- −0.04
- Cell cycle
- Module score
- * *
- *
- Enzymes
- Barcoded
- Beads
- Cells
- (incubated with ABs)
- 1 Generate droplets
- Cell
- barcode
- UMI
- Full-length
- cDNA
- 2 Amplify full length
- cDNA library
- 5 Amplify locus of interest
- (GoT)
- 3 Fragment and
- prepare library
- (Short read + CITE-seq)
- 4 Assess alternative splicing
- in individual cells
- (Long-read using ONT)
- 6 Integrate whole transcriptome with genotype,
- protein markers and alternative splicing
- Transcriptome +
- Protein markers
- Alternative Splicing Genotype
- AAAAAAAAA AAAAAAAAA
- + AAAAAAAAA
- AAAAAAAAA
- AAAAAAAAA
- AAAAAAAAA
- B.
- (MDS01-03)
- MDS01-03
- MDS01-03
- SF3B1 Mutant Dataset
- (CD34+ sorted hematopoietic progenitors)
- Disease Type Patient ID Mutation Chromium 10X GoT Bulk VAF Additional
- Cell # Cells Genotyped (%) Mutations (VAF)
- Myelodysplastic
- Syndrome
- MDS01 SF3B1-K700E 757 47.2 0.395 TET2-M5331 (0.42)
- DNMT3A-X556 (0.37)
- MDS02 SF3B1-K700E 10676 80.9 0.376 None
- MDS03 SF3B1-K700E 4003 87.3 0.386 None
- Myelodysplastic
- Syndrome
- (Validation Cohort)
- MDS04 SF3B1-K700E 1130 22.0 0.110 ASXL1-H633Qfs*2 (0.03)
- IDH-R132G (0.50)
- MDS05 SF3B1-K700E 5235 25.7 0.320 TP53-Y234C (0.39)
- TYK2-S887W (0.53)
- MDS06 SF3B1-K700E 2514 62.0 0.379 TET2-G1172Vfs*3 (0.39)
- TET2-R1262P (0.49)
- DNMT3A-G707Afs*72 (0.46)
- Clonal
- Hematopoiesis
- CH01 SF3B1-K666N 2998 31.0 0.221 None
- CH02 SF3B1-K700E 6009 45.1 0.154 None
- MUT
- WT
- * P < 0.05
- MDS01-03
- TP53 pathway
- Module score
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 5 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- mutational fitness increases with differentiation into
- committed EPs.
- To further explore SF3B1 driven
- transcriptional dysregulation in committed EPs, we
- performed differential gene expression analysis
- between SF3B1mut and SF3B1wt cells. Mutated EPs
- upregulated genes encoding important translation
- and ribosome biogenesis factors (FDR < 0.2; Fig. 1h;
- Supplementary Table 5, 6), including a number of
- eukaryotic initiation factors (e.g., EIF3, EIF5), DEADbox helicases (e.g., DDX5, DDX17), and ribosome
- subunits (e.g., RPS29). This dysregulation of
- translational activity, or ribosomal stress signal, is
- evocative of studies showing that translational
- regulation is critical during hematopoiesis46–49, and
- may lead to cell- and tissue type–restricted activation
- of TP53 signaling pathway in myeloid disease50–55.
- Specifically, cells that require high levels of protein
- synthesis, such as erythroid progenitors, may be more
- sensitive to changes caused by translational
- dysfunction56. In line with this notion, TP53 gene
- target upregulation in SF3B1mut cells was more
- prominent in the megakaryocyte-erythroid lineage,
- with no increased expression of TP53-related genes in
- earlier progenitors (HSPCs) or in neutrophil
- progenitors (NPs) compared to WT cells (Fig. 1i). Our
- results therefore establish a molecular phenotype for
- the SF3B1 mutation in human bone marrow
- progenitors, potentially phenocopying translational
- dysregulation typically observed in ribosomopathies
- driven by germline deleterious mutations of
- ribosomal subunits56.
- Mutated EPs also upregulated genes involved
- in cell-cycle and checkpoint control (FDR < 0.2; Fig.
- 1h, j; Supplementary Table 5, 6). In particular, we
- observed an increase in expression of CCNE1, a
- positive regulator of the G1/S transition of the cell
- cycle57, and MDM4. The latter gene works together
- with TP53 during the G1/S checkpoint of the cell cycle
- to determine cell fate by regulating pathways such as
- DNA repair, apoptosis, and senescence58. Increased
- expression of MDM4 during ribosomal stress59
- prevents TP53 degradation and blocks subsequential
- inactivation of p21, resulting in a sustained cell
- proliferation60. Together, the combined upregulation
- of TP53, CCNE1, and MDM4 in mutated EPs may
- therefore lead to cell survival and accumulation rather
- than cell death, supporting the finding that SF3B1
- mutations impart a greater fitness advantage
- specifically in the erythroid lineage.
- GoT-Splice links somatic mutations, alternative
- splicing, and cellular phenotype at single-cell
- resolution
- Figure 1. Increased fitness advantage of SF3B1mut cells in the megakaryocytic-erythroid lineage.
- (A) Schematic of GoT-Splice workflow. The combination of GoT with CITE-seq and long-read full-length cDNA using Oxford
- Nanopore Technologies (ONT) enables the simultaneous profiling of protein and gene expression, somatic mutation status, and
- alternative splicing at single-cell resolution. (B) Summary of patient metadata and GoT data (after quality control) for MDS and
- CH samples with SF3B1 mutations. (C) Uniform manifold approximation and projection (UMAP) of CD34+ cells (n = 15,436 cells)
- from myelodysplastic syndrome patient samples with SF3B1 K700E mutations (n = 3 individuals), overlaid with cluster cell-type
- assignments. HSPC, hematopoietic stem progenitor cells; IMP, immature myeloid progenitors; MkP, megakaryocytic
- progenitors; MEP, megakaryocytic-erythroid progenitors; EP, erythroid progenitors; NP, neutrophil progenitors; E/B/M,
- eosinophil/basophil/mast progenitor cells; T/B cells; Mono, monocyte; DC, dendritic cells; Pre-B, precursors B cells; Mono DC,
- monocyte/dendritic cell progenitors. (D) Density plot of SF3B1mut vs. SF3B1wt cells. Genotyping information (MDS01-03) was
- obtained for 12,494 cells (80.9 % of all cells). (E) Normalized frequency of SF3B1 K700E MUT cells in progenitor subsets with
- at least 300 genotyped cells. Bars show aggregate analysis of samples MDS01-03 with mean +/- s.e.m. of 100 downsampling
- iterations to 1 genotyping UMI per cell. Only cell types with >300 cells were used in the analysis. P-value from likelihood ratio
- test of linear mixed model with or without mutation status. (F) Differential ADT marker expression between SF3B1mut and
- SF3B1wt cells. Red: higher expression in SF3B1mut cells; blue: higher expression in SF3B1wt cells. Size of the dot corresponds to
- the average expression of ADT marker across cells in a given cell-type. P-values determined through permutation testing. (G)
- Mutant cell fraction and ADT expression levels of CD36 and CD71 as a function of pseudotime along the megakaryocyteerythroid differentiation trajectory for SF3B1mut and SF3B1wt cells in MDS01-03. Shading denotes 95% confidence interval.
- Histogram shows cell density of clusters included in the analysis, ordered by pseudotime. P-values were calculated by Wilcoxon
- rank sum test by comparing mutant cell fraction between pseudotime trajectory quartiles. (H) Differential gene expression
- between SF3B1mut and SF3B1wt EP cells in MDS samples. Genes with an absolute log2(fold change) > 0.1 and P-value < 0.05 were
- defined as differentially expressed (DE). DE genes belonging to cell cycle (red) and translation (blue) pathways (Reactome) are
- highlighted (BH-FDR < 0.2). (I) Expression (mean +/- s.e.m.) of TP53 pathway related genes (Reactome) between SF3B1mut and
- SF3B1wt cells in progenitor cells from MDS01-03 samples. Red: module score in SF3B1mut cells; blue: module score in SF3B1wt
- cells. P-values from likelihood ratio test of linear mixed model with or without mutation status. (J) Same as (I) for expression of
- cell cycle related genes (Reactome) between SF3B1mut and SF3B1wt cells in progenitor cells from MDS01-03 samples.
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 6 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- The integration of GoT with single-cell
- proteogenomics data revealed that SF3B1 mutations
- reshape hematopoietic differentiation and mediate
- cell-identity-dependent transcriptional changes (Fig.
- 1). Given the pivotal role of SF3B1 in mRNA splicing,
- we next explored how mis-splicing may serve as a link
- between genotypes and cellular phenotypes. Indeed,
- SF3B1 mutations promote recognition of alternative
- branch points, most commonly leading to increased
- usage of aberrant 3’ splice sites29. However, previous
- studies in primary human samples have been
- performed on bulk samples admixing MUT and WT
- cells as well as progenitor subtypes30,32,61,62.
- Conversely, short-read sequencing typically employed
- in scRNA-seq does not adequately cover splice
- junctions. Recent advances suggest that long-read
- integration into scRNA-seq may overcome these
- limitations63–67. We therefore integrated GoT with fulllength ONT long-read sequencing, allowing for highthroughput, single-cell integration of genotype, cell
- surface proteome, gene expression, and mRNA
- splicing information (GoT-Splice; Fig. 1a). We note
- that single-cell cDNA sequencing with ONT presents
- unique challenges, as cDNA amplification artifacts are
- still productively sequenced when using standard
- ONT ligation chemistry. This leads to a high fraction of
- uninformative reads in the highly amplified single-cell
- libraries. To enhance ONT efficiency, we incorporated
- a biotin enrichment step using on-bead PCR to
- selectively amplify full-length reads containing intact
- cell barcodes and unique molecular identifiers64
- (UMIs; Fig. 2a). This approach increased the yield of
- full-length reads from 50.4 +/- 2.7 to 77.6 +/- 2.0
- (mean +/- s.e.m.) percent of all sequenced reads. Thus,
- GoT-Splice delivers high-resolution single-cell fulllength transcriptional profiles that are comparable
- with short-read sequencing (Fig. 2b, c). To accurately
- identify splice junctions using single-cell long-read
- sequencing, we developed an analytical pipeline that
- leverages the recently published SiCeLoRe pipeline64
- (Extended Data Fig. 6a). To reduce alignment noise,
- we generated a splice junction reference identified in
- single-cell SMART-seq2 data from human CD34+ cells
- with no SF3B1 mutation (see Methods). Next, we
- carried out intron-centric junction calling which
- allows for the independent measurement of splicing at
- both the 5’ and 3’ ends of each intron. This allows for
- an unbiased assessment of junctions and a greater
- accuracy in measuring the degree of mis-splicing of a
- particular transcript when compared to exon-centric
- quantification approaches68, which are typically used
- for cassette exon usage profiling and rely on
- predefined transcript models or splicing events, both
- of which may be inaccurate or incomplete69,70. As
- anticipated, we observed a 4-fold increase in the
- number of junctions per cell detected using full-length
- long-read sequencing over short-read, despite lower
- absolute number of UMI/cell (Fig. 2d). Additionally,
- GoT-Splice afforded greater coverage uniformity
- across the entire transcript, compared to 3’-biased
- coverage in short-read sequencing (Fig. 2e).
- The most common mis-splicing events
- observed in MDS SF3B1mut cells involved alternative 3’
- splice sites, accounting for 57% of alternative splicing
- events (Fig. 2f), consistent with prior reports29,71.
- Notably, the usage of such alternative 3’ splice sites
- was not observed in a CD34+ sample with no SF3B1
- mutation (Extended Data Fig. 6b). ONT long-read
- sequencing also allowed us to quantify the presence of
- different splicing events across the same mRNA
- transcript. While only one aberrant 3’ splice site event
- was observed for the majority of mRNA transcripts,
- we identified a total of 428 genes (21.4% of the total
- number of genes with at least one cryptic 3’ splice site)
- with more than one aberrant 3’ splice site event.
- Interestingly, these cryptic 3’ splicing events tend to
- appear in different copies of the transcript (Extended
- Data Fig. 6c), highlighting the unique advantages of
- long-read sequencing in this context. Consistent with
- previous MDS bulk sequencing data72,73, we observed
- a relative enrichment of purines upstream of the
- aberrant 3’ splice site when compared to the canonical
- 3’ splice site (Extended Data Fig. 6d).
- We next leveraged the unique ability of GoTSplice to resolve differential splice junction usage
- between SF3B1mut and SF3B1wt cells within the same
- primary human sample (see Methods). Of the
- differentially mis-spliced cryptic 3’ splice sites (those
- 0-100bp from the canonical splice site) between MUT
- and WT cells, 87% were used more highly in SF3B1mut
- cells (Fig. 2f, inset), aligning with known
- characteristics of SF3B1 mutations. Furthermore, we
- observed a high correlation between GoT-Splice delta
- PSI (dPSI; percent spliced in) measurements obtained
- by comparing SF3B1mut and SF3B1wt cells, and dPSI
- derived from bulk RNA-sequencing of CD34+ cells
- from SF3B1mut vs. SF3B1wt MDS samples32 for shared
- cryptic 3’ splice sites (Fig. 2g). In line with previous
- work, the majority of these cryptic 3’ splice sites were
- found to be ~15-20 bps upstream of the canonical 3’
- site29 (Fig. 3a; Extended Data Fig. 7a-d). GoT-Splice
- enabled the visualization of cryptic 3’ splice sites in
- SF3B1mut vs. SF3B1wt cells, highlighting the striking
- increased usage of cryptic 3’ splice sites specific to
- SF3B1mut (Fig. 3b). Altogether, GoT-Splice extends the
- ability to connect somatic mutations not only to
- transcriptional and cell surface protein marker
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 7 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- phenotypes, but also to single-cell mapping of splicing
- changes.
- GoT-Splice shows progenitor-specific patterns in
- SF3B1mut- mis-splicing
- An important advantage of GoT-Splice is the ability to
- detect splicing changes at single-cell resolution, which
- Figure 2. Simultaneous profiling of gene expression, cell surface protein markers, somatic mutation status and
- alternative splicing at single-cell resolution.
- (A) A comparison of the percentage of ONT reads with either incorrect structure (double TSO, no adaptors, single R1 or single
- TSO) or correct structure (full-length reads) both before and after the inclusion of a biotin enrichment protocol step during
- preparation for sequencing. Bars show the aggregate analysis of n = 5 samples with mean +/- s.d. of the percentage for each
- category. (B) Scatter plot of the correlation between the number of UMIs/cell detected in long-read ONT vs. short-read Illumina
- data for cells that were sequenced across both platforms for sample MDS05. (C) Density plot of the correlation between the
- number of UMIs/gene detected in long-read ONT vs. short-read Illumina data for sample MDS05. (D) Number of splice junctions
- captured in the full-length long-read ONT data compared to short-read sequencing data, showing that GoT-Splice allows for a
- significant increase in the number of junctions captured per cell. (E) GoT-Splice provides greater sequencing coverage
- uniformity compared to inadequate coverage of short-read sequencing over splice junctions, as exemplified here for the ERGIC3
- gene. (F) Pie chart summarizing the distribution of different alternative splicing events detected after junction annotation. Inset:
- Pie chart summarizing the differences in the usage of cryptic 3’ and 5’ splice site events between SF3B1mut and SF3B1wt cells
- measured with a dPSI (SF3B1mut PSI - SF3B1wt PSI). Associated with SF3B1mut: +ve dPSI; associated with SF3B1wt: -ve dPSI. (G)
- Comparison of delta percent spliced-in (dPSI) values of shared cryptic 3’ splicing events identified in the MUT vs. WT cell
- comparison from GoT-Splice of SF3B1mut MDS01-03 samples and in the SF3B1mut vs. SF3B1wt bulk comparison from bulk RNAsequencing of CD34+ cells of MDS samples in Pellagatti et al.
- 32. Correlation coefficient ρ calculated using Spearman’s correlation
- and P-value derived from Student's t-distribution.
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 8 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- enables the comparison of alternative splicing
- aberrations between MUT and WT cells within specific
- cell subsets (Fig. 3c, Supplementary Table 7). We
- identified both shared and unique SF3B1mut cryptic 3’
- splice site events across progenitor subtypes. The
- usage of cryptic 3’ splice sites was highest along the
- megakaryocyte-erythroid lineage, with SF3B1mut
- MEPs and EPs accounting for the majority of cell-type
- specific cryptic 3’ splice site events, highlighting the
- specific impact of SF3B1 mutations on the erythroid
- lineage. These progenitor specific patterns in SF3B1mut
- mis-splicing were further detected in the validation
- cohort of MDS patient samples (MDS04-06; Extended
- Data Fig. 7e, f). In both MDS cohorts, progenitor
- specific cryptic 3’ splice sites involved genes related to
- cell cycle (e.g., CENPT)74, RNA processing (e.g., CHTOP,
- SF3B175, SRSF11, PRPF38A), erythroid differentiation
- (e.g., CD36, FOXRED1, GATA134,76,77), and heme
- metabolism (e.g., UROD, PPOX, CIAO1) (Fig. 3c;
- Extended Data Fig. 7e, f; Supplementary Table 7,
- 8). Many of these genes and pathways have previously
- been reported to be disrupted by alternative splicing
- in bulk studies of SF3B1mut MDS samples32, but their
- cell-type specificity was unknown. For instance, while
- the alternative splicing event in SF3B1 itself has been
- suggested before as being neoplasm-specific, here we
- narrowed down its erythroid-specific pattern. This
- isoform – SF3B1ins – is predicted to affect splicing by
- impairing U2 snRNP assembly75, likely contributing to
- the enhanced mis-splicing dysregulation in the
- megakaryocyte-erythroid lineage. In addition, cell
- cycle plays a critical role in the terminal
- differentiation of hematopoietic stem cells78 and RNA
- processing, erythroid differentiation, and heme
- metabolism pathways are directly linked to the
- regulation of erythropoiesis79–81. To further validate
- cell-type specificity of mis-splicing events, we
- compared the genes with cryptic 3’ splice site events
- unique to MEPs and EPs in the two distinct MDS
- cohorts and observed significant overlap of
- megakaryocyte-erythroid lineage-specific aberrantly
- spliced genes between the discovery and the
- validation MDS cohorts (P-value = 0.00029, Fisher’s
- exact test, with 46.8% of the cryptically spliced genes
- in MDS also aberrantly spliced in the MDS validation
- cohort). In contrast, no significant overlap was
- observed when comparing the genes with cryptic 3’
- splice site events unique to MEPs and EPs in the MDS
- discovery cohort to genes with cryptic 3’ splice sites
- unique to earlier progenitor cells in the MDS
- validation cohort (1.6% overlap; P-value = 0.46,
- Fisher’s exact test; Extended Data Fig. 7f). These
- findings reveal that alternative splicing is cell-type
- and differentiation-stage dependent27,82–84.
- Of note, erythropoiesis occupies a continuum
- of cell states and is dependent on a series of
- transcriptional changes that occur along a continuous
- trajectory45. Analyzing the SF3B1mut mis-splicing along
- this continuum (Fig. 4a) revealed that some erythroid
- differentiation and heme metabolism genes can be
- mis-spliced more frequently at the earliest stages of
- EP maturation (e.g., UROD and FOXRED185), while
- others display increased mis-splicing in the more
- differentiated EPs (e.g., GYPA and PPOX). UROD is part
- of the heme biosynthesis pathway and not only is
- heme an important structural component of erythroid
- cells but it also plays a regulatory role in the
- differentiation of erythroid precursors86. PPOX
- encodes for an enzyme involved in mitochondrial
- heme biosynthesis and, as such, its degradation leads
- to ineffective erythropoiesis and accumulation of iron
- in the mitochondria typical of MDS with ring
- sideroblast clinical phenotype87. These results provide
- evidence that disruptive and pathogenic SF3B1mutdriven mis-splicing impacts key mediators of
- hemoglobin synthesis and erythroid differentiation at
- all stages of erythroid maturation88,89.
- We further noted that the degree of missplicing of a particular transcript (measured via PSI)
- positively correlated with its expression across the
- erythroid differentiation trajectory in some cases. In
- others, mis-splicing was anti-correlated with gene
- expression, often in cryptic 3’ splice site events that
- are predicted to lead to transcript degradation by the
- NMD pathway (Fig. 4b for representative examples).
- Cryptic 3’ splice sites result in the inclusion of short
- intronic fragments in mRNA and often introduce a
- premature termination codon (PTC)90–92. mRNAs
- harboring an NMD-inducing PTC located ≥50 bps
- upstream of the last exon–exon junction are predicted
- to undergo NMD, which in turn prevents the
- production of potentially aberrant proteins. In
- contrast, mRNAs harboring an NMD-neutral PTC,
- which is generally located ≤50 bps upstream of the
- last exon–exon junction or in the last exon, fail to
- trigger NMD and produce dysfunctional proteins93,94.
- We classified cryptic 3’ splice sites detected in the MDS
- samples into three major groups: (i) NMD-inducing
- event (due to the introduction of a PTC); (ii) NMDneutral with a frameshift event; and (iii) NMD-neutral
- with no frameshift event (Supplementary Table 7).
- In accordance with previous reports71, of the 421
- cryptic 3’ splice sites significantly associated with the
- SF3B1mut cells, 228 (54%) of these were classified as
- NMD-inducing events while the remaining 193 (46%)
- were NMD-neutral (60 events involving a frameshift
- and 133 events were in-frame). As expected, we
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 9 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- observed a significant decrease in the expression of
- genes harboring NMD-inducing events compared with
- those harboring NMD-neutral events (P-value = 0.017,
- Mann Whitney U test; Fig. 4c).
- Figure 3. Progenitor cell-type specific mis-splicing in SF3B1mut MDS.
- (A) Differential splicing analysis between SF3B1mut and SF3B1wt cells across MDS samples. Junctions with an absolute dPSI > 2
- and BH-FDR adjusted P-value < 0.2 were defined as differentially spliced. Top: Bars showing the percentage of genes
- differentially spliced in SF3B1mut and SF3B1wt cells in the MDS and MDS validation cohorts. Inset: Expected peak in the number
- of identified cryptic 3’ splice sites at the anticipated distance (15-20 base pairs) upstream of the canonical 3’ splice site in
- SF3B1mut cells. (B) Sashimi Plot of METTL17 intron junction with an SF3B1mut associated cryptic 3’ splice site showing RNA-seq
- coverage in SF3B1mut vs. SF3B1wt cells within MDS samples. Inset: Expected marked increase in the PSI value for the usage of this
- cryptic 3’ splice site in SF3B1mut cells. (C) Representation of dPSI values between SF3B1mut and SF3B1wt cells for cryptic 3’ splicing
- events identified in the main progenitor subsets across MDS samples. Rows correspond to cryptic 3’ junctions found to be
- differentially spliced in at least one cell-type, with P-value <= 0.05 and dPSI >= 2. Columns correspond to cell-type. Genes that
- belong to pathways cell cycle (purple), heme metabolism (green), oxygen homeostasis (black), RNA processing (red) and
- erythroid differentiation (yellow) are highlighted. The left bar plots show the fraction of differentially spliced cryptic 3’ splice
- sites per cell. Top bar plots quantify the total number of cell types where an event is differentially spliced, with the cell-type
- specific events located to the right side of the plot.
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 10 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- NMD-inducing events affected genes including
- UROD, GYPA, FOXRED1 and PPOX – key genes in
- erythroid development. The loss of these transcripts
- via NMD95,96 may thus contribute to disrupted
- terminal differentiation of EPs. Notable among NMDneutral affected genes, we identified BAX, a member of
- the Bcl-2 gene family and transcriptional target of
- TP53. BAX is a vital component of the apoptotic
- Figure 4. SF3B1mut-associated mis-splicing changes along the continuum of erythropoiesis.
- (A) Percent spliced-in (PSI) of junctions in SF3B1mut cells along the hematopoietic differentiation trajectory (HSPCs, IMPs, MEPs,
- EPs). Rows (z-score normalized) correspond to cryptic 3’ splice sites; columns represent the PSI for the usage of a given cryptic
- 3’ splice site in each window (size of 3000 SF3B1mut cells, sliding by 300 SF3B1mut cells). Only junctions found to be differentially
- spliced in at least one cell-type with a dPSI > 2 were used in the analysis. The ADT expression of erythroid lineage marker CD71,
- along with the fraction of cell types in each window, is shown. Rows are ordered according to the peak in PSI. Genes that belong
- to pathways cell cycle (purple), heme metabolism (green), oxygen homeostasis (black), RNA processing (red), erythroid
- differentiation (yellow) and apoptosis (blue) are highlighted. (B) Examples of mis-spliced genes at different stages of erythroid
- maturation. Bars represent PSI in SF3B1mut cells. Red lines represent ONT expression of the given junction in SF3B1mut cells. (C)
- Fold change (log2) of gene expression between SF3B1mut and SF3B1wt EP cells in NMD-inducing vs. NMD-neutral genes. (D) Gene
- model of BAX and relevant isoforms. Characteristic domains and their location are highlighted in BAX-ɑ, the main isoform. The
- cryptic 3’ splicing event on the terminal exon defines the BAX-ω isoform, characterized by the disruption of the transmembrane
- domain (TM) as a result of a frameshift.
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 11 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- cascade and in turn plays an important role in
- balancing the control of survival, differentiation and
- proliferation of EPs at later stages of erythropoiesis97
- (Fig. 4a). The identified BAX cryptic 3’ splice site,
- though NMD-neutral, causes a frameshift in the last
- exon, disrupting the C-terminus of the protein. This
- BAX isoform, previously denoted as BAX-ω (Fig. 4d),
- has been shown to protect cells from apoptotic cell
- Figure 5. SF3B1 mutation promotes accumulation of mutant cells along the erythroid lineage in clonal hematopoiesis.
- (A) UMAP of CD34+ cells (n = 9,007 cells) from clonal hematopoiesis (CH) samples, one with SF3B1 K700E mutation and one
- with SF3B1 K666N mutation (n = 2 individuals), overlaid with cluster cell-type assignments. HSPC, hematopoietic stem
- progenitor cells; IMP, immature myeloid progenitors; MEP, megakaryocytic-erythroid progenitors; EP, erythroid progenitors;
- MkP, megakaryocytic progenitors; NP, neutrophil progenitors; E/B/M, eosinophil/basophil/mast progenitor cells; Pre-B,
- precursors B cells. (B) UMAP of CD34+ cells from CH samples overlaid with genotyping data. WT, cells with genotype data
- without SF3B1 mutation; MUT, cells with genotype data with SF3B1 mutation; NA, unassignable cells with no genotype data. (C)
- UMAP of CD34+ cells from CH samples overlaid with pseudotemporal ordering. Inset: Pseudotime in SF3B1mut vs. SF3B1wt cells
- in the aggregate of CH01-02. P-value for comparison of means from Wilcoxon rank sum test. (D) Normalized ratio of mutated
- cells along pseudotime quartiles. Bars show aggregate analysis of samples CH01-CH02 with mean +/- s.e.m. of 100
- downsampling iterations to 1 genotyping UMI per cell. Only cell types with >300 cells were used in the analysis. P-value from
- likelihood ratio test of linear mixed model with or without mutation status. Bottom: Fraction of cell types within each
- pseudotime quartile. (E) Differential gene expression between SF3B1mut and SF3B1wt HSPC cells in CH samples. Genes with an
- absolute log2(fold change) > 0.1 and P-value < 0.05 were defined as differentially expressed (DE). DE genes belonging to the
- translation pathway (red, Reactome) are highlighted (BH-FDR < 0.2). (F) Gene Set Enrichment Analysis of DE genes in SF3B1mut
- HSPC cells across CH samples. Gene sets that overlap with SF3B1mut EP cells in MDS highlighted (red). (G) Expression (mean +/-
- s.e.m.) of mRNA translation-related genes (Reactome) between SF3B1mut and SF3B1wt cells in progenitor cells from CH01-02
- samples. P-values from likelihood ratio test of linear mixed model with or without mutation status.
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 12 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- death98,99. Interestingly, a recent study revealed Cterminal BAX mutations in myeloid clones that arise in
- chronic lymphocytic leukemia patients upon
- prolonged exposure to venetoclax, demonstrating a
- role for BAX c-terminal alterations in conferring a
- survival advantage to myeloid cells with this proapoptotic treatment. Of note, early clinical
- observations reported lower response to venetoclax
- in SF3B1mut AML100,101, consistent with a potential
- anti-apoptotic effect of BAX-ω. Together, these
- findings suggest a potential mechanism underlying
- the erythroid-dysplasia phenotype of SF3B1mut MDS.
- Despite the injury to translational machinery (Fig. 1hi), SF3B1mut EPs may gain some degree of protection
- against cell death due to the presence of isoform BAXω, arising from aberrant splicing.
- Accumulation of SF3B1mut cells in the erythroid
- progenitor population and extensive mis-splicing
- in clonal hematopoiesis
- While SF3B1 mutations are the most common genetic
- alterations in MDS patients, they are also associated
- with a high-risk of malignant transformation in clonal
- hematopoiesis (CH)4–8,102,103. However, the study of
- SF3B1 mutations directly in primary human samples
- has been largely limited to MDS, where confounding
- co-occurrence of other genetic alterations is common.
- Thus, CH presents a unique setting to interrogate the
- molecular consequences of SF3B1 mutations in nonmalignant human hematopoiesis.
- We therefore isolated viable CD34+ cells from
- two CH samples with SF3B1 mutations (VAFs: 0.15
- and 0.22, from CD34+ autologous grafts collected from
- patients with multiple myeloma in remission) and
- performed GoT-Splice. A total of 9,007 cells across
- both samples passed quality filters (Extended Data
- Fig. 8a) and were integrated and clustered based on
- transcriptome data alone, agnostic to genotyping
- information (Fig. 5a; Extended Data Fig. 8b).
- Consistent with clinical data indicating normal
- hematopoietic production, we identified the expected
- progenitor subtypes using previously annotated
- progenitor identity markers (Fig. 5a; Extended Data
- Fig. 8c, d). Genotyping data were available for 3,642
- cells of these 9,007 cells (40.4%) through GoT
- (Extended Data Fig. 9a). Finally, to exclude
- additional genetic lesions in these CH samples, we
- performed copy number analysis with scRNA-seq data
- and identified no significant chromosomal gains or
- losses (Extended Data Fig. 9b).
- Projection of the genotyping information onto
- the differentiation map (Fig. 5b), showed no novel cell
- identities formed by the SF3B1 mutations, consistent
- with the fact that patients with CH exhibit no overt
- peripheral blood count or morphological
- abnormalities. However, a differentiation pseudotime
- ordering analysis showed that SF3B1mut cells are
- enriched at later pseudotime points when compared
- to SF3B1wt cells (Fig. 5c; Extended Data Fig. 9c). To
- further identify differentiation biases in SF3B1mut CH,
- we evaluated the mutated cell frequencies across the
- different prevalent progenitor cell types, as
- performed in MDS (Fig. 1e). Mutated cells were
- enriched in more differentiated EPs compared to the
- earlier HSPCs (P-value < 0.001, linear mixed model,
- Fig. 5d; Extended Data Fig. 9d), showing that
- SF3B1mut CH cells already demonstrate an erythroid
- lineage bias.
- To further identify transcriptional
- dysregulation in SF3B1mut HSPCs, we performed
- differential gene expression analysis between
- mutated and wildtype cells. We observed a similar upregulation of genes involved in mRNA translation in
- the SF3B1mut HSPC in CH (Fig. 5e, f; Supplementary
- Table 9, 10), a pathway also observed to be
- upregulated in our MDS analysis (Fig. 1h). In CH,
- upregulation of mRNA translation pathway genes was
- observed across multiple cell subtypes along
- erythroid differentiation, while absent in NPs (Fig.
- 5g). Thus, although no overt blood count
- abnormalities are observed with SF3B1 mutation in
- CH individuals, both the erythroid differentiation bias
- and aberrant transcriptional profiles are already
- apparent at this early pre-disease stage.
- The analysis of differentially used alternative
- 3’ splice sites between SF3B1mut and SF3B1wt CH cells
- revealed a marked increase in cryptic 3’ splice site
- usage in SF3B1mut cells, as observed in MDS (Fig. 6a).
- These mutant-specific cryptic 3’ splice sites affected
- genes including UROD, OXAIL, SERBP1, MED6 and
- ERGIC3, which were also detected to be cryptically
- spliced in the SF3B1mut MDS cells. Importantly, the
- lower VAF associated with pre-malignant CH samples
- highlights the necessity for GoT-Splice to increase the
- detection of mis-splicing events occurring at low
- frequencies, and that may otherwise be missed in bulk
- sequencing studies (Fig. 6b; Extended Data Fig.
- 10a).
- To compare mis-spliced transcripts between
- CH and MDS, we compared cryptic 3’ splice sites with
- a P-value < 0.05 and dPSI of >= 2 in at least one celltype along the erythroid differentiation trajectory
- (HSPC, IMP, MEP or EP) in both CH and in MDS cohorts
- (Supplementary Table 11). While the overall
- number of significant cryptic 3’ splice sites in CH was
- lower than in MDS, we observed a significant overlap
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 13 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- in shared cryptic events (P-value < 10-16, Fisher’s exact
- test; Fig. 6c). Similarly to MDS, we identified misspliced events specific to different stages of erythroid
- maturation, the majority of which overlapped with
- MDS cryptic 3’ splice sites (Fig. 6d). Notably, CH and
- MDS showed similar mis-splicing dynamics in the BAX
- transcript along the erythroid differentiation
- trajectory (Fig. 6e).
- Figure 6. SF3B1mut clonal hematopoiesis progenitor cells display cell-type specific cryptic 3’ splice site usage.
- (A) Differential splicing analysis between SF3B1mut and SF3B1wt cells across CH samples. Junctions with an absolute delta percent
- spliced-in (dPSI) > 2 and BH-FDR adjusted P-value < 0.2 were defined as differentially spliced. (B) Sashimi Plot of ERGIC intron
- junction with an SF3B1mut associated cryptic 3’ splice site showing RNA-seq coverage in SF3B1mut vs. SF3B1wt cells within CH
- samples, as well as compared to the CH samples when treated as bulk (pseudobulk of all cells regardless of genotype). PSI values
- showing the expected marked increase in the usage of this cryptic 3’ splice site in SF3B1mut cells alone when compared to both
- SF3B1wt cells as well as all cells (pseudobulk of sample). (C) Venn Diagram of the overlap of genes with cryptic junctions
- significantly differentially spliced in at least one erythroid lineage cell type (HSPCs, IMPs, MEPs, EPs) with a dPSI > 2 between
- MDS01-03 and CH samples. P-value for the overlap from Fisher’s Exact test. (D) Percent spliced-in (PSI) of junctions in SF3B1mut
- cells along the hematopoietic differentiation trajectory of erythroid lineage cells. Rows (z-score normalized) correspond to
- cryptic 3’ splice sites; columns represent the PSI for the usage of a given cryptic 3’ splice site in each window (size of 600 SF3B1mut
- cells, sliding by 60 SF3B1mut cells). Only junctions found to be differentially spliced in at least one cell type with a dPSI > 2 were
- used in the analysis. Pseudotime across each window shown. Rows are ordered according to the peak in PSI. Cryptic events also
- found to be differentially spiced in MDS highlighted (red). (E) Bar plots of the PSI values for the usage of the BAX-ω isoform across
- each window of SF3B1mut cells in the MDS, MDS validation and CH cohorts along the hematopoietic differentiation trajectory of
- erythroid lineage cells. Fraction of cell types in each window shown per cohort (MDS: SF3B1mut cells (n = 6376) ordered by CD71
- expression, MDS validation: SF3B1mut cells (n = 987) ordered by pseudotime, CH: MUT cells (n = 1021) ordered by pseudotime).
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 14 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- DISCUSSION
- Here, we present GoT-Splice, a single-cell multi-omics
- integration that enables joint profiling of genotype,
- gene expression, protein, and alternative splicing all
- within the same cell. GoT, as previously described15,
- allows for the comparison between somatically
- mutated and wildtype cells within the same sample,
- for genotype to phenotype inferences. Next, by further
- optimization of long-read sequencing of scRNA-seq
- libraries64, we were able to simultaneously capture
- both short and long-read data within the same cell,
- making it possible to analyze the impact of somatic
- mutations on transcriptional and splicing phenotypes.
- To date, few tools are available to process and
- analyze single-cell long-read data, especially for the
- purpose of alternative splicing. To address existing
- analytic gaps, we developed a long-read splicing
- analysis pipeline that detects and quantifies
- alternative splicing events within single cells and
- highlights differential junction usage across cell
- subpopulations. For processing the long-read data, the
- pipeline integrates SiCeLoRe64 to error-correct cell
- barcodes and UMIs, followed by the generation of
- consensus reads. Next, unlike other isoform detection
- methods that perform exon-centric junction calling
- (such as SiCeLoRe, TALON63, FLAME104), we opted for
- an intron-centric approach followed by split five
- prime and three prime PSI measurements. Calculating
- the rate of splicing at the 5’ and 3’ ends of the intron
- improves the detection of the true splicing rate of each
- individual intron, compared to exon-centric
- approaches68. In addition, our pipeline detected
- differential splicing patterns between MUT and WT
- cells, both across entire samples and within individual
- cell types, with sample-aware permutation testing to
- integrate across samples. Finally, the pipeline includes
- a functional annotation step that provides information
- regarding the translational consequences of the
- alternative spliced isoforms. Altogether, our pipeline
- provides a comprehensive toolkit to process and
- analyze differential splicing events in scRNA-seq longread data.
- By applying GoT-Splice to the most common
- splice-altering mutation (SF3B1), we interrogated
- differentiation biases, differential gene expression,
- protein expression and splicing patterns, comparing
- SF3B1mut vs. SF3B1wt cells co-existing within the same
- bone marrow. Importantly, while GoT revealed that
- SF3B1mut cells arise early on in uncommitted HSPCs,
- we observed a differentiation bias of SF3B1mut cells
- toward the erythroid progenitor fate. This finding is of
- particular interest given the clinical association
- between SF3B1 mutations and dysplastic
- erythropoiesis. Differential gene and protein
- expression in erythroid progenitors revealed
- signatures that may contribute to this observed
- differentiation bias of SF3B1mut cells toward the
- erythroid fate. Notably, an increase in cell cycle and
- checkpoint gene expression (TP53, MDM4 and CCNE1)
- as well as the over-expression of erythroid lineage
- markers, CD36 and CD71, specifically in SF3B1mut EPs,
- suggest a fitness advantage for SF3B1mut cells along the
- erythroid lineage.
- CH samples likewise showed erythroid biased
- differentiation with higher mutated cell frequency in
- committed erythroid progenitors compared with
- HSPCs. This is one of the first phenotypic studies of
- clonal mosaicism in human samples, and thus the
- observation of a somatic mutation-related phenotype,
- which aligns with the more advanced MDS phenotype,
- is of particular interest. In our results, SF3B1mut CH
- cells showed upregulation of genes in pathways
- involved in translation and mRNA processing, similar
- to SF3B1mut cells in MDS. This finding suggests that the
- pervasive mis-splicing observed with SF3B1 mutations may disrupt translation, reminiscent of
- ribosomopathies, which often also result in
- dyserythropoiesis105,106. Interestingly, it has been
- shown that overexpression of MDM4 prevents TP53
- degradation and leads to TP53 complex sequestration,
- which interferes with p21 activation and results in a
- sustained cell proliferation. This finding aligns with
- the observed upregulation of TP53 and other TP53-
- related pathway genes in SF3B1mut EPs in MDS. Thus,
- in addition to the shared erythroid differentiation bias
- in MDS and CH, aberrant transcriptional profiles
- linked to a dyserythropoiesis phenotype are also
- already apparent at the pre-disease CH stage.
- Leveraging the single-cell resolution of GoTSplice and differential splicing analysis between
- SF3B1mut and SF3B1wt cells revealed cell-type specific
- effects of SF3B1 mutations on patterns of mis-splicing.
- First, key genes involved in pathways important for
- terminal differentiation of hematopoietic stem cells as
- well as the regulation of erythropoiesis (namely RNA
- processing, erythroid differentiation, cell cycle and
- heme metabolism) were found to be cryptically
- spliced across distinct SF3B1mut progenitor cell types,
- many of which were previously reported to be
- affected in bulk studies of SF3B1mut MDS54,72,73. While
- some cryptic events were neutral in their effect, many
- key genes important for erythroid differentiation
- were found to be NMD-inducing (e.g., UROD, GYPA,
- PPOX) or cause a frameshift event that may affect
- protein structure and function (e.g., BAX) in both the
- primary and validation MDS cohorts. Thus, our data
- suggest that mis-splicing of erythroid specific genes
- available under aCC-BY-NC-ND 4.0 International license.
- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
- bioRxiv preprint doi: https://doi.org/10.1101/2022.06.08.495292; this version posted June 9, 2022. The copyright holder for this preprint (which
- 15 F. Gaiti, P. Chamely, A. G. Hawkins, M. Cortés-López et al. (2022). BioRxiv
- and pathways, together with the dysregulation of
- apoptotic programs, may ultimately lead to the
- accumulation of SF3B1mut EPs that fail to reach
- terminal differentiation97, leading to the
- dyserythropoiesis clinical phenotype. Importantly,
- this SF3B1mut mis-splicing phenotype was already
- evident in the CH samples, suggesting that the impact
- of somatic CH driver mutations may be conserved
- from CH to overt myeloid neoplasia.
- what is it about? what issue it tries to address, why is it important and what innovation it has? lastly, what can we learn from it?
- please present your result in markdown.
Advertisement
Add Comment
Please, Sign In to add comment