The introduction of the RNA-Seq technology based on SGS has provided a remarkable step forward providing a fast and inexpensive click here way to determine the transcriptome of a given cell type and several remarkable works have been done using this type of approach [1, 2 and 3••]. Nonetheless tasks like de novo discovery of genes, gene isoforms assembly or transcript and isoform abundance determination are still challenging and far from being achieved. Recently, we developed a new tool (IDP) to integrate SGS and Third Generation Sequencing (TGS) data from human Embryonic Stem Cells (H1 cell line) and identified 13,543 transcripts with false positive rate lower 5%, including 2103 novel transcripts
and 216 novel genes, 146 of which were deemed hESCs-specific [ 4••]. In this review we discuss the importance and the current challenges in identifying the accurate transcriptome of hESCs and human Induced Pluripotent Stem Cells (hiPSCs) and show evidence of the reliability of IDP in detecting and predicting annotated and novel genes and their isoforms. Many studies have revealed that human Pluripotent Stem Cells (hPSCs, term that includes hESCs and hiPSCs) are characterized by transcriptionally permissible chromatin (i.e. accessible to a variety of transcription and remodeling factors), a state
compatible with increased global expression of genes and gene isoforms [5]. The transcriptionally permissive chromatin is characterized by distinct epigenetic marks (e.g. histone modifications) that define two diverse types of genes: genes that are active in the undifferentiated state Rapamycin and genes that are inactive (or expressed at very low levels) but “poised” for expression and that characterize more differentiated cell types [6]. Given such complexity of the epigenetic status for most of the genes, it is essential to identify the transcripts and the isoforms that are indeed functionally relevant (even if expressed at low levels) in PSCs and those on the other hand that have a very low level
of activation because transcribed from loci that are only “poised” ID-8 for transcription but not really relevant at this stage of development. A definitive answer to this problem would be provided by the validation of expression of transcripts observed by RNA-Seq (e.g. with other assays like RT-PCR) and most importantly by functional studies. Although RNA-Seq data have been produced from pluripotent cell samples, such as embryonic stem cells and preimplantation embryos at different developmental stages (from zygote to late blastocyst) [3••, 7• and 8•], experimental validation of novel transcript expression and functional analysis of many mRNAs is still lacking. The vast majority of most recent research has focused on determining the regulatory network of the well characterized pluripotency genes, such as OCT4, SOX2 and NANOG, or have concentrated on seeking for new markers from already annotated genes, such as ZFP296 [9].