Data processing

Data generation

Steps taken to process the data.

Creating the most confident set of TSS and TES predictions. TSSs can be interpreted from this set as the most commonly used:

Quantifying sense and anti-sense transcription:

Calculating finding all neighboring genes in the genome, the distance between them, and their expression correlations:

Defining genome-wide TSSs by clustering CAGE transcription start sites (CTSSs) using CAGEr:

Quality control

Does the data meet our assumptions before we begin our analysis?

How well do our abundance estimates match with previously generated microarray data?

Can we comment on the technical aspects of the sequencing protocol in regards to potential GC-bias?

How do TSS predictions methods compare?

Data analysis

RNA-seq overview

Overview plots and statistics about the RNA-seq data.

Neighboring genes

What does the genome-wide view of neighboring genes look like before and after predicting full-length UTRs? How do the distances between genes correlate with their co-expression?

Promoter architecture

What can the CAGE data tell us about the falciparum genome-wide promoters? Do we see sharp and broad promoters? How many of each?

Do we see alternative transcription start sites being used often?

Transcription factor binding sites

Based on our newly predicted TSSs, can we make refined genome-wide TFBS predictions? Do these predictions give us any additional insight?

Strain comparison

What genes are differentially expressed between the three strains? What genes are differentially detected between the three strains?

Comparing 3D7, HB3 and IT:

This R Markdown site was created with workflowr