Input file introduction

Guide line

Fastq file

RRBSAnalyser could receive raw fastq reads, paired end or single end.

Part one: fastq file name format

>> Single end: SampleName_1.fq.tar or *.tar.gz or *.gz or *.tar.bz2.

>> Paired end: Name SampleName_1.fq.tar or *.tar.gz or *.gz or *.tar.bz2 for forward reads;

     Name SampleName_2.fq or *.tar.gz or *.gz or *.tar.bz2 for reversed reads;

Part two: fastq format

A FASTQ file normally uses four lines per sequence.
Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line).
Line 2 is the raw sequence letters.
Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again.
Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.

A minimal FASTQ file might look like this:

Sam file

Part one: sam file name format

>>SampleName.sam or *.tar.gz or *.gz or *.tar.bz2.

Part two: sam format

SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. If present, the header must be prior to the alignments. Header lines start with '@', while alignment lines do not. Each alignment line has 15 fields, like bsmap sam result.

A minimal sam file might look like this:

Mc file

Part one: mc file name format

>>SampleName.mc or *.tar.gz or *.gz or *.tar.bz2.

Part two: mc format

A mc file is a TAB-delimited text format consisting ten ranks, each rank like this:

1) chromosome
2) coordinate (1-based)
3) strand
4) c pattern
5) sequence context (2nt upstream to 2nt downstream in Watson strand direction)
6) methylation ratio
7) number of unconverted Cs covering this locus (bisulfite treatment)
8) number of converted Cs covering this locus (bisulfite treatment)
9) lower bound of 95% confidence interval of methylation ratio
10) upper bound of 95% confidence interval of methylation ratio

A minimal mc file might look like this:

>>Methylation level is used for describing the average methylation status of a genomic region.

So, methylation level (region)= Total #7 rank / ( Total #7 rank + Total #8 rank).
#7 rank (unconverted Cs): Number of reads supporting the cytosine is methylated in this site.
#8 rank (converted Cs): Number of reads supporting the cytosine is unmethylated in this site.