Quick-start guide manual

Index

Overview

 What can RRBS-Analyser do for you?

The web server RRBS-Analyser was developed for comprehensive characterize the methylome with one or multiple samples which methylcytosines enriched fragments collected by certain restriction enzyme. It is intended to assess the sequencing quality, identify and annotate the methylation sites, inspect the differentially methylated regions. It provides a very convenient way to annotate the methylation sites and differentially methylated regions based on multiple samples by upload SAM or MC files. It can also allow submit FASTQ file for quality accessments of sequencing data and methylation sites annotation.

How to use RRBS-Analyser server?

RRBS-Analyser provides an interface that is intuitive and easy to operate. Five steps to start RRBS-Analyser analysis:

  1. Upload FASTQ or SAM or MC files in Upload or Analysis page.
  2. Options for multiple architectures of filtering alignment and annotation.
  3. Enter email address for results notification.
  4. Click 'submit' to perform the analysis.
  5. Retrieve the results of FASTQ or SAM or MC file analysis according to the Job ID in Results page.

Workflow scheme

Results presentation

Explanation of inputs

Upload FASTQ file

Cloud storage is simply a term that refers to online space that you can use to store your data. It provides a convenient way of remotely storing your important data. Cloud storage can provide the benefits of greater accessibility and reliability, rapid deployment, and archival and disaster recovery purposes.Cloud storage solutions are usually provided using a large network of virtual servers that also come with tools for managing files and organizing your virtual storage space.This infrastructure is installed and managed by highly specialized providers. Box (https://www.box.com) is a cloud storage service that allows individuals to upload and access files from any device with an Internet connection quickly and easily. Box can be used to share files and collaborate with any individual with an email address. BOX is also a pioneer in robust content-management security. BOX is used as one of ways for RRBS-analyser data uploading. As the Internet is constantly changing, a public interface that are more suitable to cloud-based systems may emergent in the future, RRBS-server will update the interface for the users.

In this page, only one sample is allowed for quality assessments of sequencing data, methylation sites annotation. Both single end and paired end sequencing data allowed. Firstly upload files to box, then input.

>> Step 1: Upload FASTQ files to FTP address or box, the source data are required in specified input file format (*.fq.tar.bz2 or *.fq.tar.gz or *.fq.tar or *.fq.gz). Upload file with a restricted file size of maximum 2G.

>> Step 2: Follow the prompts to input file names.

>> Step 3: Users can choose reference genome through scroll bars to choose specific category and species for sequence reads alignment.

>> For default test data, above three steps could be ignored!

>> Format introduction of FASTQ file: you can find detail format information in " Input file introduction" page.

Upload SAM files

In this page, one sample or multiple samples is allowed, single sample only for methylation sites annotation, multiple samples for methylation sites annotation, differentially methylated regions identification. For multiple samples, touch the "Add more" button and then upload each sample files separately.

>> Step 1: Upload SAM files to FTP address or box, the source data are required in specified input file format (*.sam.tar.bz2 or *.sam.tar.gz or *.sam.tar or *.sam.gz). Upload file with a restricted file size of maximum 2G.

>> Step 2: Follow the prompts to input file names.

>> Step 3: Users can choose reference genome through scroll bars to choose specific category and species for sequence reads alignment.

>> For default test data, above three steps could be ignored!

>> Step 2: Users can choose reference genome through scroll bars to choose specific category and species for annotation file selection.

>> Format introduction of SAM file: you can find detail format information in " Input file introduction" page.

Upload MC files

In this page, one sample or multiple samples is allowed, single sample only for methylation sites annotation, multiple samples for methylation sites annotation, differentially methylated regions identification. For multiple samples, touch the "Add more" button and then upload each sample files separately.

>> Step 1: Upload MC files to FTP address or box, the source data are required in specified input file format (*.mc.tar.bz2 or *.mc.tar.gz or *.mc.tar or *.mc.gz). Upload file with a restricted file size of maximum 20M each sample.

>> Step 2: Follow the prompts to input file names.

>> Step 3: Users can choose reference genome through scroll bars to choose specific category and species for sequence reads alignment.

>> For default test data, above three steps could be ignored!

>> Format introduction of MC file: you can find detail format information in " Input file introduction" page.

Select parameters for FASTQ/ SAM/MC files

>> Quality assessment for FASTQ file

For the sequencing quality analysis of FASTQ raw data, there are several parameters that users can set for assessment, such as sequencing quality value, quality score to filter, shortest length of the remaining reads, error rate threshold for filter, enzyme type, adapters for filter.

>> Bisulfite treated reads alignment for FASTQ file

For the sequencing alignment analysis of FASTQ raw data, there are several parameters that users can set for alignment, such as maximum number of mismatches allowed on a read, maximum number of equal best hits to count, restriction enzyme digestion sites, maximum number of N in read for filter, mapping protocol.

>> DNA methylation analysis for FASTQ SAM or MC file

For the whole genome methylation analysis of FASTQ or SAM or MC file, there are several parameters that users can set for analysis, such as sequencing depth filter(n value),element which (element region overlapped captured reads)bp/(element region)bp > P will be selected for methylation analysis in a map(p value) , Methylation scales(l1,l2,l3,l4).
PE value is useful only for SAM file(if users select "no", which means the uploaded SAM format data generated by single end sequencing technology, otherwise paired end sequencing technology.).

>> DMR analysis for multiple SAM or MC files

For the differently methylated regions analysis of multiple SAM or MC files, there are several parameters that users can set for analysis, such as lowest coverage of cytosine reads to use, methods of detect DMRs, cytosine type for detection, lowest coverage of cytosine reads to use, lowest number of selected type of cytosine in the window, step size of the sliding processes, max/min methylation level difference, value of max-min methylation level, lowest length to join two fragments into one, p value to judge as a DMR, fdr to adjust DMR p value, left flanking region length of DMR in a map, right flanking region length of DMR in a map .

E-mail notification

  • Users can enter their email address or not, through which automatically job ID number can be delivered to customer for further result retrieve.
  • This server takes a brief time (up to several hours) to perform the analysis.
  • The server will automatically send a notification mail with the result link to users, if email address has been filled.

Feedback

About us

Links

In this page, users can find some useful links for DNA methylation analysis.

Analysis/Upload

In this page, users can choose MC format or SAM format or FASTQ format file to upload by touch button or button or button separately, then upload file, choose reference genome, select parameters, support email address, finally, touch the "submit" button. If successfully upload file, the following page will show up.
Please do remember your job id, which used for get your results.

Reanalysis

The function of this page is similar to "Analysis/Upload" page, if you have upload file successfully and finish your analysis, while want to adjust parameters and reanalysis the data, this page will satisfy you.

Results

Test data for RRBSAnalyser:

HeLa-S3:  ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX%2FSRX066%2FSRX066421/SRR222554/SRR222554.sra

Caco-2:  ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR531/SRR531451/SRR531451.sra

MCF-7:  ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR531/SRR531432/SRR531432.sra

The corresponding SAM and MC format data generated by our pipeline with above three FASTQ files.

SAM/MC results

You can find detail explain information of "SAM/MC results" in "Example results illustration". The three output results including 'Basic statistics','Whole methylation','DMR results', you can scan these results by touching title buttons and get the corresponding results in the bottom of each picture.

The time cost of the analysis on given MC format samples: about four hours.

The time cost of the analysis on given SAM format samples: about six hours.

FASTQ results

You can find detail explain information of "FASTQ results" in "Example results illustration", then touch the "FASTQ results". The four output results including 'Basic statistics','Quality accessment','Whole methylation' and 'mCG region associated gene', you can scan these results by touching title buttons and get the corresponding results in the bottom of each picture.

The time cost of the analysis on given FASTQ format samples: about 2 hours.

Kindly notice: Due to the large anlysis data set, please only submit one job at a time.

Annotation

All the genomic functional elements, gene and gene region annotation files can be download from UCSC.

    >>  The Promoter region defined as upstream 1200bp from the transcription start site(TSS) and downstream 300bp from TSS

    >>  The Upstream2k region defined as upstream 2000bp from the transcription start site(TSS)

    >>  The Downstream2k region defined as downstream 2000bp from the transcription end site(TES)

Hg19 annotation files:

Download from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of Enhancers region from file vistaEnhancers.txt.gz

  >>  The information of PseudoGene region from file vegaPseudoGene.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE, tRNA and so on regions from file nestedRepeats.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR regions from file refGene.txt.gz

Hg18 annotation files:

Download from http://hgdownload.soe.ucsc.edu/goldenPath/hg18/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of Enhancers region from file vistaEnhancers.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE regions from file nestedRepeats.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR region from file refGene.txt.gz

Chicken (galGal4) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/galGal4/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE tRNA and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR region from file refGene.txt.gz

Chicken (galGal3) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/galGal3/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, DNA, Low_complexity regions from file chr*_rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

chimpanzee (panTro3) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/panTro3/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

chimpanzee (panTro2) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file chr*_rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

cow (bosTau7) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/bosTau7/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

cow (bosTau6) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/bosTau6/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

dog (canFam3) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/canFam3/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

dog (canFam2) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

mouse (mm10) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

mouse (mm9) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file chr*_rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

rat (rn5) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/rn5/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

rat (rn4) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/rn4/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file chr*_rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

sheep (oviAri1) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/oviAri1/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file chr*_rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

zebrafish (danRer7) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/danRer7/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz

zebrafish (danRer6) annotation files:

Download from http://hgdownload.cse.ucsc.edu/goldenPath/danRer6/database/ page:

  >>  The information of CGI region from file cpgIslandExt.txt.gz

  >>  The information of LINE, LTR, Satellite, SimpleRepeat, SINE and so on regions from file rmsk.txt.gz

  >>  The information of Intergenic, 5-UTR, Promoter, CDS, Intron, 3-UTR, Upstream2k, Downstream2k region from file refGene.txt.gz