mirTools 2.0 - for non-coding RNA discovery, profiling and functional annotation based on high-throughput sequencing

            for non-coding RNA discovery, profiling and functional annotation based on high-throughput sequencing
Help documentation for the mirTools 2.0 platform
 Index
 Overview top
 Introduction top
 What can mirTools 2.0 do for you?
mirTools 2.0, the updated version of mirTools, which is aimed to identify microRNAs, detecet the targets of microRNAs, functional annotate targets genes, ncRNAs profiling. In addition, more organisms and parameters were added in mirTools 2.0.
 How to use mirTools 2.0 server?
mirTools 2.0 provides an interface that is intuitive and easy to operate. Five steps to start analysis:
  1. Input the clean reads file in fasta or SAM/BAM mapped format.
  2. Choose the options for analysis.
  3. Enter email address (optional).
  4. Click 'submit' to perform the analysis.
  5. Retrieve the result of analysis according to the Job ID.
 Workflow scheme top
Explanation of inputs top
Users can choose reference genome through a scroll bar for sequence reads alignment. Length of interval is elective with size 16-32. For sample upload, the source data are required in specified input file format (*.fa or *.zip or *.gz). Users can directly enter the sequence reads with a restricted file size of maximum 20M.

E.g. an example sequence tag format is:
>sample_260_x80 ( "sample_260" represents an unique ID, "80" is read counts)
CATTTATTATTTATCTTATTCCTTCTTCTTTTTTA


In addition, We have provided a Perl script for filtration of low quality reads, adaptor and polyA to generate the output fasta format files out of fastq format inptut files containing Genome Analyzer (Illumina Inc.) raw data.
Example sample and output are available by choosing the relevant download links.
Script Adapter_trim.pl is provided to convert the format of input file to that required by mirTools 2.0.

Description: Perl script used to filter low quality short reads, remove polyA, trim 3'/5' adapter and generate the proper input format of mirTools.
Usage: perl Adapter_trim.pl [options] >outputfile

Parameter:
-i<file> Short reads file in fastq format
-n <str>Sample name; default="sample"
-x <int>5' adaptor sequence, default="GTTCAGAGTTCTACAGTCCGACGATC";
-y <str>3' adaptor sequence, default="TCGTATGCCGTCTTCTGCTTG";
-f <int>Fastq file format: 1=Sanger format; 2=Solexa/Illumina 1.0 format; 3=Illumina 1.3+ format; default=2;
-h This help message.

Example:
perl Adapter_trim.pl -i sample.fq -n "newid" -f 1 >outputfile
perl Adapter_trim.pl -i sample.fq -x "ATCGGGCT" -y "TCGTAT" -f 3 >outputfile
 Upload filestop
Single case input
Two case input
Group case inputs.
Input the expression result file stat_table.result generated by one case or two case study.
Set algorithm parameters for analyses top
Set algorithm parameters for alignment.
mirTools 2.0 uses Soap2.20 as the mapping tools. Users can choose the different parameter to map their clean reads to reference genome. In addition, user can map the clean reads using other mapping tools by themself and supply the SAM/BAM mapping file to mirTools 2.0 web servise.

Set algorithm parameters for annotation.

microRNA annotation and profiling

miRNAs target prediction parameters
miRNAs target annotation parameters
For two cases analyisis, users need to set the parameters to determine the differential expression.
For group cases analyisis, users need to set the statistics parameters to determine the differential expression between the replicated groups.
 E-mail notification top
  • Users could enter their email address, through which automatically job ID number can be delivered to customer for further result retrieve.
  • This server takes a brief time (up to several hours) to perform the analysis.
  • The server will automatically send a notification mail with the result link to users, which remains available for one month.
Note of parameter of list submit top
  • The parameters users have choosed for the analysis
 Description of data output top
Once a job is finished, users can receive an E-mail automatically with a URL link, through which annotation results will be presented in a user-friendly interface containing the following parts.
 Basic statistics of sequence and mapping results
Basic statistics

Length distribution
Overview of sequence reads distribution and expression levels are represented in bar graphs.
Unique reads: sequence tags being only one of a particular type.
Expression levels: the number of reads for each tag, which reflects relative abundance.
2. Mapping statistics
The percentage of reads mapped to the reference genome.
3.Chromosome distribution The distribution of reads mapped to the differential chromosome.

Annotation statistics
top
1. Distribution on different genome elements
Overview of sequence reads distribution and expression levels are represented in bar graphs.
Unique reads: sequence tags being only one of a particular type.
Expression levels: the number of reads for each tag which reflects relative abundance.
2. Rfam ncRNA distribution
The percentage of reads mapped to the rfam database.
3.Repeat distribution
The distribution of reads mapped to the different repeat element..

Known miRNAs and novel miRNA top
1. Known miRNAs
Visual sequence alignments matched to a specific miRNA were listed in the left tabulated text ?les. Meanwhile, the right html tables provide detailed information mainly includes:
Absolute count: total number of microRNAs which mapped to a particular microRNAs.
Relative count: normalization of matched read counts to the total number of microRNA reads and then multiplied by 106.
Most abundant tag: columns are dedicated to show the tag ID absolute /relative count and tag sequence of maximal expression level microRNA.
 
2. Novel miRNA
All unclassified reads were considered for detecting candidate novel miRNA genes. Sequence of predicted putative miRNA and miRNA star along with the corresponding tag number, tag count and hairpin structure are provided.
Targets of known miRNAs and novel miRNA top
Functional annotation of targets genes top
Non-coding RNAs profiling top
 Differential expression analysis top
Here, we only give a short sketch of two samples analysis. As stated previously, users must provide an E-mail address and upload his data file in the appropriate files.Each analysis is submitted as a single job. In this page, users need to submit two samples for data input after which our system launches the job for the comprehensive analysis. The output pages were similar to that of single sample analysis.

1.differential expression detection
Scatter plot for display of differentially expressed miRNAs between multiple.
2. Based on total tag count: differentially expressed microRNA between two samples according to relative counts matched to specific microRNA.
Based on the most abundant tag: differentially expressed of most abundant tags between two samples according to relative counts matched to specific microRNA. Each individual point in the scattergram corresponding to a miRNA ID. The statistical significance (P-value) was inferred based on the Bayesian method.
 Group cases analysis results top
1. comparation in group.
2. comparation between groups
 Re-analyze top
Enter a job ID number into the text box and to perform the analysis on the data set you previous submitted.
 Job retrieve top
Enter a job ID number, for example: 1291213644 in the "retrieve" page, click "Submit" button.
You will be directed to the "Result" page.
 Download results of analyses top
All the results were downloadable from the website in a tab-delimited text file through "Download" button in the top of page.
For single file download, please click "Download" button.

Note:
  • Remember that we will delete all the temporary files, including your results and all your data, one month after the test finishes.

Institute of Genomic Medicine, Wenzhou Medical College
Wenzhou 325035, Zhejiang, China