Let’s do some simple mapping to do abundance estimation in the group 5 assembly.
First, move to a new directory:
cd
mkdir mapping
cd mapping
cp /class/cbrown/data/group5-assembly.fa.gz .
gunzip group5-assembly.fa
bowtie is a commonly used mapping tool.
We need to build a bowtie reference:
bowtie-build group5-assembly.fa group5
and then let’s map a subset of the reads (make it the single-ended ones, just to keep things fast):
gunzip -c ~/data/SRR492065.se.qc.fq.gz | \
bowtie -q group5 - > SRR492065.x.group5.map
gunzip -c ~/data/SRR492066.se.qc.fq.gz | \
bowtie -q group5 - > SRR492066.x.group5.map
Here you should see the following output for the first:
# reads processed: 802158
# reads with at least one reported alignment: 63226 (7.88%)
# reads that failed to align: 738932 (92.12%)
Reported 63226 alignments to 1 output stream(s)
and the second:
# reads processed: 904157
# reads with at least one reported alignment: 39572 (4.38%)
# reads that failed to align: 864585 (95.62%)
Reported 39572 alignments to 1 output stream(s)
At the moment, there seems to be no good way to do automated differential analysis of two samples, so I’ll content myself with showing you how to annotate the assembled sequences with the mapping abundances.
To do this, we will need to make two copies of the annotated assembly – one annotated with the first (SRR492065) and the other with the second (SRR492066) abundances.
python /class/stamps-software/share/khmer/sandbox/make-coverage.py group5-assembly.fa SRR492065.x.group5.map
mv group5-assembly.fa.cov group5.SRR492065.fa
python /class/stamps-software/share/khmer/sandbox/make-coverage.py group5-assembly.fa SRR492066.x.group5.map
mv group5-assembly.fa.cov group5.SRR492066.fa
What you should see now is that there’s a [cov] annotation for each sequence in every file –
head -1 group5.SRR49206?.fa
and you should see something like this –
==> group5.SRR492065.fa <== >NODE_1_length_3642_cov_7.200439[cov=1]
==> group5.SRR492066.fa <== >NODE_1_length_3642_cov_7.200439[cov=2]
This format can be uploaded directly to MG-RAST as an abundance-annotated assembly, although there’s no good way to do comparative analysis yet, it seems.
More generally, I would suggest looking into things like RSEM, DEseq, or other differential expression packages for mRNAseq.
This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.
For an introduction to the documentation format please see the reST primer.