===================================== 5. Mapping and abundance quantitation ===================================== .. note:: You can start from this point by taking the following files from the data on snapshot snap-8f89b092:: SRR492065.pe.qc.fq.gz SRR492066.pe.qc.fq.gz final-assembly.fa Just do:: mkdir /mnt/assembly cd /mnt/assembly cp /data/SRR492065.pe.qc.fq.gz /data/SRR492066.pe.qc.fq.gz /data/final-assembly.fa . Let's do some simple mapping to do abundance estimation in final assembly. Setup ===== First, move to a new directory:: cd /mnt mkdir mapping cd mapping cp /mnt/assembly/final-assembly.fa metagenome.fa Bowtie mapping ============== Let's start by installing `bowtie `__:: cd /root curl -O -L http://sourceforge.net/projects/bowtie-bio/files/bowtie/0.12.7/bowtie-0.12.7-linux-x86_64.zip unzip bowtie-0.12.7-linux-x86_64.zip cd bowtie-0.12.7 cp bowtie bowtie-build bowtie-inspect /usr/local/bin Next, build a bowtie reference from the assembly:: cd /mnt/mapping bowtie-build metagenome.fa metagenome and then do the mapping:: gunzip -c ../assembly/SRR492065.pe.qc.fq.gz | bowtie -p 4 -q metagenome - > SRR492065.map Do the same for the second set of reads:: gunzip -c ../assembly/SRR492066.pe.qc.fq.gz | bowtie -p 4 -q metagenome - > SRR492066.map At the moment, there seems to be no good way to do automated differential analysis of two samples, so we'll just show you how to annotate the assembled sequences with the mapping abundance. This will allow MG-RAST to properly weight annotation calls. To do this, we will need to make two copies of the annotated assembly -- one annotated with the first (SRR492065) and the other with the second (SRR492066) abundances. :: python /usr/local/share/khmer/sandbox/make-coverage.py metagenome.fa SRR492065.map mv metagenome.fa.cov metagenome.SRR492065.fa python /usr/local/share/khmer/sandbox/make-coverage.py metagenome.fa SRR492066.map mv metagenome.fa.cov metagenome.SRR492066.fa What you will see now is that there's a [cov] annotation for each sequence in every file -- try:: head -4 metagenome.SRR492065.fa and you should see:: >testasm.1[cov=259] CAATTTATTTAAATTTTTCTACGATTCCAACA... >testasm.2[cov=610] ATTCTACTAATGTCATCTTTTTACCTTCTAGA... This format can be uploaded directly to MG-RAST as an abundance-annotated assembly, although there's no good way to do comparative analysis yet. Next: :doc:`6-annotating`