6. Mapping and abundance quantitation

Let’s do some simple mapping to do abundance estimation in the group 5 assembly.

Setup

First, move to a new directory:

cd
mkdir mapping
cd mapping

cp /class/cbrown/data/group5-assembly.fa.gz .

gunzip group5-assembly.fa

Bowtie mapping

bowtie is a commonly used mapping tool.

We need to build a bowtie reference:

bowtie-build group5-assembly.fa group5

and then let’s map a subset of the reads (make it the single-ended ones, just to keep things fast):

gunzip -c ~/data/SRR492065.se.qc.fq.gz | \
               bowtie -q group5 - > SRR492065.x.group5.map
gunzip -c ~/data/SRR492066.se.qc.fq.gz | \
               bowtie -q group5 - > SRR492066.x.group5.map

Here you should see the following output for the first:

# reads processed: 802158
# reads with at least one reported alignment: 63226 (7.88%)
# reads that failed to align: 738932 (92.12%)
Reported 63226 alignments to 1 output stream(s)

and the second:

# reads processed: 904157
# reads with at least one reported alignment: 39572 (4.38%)
# reads that failed to align: 864585 (95.62%)
Reported 39572 alignments to 1 output stream(s)

At the moment, there seems to be no good way to do automated differential analysis of two samples, so I’ll content myself with showing you how to annotate the assembled sequences with the mapping abundances.

To do this, we will need to make two copies of the annotated assembly – one annotated with the first (SRR492065) and the other with the second (SRR492066) abundances.

python /class/stamps-software/share/khmer/sandbox/make-coverage.py group5-assembly.fa SRR492065.x.group5.map
mv group5-assembly.fa.cov group5.SRR492065.fa

python /class/stamps-software/share/khmer/sandbox/make-coverage.py group5-assembly.fa SRR492066.x.group5.map
mv group5-assembly.fa.cov group5.SRR492066.fa

What you should see now is that there’s a [cov] annotation for each sequence in every file –

head -1 group5.SRR49206?.fa

and you should see something like this –

==> group5.SRR492065.fa <== >NODE_1_length_3642_cov_7.200439[cov=1]

==> group5.SRR492066.fa <== >NODE_1_length_3642_cov_7.200439[cov=2]

This format can be uploaded directly to MG-RAST as an abundance-annotated assembly, although there’s no good way to do comparative analysis yet, it seems.

More generally, I would suggest looking into things like RSEM, DEseq, or other differential expression packages for mRNAseq.


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github.
comments powered by Disqus

Table Of Contents

This Page




Edit this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to 6. Mapping and abundance quantitation on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.