Next Generation Sequencing (NGS)/SOAPdenovo

From Wikibooks, open books for an open world
< Next Generation Sequencing (NGS)
Jump to: navigation, search

We get some E coli data from SRR001665 you could type

 wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR001/SRR001665/SRR001665_1.fastq.gz
 wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR001/SRR001665/SRR001665_2.fastq.gz

unpack the two files

 gunzip SRR001665_1.fastq.gz
 gunzip SRR001665_2.fastq.gz

You will need to get SOAPdenovo and the data prepare module

 wget http://soap.genomics.org.cn/down/x86_64.linux/SOAPdenovo31mer.tgz
 tar xvzf SOAPdenovo31mer.tgz
 

Also we have to make a config file. We name this cont.config

 #maximal read length
 max_rd_len=36
 [LIB]
 #average insert size
 avg_ins=200
 #if sequence needs to be reversed 
 reverse_seq=0
 #use for contig building only
 asm_flags=1
 #in which order the reads are used while scaffolding
 rank=1
 #fastq files
 q1=./SRR001665_1.fastq
 q2=./SRR001665_2.fastq


And then we scaffold using a Kmer size of 31 (the read length is 36). We use the whole SOAP pipeline by specifying the "all" parameter By setting asm_flags to 3 the same library would be used for scaffolding as well. In this case SOAP will terminate in the scaffolding step with a floating point exception as there is nothing to scaffold with. Contigs will be found nevertheless in EC.contigs.

 ./SOAPdenovo31mer all -K 31 -s cont.config -o EC