MEGA6: Constructing a Phylogenetic Tree on Multiple Sequences

使用BioEdit生成和绘制蛋白质进化树  虽然方法简单、上手快,但是由于 BioEdit 功能有限,使用起来很不方便。如果经常进行类似分析,那还是要使用 MEGA 6 这一个系统进化分析的“大杀器”。

第一步:下载和安装 MEGA 6。各平台均有发行版,在此不再赘述。

第二步:多序列比对,这是构建系统进化树的必要条件。

点击第一个按钮,选择 Edit/Build Alignment。
mega0

正如题目所说的,我们需要选择一个含有多个序列的 Fasta 文件。
mega1

从菜单选择 Alignment – Align By xxx 如果序列比较多,那比对需要花上一些时间。

mega2

第三步:进化分析。
在比对完成的窗口中直接选择 Phylogenetic Analysis。
mega3

第四步:构建进化树。
在主窗口页面上点击 Phylogeny 这个按钮,选择一种方法构建进化树,
mega6

点击 Compute 开始计算。
mega4

现在你将得到你所需要的进化树了。
mega7

Massive RT-PCR Primer Designer is online

The Massive RT-PCR Primer Designer is based on Primer3 and BioPerl toolbox, and was derived from the standalone codes which have been discussed by two previous blog posts (Post 1 and Post 2).

rt-primer-designer

You may access the service at The Massive RT-PCR Primer Designer.

What this tool needs is your fasta file, which contains one or more nucleotide sequences that will be use as PCR template. Upload this file and the tool will generate the most ideal primer pairs suitable for quantitive Real-Time PCR (qRT-PCR). The primers will be supplied as a downloadable Excel file.

Notably, the amplicon size of these RT-PCR primers are spanning from 150 bp to 250 bp.

Powered by AliYun, the service is available at 7×24h.

By the way, apply this promotion code u6fovy when you buy Aliyun service for a 10% OFF.

ContigExpress-Assemble pair-end sequencing fragments

送样测序的时候,测序公司会问你是否要从两端测通?如果你选择两端测通,可能会得到两个序列(也可能测序公司已经帮你拼接成一条序列了),这时候你需要将这两条序列拼接起来。

我们使用ContigExpress这个程序来完成这一任务。

ContigExpress是Vector NTI© Software中的一个组件,可以在这里下载:http://www.thermofisher.com/cn/zh/home/life-science/cloning/vector-nti-software.html

contigexpress1

contigexpress2

contigexpress3

contigexpress4

contigexpress5

contigexpress6

未尽事宜,请查阅 Vector NTI Advance®11.5_Users_Manual 下列内容 How Do I : Assemble Chromatogram Data Using ContigExpress:

contigexpress7

A Genbank to BED converter

Like the previous Genbank -> GTF converter, this script is also depended on BioPerl, so you need firstly get the module installed in your system (Install BioPerl). To check whether BioPerl is ready, just refer to this post: bio-spring.info/check-bioperl/ .

SYNOPSIS


gb2bed.pl [options] filename

Options:
--help -h display this message
--input -i NCBI GenBank file
--out -o filename of bed output. [STDOUT]
--filter -f GenBank primary tag that will be filtered, may be set to 'gene', 'CDS', 'rRNA', etc.
--keep -k GenBank primary tag will be keep. Default is CDS. [CDS]

Examples:

perl gb2bed.pl -i seq.gb -o seq.bed
perl gb2bed.pl -i seq.gb -k CDS -o seq.bed
perl gb2bed.pl seq.gb > seq.bed

Read Pod for further information.


#!/usr/bin/perl

=pod

=head1 NAME

gb2bed.pl -- Genbank -> BED

=head1 SYNOPSIS

gb2bed.pl [options] filename

Options:
--help -h display this message
--input -i NCBI GenBank file
--out -o filename of bed output. [STDOUT]
--filter -f GenBank primary tag that will be filtered, may be set to 'gene', 'CDS', 'rRNA', etc.
--keep -k GenBank primary tag will be keep. Default is CDS. [CDS]

Examples:

perl gb2bed.pl -i seq.gb -o seq.bed
perl gb2bed.pl -i seq.gb -k CDS -o seq.bed
perl gb2bed.pl seq.gb > seq.bed

=head1 DESCRIPTION

Use this program to generate bioconductor friendly BED files from NCBI GenBank.
Six columns will be extracted:
1. chrom - name of the chromosome or scaffold. Any valid seq_region_name can be used, and chromosome names can be given with or without the 'chr' prefix.
2. chromStart - Start position of the feature in standard chromosomal coordinates (i.e. first base is 0).
3. chromEnd - End position of the feature in standard chromosomal coordinates
4. name - Label to be displayed under the feature, if turned on in "Configure this page". [locus_tag]
5. score - A score between 0 and 1000. [0]
6. strand - defined as + (forward) or - (reverse).

=head1 AUTHOR

Chun-Hui, Gao (gaoch@thelifesciencecentury.com)
Copyright (c) 2015 www.thelifesciencecentury.com.

=cut

use strict;
use Data::Dumper;
use Pod::Usage;
use Bio::SeqIO;
use Getopt::Long;

my ($help, $genbank_input, $output, @filters, @keep);

my $ok= GetOptions( 'help|h' => \$help,
'input|i=s' => \$genbank_input,
'filter|f=s' => \@filters,
'keep|k=s' => \@keep,
'out|o=s' => \$output );
pod2usage(2) if $help || !$ok;

$genbank_input = shift @ARGV unless ($genbank_input );

@keep = ('gene') unless $#keep >= 0;

open *IN, "< $genbank_input " or die pod2usage(2); my $out; if ($output) { open $out, "> $output" or die "Can't open file $output:$@\n";
}
else {
$out = *STDOUT;
}

my %filter;
map {$filter{$_}++} @filters;
my %keep;
map {$keep{$_}++} @keep;
my $in = Bio::SeqIO->new(-fh => \*IN, -format => "genbank");
while ( my $seq = $in->next_seq() ) {
my $chr = $seq->accession_number;

my @all_SeqFeatures = $seq->get_all_SeqFeatures;

#~ warn "# working on $chr\n";
if ($#filters >= 0){
@all_SeqFeatures = grep { !defined $filter{$_->primary_tag} } @all_SeqFeatures;
}
if ($#keep >= 0) {
@all_SeqFeatures = grep { defined $keep{$_->primary_tag} } @all_SeqFeatures;
}

# abort if there are no features
warn "$chr has no features, skipping\n" and next
if $#all_SeqFeatures < 0; for my $feature ( @all_SeqFeatures ) { my ($start, $end, $name, $strand); $start = $feature->start - 1; # bed base is 0, while genbank is 1
$end = $feature->end;
($name) = $feature->get_tag_values('locus_tag');
$strand = $feature->strand;
$strand = $strand < 0 ? '-' : '+'; print $out join("\t", $chr, $start, $end, $name, 0, $strand), "\n"; } }

A Genbank –> GTF converter derived from BioPerl

BioPerl has a Genbank –> GFF converter (the script is bp_genbank2gff3.pl), however, this converter generated GFF/GTF file could not work in Bioconductor when using the function maketxdbfromGFF to construct a transcript database.

Currently, I have developed such a converter that may extract information from a NCBI Genbank file and write them to GFF version 2, or GTF format. This script is derived from BioPerl, so you need firstly get the module installed in your system. To check whether BioPerl is ready, just refer to this post: bio-spring.info/check-bioperl/ .

Here is the synopsis of this script.

SYNOPSIS


NCBI_genbank2gtf.pl [options] filename

Options:
--help -h display this message
--input -i NCBI GenBank file
--out -o filename of GTF output

Examples:

perl NCBI_genbank2gtf.pl -i seq.gb -o seq.gtf
perl NCBI_genbank2gtf.pl seq.gb > seq.gtf

The codes is as follows:

#!/usr/bin/perl

=pod

=head1 NAME

NCBI_genbank2gtf.pl -- Genbank -> Bioconductor friendly GTF

=head1 SYNOPSIS

NCBI_genbank2gtf.pl [options] filename

Options:
--help -h display this message
--input -i NCBI GenBank file
--out -o filename of GTF output

Examples:

perl NCBI_genbank2gtf.pl -i seq.gb -o seq.gtf
perl NCBI_genbank2gtf.pl seq.gb > seq.gtf

=head1 DESCRIPTION

Use this program to generate bioconductor friendly GTF files from NCBI GenBank.
The file may be used in *maketxdbfromGFF* function and should work well.

=head1 AUTHOR

Chun-Hui, Gao (gaoch@thelifesciencecentury.com)

Copyright (c) 2015 www.thelifesciencecentury.com.

=cut

use strict;
use Data::Dumper;
use Pod::Usage;
use Bio::SeqIO;
use Getopt::Long;

my ($help, $genbank_input, $gtf_output);

my $ok= GetOptions( 'help|h' => \$help,
'input|i=s' => \$genbank_input,
'out|o=s' => \$gtf_output );
pod2usage(2) if $help || !$ok;

$genbank_input = shift @ARGV unless ($genbank_input );

open *IN, "< $genbank_input " or die pod2usage(2); my $out; if ($gtf_output) { open $out, "> $gtf_output" or die "Can't open file $gtf_output:$@\n";
}
else {
$out = *STDOUT;
}

my %seen;
my $in = Bio::SeqIO->new(-fh => \*IN, -format => "genbank");
while ( my $seq = $in->next_seq() ) {
my $seq_acc = $seq->accession_number;

# abort if there are no features
warn "$seq_acc has no features, skipping\n" and next
if !$seq->all_SeqFeatures;

# construct a GFF header
# add: get source_type from attributes of source feature? chromosome=X tag
# also combine 1st ft line here with source ft from $seq ..
my $header = gff_header($seq);
print $out $header;
warn "# working on $seq_acc\n";

for my $feature ( $seq->get_all_SeqFeatures ) {

my $method = $feature->primary_tag;
$seen{$method} ++;
$feature = maptags2gff($feature);
print $out $feature->gff_string(), "\n";
if ($method eq 'CDS'){
$feature->primary_tag("exon");
$feature->add_tag_value("exon_number", 1);
$feature->add_tag_value("exon_id", $feature->get_tag_values("transcript_id"));
$feature = maptags2gff($feature);
print $out $feature->gff_string(), "\n";
}

}

}

warn "# Count of features in $genbank_input:\n";
map {warn sprintf("# %20s:%6d\n", $_, $seen{$_});} keys %seen;

sub gff_header {
my ($seq) = @_;
my $head = sprintf("##gff-version 2\n# sequence-region: %s (1..%d)\n", $seq->accession_number, $seq->length);
$head .= sprintf( "# %s\n", $seq->desc);
$head .= "# converted by NCBI_genbank2gtf.pl\n";

#~ print Dumper $acc, $desc, $end;
return $head;

}

sub maptags2gff {
my $f = shift;
my $method = $f->primary_tag;
my %TAG_MAP;
if ($method eq 'gene' || $method eq 'misc_RNA' || $method eq 'rRNA' || $method eq 'tRNA'){
$f->add_tag_value('transcript_id',$f->get_tag_values('locus_tag'));
%TAG_MAP = ( gene => 'gene_name',
transcript_id => 'transcript_id',
locus_tag => 'gene_id' );
}
elsif ($method eq 'CDS') {
$f->add_tag_value('transcript_id',$f->get_tag_values('locus_tag'));
%TAG_MAP = ( locus_tag => 'gene_id',
protein_id => 'protein_id',
transcript_id => 'transcript_id',
gene => 'gene_name',
product => 'product');

}
elsif ($method eq 'exon') {

%TAG_MAP = ( gene_id => 'gene_id',
protein_id => 'protein_id',
transcript_id => 'transcript_id',
exon_number => 'exon_number',
exon_id => 'exon_id',
gene_name => 'gene_name');

}
elsif ($method eq 'source') {
%TAG_MAP = ( organism => 'organism' );

}
else {

}

my @all_tags = $f->get_all_tags();
#~ print Dumper \@all_tags;
for my $tag (@all_tags){

if (exists $TAG_MAP{$tag}){
my $newtag= $TAG_MAP{$tag};
my @v= $f->get_tag_values($tag);
$f->remove_tag($tag);
$f->add_tag_value($newtag,@v);
}
else {
$f->remove_tag($tag);
}
}

return $f;
}

If you would like to access the full document, run:
perldoc NCBI_genbank2gtf.pl

Feedback is welcome.

Here is also a GenBank -> BED converter: