This exercise will show that profile-based methods are more sensitive
than similarity searches like those conducted with BLAST
We are interested in characterizing the following protein sequence from
For FASTA format sequence retrieval click on "P64811 in FASTA format" at the end of the page.
Let's assume that we are trying to predict the function of Y1333_MYCTU.
Now search for homologous sequences using protein BLAST
The initial parameters could be:
- Database = nr
- Algorithm parameters/Max Target Sequences = 100
to view pre-computed results.
Since there is no obvious way to infer the
function of the protein, we can go on to use PROFILES:
The easiest approach is to use PSI-BLAST, because the same algorithm
performs automatically an initial BLAST search, retrieves similar sequences,
generates an alignment, constructs a profile, and performs a new search,
this time using the profile. The newly identified sequences are retrieved
and aligned, and the profile is updated with the new information. The process
is iterated as many times as the user decides, or until no additional new
sequences that match the profile are identified in the database.
at the NCBI:
BLAST is at http://www.ncbi.nlm.nih.gov/BLAST/
Choose as Algorithm
"PSI-BLAST (Position-Specific Iterated BLAST)".
Paste the sequence in "Search".
Algorithm parameters/Max Target Sequences = 250
Algorithm parameters/PSI BLAST threshold = 1e-05 (=0.00001)
After a while click on "format" to display the results of the first round search.
Choose the proteins you want to use to construct the the profile and select "Run PSI-BLAST iteration 2".
After the second iteration select "Run PSI-BLAST iteration 3".
Here you have the pre-computed results of the FIRST
, of the SECOND ROUND
the THIRD ROUND
There are many possible strategies, but the most obvious one is to recover
the protein sequences identified as similar with BLAST and generate a multiple sequence alignment (MSA) with ClustalW.
Then retrieve the sequences of all proteins identified in the PSI-BLAST search. First click on select all button and than on get selected sequences button. You will redirect to an Entrez NCBI page.
Select FASTA in the 'Display' pull down menu, and than click the send button with "all to file" in the next pull down menu to save your sequences on your PC.
Click here for FASTA formatted sequences retrieval.
Then, align them with ClustalW, Plese select MSF output format
Click here for pre-computed ClustalW output file retrievial (MSF format).
Then we would generate a Profile or an HMM profile, and we would use the profile to search again in protein sequence databases.
For this pourpose it is used the HmmerBuild program from the HMMER package.
Select the File radio button, upload your ClustalW alignment (the file on your Desktop named clustalw-msf.aln), insert your mail address (mandatory) and then click 'Run' button.
In the Job results window 'clustalw-msf.aln.hmm' you can view how HMM profile looks like.
Click here for pre-computed HMM profile retrievial.
Once the HMM profile has been built, select hmmsearch in the pull down menu close to the bookmark button and then click further analysis button.
Ensure that the Result radio button is checked, then choose the uniprot_sprot database and then click 'Run' button.
click here to view pre-computed results.
- Now we will try to identify whether there is an HMM profile for this
family of proteins in Pfam.
- Upload the sequence file
- Set the E-value=10
- Hit "Search Pfam"
The pre-computed results of the Pfam search are here.
Return to CAB Home