This exercise will show that profile-based methods are more sensitive than similarity searches like those conducted with BLAST

We are interested in characterizing the following protein sequence from Mycobacterium tuberculosis: Y1333_MYCTU.

  • For FASTA format sequence retrieval click on "P64811 in FASTA format" at the end of the page.

  • Let's assume that we are trying to predict the function of Y1333_MYCTU.

  • Now search for homologous sequences using protein BLAST

  • The initial parameters could be:
    • Database = nr
    • Algorithm parameters/Max Target Sequences = 100
    Click here to view pre-computed results.

    Since there is no obvious way to infer the function of the protein, we can go on to use PROFILES:

  • The easiest approach is to use PSI-BLAST, because the same algorithm performs automatically an initial BLAST search, retrieves similar sequences, generates an alignment, constructs a profile, and performs a new search, this time using the profile. The newly identified sequences are retrieved and aligned, and the profile is updated with the new information. The process is iterated as many times as the user decides, or until no additional new sequences that match the profile are identified in the database.

  • Using PSI-BLAST at the NCBI:
    BLAST is at http://www.ncbi.nlm.nih.gov/BLAST/. Choose as Algorithm "PSI-BLAST (Position-Specific Iterated BLAST)".

  • Paste the sequence in "Search".
  • Algorithm parameters/Max Target Sequences = 250
  • Algorithm parameters/PSI BLAST threshold = 1e-05 (=0.00001)
  • Hit "BLAST!"


  • After a while click on "format" to display the results of the first round search.

  • Choose the proteins you want to use to construct the the profile and select "Run PSI-BLAST iteration 2".

  • After the second iteration select "Run PSI-BLAST iteration 3".

  • Here you have the pre-computed results of the FIRST ROUND, of the SECOND ROUND and the  THIRD ROUND
    .
  • There are many possible strategies, but the most obvious one is to recover the protein sequences identified as similar with BLAST and generate a multiple sequence alignment (MSA) with ClustalW.

  • Then retrieve the sequences of all proteins identified in the PSI-BLAST search. First click on select all button and than on get selected sequences button. You will redirect to an Entrez NCBI page.

  • Select FASTA in the 'Display' pull down menu, and than click the send button with "all to file" in the next pull down menu to save your sequences on your PC.

    Click here for FASTA formatted sequences retrieval.

  • Then, align them with ClustalW, Plese select MSF output format

    Click here for pre-computed ClustalW output file retrievial (MSF format).

    Then we would generate a Profile or an HMM profile, and we would use the profile to search again in protein sequence databases.

  • For this pourpose it is used the HmmerBuild program from the HMMER package
    .
  • Select the File radio button, upload your ClustalW alignment (the file on your Desktop named clustalw-msf.aln), insert your mail address (mandatory) and then click 'Run' button.
  • In the Job results window 'clustalw-msf.aln.hmm' you can view how HMM profile looks like.

    Click here for pre-computed HMM profile retrievial.

  • Once the HMM profile has been built, select hmmsearch in the pull down menu close to the bookmark button and then click further analysis button.
    Ensure that the Result radio button is checked, then choose the uniprot_sprot database and then click 'Run' button.
  • click here to view pre-computed results.
    • Now we will try to identify whether there is an HMM profile for this family of proteins in Pfam.
    • Upload the sequence file
    • Set the E-value=10
    • Hit "Search Pfam"


    • The pre-computed results of the Pfam search are here.





      Return to CAB Home