pATsi: paralogs and singleton genes from Arabidopsis thaliana

pATsi (paralogs and singleton genes in Arabidopsis thaliana) is a database of paralogs and singleton genes defined considering the Arabidopsis thaliana protein-coding gene collection.
All genes from the official TAIR annotation are accessible through the query. All non mRNA genes are not considered in the paralog search analysis, therefore they are defined as "non-protein coding genes". All the mRNA genes are classified as paralogs (i.e. duplicated genes), unassigned genes and singletons.

Paralog genes were organized into "Network of paralogs" using BLASTp based analyses at two different E-Value cut-offs (E≤10-10, E≤10-5).
Gene and network information are accessible at both the cut-offs through this browser.

Details on the gene collection and classification are summarized as follows:

Non-protein Coding Genes6070 genes among miscRNAs, tRNAs, rRNAs, ncRNAs, pseudogenes, transposons and unknown genes
All-against-all BLASTp E≤10-522522 paralogs classified into networks
Filtering with Rost's formula405 unassigned genes due to the Rost's formula
All-against-all BLASTp E≤10-5 without masking filter213 unassigned genes due to the masking filter
All-against-all BLASTp E≤10-3 of protein-coding genes440 unassigned genes due to loose protein similarity
Transcripts BLASTx E≤10-5 versus proteins for ORF validation2 unassigned genes due to the ORF annotation error
Full genes BLASTn E≤10-5 versus non-protein coding genes178 unassigned genes due to similarities with non protein-coding genes
Full genes BLASTn E≤10-5 versus intergenic regions0 unassigned genes due to similarities with intergenic regions
Transcripts BLASTn (free E-value cutoff) versus ESTs24 singletons not confirmed by ESTs (no EST trace)
Filtering of BLASTn results by E≤10-5688 singletons not confirmed by ESTs (discarded by e-value cutoff)
Filtering of BLASTn versus EST results by coverage and identity201 singletons not confirmed by ESTs (discarded by coverage and identity requirements)
Filtering by Delta >= 20 (EST length >= 20 nt than the transcript)100 singletons not confirmed by ESTs
0 < Delta < 20 (EST length greater than transcript but less than 20 nt) 9 singletons confirmed by ESTs
Delta <=0 (Transcript longer than the EST)2387 singleton confirmed by EST

Further details on the analyses performed can be found in:
Exploiting a reference genome in terms of duplications: network of paralogs and single copy genes in Arabidopsis thaliana.


Hamed Bostan organized the database and implemented the query interfaces and the web based resource.
Luca Ambrosino implemented the network visualization in the web pages.
Pasquale Di Salle contributed to the web page implementation.
Mara Sangiovanni and Alessandra Vigilante developed the pipeline and extracted the results collected in the database.
Maria Luisa Chiusano conceived the analysis and the system and directed all the work.

Contact us:


This work was supported by the Agronanotech Project (MIPAF, Italy), the GenoPom (-PRO, -HORT) Projects and the EU-SOL project (European Union) [Contract no. PL 016214-2].


Disclaimer: whilst every effort has been taken to ensure the accuracy of the information and the reliability of the analyses available from this site, neither the University 'Federico II' nor any of CAB group employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, or represents that its use would not infringe privately owned rights.