Use of Tools and Softwares in Bioinformatics
name’ Muhammad Akmal
name’ Dr. Tahir Pervaiz
This article is based on
brief introduction and use of different softwares used in bioinformatics. For
different search projects and to know the causative organisms in animals we
need DNA or other biological molecules sequences. By knowing the exact sequence
we can do cure diseases much easily. Here we will discuss how one can use bioinformatics
tools and software precisely.
Bioinformatics is a branch of biology formed by the combination of many
sienceslike biology, mathematics, stat, computer science etc Howeweg was first
who coined the word bioinformatics first time in 1970. About 18 years ago some
online data storage servers launched after which this branch grow exponentially
fast. Now, one can have much much huge data from online servers. . in
bioinformatics we use different computational tools to know the sequence of genes
or proteins. Computational tools are now being used to characterize the genes,
or structure and function of proteins. Although these tools not generate
information as reliable as experiments, but it is much conviient and easy
method to get information than expaerimentaly. The gap between proteins with known sequences and that proteins
with experimentally characterized structure and function keeps increasing. One
way to decrease this gap is to develop advanced computational tools for
modeling structure and function from sequences, where progress has been
recently witnessed in community wide blind experiments1,2.
of text in each section starts from a simplistic overview of each area followed
by key reports from literature and a tabulated summary of related tools, where
necessary, towards the end of each section.
I Tasser mean iterative Threading ASSEmbly
Refinement. I Tasser is a online bioinformatic tool used to structure and
functions of the proteins. First this tool identify the structural templates
from PDBby multiple threading approach. Then functions are drived by threading
the 3D models by protein function database.3,4
The I-TASSER tool
consists of four steps:
threading template identification,
iterative structure assembly simulation,
model selection and refinement,
structure-based function annotation
In the first step, the query is threaded by
LOMETS which is structure library to identify structural templates. it is based
on sequence profile-to-profile alignments, but with various structural features
combined. Such variations are important to generate complementary alignments,
which increase the coverage of template detections
After the query and template alignment we divide the sequence into
threading aligned and threading unaligned. Then regions with resemblance are
excised from the templates. Now structure of unaligned regions is made by
Then the structural foldings and reassemblies are conducted by
replica-exchange Monte Carlo simulations in the light of an optimized
knowledge-based force field, consisting of three major components:
generic statistical potentials,
hydrogen-bonding networks and
threading-based restraints from LOMETS .
For functional annotation of protiens, the
structure models with the highest confidence scores are matched in the
BioLiP database of ligand-protein interactions to detect homologous
function templates 4.
ExPASy short of Expert Protien Analysis System. It is proteomic
server which analyze protein structures and sequences and also 2D gel
electrophoresis. 5. It is that Bioinformatics Resource
Portal that provides access to databases and software system in
many disciplines of life sciences together with genetics, genomics,
phylogeny, systems biology, population genetic science, transcriptomics
world-wide name together with the most popular bioinformatics
resources for genetics. ExPASy has currently evolved, turning
into integrative portal accessing several scientific resources,
databases and software system tools in numerous fields of
life sciences. Scientists can access seamlessly a large variety of
distinguished resources in many domains, resembling genetics,
genomics, phylogeny, systems biology, evolution, population genetic
science, transcriptomics, etc. expasy is the first website of life science
The part of reading frame which is capable to be
translate is called open reading fram or ORF. ORF is a continous stretch of
codons with a start codon and stop codons. 6.
ORF finder a graphical
analysis tool that finds all open reading frames of a selectable
minimum size in an exceedingly user’s sequence or in an
exceedingly sequence already within the information. We gift the paradigm of
a computer code, referred as GeneQuiz, for large-scale
biological sequence analysis. The system was designed to fulfill the
desires that arise in process of sequence analysis.
The paradigm system consists of 2 parts:
the information update and
the image and browsing system.
The principal style demand for the
primary half was the entire automation of all repetitive
actions: information updates, economical sequence similarity
searches and sampling of ends up in a consistent fashion. The
user is then bestowed with “hit-lists” that summarize the
results from haterogeneous information searches.
Sometimes a protein synthesis
may stop before a stop codon, in such cases an incomplete protein is the
Long ORFs are used to
identify the functional RNA from a particular DNA sequence. 8
Ensembl is a
database a joint project of EBI and Welcome Trust Sanger Institute. It was
launched in 1999 in response to completion of human genome project. 9. The aim
of esambl to supply a centralized resource for geneticists, molecular
biologists and different researchers to know the genomes of
our own species and different vertebrates and model organisms.
Ensembl is one in every of many standard ordering browsers
for the retrieval of genomic data. Ensembl Genomes, has extended the scope
of Ensembl into invertebrate metazoans, plants, fungi, bacteria, and protists.
It is a computational tool used for homology
modeling to identify the protein structures 10, 11. The program cojointly incorporates restricted functions
for at first structure prediction of loop regions in proteins, which
are typically and extremely variable even among homologous
proteins and therefore so tough to predict the similarity modeling.
It was written and maintained by Andrej Sali
University of California San Franscisco.12. It run on UNIX, LINUX and Windows
operating system. The ModWeb comparative supermolecule structure
modeling webserver depends upon modeler for
automatic supermolecule structure modeling, with Associate in
Nursing choice to deposit the ensuing models into ModBase.
It is a web portal used to search many many health
science database at NCBI website. 13. The name “Entrez”
(a salutation which means “Come in!” in French) was
chosen to mirror the spirit of hospitable the general
public to look the content accessible from the NLM. Entrez is an integrated
search and retrieval system that gives access to all or
any other databases at the same
time with one question string and program. Entrez efficiently retrieve connected sequences,
structures, and references. It offers views
of factor and macromolecule sequences
and body maps.
A program used to identify motifs in aligned sets of
protein sequences. It provides multiple motifs for same set of aligned protein sequence.
It is a systematic
method to determine regular expression patterns, or discrete sequence motifs,
from aligned sets of protein sequences. Unlike other methods emotif allows methodical
enumeration of multiple motifs for the same alignment. These multiple sequence
alignment subsets usually represent subfamilies within the super family
represented by the full alignment. Discrete sequence motifs constructed from
these subsets can, therefore, classify novel sequences with higher specificity
than the position-specific scoring matrices derived from entire multiple
sequence alignments 14. Most motif generated algorithms target finding one ‘best’
motif for a given alignment of the homologous domain sequences
from various species. These strategies sometimes conceive
maximum sensitivity at the expense of specificity. Sequence analysis tools use
these specifically compromised motifs area unit at risk of classifying new discovered
proteins into the incorrect families. For genomic-scale machine-driven sequence
classification to be effective, the expected variety of misclassifications generated by
the distinct motifs should decreased .
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T,
Tramontano A. Proteins. 2014;82(suppl. 2):1–6.
2. Haas J, et al. Database (Oxford) 2013;2013 bat031.
3. Yang J, Roy A, Zhang Y. Bioinformatics. 2013;29:2588–2595.
4. Yang J, Roy A, Zhang Y. Nucleic Acids Res. 2013;41:D1096–
5. . asteiger, E.; Gattiker, A; Hoogland, C; Ivanyi, I; Appel,
RD; Bairoch, A (2003)
6. U.S. National Library of Medicine. 2015-10-19. Retrieved 2015-10-22.
7. Slonczewski, Joan;
John Watkins Foster (2009). Microbiology: An Evolving Science. New York: W.W. Norton & Co
8. Michael Waterman (2005). Computational Genome Analysis: an introduction.
9. Flicek P, Amode MR, Barrell D, et al. (November 2010)
10.Fiser A, Sali A (2003). “Modeller: generation and
refinement of homology-based protein structure models”. Meth. Enzymol. 374: 461–91
11.Martí-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Sali A
(2000). “Comparative protein structure modeling of genes and
genomes”. Annu Rev Biophys Biomol Struct. 29: 291–325.
12.Sali A, Blundell TL
(December 1993). “Comparative protein modelling by satisfaction of spatial
restraints”. J. Mol. Biol. 234 (3): 779–815.
13.Nucleic Acids Research. 41 (Database issue): D8–D20
14.Henikoff J.G., Greene,E.A., Pietrokovski,S. and Henikoff,S.
(2000) Increased coverage of protein families with the Blocks Database servers. Nucleic Acids Res., 28, 228–230.