Code Search for Developers
 
 
  

index.php from eXtensible Genome Data Broker at Krugle


Show index.php syntax highlighted

<?php
if(empty($SITEDEF_H)){require('SITEDEF.php');}
if(empty($PARAM_H)){require('getPARAM.php');}
require('SSI_GDBprep.php');
virtual("${CGIPATH}SSI_GDBgui.pl/TWO_COLUMN_HEADER/" . $SSI_QUERYSTRING);
?>


<STYLE TYPE="text/css">

p{ font:normal 12pt Verdana,Arial,sans-serif;}
hr{ clear:both; width:95%; }

td { font:normal 10pt Verdana,Arial,sans-serif;}

h2 { font:bold 14pt Verdana,Arial,sans-serif;
 clear:both; padding-top:15px; 
}

h3{ font:bold 12pt Verdana,Arial,sans-serif; clear:both; padding-left:5px;}
h3#tim{ clear:right; }

h4 { font:bold 10pt Verdana,Arial,sans-serif; clear:both; padding-left:10px;}

a { color: blue; }
a.btt {float:right; clear:both;}

img.leader{ float:left; }
img.leader_pic{ float:left; margin-right:5px; border:2px solid blue;}
img#trans_img{ width:250px; }
p.student {clear:left; padding-top:5px;}

div#background_info{ float:left; clear:left; width:48%;}
div#definitions{ float:right; width:48%;}
div#dna_rep{ float:left; width:48%; }
div#mrna_trans{ float:right; width:48%; }

</STYLE>

<DIV ID="mainWLS">

<H1 style="text-align:center;">Spring 2004: Discovering Gene Structure</H1>

<!--
<table border="1" !cellpadding="0" !cellspacing="0"  bordercolor="#111111" !width="94%" id="AutoNumber1" !height="241" bordercolorlight="#FFFFFF" bordercolordark="#FFFFFF" bgcolor="#FFFFFF">
<tr>
<td width="20%" height="241"><p><a href="#def">Definitions</a><p>
<a href="#dna" >DNA Replication</a>
<p><a href="#pro" >Protein Synthesis</a>
<p><a href="#back" >Background of Bioinformatics</a>
<p><a href="#fin" >Finding the Problem</a>
<p><a href="#anno" >Annotation Process</a>
<p><a href="#ex" >Examples</a>
<p><a href="#kids" >Who Are We?</a><p>&nbsp;</td>
<td width="180%" height="241" bordercolor="#FFFFFF" bordercolorlight="#FFFFFF" bordercolordark="#FFFFFF" nowrap>
-->

<DIV id='abstract'>
<a name="a_abstract"></a>
<p id='abstract'><img class='leader' src='./watson.gif'>Hello and welcome to the world of bioinformatics! We are Xin Pan and Anna
Kurkalova and we did an internship at ISU in Dr. Brendel's lab in the spring of 2004. Our 
internship included analyzing and annotating gene sequences as well learning the biological 
processes of DNA transcription and working our way through research papers. Bioinformatics 
has a wide range applications but our internship was mainly focused on analyzing of and 
correcting of&nbsp;gene sequences. Together we analyzed about 300 annotations. This internship
provided us with a great opportunity&nbsp;to experience this exciting new field of genetics.
We would like to thank Dr. Volker Brendel, Adah Ackerman and ISU for making this internship 
possible. We also like to thank Shannon Schlueter and Matthew Wilkerson for all of their help 
with the annotation processes.
</p>
</DIV>
<HR>

<DIV id='background_info'>
<a name="a_background"></a>
<H2>Background of Bioinformatics</H2>
<P id='background'>
<ul id='background_topics'>
  <li class='topic'>Bioinformatics defined
  <ul class='comment_list'>
    <li class='comment'>Modern bioinformatics is broadly comprised of three main disciplines
    <ul>
      <li>biological science</li>
      <li>computer science</li>
      <li>applied statistics</li>
    </ul>
    <li class='comment'>Bioinformatics itself is defined as the use of computers to analyze
biological information. The most common form of bioinformatics is studying the vast amounts 
of DNA, RNA, and protein sequence that are now available. There are many other possible 
applications of computers in biology, such as simulating populations, analyzing experimental
gels and storing information about the phenotypes of mutant organisms.</li>
  </ul></li>

  <li class='topic'>General objectives
  <ul class='comment_list'>
    <li class='comment'>To be able to explain normal biological processes through 
understanding of how gene sequences code specific proteins</li>
    <li class='comment'>To further drug discoveries by analyzing the cause of malfunctions 
leading to a diseases condition</li>
  </ul></li>

  <li class='topic'>General Principles
  <ul class='comment_list'>
    <li class='comment'>Molecular biology provides the information to be analyzed</li>
    <li class='comment'>Computer science supplies the tools and networks for managing,
analyzing, and storing this information</li>
    <li class='comment'>Applied statistics enables us to compare and evaluate the information
and the results of analysis in which it is used.</li>
  </ul></li>

  <li class='topic'>History
  <ul class='hist_list'>
    <li class='chrono_event'>(1865) Gregor Mendel "The Father of Genetics" begins his study of 
genetic inheritance which goes on to spur countless others and launches a new field of science</li>
    <li class='chrono_event'>(1868) Friedrich Miescher discovers "nuclein" in the cell nucleus, 
acidic, rich in PO4, lacks S (characteristic of protein). Now known as nucleic acid</li>
    <li class='chrono_event'>(1953) James Dewey Watson and Francis Harry Compton Crick propose the
double helix model for DNA based on x-ray diffraction data.</li>
    <li class='chrono_event'>(1953) Frederick Sanger, E. O. P. Thompson and Hans Tuppy completed the
determination of the amino acid sequence of the A and B chains of insulin</li>
    <li class='chrono_event'>(1958) Francis Harry Compton Crick announces that information flows from
DNA to RNA to protein "The Central Dogma of Genetics".</li>
    <li class='chrono_event'>(1961) Sidney Brenner, François Jacob, Matthew Meselson, identify
messenger RNA.</li>
    <li class='chrono_event'>(1990) The Human Genome Project is underway</li>
  </ul></li>

  <li class='topic'>Computers Languages
  <ul class='comment_list'>
    <li class='comment'>Computer languages supply the tools for organizing the vast amounts of data 
collected from / by researchers.
    </H3>Commonly used programming, markup, and scripting languages</H3>
    <ul class='comment_list'>
      <li class='comment'>HTML</li>
      <li class='comment'>XML</li>
      <li class='comment'>C/C++</li>
      <li class='comment'>PERL</li>
      <li class='comment'>Java</li>
      <li class='comment'>PHP</li>
    </ul></li>
  </ul></li>

  <li class='topic'>Databases
  <ul class='comment_list'>
    <li class='comment'>The first bioinformatic/biological databases were constructed a few years after 
the first protein sequences began to become available. A huge variety of divergent data resources of different
types and sizes are now available either in the public domain or more recently from commercial third parties. 
All of the original databases were organized in a very simple way with data entries being stored in flat files, 
either one per entry, or as a single large text file. </li>
  </ul></li>

  <li class='topic'>Tools
  <ul class='comment_list'>
    <li class='comment'>Concurrent to the development of databases tools became available for searching sequence 
databases and matching and alignment sequences.</li>
  </ul></li>
</ul>
</P>
</DIV>

<DIV id='definitions'>
<a name="a_definitions"></a>
<H2>Definitions</H2>
<P id='definitions'>
<ul id='def_list'>
<li class='def'>
  <span class='term'>Gene:</span>
  <span class='description'>Segment of DNA that controls the expression of a protein</span>
  <ul class='comment_list'>
    <li>We don't know how many genes there are</li>
    <li>Characteristics are usually created by many genes, not just one</li>
    <li>Genes interact with each other</li>
  </ul>
</li>
<li class='def'>
  <span class='term'>Genome:</span>
  <span class='description'>All the genes of a particular species</span>
</li>
<li class='def'>
  <span class='term'>Eugenics:</span>
  <span class='description'>An event that has tried to control human evolution by breeding</span>
</li>
<li class='def'>
  <span class='term'>Exons:</span>
  <span class='description'>Coding segments of nucleic acidfound in mRNA</span>
</li>
<li class='def'>
  <span class='term'>Introns:</span>
  <span class='description'>Segments of non-coding nucleic acid found in mRNA</span>
</li>
<li class='def'>
  <span class='term'>DNA:</span>
  <span class='description'>Deoxyribonucleic Acid is a nucleic acid that carries
the genetic information in the cell and is capable of self-replication and
synthesis of RNA. DNA consists of two long chains of nucleotides twisted into a
double helix and joined by hydrogen bonds between the complementary bases
adenine and thymine or cytosine and guanine. The sequence of nucleotides
determines individual hereditary characteristics.</span>
</li>
<li class='def'>
  <span class='term'>Codon:</span>
  <span class='description'>Three consecutive bases codes for an amino acid, there are 64 combinations but 
only 20 different amino acids, meaning that there is more than one combination 
for every amino acid.</span>
</li>
<li class='def'>
  <span class='term'>Stop Codon:</span>
  <span class='description'>Three base pairs that stop the chain of amino acids</span>
</li>
<li class='def'>
  <span class='term'>RNA:</span>
  <span class='description'>Ribonucleic Acid is polymeric constituent of all 
living cells and many viruses, consisting of a long, usually single-stranded 
chain of alternating phosphate and ribose units with the bases adenine, guanine, 
cytosine, and uracil bonded to the ribose. The structure and base sequence of 
RNA are determinants of protein synthesis and the transmission of genetic 
information.</span>
  <H3>There are three forms of the RNA:</H3>
  <ul class='def_comment_list'>
  <li class='def'>
    <span class='term'>tRNA:</span>
    <span class='description'>Clover shaped molecules that bring in one kind of amino acid to the codons, tRNA
are made out of anti-codons, which match up with its compliment codon on the 
mRNA</span>
  </li>
  <li class='def'>
    <span class='term'>Messenger RNA (mRNA):</span>
    <span class='description'>RNA that is synthesized in the nucleus and processed in the 
endoplasmic reticulum.&nbsp; mRNA is the single-stranded complement of DNA.&nbsp; The only
difference is that mRNA has the base uracil instead of thymine</span>
  </li>
  <li class='def'>
    <span class='term'>Ribosomal RNA (rRNA):</span>
    <span class='description'>RNA that is a permanent structural part of a ribosome.</span>
  </li>
  </ul>
</li>
<li class='def'>
  <span class='term'>Ribosome:</span>
  <span class='description'>An organelle which consists of RNA and proteins and is found on the outside of 
the rough endoplasmic reticulum</span>
</li>
<li class='def'>
  <span class='term'>Polypeptide:</span>
  <span class='description'>A small protein that containing many molecules of
amino acids, typically between 10 and 100.</span>
</li>
<li class='def'>
  <span class='term'>DNA polymerase:</span>
  <span class='description'>Any of various enzymes that function in the replication and repair
of DNA using single-stranded DNA as a template</span>
</li>
<li class='def'>
  <span class='term'>RNA polymerase:</span>
  <span class='description'>A polymerase that catalyzes the synthesis 
of a complementary strand of RNA from a DNA template, or, in some viruses, from
an RNA template.</span>
</li>
<li class='def'>
  <span class='term'>cDNA:</span>
  <span class='description'>Called complementary DNA, cDNAs are synthesized by RNA polymerase in a process
similar to DNA replication.</span>
</li>
<li class='def'>
  <span class='term'>ORF:</span>
  <span class='description'>Open Reading Frames. Reading frames where successive
nucleotide triplets can be read as codons specifying amino acids and where the
sequence of these triplets is not interrupted by stop codons.</span>
</li>
<li class='def'>
  <span class='term'>BLAST (Basic Local Alignment Search Tool):</span>
  <span class='description'>A set of similarity search programs which use heuristic
algorithm to seek out local alignments and is designed to explore all of the
available sequence databases regardless of whether the query is protein or DNA.</span>
</li>
<li class='def'>
  <span class='term'>GenBank:</span>
  <span class='description'>A database containing all known sequences of DNA
strands, categorized by alphanumeric code.</span>
</li>
<li class='def'>
  <span class='term'>GeneSeqer:</span>
  <span class='description'>a method to identify potential exon/intron
structure in pre-mRNA by splice site prediction and spliced alignment.</span>
</li>
<li class='def'>
  <span class='term'>UCA:</span>
  <span class='description'>user contributed annotation</span>
</li>
<li class='def'>
  <span class='term'>Alternative splicing:</span>
  <span class='description'>The cutting and pasting of the primary mRNA transcript 
into various combinations of mature mRNA.</span>
</li>
<li class='def'>
  <span class='term'>GAEVAL:</span>
  <span class='description'>The Genome Annotation EVALuation project was created 
to assign qualityscores to gene structure predictions and to note exceptional cases of
incongruence.</span>
</li>
</ul>
</P>
</DIV>
<a class='btt' href="#top">Back to Top</a>

<DIV id='dna_rep'>
<a name="a_dna_rep"></a>
<H2>DNA Replication</H2>
<P id='dna_rep'>
<img class='leader' src='./replication.jpg'>
<ol id='dna_rep_steps'>
<li class='step'>DNA uncoils and "unzips"</li>
<li class='step'>DNA polymerase then reads the "unzipped" strands of DNA and produces
a reverse complement which is attached to the single strand of original DNA. The 
reverse complements are shown in green.</li>
</ol>
</P>
</DIV>

<DIV id='mrna_trans'>
<a name="a_mrna_trans"></a>
<H2>mRNA Transcription</H2>
<P id='mrna_trans'>
<img id='trans_img' class='leader' src='./transcription.gif'>
<ol id='mrna_trans_steps'>
  <li class='step'>Transcription occurs in nucleus
  <H3>There are 3 stages of transcription</H3>
  <ul id='trans_stages'>
    <li class='step'>Initiation</li>
    <li class='step'>Elongation</li>
    <li class='step'>Termination</li>
  </ul>
  </li>
  <li class='step'>RNA Processing (maturation) "edits" the pre-mRNA by splicing
out the intronic sequence from the pre-mRNA transcript. Once the mRNA is fully matured
it leaves the nucleus becoming venerable to thousands of enzymes, a methoguanine (MG) 
Cap is added to the front, and a Poly-A tail is added to the 3' terminus to prevent
premature degradation.</li>
</ol>
<H3 id='tim'>DEMO: Transcription in motion</H3>
<img id='trans' src='./transcription_mov.gif'>
</P>
</DIV>

<DIV id='prot_syn'>
<a name="a_protein_syn"></a>
<H2>Protein Synthesis (aka. mRNA Translation)</H2>
<P id='prot_syn'>
<img class='leader' src='./expression.gif'>
<ul id='protein_syn'>
  <li class='step'>Translation occurs via the ribosome where mRNA is "read" and 
polypeptides are formed</li>
  <li class='step'>The ribosome travels 5' to 3' on the single stranded mRNA 
helping to generate the protein polypeptide from amino(N)-terminus to 
carboxy(C)-terminus.</li>
  <li class='step'>Translation occurs in such a way that multiple ribosomes
can read the same strand of mRNA at once thus generating multiple copies of the
encoded polypeptide in a short time.</li>
  <li class='step'>Polypeptides produced in the ribosome are usually routed to 
the Golgi Apparatus, which is the "post office" of the cell seeing to their proper
delivery.</li>
</ul>
</P>
</DIV>


<a class='btt' href="#top">Back to Top</a>


<a name='internship'></a>
<H2>The Project</H2>
<H3>Defining the problem</H3>
<p>We'll start with the easiest and most frequently observed case, which is
when there is one or more full length cDNAs. Full length cDNAs are those that
are experimentally derived such that they should capture the entire span of 
their mRNA precurser. Therefore these sequences should be as long as or longer 
than their predicted gene model annotation. We are interested in any differences
between the alignment of these sequences and the predicted gene model.

<ul id='example_problems'>
  <li class='problem_case'>If just a few exons on the original annotation do not agree 
with those on the cDNA, the cDNA is almost always correct.</li>
  <li class='problem_case'>If there is a cDNA that spans the length of two or more 
original annotations, then it most likely means that the two or more annotations 
need to be joined. This can be verified by prediction of an ORF.</li>
  <li class='problem_case'>As well, there may be one or more short cDNAs 
that do not span the length of the original annotation; this means that there is 
a chance that the original annotation needs to be split. However, be sure to
check for ORFs once more. If the ORF for the original annotation is longer than 
your corrected annotation, this may be an exceptional case which needs further attention.</li>
</ul>
</p>

<a name="a_annotation_process"></a>
<H3>Processing our Annotations</H3>
<H4>Once a problem is found:</H4>
<p id='annotation_process'>
<ol id='annotation_steps'>
  <li class='annot_step'>Make sure you are using the most recent genome assembly version. This
version information can be found in a pull-down menu at the top of every AtGDB page.</li>
  <li class='annot_step'>Now click on Provide Expert Annotation.</li>
  <li class='annot_step'>If you are not yet a registered user, click on Register HERE! Otherwise, 
log in and continue.</li>
  <li class='annot_step'>Type in a LOCUS ID (generally these begin with UCA- followed by the 
sister gene model id.[eg. UCA-At2g23500]).</li>
  <li class='annot_step'>Now click on the exons (the thick blue lines) that you believe to be
part of an accurate gene structure description to add them individually to your gene structure.
Click on the mRNA gi number in order to add the whole series of exons predicted by its alignment 
to the your UCA structure.</li>
  <li class='annot_step'>You can now verify your UCA structure by checking for an
open reading frame. Do this by clicking on the ORF Finder button. If one open reading 
frame is noticeably longer, select it. If there is no obvious case, use BLAST 
(see tutorial on www.plantbdb.org/AtGDB) to determine the proper ORF.</li>
  <li class='annot_step'>Write a brief description commenting on your reasons for submitting this
altered annotation. Here is a template: <i>This annotation corrects the current annotation
for gene model ***. This modifies *** by doing ***. It appears that this error was caused
by ***. These changes are supported by ***.</i></li>
  <li class='annot_step'>Once this is done, you are finished. You should now click SUBMIT.
Once an AtGDB curator has seen and accepted your UCA you will be notified of its public 
availablity.</li>
</ol>
</p>

<a name="a_examples"></a>
<H3>Example User Contributed Annotations</H3>
<p id='examples'>Shown below are examples of annotations we've corrected as part of our internship 
experience.
<H4>Incorrect / Unannotated Exon:</H4>
<img class='example_img' src="Exampl3.gif">
<ul>
  <li>If you look at exon 10 on the cDNA (light blue), you will see that the gene model annotation
(dark blue) shows this region to be intronic. When counting exons you start from the end with the 
green flag and count to the end with the red flag. In this case, just submit the coordinates for 
the cDNA.</li>
</ul>

<H4>Gene Model Needs to Be Split:</H4>
<img class='example_img' src="Exampl4.gif">
<ul>
  <li>By looking at the cDNAs (light blue) and ESTs (red), you can see that there is a definite 
gap. In cases like this where the gap is very clear cut, it means that most likely the mRNA should 
be split. However, still make sure to check the open reading frame for verification.</li>
</ul>

<H4>Gene Models Need to Be Combined:</H4>
<img class='example_img' src="Exampl5.gif">
<ul>
  <li>If there is a cDNA that spans the length of two or more original annotations, then it most 
likely means that the two or more annotations need to be joined. Verify with ORF. The green strip 
shown in the example is a user contributed annotation.</li>
</ul>

<H4>Ambiguous Boundary:</H4>
<img class='example_img' src="Exampl6.gif">
<ul>
  <li>Annotations like this represent possible errors in gene structure
  determination caused by the automated gene structure annotation routines used 
  for <i>Arabidopsis</i> genome annotation. Specifically, this situation occurs
  when an EST or cDNA is aligned such that it may belong to either of two 
  overlapping annotations.</li>
</ul>

<H4>Alternative Splicing:</H4>
<img class='example_img' src="Exampl7.gif">
<ul>
  <li>As you can see from the cDNAs, there are two possible
  annotations for this mRNA. One possible annotation has 12 exons while
  the other has only 11 (exons 2 and 3 are combined). This is an example of
  alternative splicing, and both annotations should be submitted</li>
</ul>
</p>
<a class='btt' href="#top">Back to Top</a>

<a name="a_interns"></a>
<H2>Interns: spring 2004</H2>

<p class='student'><img class='leader_pic' src="./Anna_pic.jpg" alt='Annas photo'>
<b>Anna Kurkalova</b> is a currently a junior at Ames High school. Anna
enjoys an academic challenge; she is currently taking AP Physics, Honors
American Literature, Pre-Calculus, French III, 2-D Art, and Western
Civilization. This summer, Anna will be participating in another
internship in which she will work 40 hours a week. Next Fall Anna will be
taking two ISU classes, Introduction to Design and Drawing I. Anna also
participates avidly in extra-curricular activities including Key Club, S.H.E.F.
(Students Helping to Eliminate Hunger), Dance, and Fashion Show. On
weekends, she enjoys ignoring homework, relaxing, sleeping and hanging out with
friends.</p>

<p class='student'><img class='leader_pic' src="./Xin_pic.jpg" alt='Xins photo'>
<b>Xin Pan</b> is a currently a sophomore at Ames High school. Xin 
really enjoys an academic challenge; he is currently taking AP Physics, AP 
Calculus, AP U.S. History, Honors English10, Spanish II, and Orchestra. 
This summer, Xin plans on talking Physics 221, Differential Equations, and 
attending HOBY. Next Fall Xin will be talking Calculus III and Physics
222. Outside of school, Xin participates in Science Olympiad, Math League,
GPML, and ARML. On weekends, Xin enjoys playing football, basketball,
tennis, and hanging out with friends. </p>

<a class='btt' href="#top">Back to Top</a>

</DIV>

<?php
require('SSI_GDBprep.php');
virtual("${CGIPATH}SSI_GDBgui.pl/STANDARD_FOOTER/" . $SSI_QUERYSTRING);
?>





See more files for this project here

eXtensible Genome Data Broker

The xGDB project provides scientists with an online portal for the integration of diverse sources of genomic data. Portals allow researchers to effectively target a specific scientific question by customizing their interactions with available data.

Project homepage: http://sourceforge.net/projects/xgdb
Programming language(s): JavaScript,Perl,PHP
License: other

  Anna_pic.jpg
  DNA Re1.gif
  DNA Re2.gif
  Exampl3.gif
  Exampl4.gif
  Exampl5.gif
  Exampl6.gif
  Exampl7.gif
  Main P1.jpg
  Xin_pic.jpg
  expression.gif
  index.php
  replication.jpg
  transcription.gif
  transcription_mov.gif
  watson.gif