NYU Langone Medical Center

Molecular Biology Resources

Molecular Biology Resources on the Web
Courtesy of Stuart Brown, PhD, Dept of Cell Biology, NYU School of Medicine

Dr. Brown’s top 12 most valuable Bioinformatics websites:

NCBI (Genbank, BLAST, PubMed, etc)
UCSC Genome Browser


MEGA phylogentics

PFAM (Protein Families Database)

Pasteur Inst. (web interface to EMBOSS; other command line programs)

PDB (Protein 3D structure database)

Sequence Databases and Retrieval Tools

NCBI: National Center for Biotechnology Information 
The number one resource for molecular biologists. Provides GenBank and free BLAST and ENTREZ searches via e-mail, client software, or directly over the Web.

BLAST: NCBI Basic Local Alignment Search Tool 
This is the premier Web engine for DNA and protein homology searches.

ENTREZ: NCBI sequence database browser
Entrez is a molecular sequence and document retrieval system, which contains an integrated view of portions of MEDLINE, and all publicly available nucleotide and protein databases. The Protein and Nucleotide entries in Entrez have been compiled from a variety of sources, including GenBank, EMBL, DDBJ, PIR, SWISS-PROT, PRF, and PDB. Entrez is extremely useful for obtaining cross-referenced documentation for a particular sequence once you know its database accession number.

EMBL: European Molecular Biology Laboratory (European equivalent of NCBI)

DDBJ: DNA Data Bank of Japan, Center for Information Biology, Japanese National Institute of Genetics 

Mirrors of GenBank/EMBL databases as well as local databases including Genome Information Broker for Microbial Genomes (GIB), Protein Data Bank (PDB), and a Unified taxonomy database (TXSearch). Provides online tools for FASTA, SSEARCH and BLAST searches, Multiple Alignment using "MALIGN" and "CLUSTAL W", protein secondary structure prediciton, and protein 3D structure analysis by threading (LIBRA).

OWL: Non-redundant protein database 

OWL is a non-redundant superset of SwisProt, PIR, GenPept, and NRL-3D. Entries are amalgamated from primary source databases by a process in which redundant and trivially different entries are eliminated.

PDB: Protein Data Bank hosted by the Research Collaboratory for Structural Bioinformatics 

The Protein Data Bank is an archive of experimentally determined three-dimensional structures of biological macromolecules, serving a global community of researchers, educators, and students.


GEO: Gene Expression Omnibus 
GEO is a gene expression and hybridization array data repositoryas well as an online resource for the retrieval of gene expression data from any organism or artificial source. 

A public repository for microarray based gene expression data at the European Bioinformatics Institute. Currently the EBI is establishing a pilot database containing the microarray gene expression data that are available publicly. An Expression Profiler set of tools is in development to facilitate the analysis and clustering of gene expression and sequence data which may help in the discovery of sequence pattern profiles in the regulatory regions of co-expressed genes.

ExpressDB: a relational database of yeast RNA expression data
As of July, 1999 ExpressDB contains 17.5 million pieces of information loaded from 11 yeast gene expression studies. To assist with the interpretation, extracts of current Saccharomyces Genome Database (SGD) gene name and description data are linked with their corresponding ORFs. ExpressDB also contains 207 functional groupings of yeast ORFs derived from the MIPS database.

KEGG: Kyoto Encyclopedia of Genes and Genomes
The primary objective of KEGG is to computerize the current knowledge of molecular interactions; namely, metabolic pathways, regulatory pathways, and molecular assemblies. KEGG maintains gene catalogs for all the organisms that have been sequenced and links each known protein to a component on the pathway. KEGG also organizes a database of all chemical compounds in living cells and links each compound to one or more pathways.

Human Genome

Ensembl: Human Genome Browser
Ensembl provides automatic annotation to human genome data. Ensembl takes raw DNA sequence contigs from the public Human Genome Project and runs a number of computer programs to determination annotation of genes, transcripts (ESTs), introns and exons, mapped STSs, etc. The results are stored in a relational database and accessible via a Web-bases Genome Browser.

Celera Genomics
Private Human Genome database with limited free access. 

GDB: The Human Genome Database
The Genome Database (GDB) stores and curates human genomic mapping data submitted by researchers worldwide and provides this information electronically to the scientific community.

Protein Pattern and Structural Analysis

PROSITE: Dictionary of protein sites and patterns
PROSITE is Dr. Amos Bairoch's meticulously annotated database of biologically significant protein sites, patterns and profiles that help to identify to which known family of protein (if any) a new sequence belongs. This server allows only text searches of the database.

PRINTS: Protein Motif Fingerprint Database
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family. Usually the motifs do not overlap, but are separated along a sequence. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs. The database thus provides a useful adjunct to PROSITE. This server provides both sequence similarity and text-based database searches, an interesting interactive multiple sequence alignment editor (knonw as CINEMA) is also available.

Pratt: a protien pattern discovery tool
Pratt is a tool that allows the user to search for patterns conserved in sets of unaligned protein sequences. The user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be reported.

SBASE: a collection of annotated protein domain sequences
Offers web-based BLAST searching of proteins domains and cross-references to the other major protein databases.

DNA Pattern Analysis

Eukaryotic Promoter Database
The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally.

SIGNAL SCAN finds homologies of published signal sequences to your sequence, most of these are transcriptional elements (from Advanced Biosciences Computing Center, University of Minnesota).

Other Interesting Bioinformatics Resources

Bioinformatics: services offered by the EMBL at Heidelberg, Germany 

Bioinformatics: services offered by the EMBL at European Bioinformatics Institute, Cambridge, UK

The ExPASy Proteomics Server is dedicated to molecular biology with an emphasis on data relevant to proteins. It allows you to browse through a number of databases produced in Geneva, such as SWISS-PROT, PROSITE, SWISS- 2DPAGE, SWISS-3DIMAGE and SeqAnalRef. It also allows access to various sequence analysis tools. (From University of Geneva, Switzerland).

GCG: The Genetics Computer Group 
GCG is the home of the Wisconsin Sequence Analysis Package, the most comprehensive suite of DNA and protein sequence analysis tools available, and the core software offered by the RCR. The GCG web site offers the company newsletter, advertisements for GCG products, and some links to other biocomputing sites that offer useful information such as online documentation and tutorials for the GCG software.

Sequence Analysis Tools at ExPASy: The Molecular Biology server at the University of Geneva, Switzerland.

Rockefeller University Computing Services: List of DNA and Protein analysis links.

Yeast Genome references and links

Biological Data Transport Inc.: A commercial site (funded by vendors whose products are featured) that contains many useful links and embedded mini-search engines for specific databases.

Harvard Mol and Cell Biology: This provides an excellent collection of links in the areas of Biochemistry and Molecular Biology, Biomolecular and Biochemical Databases (Sequences, Structure, etc), Educational Resources, Evolution, Immunology, Jobs (Biology-related), Online Biological Journals and Articles, and Zebrafish Links (T.-T. S)