Skip to Main Content

Chemistry 410: Chemical Structures and Properties

CHEM 410

Spectral Databases

Spectral Database for Organic Compounds, SDBS
"SDBS is an integrated spectral database system for organic compounds, which includes 6 different types of spectra under a directory of the compounds. The six spectra are as follows, an electron impact Mass spectrum (EI-MS), a Fourier transform infrared spectrum (FT-IR), a 1H nuclear magnetic resonance (NMR) spectrum, a 13C NMR spectrum, a laser Raman spectrum, and an electron spin resonance (ESR) spectrum."
From the source

Advanced Mass Spectral Database
"mzCloud tries to address the identification bottleneck by considering all mass spectrometrically relevant aspects, by looking at a number of experimental and computational details and, in some cases, allowing the identification of unknowns even if they are not present in the library. Please see the full list of mzCloud features."
From the source

DNA

GenBank

GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2013 Jan;41(D1):D36-42). GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive (ENA), and GenBank at NCBI. These three organizations exchange data on a daily basis.

Nucleic Acid Research Database List
"Nucleic Acids Research NAR ) publishes the results of leading edge research into physical, chemical, biochemical and biological aspects of nucleic acids and proteins involved in nucleic acid metabolism and/or interactions. It enables the rapid publication of papers under the following categories: Chemistry and synthetic biology; Computational biology; Gene regulation, chromatin and epigenetics; Genome integrity, repair and replication; Genomics; Molecular biology; Nucleic acid enzymes; RNA and Structural biology."
From the source

Enzymes

BRENDA: Comprehensive Enzyme Information System
"BRENDA is the main collection of enzyme functional data available to the scientific community. It is available free of charge via the internet (www.brenda-enzymes.org) and as an in-house database for commercial users (requests to our distributor geneXplain)."

From the source

ENZYME

"The ENZYME database is a repository of information relative to the nomenclature of enzymes. It is primarily based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB), and it contains the following data for each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided:

  • EC number
  • Recommended name
  • Alternative names (if any)
  • Catalytic activity
  • Pointers to the UniProtKB/Swiss-Prot protein sequence entrie(s) that correspond to the enzyme (if any)
  • Pointers to human disease(s) associated with a deficiency of the enzyme (if any)"

From the source

Drug-Oriented

DrugBank Online

DrugBank Online is a comprehensive, free-to-access, online database containing information on drugs and drug targets. As both a bioinformatics and a cheminformatics resource, we combine detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. DrugBank Online is widely used by the drug industry, medicinal chemists, pharmacists, physicians, students and the general public. Because of its broad scope, comprehensive referencing, and detailed data descriptions, DrugBank is enabling major advancements across the data-driven medicine industry.
From the site.

ChEMBL 

"ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs." 
From the site

Major Open Databases

PubChem 

"PubChem is the world's largest collection of freely accessible chemical information. Search chemicals by name, molecular formula, structure, and other identifiers. Find chemical and physical properties, biological activities, safety and toxicity information, patents, literature citations and more."
From the source.

ChemSpider

"ChemSpider is a free chemical structure database providing fast access to over 100 million structures, properties, and associated information. By integrating and linking compounds from hundreds of high-quality data sources, ChemSpider enables researchers to discover the most comprehensive view of freely available chemical data from a single online search. It is owned by the Royal Society of Chemistry."
From the source

NIST Chemistry WebBook

"This site provides thermochemical, thermophysical, and ion energetics data compiled by NIST under the Standard Reference Data Program.

The National Institute of Standards and Technology (NIST) uses its best efforts to deliver a high quality copy of the Database and to verify that the data contained therein have been selected on the basis of sound scientific judgment. However, NIST makes no warranties to that effect, and NIST shall not be liable for any damage that may result from errors or omissions in the Database.

NIST is an agency of the U.S. Department of Commerce."
From the source

iScienceSearch

iScienceSearch is a federated search service that retrieves chemical compound information from a wide variety of databases (sources) on the Internet.  It is a two-step process.  If the users start with a structure, or identifier (synonym, CAS #, AKos number, SMILES, etc.) iScienseSearch tries to find in two sources alternative synonyms, and identifiers. When the user clicks on "SEARCH", we start in the background parallel sometimes several hundred searches. In spite of this, you get answers quickly, and you don't have to wait until all the results are presented.

The result is that you get correct answers in sources that can only be searched by CAS # , and the user started with a name, or you get correct answers if you started with "Tamiflu" and the hit only contains the name "Oseltamivir". We call this the "extended search".

Search by name, CAS number, AKos number, InChI, or any identifier.

From the source

 

Chemical Synthesis Database

"ChemSynthesis is a freely accessible database of chemicals. This website contains substances with their synthesis references and physical properties such as melting point, boiling point and density. There are currently more than 40,000 compounds and more than 45,000 synthesis references in the database."
From the database

Handbooks

MatWeb Material Property Data

MatWeb's searchable database of material properties includes data sheets of thermoplastic and thermoset polymers such as ABS, nylon, polycarbonate, polyester, polyethylene and polypropylene; metals such as aluminum, cobalt, copper, lead, magnesium, nickel, steel, superalloys, titanium and zinc alloys; ceramics; plus semiconductors, fibers, and other engineering materials.
Membership (Free basic) required
From the source


Proteins

AlphaFold Protein Structure Database

AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment.

DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create AlphaFold DB to make these predictions freely available to the scientific community. The latest database release contains over 200 million entries, providing broad coverage of UniProt (the standard repository of protein sequences and annotations).

From the source

UniProt 

"The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB and PIR are committed to the long-term preservation of the UniProt databases."
From the source

Biological Macromolecule Crystallization Database (version 7.0)

"The BMCD stores information on protein and nucleic acid crystals that have been reported in the literature or deposited in the Protein Data Bank. Crystal growth conditions have been parsed into separate chemicals with numerical concentrations to facilitate data mining. The mission of the BMCD is to enable the discovery of relations among protein properties, crystal conditions, and crystal behavior, in order to facilitate the design of crystal screening strategies for the determination of new structures."
From the source

 

Protein Database (PDB)
"
RCSB PDB (RCSB.org) is the US data center for the global Protein Data Bank (PDB) archive of 3D structure data for large biological molecules (proteins, DNA, and RNA) essential for research and education in fundamental biology, health, energy, and biotechnology.

The Protein Data Bank (PDB) was established as the 1st open access digital data resource in all of biology and medicine (Historical Timeline). It is today a leading global resource for experimental data central to scientific discovery."

From the source

 

"Understanding how genetics affects the health of humans, plants and animals is essential to advances in disease prevention, food security and biodiversity.

We develop databases, tools and software that make it possible to align, verify and visualise the diverse data produced in publicly-funded research, and make that information freely available to all."
From the Source

Deep View Swiss pdb Viewer

"Swiss-PdbViewer (aka DeepView) is an application that provides a user-friendly interface allowing to analyze several proteins at the same time. The proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts. Amino acid mutations, H-bonds, angles and distances between atoms are easy to obtain thanks to the intuitive graphic and menu interface.

Swiss-PdbViewer (aka DeepView) has been developed since 1994 by Nicolas Guex. Swiss-PdbViewer was initially tightly linked to SWISS-MODEL, an automated homology modeling server developed within the Swiss Institute of Bioinformatics (SIB) at the Structural Bioinformatics Group at the Biozentrum in Basel. However, the SWISS-MODEL web interface evolved to a point where it is now possible to use it directly for advanced modeling. Maintaining a direct interface with Swiss-PdbViewer is too complex and no longer supported."

From the Source

 National Center for Biotechnology Information (NCBI)
"As a national resource for molecular biology information, NCBI's mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. More specifically, the NCBI has been charged with creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics; facilitating the use of such databases and software by the research and medical community; coordinating efforts to gather biotechnology information both nationally and internationally; and performing research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules."
From the source

STITCH
STITCH is a database of known and predicted interactions between chemicals and proteins. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases.


Interactions in STITCH are derived from five main sources:

  • Genomic Context Predictions
  • High-throughput Lab Experiments
  • (Conserved) Co-Expression
  • Automated Textmining
  • Previous Knowledge in Databases

The STITCH database currently covers 9'643'763 proteins from 2'031 organisms.