Small molecules are annotated as known cofactors, porphyrin-ring-like compounds, metal-organic compounds and other organic compounds. Complex structures containing both viral components and nonviral components are distinguished using the NCBI Taxonomy ID from the metadata of the mmCIF file header as described in Section 2.1. Nonviral components in a complex structure can be the binding partner of the viral protein from its cellular host or an immunoglobulin or nanobody. PDB chains with nonviral components in the viral protein structure complex possessing greater than 30% sequence similarity to a known B-cell antibody, nanobody or T-cell receptor sequence are indexed in a data set of antibodies. The set of antibodies is used to select the viral proteins or glycoproteins within the Neighborhood database to identify all antibody–antigen interfaces. Epitopes interacting with a B-cell antibody or nanobody are annotated as B-cell epitopes, while epitopes interacting with T-cell receptors are annotated as T-cell epitopes.

  • The hotspot-viewing window includes both metadata and detailed information about the hotspot.
  • The intermolecular interactions between small molecules and viral proteins are stored as the hydrophobic interactions, hydrogen bonds and other polar interactions between the small-molecule compound and the corresponding binding pocket.

Vertebrates possess an adaptive immune system that can recognize a wide range of exotic immunogens from various pathogens. The corresponding `antigenic cluster’ of immunogens on the viral proteins can trigger immune responses. Therefore, precise and in-depth knowledge of `antigenic clusters’ is critical for the development of diagnostics and therapeutics targeting infectious, allergic and autoimmune diseases and carcinoma (Dhanda et al., 2019). More specifically, the information on epitopes allows the rational design of fine-tuned or truncated immunogens (Deng et al., 2018), epitope-focused vaccines (Correia et al., 2014), or optmization of antibody cocktail therapy (Starr et al., 2021). For example, identifying the epitopes of a 2019-nCoV vaccine helped to elucidate its working mechanism and led to its production for clinical use (Lucchese, 2020).

  • However, traditional vaccines are often futile against viruses that mutate frequently, such as HIV and influenza (Oscherwitz, 2016).
  • The classification hierarchies and the scientific name of each virus strain are obtained from the NCBI taxonomy database using the corresponding taxonomy IDs.

A database that is not updated and kept current is not only obsolete but can actually hinder scientific progress. Similarly, maintaining internal consistency and consistency with external databases and resources is a critical step towards transformation to an AIS. For this reason, we are planning to perform frequent updates to a test server and release a major update of the virusMED database to the stable production server semi-annually.

virusMED: an atlas of hotspots of viral proteins

The virus classification information on the name of the strain in the database is obtained from the NCBI taxonomy database using the NCBI Taxonomy ID reported in the metadata of the mmCIF file header. These virus IDs are not always reported at the strain level; some are reported at the species level, hindering the direct acquisition of detailed information on the strain level of each virus. For viruses that are not annotated at the strain level, manual curation and routine updates will be needed to label them with reference to the published literature on the viral protein structure. Diversity of hotspot-containing viral protein structures in the virusMED database. Virus strains with more than ten distinct hotspot-containing protein structures are shown. Many viral antigen chains possess covalently linked glycosyl­ation features assigned as individual glycan chains.

Viral components are insufficient to support the complete virus-replication cycle, and therefore viruses can only grow and replicate in living host cells. Viral infections are initiated when virus surface proteins interact with receptors on the host cell, followed by invasion of the target cell through a different mechanism. In rare cases, there are also broad-spectrum antivirals that are known to target homologous viral proteins across species (Verbruggen et al., 2018).

The Neighborhood database provides an effective way to store, query and classify the intermolecular interactions for a diverse set of hotspots in viral protein structures, including the metal binding sites, epitope and drug binding sites. Metal binding sites are hotspots on viral proteins that serve either as an architectural site or a catalytic site (Chaturvedi & Shrivastava, 2005), yet their importance is often overlooked. Metal-containing compounds offer a viable alternative for targeting unusual structural or chemical features on viral proteins that are otherwise inaccessible to organic compounds (de Paiva et al., 2020). Recent studies have shown that ranitidine bismuth citrate (a metallodrug) can inhibit both the ATPase and helicase in SARS-CoV-2 (Yuan et al., 2020). The correct modeling of metal binding sites in macromolecular structures requires knowledge from multiple disciplines, including biology, coordination chemistry and crystallography (Zheng, Cooper et al., 2017; Yao & Moseley, 2019). There are plenty of databases archiving the metal binding sites in macromolecular structures (Lin et al., 2016; Ireland & Martin, 2019), which all rely on the structural information contained in the Protein Data Bank (PDB; Burley et al., 2017).

