U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of springeropen

Assigning NMR spectra of RNA, peptides and small organic molecules using molecular network visualization software

Jan marchant.

1 Department of Chemistry and Biochemistry, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 USA

Michael F. Summers

2 Howard Hughes Medical Institute, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 USA

Bruce A. Johnson

3 Structural Biology Initiative, CUNY Advanced Science Research Center, 85 St. Nicholas Terrace, New York, NY 10031 USA

Associated Data

NMR assignment typically involves analysis of peaks across multiple NMR spectra. Chemical shifts of peaks are measured before being assigned to atoms using a variety of methods. These approaches quickly become complicated by overlap, ambiguity, and the complexity of correlating assignments among multiple spectra. Here we propose an alternative approach in which a network of linked peak-boxes is generated at the predicted positions of peaks across all spectra. These peak-boxes correlate known relationships and can be matched to the observed spectra. The method is illustrated with RNA, but a variety of molecular types should be readily tractable with this approach.

Electronic supplementary material

The online version of this article (10.1007/s10858-019-00271-3) contains supplementary material, which is available to authorized users.

The power of NMR spectroscopy relative to other molecular spectroscopies lies in the ability to detect spectral signals and interactions associated with specific atoms. Requisite assignment of NMR signals typically follows a paradigm of measuring the chemical shifts of local maxima (peaks) within each spectrum followed by correlating signals within and among different types of NMR spectra and associating those peak positions with specific atoms, either by automated methods or interactive analysis. Although automated assignment methods are desirable and work toward this goal is ongoing, interactive methods remain the standard for much NMR assignment. Although interactive analysis is aided by the display of peak-boxes associated with each measured peak, which can display known assignments or other annotations, the tracking of thousands of scalar- and dipolar-coupled peaks in multiple datasets can be challenging. We describe here an inverted approach that focuses on networks of coupled peaks that are predicted from the molecular structure and type of NMR experiment. Instead of picking peak positions and then attempting to assign them, we generate a linked network of assigned peak-boxes at these predicted positions that can then be interactively aligned with the observed spectra. This approach allows the spectroscopist to make simultaneous use of multiple spectral features that can minimize ambiguity in the assignments compared to the process of assigning individual peaks.

Our technique relies on a priori knowledge of the molecular topology and the ability to predict chemical shifts and coupling patterns. Both types of information are available for a range of molecule types (Steinbeck et al. 2003 ; Ulrich et al. 2008 ; Barton et al. 2013 ; Brown et al. 2015 ). We have used the technique to assign RNAs as large as 68 nucleotides (including fragments of much larger RNA projects) (Keane et al. 2015 ; Marchant et al. 2018 ; Zhang et al. 2018 ) and small cyclic peptides, but the principles apply to many molecular types including DNA, modest sized peptides, and arbitrary small organic molecules. The higher the quality of chemical shift predictions and predicted NOE peaks the better the starting point, but the approach allows bootstrapping from better portions of the starting set to regions of lower quality.

The approach described here could be implemented with a variety of tools for chemical shift prediction, peak network generation and interactive assignment. Here we describe using the protocol with NMRFx Analyst, a software tool that is freely available and open sourced and extends the existing NMRFx Processor (Norris et al. 2016 ). NMRFx Analyst integrates NMR processing, chemical shift prediction and peak picking and assignment tools useful for this approach. An earlier implementation of the approach is also available in NMRViewJ (Johnson and Blevins 1994 ). There are several requirements to implement the approach in other tools. The key requirements are a source of predicted shifts and the ability to interactively move multiple peaks in response to the movement of a single peak. The shift prediction is done once at the start of a project, and so can be done with an external tool. Generation of peaks based on the NMR shift prediction can also be done external to the software. So any NMR analysis tool (such as CCPN Analyst (Skinner et al. 2016 ), Sparky (Lee et al. 2015 ) or CARA(Keller)) that can read external peak files has the core technology to get started without any modifications. The interactive adjustment of peak positions in response to moving a single peak would likely require code modifications or a plugin module, but this should be relatively straightforward to implement.

We describe here the approach for a 50-nucleotide RNA hairpin. A 3 min video illustrating the major steps on a 22-nucleotide RNA hairpin is available as supplementary material. The molecular topology for the RNA is readily available from the primary sequence coupled with NMRFx Analyst’s built-in library of nucleotides. A secondary structure (or if available, the tertiary structure) is additionally necessary for chemical shift prediction and NOE cross-peak prediction. Predicting chemical shifts of the target molecule is an essential component of the protocol. For RNA molecules we use our previously described attribute-based shift prediction technique (Barton et al. 2013 ; Brown et al. 2015 ), but with 3D coordinates a structure based method could be used (Frank et al. 2013 , 2014 ; Brown et al. 2015 ). The attribute technique predicts hydrogen, carbon, and nitrogen chemical shifts based on a set of attributes describing the central nucleotide in a five-nucleotide window. The only input necessary is the primary sequence and a dot-bracket style representation of the secondary structure (Lorenz et al. 2011 ).

For RNA assignments we have used a set of three different experiments. These are homonuclear 2D TOCSY, 2D 1 H- 13 C HMQC and 2D NOESY. The technique is not dependent on having the TOCSY and HMQC experiments, but a greater number of complementary experiments will reduce ambiguities in the assignment process. Each experiment type necessitates a different protocol for peak-box generation. The TOCSY protocol simply generates peak-boxes for protons that have less than a specified number of homonuclear J-coupling steps. In particular, the H5–H6 coupling of uracil and cytosine and couplings between ribose protons are generated. The HMQC involves all carbons with directly bonded protons. While the expected peaks and correlations for the HMQC and TOCSY are relatively insensitive to tertiary structure, peak-box generation for the NOESY involves various assumptions.

For an RNA (or other molecule) where the 3D structure is known peak-boxes are generated for all hydrogen pairs whose distance is less than a specified limit (often 5 or 6 Å). Where the 3D structure is not available, NMRFx Analyst uses the secondary structure specified with dot-bracket notation and a built-in set of rules to generate peak-boxes for helical and tetraloop regions. Intra-residue peak-boxes are also generated and are less dependent on the structural information. While this NOESY protocol is unable to generate inter-residue peak-boxes in larger loops, the combination of peak-boxes in helical and tetraloop regions and intra-residue peak-boxes in all regions gives a substantial number of predicted peaks that can be used as a basis for a search to other regions. The intra-residue assignments can be used to get the correct shift assignments which are then used to assign peaks that haven’t been predicted ( vide infra ).

Overlapping peaks are a serious impediment to the assignment of larger RNAs, but this can be alleviated by the use of isotopically labeled RNA molecules to minimize the number of spectral peaks (Lu et al. 2010 ; Longhini et al. 2016 ). Nucleotide and atom-specific 2 H labeling, or 13 C labeling combined with pulse sequences that filter and edit the spectra based on the presence of 13 C labelled nuclei can be used to generate a complementary set of experiments in which the number of peaks in each individual experiment is reduced, but all expected peaks can be observed in the complete set of experiments (LeBlanc et al. 2017 ). NMRFx Analyst allows specifying the labeling pattern by both nucleotide type and specific residues. The peak-box generator uses this in combination with each experiment’s edit-filter scheme to generate the expected cross-peaks for the labelled RNA.

Once the set of peak-boxes is generated for each experiment the user can begin to interactively assign the spectra. Each available spectrum is displayed with its corresponding peak-boxes superimposed. Any given spectrum might be displayed in multiple windows so that expansions of relevant portions of the spectra can be displayed. The user can then interactively drag, with motions of mouse or track pad, a peak-box from its predicted position to alignment with an observed spectral peak (Fig.  1 ).

An external file that holds a picture, illustration, etc.
Object name is 10858_2019_271_Fig1_HTML.jpg

Screenshot of the NMRFx Analyst GUI with a network assignment procedure in progress. The rectangular peak-boxes illustrate predicted peaks, label numbers indicate the residues involved, and arrows are used to show whether peak-boxes can be moved in each dimension (no X) or are frozen in that dimension (with X). Peak-boxes in black (with residue numbers 6–46, and 6–7) are initially in the predicted positions and can be freely adjusted, as shown by black arrows for peak-box 6–46. Peak-box 7–6 (red) has been selected (yellow background) and then frozen and can no longer be adjusted in either dimension. As a consequence of freezing this peak-box, peak-box 7 (orange) is now frozen in the horizontal position yet adjustable in the vertical so it could be slid down to align with the peak below. The opposite is true for peak-box 6 (magenta) which could be slid left to align with a peak. Other red peak-boxes have already been positioned and frozen. Controls at bottom allow for freezing and thawing peaks. The Tweak + Freeze button will automatically center a peak-box on an overlapped peak before freezing

In the traditional approach, peak-boxes are initially not assigned so there is no unambiguous relationship between different peak-boxes within the spectrum or between spectra. In this new approach, while peak-boxes are not necessarily correctly positioned, they each have an assigned atom for each dimension. The assignment means that sets of peaks will share atoms on one or both dimensions. This is illustrated visually when one selects a peak as shown in Fig.  1 . Connecting lines are drawn between peak-boxes with common atom assignments. As a user drags a peak-box, the entire set of peak-boxes that share an atom with the moved peak will move synchronously with the directly shifted peak. The essence of the method is that whereas observing an individual peak in relation to a spectral signal might be ambiguous, a whole set of coupled peaks is not.

Individual peak-boxes may initially be predicted to be close to multiple spectral signals, precluding unambiguous placement in isolation. In this new approach, however, the entire set of linked peak-boxes across multiple experiments inform the user’s decision. An example of this is shown in Fig.  2 , step 3, where two possible alignments of a group of peak-boxes within the NOESY spectrum are possible, but can be resolved with analysis of the HMQC spectrum. Positioning peak-boxes in crowded regions is still difficult, but is often unnecessary due to the presence of linked peak-boxes that are in uncrowded regions. An additional practical advantage of the approach is that typographical errors are minimized. Rather than the user typing in, with possible errors, an atomic assignment to a peak-box label field in the GUI, all peaks start with a computer generated assignment.

An external file that holds a picture, illustration, etc.
Object name is 10858_2019_271_Fig2_HTML.jpg

Demonstration of the assignment procedure for a portion of a 50 nt RNA. In each panel the upper spectrum is a 1 H- 1 H NOESY and the lower a 1 H- 13 C HMQC. 1 Peak-boxes are initially positioned according to predicted chemical shifts. Upon selecting a peak-box for positioning, the linked peak-boxes are indicated by connecting lines. Visual inspection identifies a candidate peak to which the peak-box labeled 4–5 is manually repositioned, as indicated by the solid arrow. Linked peaks are repositioned automatically, as indicated by the dashed arrows. 2 The peak-box position is frozen, indicated in red. The remaining three peak-boxes in the spin system are automatically frozen, and prevented from moving in their shared dimension, indicated in orange for the x -axis. Their associated peaks are readily identified due to this restriction. 3 Examination of the NOESY spectrum reveals two well-matched possibilities for assignment of the peak-box labeled 7. The correct assignment is found by reference to the HMQC spectrum, in which there is only one reasonable candidate. 4 Repositioning the remaining peak-boxes for the spin-system associated with this atom automatically repositions associated peak-boxes from the remaining spin-system under consideration. 5 The remaining spin-system contains peak-boxes restricted from moving along the y -axis due to previously frozen peaks, indicated in magenta, such that their associated peaks are readily identified. 6 Final positions of the peak-boxes under consideration

The protocol is greatly facilitated by a means to specify whether any given peak-box has been positioned into a final location. In NMRFx Analyst, this is done by clicking a “Freeze” button or using a corresponding keyboard shortcut. Once frozen, a peak-box will be displayed with a different color so that the user has a visual indication of which peak-boxes have been confidently placed (Fig.  1 ). Freezing an individual peak-box will lock both of its dimensions to their current position so that it can’t subsequently be moved. The linked (sharing the same atom) dimensions of other peak-boxes, in the same and different experiments, will also be frozen. Thus, linked peak-boxes might only be frozen in a single dimension. Such peak-boxes may only be slid along the free dimension which facilitates their assignment by minimizing the choice of locations to a single dimension. A color scheme is used to indicate whether a peak is frozen on the x-axis, y-axis or both axes. Peak-boxes can also be unlocked via a “Thaw” button. Freezing peak-boxes also updates the atom assignment table with the chemical shift of the peak-box dimensions. Thus the final assignment list is generated from only peak-boxes that have been frozen.

As described above, the set of peak-boxes generated for NOESY spectra requires assumptions about the molecular structure and it is unlikely that they will perfectly match the spectra. Extraneous peak-boxes are easily deleted. Where peaks cannot be associated with a generated peak-box, the user can manually add a peak-box at the peak’s location. The software still provides significant value in this process as the observed signal might align with peak-boxes that have already been frozen. In this case assignment possibilities for the manually added peak-box are displayed and a link can be made to the already frozen peak-boxes.

The above description has focused on applications to RNA. The approach, however, was initially developed as a means to assign cyclic peptides. The basic protocol for peptides is essentially the same as described above. The differences involve methods for chemical shift prediction and rules for peak-box generation. Predicted chemical shifts can be obtained simply from average chemical shifts for standard amino-acids available from the BMRB (Ulrich et al. 2008 ). Alternatively, NMRFx Analyst includes a built-in (as yet, unpublished) tool for generating predictions based on sequence and dihedral angles, and optionally ring-current shifts. Projects involving cyclic peptides often include non-canonical amino-acids (Hosseinzadeh et al. 2017 ). Shift prediction for non-canonical amino-acids is supported using a built-in predictor based on HOSE codes that can form predictions for any arbitrary organic molecule. Peptides, and all other molecules supported, can also use predictions generated in 3rd party software and imported from a text file. As for RNA, 2D TOCSY, 1 H- 13 C HMQC and 2D NOESY experiments have been implemented, but various experiment combinations are possible. COSY experiments can be included, for example, by using the TOCSY peak-box generation protocol but limiting the number of transfer steps in the peak generator to one. The TOCSY and HMQC experiments are particularly robust because they don’t depend on having 3D structural information, though constraints involved in cyclizing the peptide can be used to generate a reasonable family of structures for NOESY predictions.

The described protocol is also completely applicable to arbitrary small organic molecules and provides a means to rapidly assign, without typographical errors, these molecules using one or more 2D spectra. Predictions can be made using the internal HOSE code based predictor or external tools (Schütz et al. 1997 ; Smurnyy et al. 2008 ). Prediction of NOESY peaks to complement those from scalar-coupled experiments can be made with an approximate 3D structure. Missing and additional peaks can be dealt with as described above.

While the chemical shift predictions that are used always have some level of error, a key benefit of this approach is that individual errors of large magnitude are easily identified and tolerated due to redundancy in the network of moving peaks. More widespread errors in the predicted chemical shifts, particularly if accompanied by errors in the predicted network of NOEs, would potentially prove more challenging, however in our experience of close to 100 distinct RNA molecules this problem has not arisen. This tolerance to error should also allow the method to be used in situations such as RNA–protein complexes where the RNA chemical shifts near the interface are perturbed from their expected values.

The above protocol, as implemented in NMRFx Analyst, provides a rapid way to facilitate the assignment of a variety of RNA, DNA, peptides and small molecules. It has been used for the assignment of a variety of published RNA projects (Keane et al. 2015 ; Marchant et al. 2018 ; Zhang et al. 2018 ) and for rapid assignment of a variety of cyclic peptides (unpublished studies). Its use requires access to chemical shift predictions which are available within NMRFx Analyst or through a wide variety of external software packages. Prediction of peaks expected in scalar-coupled experiments (e.g. TOCSY, COSY, and HMQC) require only an understanding of the covalent structure of the molecule and prediction of a significant number of NOESY peaks can be made with reasonable assumptions about structure. In particular, intra-residue peaks can be predicted and used to aid in assigning inter-residue peaks. The protocol fits between the traditional manual assignment methods that rely on assigning picked peaks and fully automated methods. We anticipate that it will form a basis for adding more automated capabilities in the future. For example, one can already drag a peak near to a signal and have it automatically positioned to the close peak. By basing the automated capabilities on this visual tool, the user will be able to observe the results of the automation and manually intervene. As chemical shift and structural prediction methods are developed across all molecule types, we expect the approaches for chemical shift assignment illustrated here to be adopted into widespread use.

Below is the link to the electronic supplementary material.

Acknowledgements

This work was supported in part by grants from the National Institute of General Medical Sciences of the National Institutes of Health (U54 GM 103297 to BAJ and JM, R01 GM 123012 to BAJ, and GM 42561 to MFS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Barton S, Heng X, Johnson BA, Summers MF. Database proton NMR chemical shifts for RNA signal assignment and validation. J Biomol NMR. 2013; 55 :33–46. doi: 10.1007/s10858-012-9683-9. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown JD, Summers MF, Johnson BA. Prediction of hydrogen and carbon chemical shifts from RNA using database mining and support vector regression. J Biomol NMR. 2015; 63 :39–52. doi: 10.1007/s10858-015-9961-4. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frank AT, Bae SH, Stelzer AC. Prediction of RNA 1H and 13C chemical shifts: a structure based approach. J Phys Chem B. 2013; 117 :13497–13506. doi: 10.1021/jp407254m. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frank AT, Law SM, Brooks CL. A simple and fast approach for predicting 1H and 13C chemical shifts: toward chemical shift-guided simulations of RNA. J Phys Chem. 2014; 118 (42):12168–12175. doi: 10.1021/jp508342x. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hosseinzadeh P, Bhardwaj G, Mulligan VK, et al. Comprehensive computational design of ordered peptide macrocycles. Science. 2017; 358 :1461–1466. doi: 10.1126/science.aap7577. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnson BA, Blevins RA. NMRView: a computer program for the visualization and analysis of NMR data. J Biomol NMR. 1994; 4 :603–614. doi: 10.1007/BF00404272. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Keane SC, Heng X, Lu K, et al. RNA structure. Structure of the HIV-1 RNA packaging signal. Science. 2015; 348 :917–921. doi: 10.1126/science.aaa9266. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Keller R, CARA: http://cara.nmr.ch
  • LeBlanc RM, Longhini AP, Le Grice SFJ, et al. Combining asymmetric 13C-labeling and isotopic filter/edit NOESY: a novel strategy for rapid and logical RNA resonance assignment. Nucleic Acids Res. 2017; 45 :e146. doi: 10.1093/nar/gkx591. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee W, Tonelli M, Markley JL. NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics. 2015; 31 :1325–1327. doi: 10.1093/bioinformatics/btu830. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Longhini AP, LeBlanc RM, Becette O, et al. Chemo-enzymatic synthesis of site-specific isotopically labeled nucleotides for use in NMR resonance assignment, dynamics and structural characterizations. Nucleic Acids Res. 2016; 44 :e52. doi: 10.1093/nar/gkv1333. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lorenz R, Bernhart SH, Zu Siederdissen CH, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011; 6 (1):26. doi: 10.1186/1748-7188-6-26. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lu K, Miyazaki Y, Summers MF. Isotope labeling strategies for NMR studies of RNA. J Biomol NMR. 2010; 46 :113–125. doi: 10.1007/s10858-009-9375-2. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marchant J, Bax A, Summers MF. Accurate measurement of residual dipolar couplings in large RNAs by variable flip angle NMR. J Am Chem Soc. 2018; 140 :6978–6983. doi: 10.1021/jacs.8b03298. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Norris M, Fetler B, Marchant J, Johnson BA. NMRFx Processor: a cross-platform NMR data processing program. J Biomol NMR. 2016; 65 :205–216. doi: 10.1007/s10858-016-0049-6. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schütz V, Purtuc V, Felsinger S, Robien W. CSEARCH-STEREO: a new generation of NMR database systems allowing three-dimensional spectrum prediction. Fresenius J Anal Chem. 1997; 359 :33–41. doi: 10.1007/s002160050531. [ CrossRef ] [ Google Scholar ]
  • Skinner SP, Fogh RH, Boucher W, et al. CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis. J Biomol NMR. 2016; 66 :111–124. doi: 10.1007/s10858-016-0060-y. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smurnyy YD, Blinov KA, Churanova TS, et al. Toward more reliable 13C and 1H chemical shift prediction: a systematic comparison of neural-network and least-squares regression based approaches. J Chem Inf Model. 2008; 48 :128–134. doi: 10.1021/ci700256n. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Steinbeck C, Krause S, Kuhn S. NMRShiftDB-constructing a free chemical information system with open-source components. J Chem Inf Comput Sci. 2003; 43 :1733–1739. doi: 10.1021/ci0341363. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ulrich EL, Akutsu H, Doreleijers JF, et al. BioMagResBank. Nucleic Acids Res. 2008; 36 :D402–D408. doi: 10.1093/nar/gkm957. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang K, Keane SC, Su Z, et al. Structure of the 30 kDa HIV-1 RNA dimerization signal by a hybrid Cryo-EM, NMR, and molecular dynamics approach. Structure. 2018; 26 :490–498. doi: 10.1016/j.str.2018.01.001. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Protein NMR Resonance Assignment

  • Living reference work entry
  • First Online: 10 March 2021
  • Cite this living reference work entry

nmr assignment

  • Takahisa Ikegami 4 &
  • Fuyuhiko Inagaki 5  

33 Accesses

Biosynthetic labeling ; Main chain assignment: side chain assignment ; Spectroscopic assignment

Overview of Protein Resonance Assignment

Until the introduction of the sequential assignment procedure developed by Kurt Wüthrich and his coworkers in 1980s (Wüthrich 1986 ), most protein assignment works were accomplished with reference to the corresponding crystal structures. The establishment of the sequential assignment procedure without depending on the existing three-dimensional (3D) structures was, therefore, a milestone for the protein NMR. Backbone amide proton ( 1 H N ) and α proton ( 1 H α ) signals were sequentially assigned based on the distance information between 1 H N i and 1 H α i − 1 and between 1 H N i and 1 H α i , and fragments of connected assignments were aligned on the amino acid sequence of the particular protein. This facilitated NMR to be independent of X-ray crystallography, and the solution structures of proteins were determined by NMR using the assignment of proton signals...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Bax A, Grzesiek S (1993) Methodological advances in protein NMR. Acc Chem Res 26:131–138. https://doi.org/10.1021/ar00028a001

Article   CAS   Google Scholar  

Cavanagh J, Fairbrother W, Palmer AG, Rance M, Skeleton NJ (2007) Protein NMR spectroscopy, 2nd edn. Elsevier, Amsterdam

Google Scholar  

Driscoll PC, Gronenborn AM, Wingfield PT, Clore GM (1990) Determination of the secondary structure and molecular topology of interleukin-1 beta by use of two- and three-dimensional heteronuclear 15 N- 1 H NMR spectroscopy. Biochemistry 29:4668–4682. https://doi.org/10.1021/bi00471a023

Article   CAS   PubMed   Google Scholar  

Fesik SW, Eaton HL, Olejniczak ET, Zuiderweg ERP, McIntosh LP, Dahlquist FW (1990) 2D and 3D NMR spectroscopy employing 13 C-, 13 C magnetization transfer by isotropic mixing. spin system identification in large proteins. J Am Chem Soc 112:886–888. https://doi.org/10.1021/ja00158a069

Gorman SD, Sahu D, O'Rourke KF, Boehr DD (2018) Assigning methyl resonances for protein solution-state NMR studies. Methods 148:88–99. https://doi.org/10.1016/j.ymeth.2018.06.010

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ikura M, Kay LE, Bax A (1990) A novel approach for sequential assignment of proton, carbon-13, and nitrogen-15 spectra of larger proteins: heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin. Biochemistry 29:4659–4667. https://doi.org/10.1021/bi00471a022

Jabar S, Adams LA, Wang Y, Aurelio L, Graham B, Otting G (2017) Chemical tagging with tert -butyl and trimethylsilyl groups for measuring intermolecular nuclear Overhauser effects in a large protein-ligand complex. Chemistry 23:13033–13036. https://doi.org/10.1002/chem.201703531

Kainosho M, Tsuji T (1982) Assignment of the three methionyl carbonyl carbon resonances in Streptomyces subtilisin inhibitor by a carbon-13 and nitrogen-15 double-labeling technique. A new strategy for structural studies of proteins in solution. Biochemistry 21:6273–6279. https://doi.org/10.1021/bi00267a036

Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Ono AM, Güntert P (2006) Optimal isotope labelling for NMR protein structure determinations. Nature 440:52–57. https://doi.org/10.1038/nature04525

Kasai T, Ono S, Koshiba S, Yamamoto M, Tanaka T, Ikeda S, Kigawa T (2020) Amino-acid selective isotope labeling enables simultaneous overlapping signal decomposition and information extraction from NMR spectra. J Biomol NMR 74:125–137. https://doi.org/10.1007/s10858-019-00295-9

Kay LE (2001) Nuclear magnetic resonance methods for high molecular weight proteins: a study involving a complex of maltose binding protein and β-cyclodextrin. In: James TL, Dotsch V, Schmitz U (eds) Methods enzymol, vol 339. Academic, New York, pp 174–203. https://doi.org/10.1016/s0076-6879(01)39314-x

Chapter   Google Scholar  

Kay LE, Ikura M, Tschudin R, Bax A (1990) Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J Magn Reson 89:496–514. (Reprint, 213:423–441). https://doi.org/10.1016/j.jmr.2011.09.004

Maciejewski MW, Schuyler AD, Gryk MR, Moraru II, Romero PR, Ulrich EL, Eghbalnia HR, Livny M, Delaglio F, Hoch JC (2017) NMRbox: a resource for biomolecular NMR computation. Biophys J 112:1529–1534. https://doi.org/10.1016/j.bpj.2017.03.011

McIntosh LP, Dahlquist FW (1990) Biosynthetic incorporation of 15 N and 13 C for assignment and interpretation of nuclear magnetic resonance spectra of proteins. Q Rev Biophys 23:1–38

Morita EH, Shimizu M, Ogasawara T, Endo Y, Tanaka R, Kohno T (2004) A novel way of amino acid-specific assignment in 1 H- 15 N HSQC spectra with a wheat germ cell-free protein synthesis system. J Biomol NMR 30:37–45

Oh BH, Westler WM, Darba P, Markley JL (1988) Protein carbon-13 spin systems by a single two-dimensional nuclear magnetic resonance experiment. Science 240:908–911. https://doi.org/10.1126/science.3129784

Pritišanac I, Alderson TR, Güntert P (2020) Automated assignment of methyl NMR spectra from large proteins. Prog Nucl Magn Reson Spectrosc 118–119:54–73. https://doi.org/10.1016/j.pnmrs.2020.04.001

Reif B (2017) Proton-detection in biological MAS solid-state NMR spectroscopy. In: Webb GA (ed) Modern magnetic resonance. Springer, Cham, pp 1–33. https://doi.org/10.1007/978-3-319-28275-6

Rennella E, Huang R, Yu Z, Kay LE (2020) Exploring long-range cooperativity in the 20S proteasome core particle from Thermoplasma acidophilum using methyl-TROSY-based NMR. Proc Natl Acad Sci U S A 117:5298–5309. https://doi.org/10.1073/pnas.1920770117

Shen Y, Bax A (2015) Protein structural information derived from NMR chemical shift with the neural network program TALOS-N. Methods Mol Biol 1260:17–32. https://doi.org/10.1007/978-1-4939-2239-0_2

Stoffregen MC, Schwer MM, Renschler FA, Wiesner S (2012) Methionine scanning as an NMR tool for detecting and analyzing biomolecular interaction surfaces. Structure 20:573–581. https://doi.org/10.1016/j.str.2012.02.012

Torchia DA, Sparks SW, Bax A (1989) Staphylococcal nuclease: sequential assignments and solution structure. Biochemistry 28:5509–5524. https://doi.org/10.1021/bi00439a028

Tugarinov V, Kay LE (2003) Ile, Leu, and Val methyl assignments of the 723-residue malate synthase G using a new labeling strategy and novel NMR methods. J Am Chem Soc 125:13868–13878. https://doi.org/10.1021/ja030345s

Tugarinov V, Muhandiram R, Ayed A, Kay LE (2002) Four-dimensional NMR spectroscopy of a 723-residue protein: chemical shift assignments and secondary structure of malate synthase G. J Am Chem Soc 124:10025–10035. https://doi.org/10.1021/ja0205636

Wüthrich K (1986) NMR of proteins and nucleic acids. Wiley, New York

Book   Google Scholar  

Wüthrich K, Wider K (2003) Transverse relaxation-optimized NMR spectroscopy with biomacromolecular structure in solution. Magn Reson Chem 41:S80–S88. https://doi.org/10.1002/mrc.1280

Download references

Author information

Authors and affiliations.

Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan

Takahisa Ikegami

Department of Structural Biology, Hokkaido University, Kita-ku, Sapporo, Japan

Fuyuhiko Inagaki

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Takahisa Ikegami .

Editor information

Editors and affiliations.

University Leicester MRC Centre, Leicester, UK

Gordon Roberts

Dept Biochemistry, University of Oxford, Oxford, UK

Anthony Watts

Section Editor information

No affiliation provided

Mitsu Ikura

Rights and permissions

Reprints and permissions

Copyright information

© 2021 European Biophysical Societies' Association (EBSA)

About this entry

Cite this entry.

Ikegami, T., Inagaki, F. (2021). Protein NMR Resonance Assignment. In: Roberts, G., Watts, A. (eds) Encyclopedia of Biophysics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35943-9_312-1

Download citation

DOI : https://doi.org/10.1007/978-3-642-35943-9_312-1

Received : 13 November 2020

Accepted : 16 November 2020

Published : 10 March 2021

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-35943-9

Online ISBN : 978-3-642-35943-9

eBook Packages : Springer Reference Biomedicine and Life Sciences Reference Module Biomedical and Life Sciences

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 21 March 2023

Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing

  • Anthony C. Bishop   ORCID: orcid.org/0000-0003-4853-1073 1 ,
  • Glorisé Torres-Montalvo   ORCID: orcid.org/0000-0001-5240-6000 1 ,
  • Sravya Kotaru   ORCID: orcid.org/0000-0002-5632-8190 2 ,
  • Kyle Mimun   ORCID: orcid.org/0000-0003-0060-4149 1 &
  • A. Joshua Wand   ORCID: orcid.org/0000-0001-8341-0782 1 , 2 , 3 , 4  

Nature Communications volume  14 , Article number:  1556 ( 2023 ) Cite this article

2 Citations

3 Altmetric

Metrics details

  • Molecular conformation
  • Solution-state NMR

Assignment of resonances of nuclear magnetic resonance (NMR) spectra to specific atoms within a protein remains a labor-intensive and challenging task. Automation of the assignment process often remains a bottleneck in the exploitation of solution NMR spectroscopy for the study of protein structure-dynamics-function relationships. We present an approach to the assignment of backbone triple resonance spectra of proteins. A Bayesian statistical analysis of predicted and observed chemical shifts is used in conjunction with inter-spin connectivities provided by triple resonance spectroscopy to calculate a pseudo-energy potential that drives a simulated annealing search for the most optimal set of resonance assignments. Termed Bayesian Assisted Assignments by Simulated Annealing (BARASA), a C++ program implementation is tested against systems ranging in size to over 450 amino acids including examples of intrinsically disordered proteins. BARASA is fast, robust, accommodates incomplete and incorrect information, and outperforms current algorithms – especially in cases of sparse data and is sufficiently fast to allow for real-time evaluation during data acquisition.

Similar content being viewed by others

nmr assignment

Backbone-independent NMR resonance assignments of methyl probes in large proteins

nmr assignment

Automatic structure-based NMR methyl resonance assignment in large proteins

nmr assignment

Extended experimental inferential structure determination method in determining the structural ensembles of disordered protein states

Introduction.

Nuclear magnetic resonance (NMR) spectroscopy is unique in its ability to provide simultaneous and comprehensive structural and dynamical atomic-scale information about macromolecules such as proteins in solution 1 , 2 , 3 , 4 . Unfortunately, however, an observed resonance frequency in an NMR spectrum cannot yet be directly assigned to the individual atom(s) within the protein from which they arise without the time-intensive collection and analysis of additional spectra. Comprehensive mapping of individual resonances comprising nuclear magnetic resonance (NMR) spectra to specific atoms within a protein molecule is a general prerequisite for the successful analysis of the structure and dynamics of proteins by NMR spectroscopy. Early applications of multi-dimensional homonuclear 1 H NMR data to the so-called resonance assignment problem relied heavily on human intervention. The first comprehensive approach was the sequential assignment method, which centered on identification of J-coupled spin systems 5 that are then assembled through connections provided by short distances revealed by the nuclear Overhauser effect (NOE) interactions between sequential residues using the identity of side chains to error-check against the primary structure 6 , 7 . The subsequent main chain directed (MCD) assignment strategy 8 , 9 formalized self-correcting cyclic patterns of backbone 1 H- 1 H NOE interactions and provided a more robust algorithmic framework that relieved somewhat the complexity of identifying side chain resonances 10 , 11 . While the MCD approach did lead to the first fully automated assignment of 1 H resonances to backbone hydrogens 11 , automation of 1 H-based resonance assignments was generally frustrated by the overwhelming spectral degeneracy of multidimensional 1 H spectra of proteins and the interference of technical attributes such as a prominent diagonal. The introduction of heteronuclear triple resonance spectroscopy 12 , 13 , 14 , 15 , 16 , 17 completely changed the landscape of the resonance assignment task by providing much greater resolution, generally higher quality data, and, most importantly, definitive rules with very precise meanings for making connectivities (correlations) between backbone resonances. Triple resonance assignments of the protein backbone permit access, either directly or by tethering to side chain resonance assignments, to a wide range of dynamic phenomena 17 , 18 and structural information 19 , 20 , 21 .

Automated triple resonance algorithms have led to effectively complete backbone resonance assignments of smaller proteins with little human intervention and greatly aided the assignment of larger systems 22 , 23 , 24 . Yet, even with the advent of transverse relaxation optimized spectroscopy (TROSY) 25 , the comprehensive assignment of systems larger than 30 kDa remains remarkably rare. The limitations are quite analogous to that summarized for earlier assignment strategies based exclusively on 1 H- 1 H scalar and NOE interactions: increasing ambiguity in connectivities due to degeneracy, loss of resonances due to relaxation or artifact, and other confounding spectral attributes are simply not sufficiently accommodated by current automated assignment strategies.

Here, we strive to overcome the issue of data sparseness and ambiguity by appealing to the statistics of Bayes to utilize available information more effectively via the calculation of explicit probabilities. Importantly, this formalism also allows for a flexible and adaptable incorporation of chemical shift prediction and structural knowledge into the assignment process. By implementing the Bayesian analysis within a simulated annealing engine, we develop a robust and efficient search for optimal solutions. Protein assignment algorithms utilizing simulated annealing have been developed in the past 26 . However, the stochastic algorithm described here takes advantage of readily available pre-existing structural models, both experimentally-determined and predicted, and in doing so more effectively exploits the rich information contained within structure-based predicted chemical shifts. We demonstrate how these invaluable restraints greatly aid the resonance assignment process, especially in cases where data may be otherwise sparse or even incorrect. We also compare the overall performance of BARASA against three highly cited assignment algorithms on a variety of experimental datasets.

Results and discussion

Bayesian assisted resonance assignments by simulated annealing (barasa).

We designed an algorithm, termed BARASA, which utilizes a simulated annealing approach 27 to efficiently search the immense solution space for the optimal set of resonance assignments starting with a set of raw crosspeaks derived from triple resonance type spectra. The objective is to find the correct mapping of individual resonances to specific atoms within the protein molecule. The algorithm first assembles an initial set of spin systems based on an analysis of crosspeak lists and the connectivity rules of the particular triple resonance experiments employed. This process may not yield an unambiguous nor complete set of spin systems due to inherent degeneracy and missing or artifactual peaks (See Methods). As a result, a given crosspeak could be associated with multiple, spectrally-overlapping spin systems; in which case, the crosspeak is randomly placed in one of the overlapping spin systems. The simulated annealing search engine then randomly distributes the starting set of spin systems to specific residue positions. If there are more spin systems than residue positions, then the excess spin systems are placed in a cache for later use as described below. The energy of this initial state is calculated as the sum of the energies of the individual spin systems currently placed in residue positions. Each spin system energy is composed of two terms: the adjacency energy and the chemical shift energy. The adjacency energy describes the interaction between two spin systems mapped to adjacent locations on the amino acid sequence. This energy is minimized if the Cα(i), Cβ(i), and C’(i) shifts of the spin system match the Cα(i-1), Cβ(i-1), and C’(i-1) of the spin system at the following residue in the sequence. In contrast, the chemical shift energy describes the interaction between a spin system and its current residue position i.e., it is defined by the local sequence and structure. This energy is minimized when the resonances of the spin system closely match the predicted values of the current residue position, while also failing to match the predicted values at all other residue positions. Application of Bayes’ theorem then provides a posterior probability of assigning each spin system at each location in the sequence that is based on the predicted and experimental shifts. Using this probability, the chemical shift energy is calculated (see Methods for a more detailed description). After the initial calculation of energy, a spin system or individual crosspeak is randomly chosen. A spin system is either moved to an unoccupied residue position, swapped with another spin system, or added to the cache. Spin systems or cross peaks deposited to their respective caches have no priority and are randomly selected from the cache. Similarly, if a chosen crosspeak can be productively added to the crosspeak cache, swapped with another crosspeak in an overlapping spin system, or moved to an overlapping spin system, the move is made. With every crosspeak/spin system swap, the decision to accept the proposed move is made based on the energy of the system before and after the proposed swap. Using an effective temperature T, the Metropolis criterion 28 is applied (Eq.  1 ).

\({P}_{{{{{\rm{accept}}}}}}\) is the probability of accepting the swap and \(\varDelta E\) is the change in energy due to the proposed swap. If \(\varDelta E\le 0\) then \({P}_{{{{{\rm{accept}}}}}}\) is set to 1. If \(\varDelta E\) > 0, then \(0 < \,{P}_{{{{{\rm{accept}}}}}} < 1\) and a uniformly distributed random number r such that \(0\le {r}\le 1\) is generated. If \(r\le \,{P}_{{{{{\rm{accept}}}}}}\) then the swap is accepted. Otherwise, the swap is rejected and the system state is left unchanged. Random swap attempts are continued until the average energy of system does not vary significantly. \(T\) is then decreased by following a highly optimized schedule based on a quantity analogous to the specific heat of the system (see Methods). The system is further cooled and equilibrated in this manner until a set of termination criteria are achieved and the annealing protocol is ended. Finally, to ensure that the system has reached a minimum in energy, a proposed swap of each spin system with every other spin system as well as every crosspeak with every other possible crosspeak is then attempted with only decreasing energy changes being accepted. This post-annealing minimization routine is repeated 100 times. The entire procedure, starting from initialization and ending with minimization, is repeated 20 times. The algorithm then chooses the spin system that was assigned to each residue location in a majority of the annealing runs (if any) and builds a consensus assignment set. The consensus assignment set is further curated using criteria defined below to produce the final assignment set. The overall BARASA algorithm is outlined in Figs.  1 and 2 .

figure 1

a The search engine rests on a Bayesian-based simulated annealing protocol that uses a specific-heat mechanism to guide cooling. Crosspeaks lists drawn from triple resonance spectra are assembled into putative spin systems, which are then randomly assigned to positions within the primary sequence of the protein. Sequential adjacency in the primary sequence is provided by apparent connectivities derived from triple resonance NMR spectra. Predicted chemical shifts, based on a high-resolution structural model or gleaned from empirical amino acid-specific distributions, are incorporated into the system energy using Bayesian statistics. Throughout annealing, crosspeaks may move among spin systems with overlapping resonances, changing the energies of the affected spin systems. Annealing involves Monte Carlo swapping of both crosspeak assignments to spin systems and spin system assignments to locations in the sequence. The concept of dynamic swapping of individual crosspeaks or entire spin systems is outlined in Fig.  2 . Annealing continues until energy equilibration is achieved. The temperature is then lowered and the system re-equilibrated. Annealing is stopped when the termination criteria are met and a local minimization routine is performed. b The final resonance assignments are developed from results of multiple independent simulated annealing runs. c Shown is a ribbon representation of maltose binding protein (PDB code: 1DMB [ https://doi.org/10.2210/pdb1DMB/pdb ]) color-coded according to assignment status following analysis by BARASA: correctly assigned residues (blue); unassigned residues (white), prolines (red). See main text for further details.

figure 2

a Spin systems (orange puzzle pieces) begin in the cache (black box) and are initialized by random assignment to the sequence (purple pieces). Spin systems can then be swapped with others or moved to different locations of the sequence or to the cache. Spin systems or cross peaks in their respective caches have no priority and are randomly selected. Swaps are accepted or rejected with a probability based on the change in energy of the proposed swap. b The energy of each spin system depends on how it fits with the adjacent spin system (adjacency energy) and with the predicted shifts for that residue location (chemical shift energy). Exchange of crosspeaks between spin systems can be thought of as changing the puzzle piece shape. See main text and supplementary material for details.

BARASA is accurate, robust, and fast

We tested BARASA against a test set of six different folded protein systems ranging in size and topology: human interleukin-1 receptor antagonist C66A, C122A (IL-1Ra, 152 residues, 17.1 kDa), human interleukin-1 β (IL-1 β , 154 residues, 17.5 kDa), S. solfataricus indole-3-glycerol phosphate synthase R43S (IGPS, 248 residues, 28.4 kDa), E. coli maltose binding protein (MBP, 371 residues, 40.8 kDa), the first cyclization domain from the Y. pestis yersiniabactin non-ribosomal peptide synthetase (Cy1, 453 residues, 51.9 kDa), and E. coli thymidylate synthase (ecTS, 264 residues, 61.0 kDa homodimer). In addition, we challenged the algorithm with two so-called intrinsically disordered proteins (IDPs). These include the V5 domain (residues 606-672) of human protein kinase C (V5dm, 68 residues, 7.7 kDa) and the intrinsically disordered region of human ANP32A (hIDD, 110 residues, 12.8 kDa). All crosspeak lists were derived from triple resonance data (Table  1 ). Crosspeak positions used were pulled from the canonical triple resonance spectra used for protein assignment (i.e., HSQC, HNCO 29 , HN(CA)CO 30 , HNCA 31 , HN(CO)CA 31 ,HNCACB 32 , HN(CO)CACB/CBCA(CO)NH 33 ) (see Supplementary Table  S1 ) with the exception of hIDD in which crosspeaks were derived from provided spin systems. To generate crosspeaks from the spin systems of hIDD, Gaussian error was added to the resonance values to create the chemical shifts of simulated crosspeaks. (see Methods). Four of the data sets (IL-1Ra, IL-1 β , IGPS, and MBP) were obtained in our laboratory. Crosspeak lists for Cy1, ecTS, V5dm, and spin systems for hIDD were kindly provided by Drs. Dominque Frueh (Johns Hopkins University), Andrew Lee (University of North Carolina at Chapel Hill), Tatyana Igumenova (Texas A&M University) and Martin Blackledge (Institut de Biologie Structurale), respectively.

The results from BARASA were compared to reference assignments to assess program performance. Reference assignments were obtained from either the BMRB, directly from another lab, or manually determined by us (Table  1 ). Deposited assignments were manually mapped to the acquired spectra for comparison. A small movement in crosspeak positions between the deposited assignments and the acquired spectra was permitted to account for differences in experimental conditions. In addition, a small number of resonances assigned in the deposited data sets were not present in the acquired spectra of IL-1 β . These were removed from the reference assignments and considered unassigned when assessing algorithm performance (Supplementary Table  3 ). For the most part, reference assignments were considered complete though in a few cases BARASA identified a small number of additional assignments that were confirmed manually and included in the reference assignments (Supplementary Tables  6 – 9 ). For each residue position, BARASA either outputs the spin system and its associated resonances that were assigned to that residue position or marks it as unassigned. The assignment given to each residue in the protein sequence by BARASA was determined to either be matching, missing, or mismatching its counterpart in the reference assignments. A residue was considered to have a matching assignment if the amide group assigned to it by the algorithm was the same as the reference. A residue was also considered to match the reference if it was unassigned both by BARASA and in the reference assignments. A residue was designated missing if an amide group was assigned to that location in the reference assignments, but BARASA did not assign that residue position. Lastly, a residue was labeled as mismatching if BARASA assigned an amide group and it did not match that in the reference assignments or if the residue was unassigned in the reference assignments.

In general, BARASA’s performance when utilizing structure-based chemical shifts and crosspeak lists derived from a comprehensive set of triple resonance experiments is marked by (nearly) complete assignments when compared to the manually curated reference assignments and, most importantly, produced very few errors (Fig.  3 & Supplementary Table  2 ). Individual statistics for each assignment are listed in Supplementary Tables  3 – 10 . BARASA had relatively more difficulty with the Cy1 and IGPS examples. This is likely due to a higher degree of variance in resonance chemical shifts of the backbone spins among the different spectra relative to the test cases because of the employment of multiple independently prepared samples, but the performance overall remained very good (Fig.  3 ). In the case of hIDD, a relatively high apparent mismatch rate is observed. Upon closer examination, the mismatching assignments made by BARASA were all assignments not previously reported as assigned. Many of these previously unreported assignments fall within regions of the sequence with low complexity (Supplementary Table  10 ) which is likely why they were difficult to assign manually. While there are no independent data supporting their veracity, these assignments proposed by BARASA and, as we discuss more below, by the next best performing automated assignment algorithm FLYA 34 are highly similar and are likely to be largely correct.

figure 3

Comparison of automated assignment algorithms. Results of automated resonance assignments by BARASA utilizing raw crosspeak lists drawn from a relatively comprehensive set of triple resonance experiments. Compared to manually curated resonance assignments obtained for eight test proteins: interleukin-1 β (IL-1 β ), interleukin-1 receptor antagonist (C66A, C122A) (IL-1Ra); indole-3-glycerol phosphate synthase (R43S) (IGPS), maltose binding protein (MBP), non-ribosomal peptide synthetase (Cy1), thymidylate synthase (ecTS), V5 domain of protein kinase C (V5dm), and intrinsically disordered region of human ANP32A (hIDD). Shown are the fractions of residues that are accurately matched (green), mismatched (magenta), or missing (i.e., unassigned) (blue) to the reference assignments. *In the case of hIDD, a number of de novo assignments were indicated by BARASA and are included as mismatching with the reference assignments. See main text and Table  1 . Source data are provided in the Source Data file.

BARASA utilized SHIFTX+ 35 predicted chemical shifts for the globular test proteins, whereas the algorithm utilized random coil chemical shifts 36 , 37 for the so-called IDP examples as predicted shift restraints during annealing (see Methods). SHIFTX+ was chosen as it appears to be among the best-reported chemical shift prediction algorithm based solely on three-dimensional structural information and other physical parameters (i.e., temperature, pH). The related algorithm SHIFTX2, though it gives more accurate predictions, relies on the analysis of shifts from homologous proteins as well as the three-dimensional structural inputs specific to the protein being analyzed. It was our concern that the accuracy of SHIFTX2 would vary with the number of homologs available and, under circumstances of sparse homologs, result in significantly larger errors than are reported for the average case. As accurate estimation of prediction error is crucial to the Bayesian analysis (Methods), inaccurate and/or unaccounted for variance in prediction errors could compromise performance. Furthermore, as SHIFTX2 performs searches for the known chemical shifts of homologous sequences as part of its prediction, it would utilize the previously assigned shifts of our test proteins to present the BMRB in the generation of the predicted shifts. Such shifts would not be generally available for the de novo assignment of a protein and would thus be an invalid test of BARASA. We also note that using predicted chemical shifts generated by SPARTA+ 38 gave similar results (Supplementary Table  11 ) as when using those predicted by SHIFTX+.

In this regard, it is important to appreciate that it is statistically anticipated from the distributions of chemical shifts, either predicted or documented in the BMRB, that values outside the error range will be encountered. For example, if the distribution were taken as Gaussian and employing the standard deviation as the prediction error (see Methods), approximately 32% of all predictions would be expected to be outside of the considered error range. This is what is observed. Supplementary Tables  3 – 10 contain the likelihoods of the spin systems for the various test proteins. These likelihoods represent the probability of observing the experimental shifts given that the assignment is correct and ranges from 0 to 1. Likelihoods lower than 0.32 correspond to spin systems with predicted resonance chemical shifts that are, on average, beyond the specified error range but are nevertheless well accommodated by BARASA.

Finally, BARASA also produces a curated set of assignments from 20 annealing runs within 1 hour for each system tested (see Supplementary Table  12 ). With high accuracy and runtimes under an hour, the advantages of BARASA become even more apparent when considering large proteins with suboptimal data sets.

The performance of BARASA with suboptimal data sets

The rather complete crosspeak lists from an extensive set of triple resonance experiments for each test protein provide valuable benchmarks for the validation of BARASA, but are arguably not fully illustrative of the difficult protein systems often challenging current applications of protein NMR spectroscopy. To examine the performance of BARASA in cases of missing data and to illuminate the most impactful triple resonance information, individual crosspeaks or all crosspeaks of entire spin systems from the MBP and ecTS data sets were randomly discarded to generate compromised data sets, emulating data collection on challenging protein systems. Individual crosspeaks were randomly retained in the data set with a probability based on the crosspeak type (i.e., Cα, Cβ, or CO resonance). This process was done over a wide range of retention probabilities to produce a multitude of distinct data sets that represent a wide range of data completeness. These depleted peaks lists were then used as input to BARASA the results of which are provided in Supplementary Tables  13 and 14 . In this way, the relative importance and completeness of different types of spectral data as well as the effects of entirely missing spin systems could be probed. In addition, a key question was to learn the extent to which structure-based chemical shifts, as opposed to general BMRB residue-specific statistics, can rescue the assignment and aid the assignment process.

Figure  4 illustrates the robustness of BARASA when analyzing conditions of missing spectral data. This specific example was generated using retention probabilities of 88% and 25% for the Cα- and Cβ-based information, respectively, and with retention probabilities of either 0% or 75% for the CO-based information. Reliance on the BMRB database for predicted shifts, as opposed to structure-based shifts, yielded poor performance. In brief, the use of structure-based SHIFTX+ 35 predictions entirely rescues the resonance assignment. These data indicate that the availability of the structure-based chemical shift predictions serves as a powerful restraint in protein assignment - large enough to potentially surpass the information provided by the CO experimental pair under many circumstances. This is likely due to the fact that spin system adjacency is established adequately with the Cα and Cβ spectral information and the remaining assignment ambiguity is due to residue type matching; CO resonances provide little residue type information and offer little help in this respect. We do not believe this observation to be an artifact of the parameterization of the energy function since carbonyl-derived connectivities are weighted roughly the same as the chemical shift probability (Methods). As such, the energy provided by CO connectivity information would be of a similar magnitude of the total chemical shift energy of the spin system.

figure 4

Shown are the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Panels a – d correspond to results from input data sets where entire spin systems were discarded from the crosspeak lists. The ordinate axis is the fraction of retained spin systems and the dashed lines indicate the maximum fraction of possible matching assignments. The effects of random spin system depletion on the analysis of MBP both randomly ( a ) and as stretches of five consecutive residues being discarded ( b ). A similar analysis of ecTS with either individual ( c ) or groups of five consecutive spin systems being discarded ( d ). For the conditions 0.8 and 0.6 fractions retained, ten random data sets retaining the indicated fraction of spin systems were generated. The performance of BARASA on each data set is shown as a single orange solid circle, with the bar height representing the arithmetic mean. The full data set (“1.0” condition) results were taken from Fig.  3 . Only one result with the full data set was measured to avoid the comparison of run-to-run variation with variation due to differences in the input data set. The effects of restricting connectivity information by utilizing only a single pair of triple resonance experiments with either residue-type statistics (BMRB) ( e ) or structure-based (SHIFTX + ) ( f ) chemical shift predictions for MBP. Similarly, for ecTS using only residue-type statistics (BMRB) ( g ) or structure-based (SHIFTX + ) ( h ) chemical shift predictions. The effect of random depletion of crosspeaks from the comprehensive set of triple resonance experiments where the indicated percentages each type of crosspeak that are retained is illustrated for the MBP ( i ) and ecTS ( j ) data sets and used with residue-type statistics (BMRB) or structure-based (SHIFTX+) predicted chemical shifts. Results of ten individual runs ( n  = 10) are plotted as solid orange circles and bar heights represent the arithmetic mean. Source data are provided as a Source Data file.

Randomly retained spin system data sets were generated in two ways: by allowing all crosspeaks of any spin system assigned in the reference assignments to be randomly discarded from the input data set until only the indicated fraction of the assigned spin systems remained or by discarding the crosspeaks of random spin systems in the same manner, with the added condition that only those from sets of five random, but contiguous in sequence, spin systems are discarded. The latter condition was performed to simulate the performance of BARASA under the common situation where exchange broadening arising from physical motion of contiguous stretches of sequence (e.g., loops) results in loss of amide resonances. In both cases, BARASA is still able to produce the overwhelming majority of the possible assignments without errors even when up to 40% of the spin systems are missing (Fig.  4 ). There is little difference in performance whether the missing data is localized or distributed across the sequence. The performance of BARASA when challenged with artifact peaks, which often arises from low-concentration or unstable samples or instrumentation, was also examined. In this case, a depleted data set from above was augmented with randomly generated artifact peaks. Only a modest decrease in performance is observed even when the crosspeak list is contaminated with 20% artifactual entries (Supplementary Fig.  1 ).

Even with the considerable time-savings introduced by non-uniform sampling 39 , collection of NMR data on proteins is still time intensive. The superior performance of BARASA on missing data within a comprehensive set of triple resonance experiments raised the possibility that BARASA could tolerate a reduced set of triple resonance experiments. We tested this hypothesis using ecTS and MBP where information from a single triple resonance experimental pair (e.g., HNCA and HN(CO)CA) combined with BMRB or SHIFTX + predicted shifts were analyzed. The Cα- and Cβ-type triple resonance pairs are equally useful in the BARASA assignment process when provided SHIFTX+ shifts, but the Cβ information becomes relatively more effective when relying on BMRB amino acid distributions (Fig.  4 ). This is clearly due to the higher residue type information intrinsic to the Cβ resonance. Overall, BARASA performs extremely well with either the Cα -or Cβ-type triple resonance experimental pairs only. In contrast, the CO-type triple resonance experimental pair when used alone is much less effective, likely due to the reduced sensitivity of carbonyl carbon shifts to amino acid type and local structure.

Comparison to alternate automated resonance assignment algorithms

Computer-assisted resonance assignment strategies for analysis of triple resonance spectra have been employed for over two decades. For the sake of comparison, three highly-cited algorithms were compared to BARASA: FLYA, AutoAssign 22 , and I-PINE 40 . The same crosspeak lists derived from the comprehensive set of triple resonance experiments were used for all four algorithms (Fig.  5 ). BARASA achieved the highest percent matching among all the algorithms against the reference assignments in all test cases. BARASA outperformed AutoAssign and I-PINE by considerable margins, most notably with the two IDPs examined, while offering only marginal improvement over FLYA (Supplementary Table  2 ). Importantly, BARASA made few mismatching assignments (<3%) while I-PINE had up to 20% mismatches meaning that about 1 in 5 assignments made were incorrect. For these reasons, AutoAssign and I-PINE were not examined further.

figure 5

Performance of BARASA, FLYA, AutoAssign (AA), and I-PINE against reference triple resonance assignments of six protein systems: a IL-1 β ; b IL-1Ra; c IGPS; d MBP; e CY1; f ecTS; g V5dm; h hIDD. Shown are the fractions of residues that are accurately matched to the reference assignments (green), incorrectly matched (magenta) or missing (i.e., unassigned) (blue). *BARASA and, to a lesser extent, FLYA extended the reference assignments for hIDD considerably (Supplementary Table  10 ). The extended assignments are therefore denoted here as mismatching. Source data are provided as a Source Data file.

The marginal advantage of BARASA over FLYA when utilizing a comprehensive triple resonance data set prompted us to examine their behavior in the more challenging situations commonly encountered. BARASA’s performance in settings where there is a significant amount of missing data was compared against FLYA. MBP and ecTS crosspeak lists with varying retention probabilities were generated and used as input for BARASA and FLYA (Fig.  6 ). BARASA was able to generate a higher assignment match rate in all scenarios with the difference in performance between the algorithms growing as the data became increasingly sparse. In addition, the mismatch rate between the algorithms remained similar. These results demonstrate that BARASA has excellent outcomes in circumstances where there is a large quantity of missing data – greatly outperforming existing algorithms.

figure 6

The effects of random crosspeak depletion on the analysis of MBP ( a ) and ecTS ( b ) comprehensive triple resonance data sets with partial retention of the indicated crosspeak types (see text and Fig.  4 ). Shown are the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Ten independent data sets ( n  = 10) were randomly generated for each depletion condition. The results of analysis by BARASA for each data set are shown as solid orange circles and the bar heights correspond to the mean. Source data are provided as a Source Data file.

Use of predicted versus experimentally determined structural models

It is clear from Fig.  4 that use of structure-based chemical shift predications provides significant advantages over simple residue-type predictions derived from empirical distributions. This is particularly true in the case of Cy1, which is perhaps an exemplar of the challenges facing modern protein NMR and required a battery of experimental spectra and labeling schemes 41 . The sheer number of samples and experiments required resulted in a relatively high variation in resonance positions among the spectra. The resonance assignment was carried out in the absence of an experimentally determined structural model with the closest homolog having only 38% identity. Accordingly, the resonance assignment of Cy1 must be considered a significant achievement.

The absence of an experimentally-determined atomic-resolution structure of the protein of interest is a common occurrence and can severely limit the resonance assignment process. However, powerful structure-prediction algorithms have recently been introduced 42 and we sought to learn how the availability of structures predicted by the AlphaFold2 algorithm influence the performance of BARASA. Chemical shifts predicted by SHIFTX+ using the structure of Cy1 predicted by AlphaFold2 were used for analysis by BARASA. Using only residue-type information based on the BMRB resulted in poor performance. However, when utilizing the predicted chemical shifts from the predicted structure of Cy1, BARASA recapitulated its performance based on the NMR-derived structure and a comprehensive set of triple resonance experiments. In addition, BARASA performed very well using subsets of triple resonance experiment pairs and significantly outperformed FLYA (Fig.  7 ). This level of success of BARASA using SHIFTX+ in concert with structures predicted by AlphaFold2 was observed across the test data sets (Supplementary Table  15 ). Taken together these data suggest that the lack of an experimental structure is unlikely to hinder the full capability of the BARASA algorithm.

figure 7

The resonance assignment by BARASA using the indicated cross crosspeak types from the triple resonance spectra and, residue-type (BMRB) chemical shift statistics ( a ) or chemical shifts predicted by SHIFTX + based on a structural model provided by AlphaFold2 ( b ). Triple resonance data sets include the peaks from the following spectra: HNCA/HN(CO)CA (Cα), HN(CA)CB/HN(COCA)CB (Cβ) and HNCO/HN(CA)CO (CO). Bar heights indicate the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Equivalent runs with FLYA ( c ) using the data set of ( b ) reinforce the conclusion that BARASA is more robust to non-ideal data. Source data are provided as a Source Data file.

In summary, we have demonstrated that Bayesian-based simulated annealing combining sequential relationships derived from triple resonance spectra and chemical shift information predicted from a high-resolution structural model can greatly facilitate the triple-resonance backbone assignment of proteins. The implementation of this strategy in BARASA is robust to incompleteness of spin system definition (sparseness) and overall complexity of the resonance assignment challenge (protein size). Importantly, BARASA is relatively conservative and makes few errors. An optimized annealing strategy utilizing a specific heat approach to guide temperature cooling results in a very rapid analysis. The speed of analysis combined with its aforementioned robustness clearly positions BARASA to inform on the real time data acquisition side of the resonance assignment process. This becomes increasingly feasible with the utilization of automated crosspeak picking. Iterative examination by BARASA of sequentially acquired triple resonance spectra could, in principle, allow the user to determine if a satisfactory level of assignment can be achieved without further data acquisition and thereby save valuable spectrometer time. In summary, the BARASA algorithm provides the ability to easily and robustly assign unusually difficult protein systems and simplify this otherwise challenging task. The combination of fast and robust backbone resonance assignments with structure-based methyl resonance assignments 43 , 44 , 45 , 46 , 47 , 48 will reduce the resonance assignment barrier considerably and allow greater application of the power of NMR spectroscopy to be applied in a facile manner to otherwise challenging proteins.

NMR sample production

A vector encoding the gene for Interleukin 1- β (IL-1β) was transformed into E. coli BL21DE3 cells and expressed in 1 L of 95% D 2 O M9 media containing 15 NH 4 Cl and 2 H, 13 C glucose as the sole nitrogen and carbon sources, respectively. Cells were grown at 37 °C to an OD 600 of 0.9 and induced with 1 mM IPTG. Induction continued for 4 hrs at 37 °C until harvesting via centrifugation at 3500xg and frozen overnight. The cell pellet was then thawed, resuspended in 10 mM potassium phosphate pH 8.0, 0.2 mM EDTA, 5 mM DTT and 1 mM PMSF. The cells were then lysed by sonication and centrifuged at 32,000xg for 30 min at 4 °C. Lysate was then brought to 80% saturation with NH 4 SO 4 and allowed to stir for 1 hr at 4 °C. The suspension was then centrifuged for 30 min at 32,000 x  g 4 °C and the pellet was resuspended in 25 mM ammonium acetate pH 4.5, 1 mM BME and dialyzed overnight (8 kDa MWCO) in the same buffer at 4 °C. The dialyzed protein was then loaded onto a HiTrap Capto S column (Cytiva Life Sciences) equilibrated in 25 mM ammonium acetate pH 4.5, 1 mM BME and eluted with a linear gradient up to 500 mM ammonium acetate pH 4.5, 1 mM BME. Protein was then frozen and lyophilized. The lyophilized protein was dissolved in 20 mM Tris pH 8.0, 7 M urea, 20 mM DTT and added drop wise to 20x volume of 20 mM tris, 100 mM NaCl, 5 mM DTT pH 8.0 under constant stirring. The refolded protein was then dialyzed against 50 mM sodium acetate pH 5.0, 5 mM DTT and concentrated to 0.67 mM. To this sample 0.02% NaN 3 , 100 μM DSS and 5% D 2 O was added. Triple resonance spectra were acquired at 23 °C on an 800 MHz ( 1 H) Bruker NEO spectrometer running TopSpin and equipped with a CryoProbe.

A vector encoding the gene human interleukin-1 receptor antagonist (IL-1Ra) containing C66A/C122A amino acid substitutions was expressed using E. coli BL21(DE3) cells in M9 minimal media. The M9 minimal media contained 15 NH 4 Cl and 13 C-glucose as the sole nitrogen and carbon sources respectively. The culture was centrifuged at 5000 rpm, and the cell pellet was resuspended in 20 mM Tris, 500 mM NaCl, 20 mM imidazole, pH 7.9 for sonication. Sonicated cells were centrifuged at 15000 rpm, and supernatant was loaded onto a His60 column (Takara Bio USA). The column was washed with 20 mM Tris, 500 mM NaCl, 40 mM imidazole, pH 7.9; and protein was eluted with 20 mM Tris, 500 mM NaCl, 500 mM imidazole, pH 7.9. The collected protein fraction was buffer exchanged to 12.5 mM HEPES, 50 mM NaCl, 5 mM CaCl2, pH 6.5 for His-tag removal by FXa protease (New England Biolabs) and further purified both by affinity (His60 resin) and size exclusion chromatography (S-75 Sephadex, Cytiva Life Sciences). The NMR sample was prepared by buffer exchanging the protein into 100 mM NaCl, 25 mM MES, pH 6.0 and concentrated to 1 mM, with the addition of 100 μM DSS, 5% D2O, and 0.02% NaN3 (Supplementary Table  1 ). Triple resonance assignment experiments were acquired at 35 °C on either a 500 MHz Bruker Avance spectrometer or an 800 MHZ ( 1 H) Bruker NEO spectrometer both equipped with a Cryoprobe.

A R43S variant of the gene for indole-3-glycerol phosphate synthase from S. solfataricus (IGPS) was cloned in a modified pGS-21a vector downstream of an N-terminal His-tag and TEV protease site. This expression plasmid was a gift from the lab of Professor Robert Matthews, University of Massachusetts Medical School, Worcester. IGPS R43S protein was expressed in BL21(DE3) competent cells with ampicillin antibiotic selection. Cells were grown at 37 °C until they reached an OD600nm of 0.6 and 1 mM IPTG was added to induce expression for 16-20 h at 25 °C. To isotopically label the protein for NMR spectroscopy, cells were grown in M9 minimal medium with 15 NH 4 Cl and 13 C-glucose as the nitrogen and carbon sources, respectively. Cells were lysed in 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 5 mM imidazole by sonication. The lysate was loaded onto a Ni 2+ -NTA column pre-equilibrated with the lysis buffer. Impurities weakly bound to the column were washed away with 100 mM potassium phosphate, pH 7.5, 150 mM KCl, 75 mM imidazole, followed by equilibration into the low salt buffer 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 75 mM imidazole. Protein was eluted with 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 500 mM imidazole and dialyzed into lysis buffer. Purified His-tagged protein was concentrated to 5 mL, and tag was cleaved with TEV protease added at 1:30 mass ratio and mixing at RT overnight. Untagged protein was separated TEV protease and uncleaved protein by Ni 2+ -affinity chromatography. Protein aliquots were flash frozen and stored at −80 °C. NMR samples of 15 N 13 C-labeled IGPS were prepared at 250 µM concentration in 60 mM potassium phosphate, pH 7.2, 50 mM KCl, 5% D 2 O, 100 μM DSS. All data were collected on a 750 MHz ( 1 H) Bruker AVANCE III NMR spectrometer equipped with a CryoProbe at 50 °C.

A vector encoding maltose binding protein (MBP) was transformed into BL21DE3 cells and expressed in 1 L of 95% D 2 O M9 media containing 15 NH 4 Cl and 2 H, 13 C glucose as the sole nitrogen and carbon sources respectively. Cells were grown at 37 °C to an OD 600 of 0.9 and induced with 1 mM IPTG. Induction continued for 4 hrs at 37 °C until harvesting via centrifugation at 3500 × g. The cell pellet was frozen overnight and resuspended in 20 mM Tris-HCl, 20 mM NaCl pH 8.0, 1 mM DTT. 6 mg of Lysozyme was added and was incubated under stirring for 30 min at room temperature. Cells were further lysed by sonication and centrifuged at 32000 × g for 30 min at 4 °C. Clarified lysate was filtered (0.45 um pore size) and loaded onto a 25 ml DEAE column equilibrated in 20 mM Tris, 20 mM NaCl, pH 8.0, 1 mM DTT. The protein was eluted using a gradient to 20 mM Tris, 500 mM NaCl. Protein was concentrated to 1-2 ml and run on a 112 ml Superdex 75 column equilibrated in 20 mM Tris, 20 mM NaCl, 2 mM DTT pH 8.0. The protein was pooled and unfolded by dialysis in 4 M GuCHl, 20 mM Tris-HCl, 1 mM DTT pH 7.5. Protein was refolded by repeated 10x dilution with 20 mM sodium phosphate pH 7.1, 1 mM EDTA, 2 mM β-cyclodextrin, 0.02% NaN3, 100 μM DSS 5%D 2 O followed by concentration (4 times). From this a 0.5 mM sample of MBP was created. Spectra were acquired at 37 °C on an 800 MHz ( 1 H) Bruker NEO NMR spectrometer. NMR data acquisition and processing parameters recorded by us for IL-1β, IL-1Ra, IGPS and MBP are summarized in Supplementary Table  1 . Poisson gap NUS spectra were reconstructed using hmsIST 39 and all spectra were processed with NMRpipe 49 on NMRBox 50 . Spin systems were built by manual peaking picking using NMRFAM-SPARKY 51 and referenced using DSS.

Origin of protein test data sets

Triple resonance data acquired in our laboratory were processed using the NMRPipe 49 installed on NMRbox 50 . The crosspeak lists were constructed from data acquired in our laboratory (see Table  1 ) by manually crosspeak picking using NMRFAM-SPARKY 51 (i.e., not reconstructed from deposited assignments) (see Table  1 ). Crosspeak lists for ecTS, Cy1 and V5dm were provided by Professors Andrew Lee (University of North Carolina, Chapel Hill), Dominique Frueh (Johns Hopkins University), and Tatyana Igumenova (Texas A&M University), respectively, and were used without further adjustment. Crosspeaks for hIDD were generated from spin systems provided by Professor Martin Blackledge (Institut de Biologie Structurale) in the following manner. Each provided spin system consisted of an amide proton (H) and amide nitrogen (N) chemical shift as well as chemical shifts for Cα, Cα(i-1), CO and CO(i-1) resonances (though a complete set of carbon resonances were not present for each spin system). HNCA, HN(CO)CA, HNCO, and HN(CA)CO crosspeak lists were generated from the spin system data by adding the following crosspeaks to the indicated crosspeak list from each spin system: H-N-Cα(i-1), H-N-Cα for the HNCA; H-N-CO and H-N-CO(i-1) for the HN(CA)CO; H-N-CO for the HNCO and H-N-Cα(i-1) for the HN(CO)CA. The resonance values for the crosspeak positions were drawn from a normal distribution with a mean given by the value of the resonance in the spin system and a standard deviation of 0.003, 0.04, and 0.04 ppm for hydrogen, nitrogen and carbon resonances, respectively.

BARASA algorithm description

The algorithm begins by reading in the crosspeak lists to assemble spin systems. Within the crosspeak lists, the user provides the possible crosspeak types that are produced by the experiment. For example, the HNCA would produce possible crosspeak types of H-N-CA(i) and H-N-CA(i-1). The user also specifies cutoff values for each spectral dimension that dictate the range over which chemical shifts will be matched during spin system construction. The provided crosspeak types dictate which dimensions have resonances of ambiguous type. In the example of the HNCA, the first two dimensions are of unambiguous type (H and N resonances respectively). However, the third dimension is ambiguous (CA(i) or CA(i-1)).

BARASA builds crosspeak lists by first arbitrarily choosing a crosspeak to seed the construction of the spin system. All other crosspeaks are searched to find those that have at least two resonances of unambiguous type that match the resonances of the seed crosspeak, both in terms of their chemical shift (i.e., fall within a tolerance cutoff specified by the user) and resonance type. After each subsequent addition, BARASA attempts to resolve ambiguous resonance types based on known chemical shifts already present in the spin system. For example, if a spin system has a Cα(i-1) value of 56.0 ppm (with a tolerance of 0.3 ppm) and a HNCA crosspeak, which is added (which could have a resonance type of Cα or Cα(i-1)) with a value of 58.0 ppm, then the algorithm will resolve the type of the new crosspeak as the Cα as it is not within the 0.3 ppm tolerance of the 56.0 ppm Cα(i-1). After adding the crosspeak and resolving type, the algorithm then iterates through the entire list of remaining crosspeaks and repeats the above addition procedure. Once no more peaks can be added to the spin system, a new crosspeak is arbitrarily chosen from the list of remaining peaks to seed the construction of additional spin systems. This continues until all peaks have been added to a spin system.

If BARASA finds a crosspeak that has two unambiguous resonances that match those already present in a spin system, but contains additional resonances that have shifts which conflict with those that are already present in the spin system, then an additional spin system in which to place the incongruent crosspeak is created. Such as situation arises due to spectral degeneracy (e.g., two spin systems with the same or similar amide shifts). The algorithm will then attempt to add the remaining peaks to both spin systems. Any further clashes are resolved by the generation of a new spin system. This continues until no more crosspeaks can be added to any spin system. The crosspeaks within this group of spin systems are then marked by the algorithm to be allowed to exchange to any other spin system within the group during the annealing process. In addition, the user has the option to allow the algorithm to use a crosspeak cache to which low intensity peaks (lowest 5%) can be added to over the course of the annealing run to provide a mechanism to eliminate potential artifactual crosspeaks.

Once all the crosspeaks to a spin system have been added, all possible resonance type sets are generated for that spin system. A resonance type set is a complete designation of each atom type of each crosspeak in a spin system. If a spin system only contains peaks with no ambiguous resonance types, then the spin system has only one possible resonance type set. This is the case for the majority of data sets as experiments with ambiguous resonance types are often paired with experiments that resolve this ambiguity (e.g., HNCA, HN(CO)CA experimental pair). However, if ambiguous resonance types are present in a spin system, then the spin system will contain all possible resonance type sets. A distinct set of average resonance values for the spin system are calculated for each resonance type set; all of which will be considered over the course of the annealing run.

The resonance assignment analysis is then initialized by randomly assigning the spin systems to the protein sequence. Often there are more spin systems than are residue positions (e.g., spin systems correlated to a side-chain amide group and not the backbone are also present in the data set). Any spin systems that were not randomly placed on the sequence, are placed in a spin system cache and may be assigned to the sequence over the course of the run. The simulation temperature is initialized at 1000 arbitrary units and a spin system or crosspeak is chosen at random to swap. The probability that a swap will be a crosspeak swap is set at 0.01 (which was found to be a good compromise between sampling and algorithm speed) with the remaining swaps being spin system swaps. A chosen spin system will have the ability to be added to the spin system cache, swap positions with another spin system, or move to an empty position in the sequence, making its former position available. Whenever a spin system is moved, a random resonance type set is chosen from among those possible. In addition, the algorithm may attempt to change the current resonance type set and keep the current spin system in place. If a crosspeak is chosen to swap, it has the potential to be added to the crosspeak cache (if it is of low intensity), added to another spin system within its spin system group, or swap places with any crosspeak within its spin system group. Upon moving/swapping cross peaks, the affected spin systems are evaluated for clashes. If there are none, the crosspeak swap is allowed to continue, otherwise the swap is rejected. In addition, a crosspeak move/swap will trigger the affected spin systems to generate all new resonance type sets and choose one at random from the possibilities. This forces a recalculation of average chemical shifts for each resonance type set of each spin system resonance, as well as the Bayesian probabilities described below for sequence position determination.

If the swap is not immediately rejected due to a crosspeak clash, the change in energy of the system due to the swap is calculated using the energy function described below. The swap is then accepted or rejected at a frequency corresponding to a probability generated by applying the Metropolis criterion (Eq.  1 ). After each successful swap, the energy of the state is recorded and stored as a part of a sample of energy values. Once the sample reaches a particular size, the sample mean and standard error are calculated and an additional sample is generated by continued swapping. A Student’s two tailed t-test is performed to compare the sample means of the two samples. The system is considered to have equilibrated at the current temperature if the p -value of the t-test is greater than a user supplied value (default p  > 0.5). If equilibration has not been reached, more swaps are performed to generate an additional sample and the t-test is repeated with the two most recent samples. If equilibration has been reached, then the energy values are used to estimate the specific heat at the current temperature:

Where \(T\) is the current temperature in arbitrary units, \(E\) is the energy of the system and the angled brackets indicate the sample mean. Large drops in average ensemble energy due to oversized temperature steps can lead to the system becoming trapped in a local minimum. By deciding on a target energy drop that is unlikely to lead to a frustrated state, we can utilize the specific heat calculated at each temperature to estimate the temperature drop needed to achieve the target change in energy. This is done in the following manner:

Where \(\triangle {\left\langle E\right\rangle }_{{target}}\) is a user-controlled parameter and is kept at −2000 for this study. Decreasing the magnitude of the target energy drop, in situations where the system is becoming trapped in a frustrated state can lead to better results at the expense of longer simulation time. If \({T-T}_{{next}}\) is greater than 10, then the temperature decrease is limited to 10 units to prevent overcooling the system. The use of the specific heat in this manner results in smaller temperature steps at temperatures where the system is rapidly decreasing in energy, while allowing for larger steps when drops in temperature have a modest effect on the ensemble. The resulting schedule avoids system quenching while simultaneously minimizing unproductive swaps at temperatures that are either too high or too low for effective annealing. After decreasing the temperature, the annealing run will terminate if any of the following criteria are met: the temperature is less than 1, the product of the temperature and the last specific heat calculated is less than 200, or the ratio of unsuccessful swaps to successful swaps while collecting the last sample is greater than 10,000. The rational for the criteria are as follows: Given the standard energy parameterization, productive annealing is unlikely to happen at temperatures below 1; the product of specific heat and current temperature (at low temperatures) provides a crude estimate as to the amount of energy between the current ensemble and global minimum (i.e. the thermodynamic ensemble at T = 0, which should correspond to a single state) and approximately 200 energy units is negligible; and at this ratio of unsuccessful to successful swaps, the system is near a minimum and further sampling is inefficient. If termination is not achieved, a new sample size is defined using the following equation:

Where N is the number of residues in the sequence. This equation permits increases in sample size when sampling at temperatures with high specific heats, which is where the most productive swaps occur. This approach also permits scaling of sampling for larger proteins. The parameters of this equation were found empirically to be a good compromise between sufficient sampling and speed. Samples are then drawn at the new temperature to determine equilibration and the cycle is continued. Upon termination of the annealing protocol, a steepest-descent type search is performed to locally minimize the system energy and refine the assignments, discarding potentially bad assignments that were left over from the run. This is done by attempting to place (or swap) every spin system/peak at every possible location in the sequence/spin system group (including the cache, if allowed). Only spin system/peak swaps/placements that decrease the system energy are accepted. This is repeated 100 times.

This entire process of simulated annealing is independently repeated with a number of different random starting conditions. Here we have used 20. A consensus set of assignments is generated by calculating the frequency with which each spin system is placed at each amino acid location. The spin system assigned to each residue location in a majority of the runs (if any) is kept as the consensus spin system. A curated set of assignments is generated from this consensus analysis. The curation procedure is as follows: the consensus spin system at each residue was chosen as the tentative assignment for that particular residue. Residues without a consensus spin system (i.e. did not have the same spin system assigned to it greater than 50% of the time) were marked as unassigned. Tentatively assigned spin systems are then evaluated by the posterior probabilities as well as the number of connectivities defined as a matching resonance between adjacent spin systems. Assignments were accepted if they met any of the following criteria: 1) the assigned spin system has at least two connectivities with adjacent spin systems, 2) the assigned spin system has at least 1 connectivity with adjacent spin systems and a posterior probability at least three times higher than the quantity 1/ N , or 3) the assigned spin system has a posterior probability > 50%. Residues with tentative assignments that did not satisfy any of these criteria were then marked as unassigned.

The energy function used in the annealing routine is calculated as the sum of all the energies of the constituent spin systems ( E tot ) (Eq.  5 ). At any given step during the annealing protocol, spin systems are either tentatively assigned to a position in the sequence or placed in the cache. Cached spin systems are defined as having zero energy (i.e., \({E}_{m}=0\) ).

The energy of each spin system tentatively assigned to a specific place in the amino acid sequence is comprised of the adjacency energy ( \({E}_{m}^{{adj}}\) ) and the chemical shift energy ( \({E}_{m}^{{cs}}\) ):

The adjacency energy is related to the degree of correspondence between the averages of the Cα, Cβ and CO resonances of the current spin system and the averages of the Cα (i-1), Cβ (i-1) and CO (i-1) resonances of the spin system tentatively assigned to the subsequent position in the sequence. Each average resonance value in a spin system is calculated as the arithmetic mean of all resonance chemical shifts of the indicated type from all of the crosspeaks that contain that resonance currently in the spin system. \({E}_{m}^{{adj}}\) therefore, captures the process of evaluating spin system adjacency and is based on the number of potential connectivities between adjacent spin systems tentatively assigned to the sequence. For example, if spin system m is assigned to a residue position immediately prior to that of spin system l , then the adjacency energy is given by:

Where \({\delta }_{k\left(i\right)}^{m}\) is the chemical shift of resonance k ( i ) (either Cα( i ), Cβ( i ) or CO( i )) of spin system m and \({\delta }_{k\left(i-1\right)}^{l}\) is the chemical shift of resonance k ( i −1) (either Cα ( i −1), Cβ ( i −1) or CO( i −1)) of spin system l ). \({\sigma }_{k}\) is related to the estimated precision of the measured chemical shifts. The E adj is the sum of inverted Gaussians when c 0  < 0. Previous assignment algorithms have used functions of this form to good effect for estimating adjacency 26 . In the limit of well-matched connectivities, the sum of inverted Gaussian functions will have a minimum value of K(c 0  +  c 1 ) where K is the number of connectivities whereas, for poorly matched putative connectivities, the adjacent energy will tend to a limit of Kc 1 . Importantly, when an expected element of spin system m or l is missing, that contribution to the adjacency energy is set to zero. Similarly, if the subsequent position in the sequence is not currently assigned a spin system, then E adj  = 0. Here, c 0 and c 1 were set to −100 and +50, respectively. This results in an energy of −50 if the difference in chemical shifts is 0 and approaches +50 as the magnitude of the difference in chemical shifts approaches infinity. The value \({\sigma }_{k}\) is influenced by the properties of the NMR spectra from which the spin systems are built. For all runs described, \({\sigma }_{k}\) was chosen so that the function has an abscissa-intercept at a chemical shift difference of 0.2 ppm for all nuclei k .

The second term of the spin system energy, \({E}_{m}^{cs}\) , evaluates the degree of correspondence of the observed chemical shifts to those predicted. It is this term that makes use of the ability of Bayesian statistics to incorporate diverse degrees of knowledge of the local structure of the protein. These include relatively structureless information encoded in the simple empirical distributions of chemical shifts of the amino acids observed in proteins or specific chemical shift predictions based on the high-resolution structure of the protein being examined. For the former, we utilize the BMRB 52 database. For the latter, we use SHIFTX + predictions derived from either crystallographic structures available in the PDB 53 or structures predicted by AlphaFold2 42 . Or in the case of the IDPs V5dm and hIDD, we use calculated, sequence-specific random coil chemical shifts 36 , 37 as prediction. \({E}_{m}^{{cs}}\) is ultimately calculated from the Bayesian posterior probability of a proposed assignment given the observed chemical shifts:

The subscripts n and m index over all residue positions and the provided spin systems, respectively. The condition A n,m refers to where spin system m is correctly assigned to sequence position n . The condition B m refers to the observed chemical shifts of spin system m . Condition \({Q}_{{m}_{i}}\) refers to where resonance type set i of spin system m is the correct resonance type set. Because it is possible for the spin system to have ambiguous resonance crosspeak types, the probability calculation explicitly considers each resonance type set of a spin system within the context of each residue location. Thus, an assignment entails both the placement of a spin system at a residue location and choice of resonance type set.

The prior probability \(P\left({A}_{n,m}\cap {Q}_{{m}_{i}}\right)\) refers to the initial probability of the assignment of spin system m to residue n being correct and that the resonance type set i is correct for spin system m . If I m represents the number of possible resonance type sets of spin system m then the total number of combinations of residue type sets and residue locations for spin system m is the product I m N . However, given the constraints provided by the amino acid sequence of the protein, not all combinations of sequence location and residue type sets are possible. For example, a resonance type set with a defined amide proton would be impossible to place at a proline. To encode the impossibility of certain resonance type set/residue location combinations, these assignments are assigned a prior probability of 0. The remaining prior probability is then evenly distributed among the remaining locations:

Where C is the number of possible combinations of n and \({m}_{i}\) in the sequence.

The likelihood of assignment \(P\left({B}_{m}|{A}_{n,m}\cap {Q}_{{m}_{i}}\right)\) (i.e., the probability of observing the chemical shifts of spin system m given the assignment \({A}_{n,m}\cap {Q}_{{m}_{i}}\) ) is given by Eq.  10 & 11 :

Where \({\delta }_{{pred},r}^{n}\) is the predicted chemical shift of spin r at sequence position n ; \({\delta }_{{obs},r}^{{m}_{i}}\) is the observed chemical shift of resonance r of resonance type set i of spin system m and \({\sigma }_{r}^{n}\,\) is the standard error for the chemical shift prediction of resonance r at sequence position n . The resonances, represented by variable r , are the following: H, N, C α , C β , CO, C(i-1), C β (i-1), CO(i-1). In Eqs.  10 and 11 it is assumed that the random variable \({\delta }_{{pred},r}^{n}\,\) is normally distributed about \({\delta }_{{obs},r}^{m}\) with a standard deviation \({\sigma }_{r}^{n}\) and that the error in the chemical shift measurement is much less than the error in the prediction. With these assumptions, the random variable \({X}_{n,m}^{2}\) is a chi-square distribution with R degrees of freedom, where R is equal to the number of spins for which data are provided. The likelihood is then calculated as the value of the complementary cumulative distribution function (CCDF) of a chi square variable of R degrees of freedom at \({X}_{n,{m}_{i}}^{2}\) .

The likelihoods of all other residue position/resonance type sets being a valid assignment of spin system m are considered via the calculation of the marginalization, \({{{{{\rm{P}}}}}}\left({B}_{m}\right)\) :

Where the summation terms are over all possible i and n combinations. Using Bayes’ theorem as expressed above, the posterior probability (i.e., the probability of a particular assignment being correct given the observed data) can be calculated and then \({E}_{m}^{cs}\) determined via:

To avoid numerical instability in the evaluation of logarithms of numbers near zero and to prevent a dominating influence of inaccurate chemical shift predictions on the energy function, instances where \({E}_{m}^{{cs}}\)  >  \({E}_{{{{{\rm{max}}}}}}^{{cs}}\) are fixed at  \({E}_{\max }^{{cs}}\) . \({E}_{\max }^{{cs}}\) and \({E}_{\min }^{{cs}}\) are set to 100 and −50 respectively, for this study.

The values of the parameters for the energy function were chosen to safeguard against inaccurate chemical shift predictions based on the following reasoning: a perfectly matching connectivity between two resonances will contribute −50 to the final energy function. Given that a spin system with a posterior probability of 0 will contribute 100 to the final energy function, it would require two perfect connectivities or three or more reasonable connectivities for that spin system to be favorably assigned to that position vs being left in the cache. This was done to permit the algorithm to assign a spin system to a particular location in the event of highly inaccurate chemical shift prediction of its resonances so long as there are sufficient resonance connectivities to justify the assignment. Likewise, the \({E}_{{{{{\rm{min}}}}}}^{{cs}}\) parameter was chosen such that a posterior probability of 1.0 would result in an energy contribution of −50 and would be equal to the contribution of a single perfect connectivity. This would require two bad connectivities to overrule a high posterior probability and disfavor its assignment. The user has control over these \({E}_{{{{{\rm{max }}}}}}^{{cs}}\) and \({E}_{{{{{\rm{min}}}}}}^{{cs}}\) to adjust the relative influence of chemical shift energy on the course of the annealing run.

The source of predicted shifts for each resonance can be from any source, so long as the precision of the prediction algorithm is accurately estimated. For IDPs, sequence-specific random coil chemical shifts can be substituted (see below). In the absence of an acceptable structural model, the average and standard deviation of the BMRB distribution of chemical shifts for a given atom of a given residue type are used as the predicted shift and prediction error respectively. This is also used in regions where the sequence of interest contains a tag that is absent in the structural model used to predict chemical shifts as well as regions that are not resolved.

Generation of predicted chemical shifts

Predicted H, N, C α , C β , and CO chemical shifts were generated via SHIFTX + using PDB entries and/or AlphaFold2 predicted structures (Table  1 ). Chemical shift prediction errors for H, N, C α , Cβ, and CO were taken from the reported root mean squared deviations (RMSD) of SHIFTX + predictions: 0.45, 2.4, 0.8, 0.95, and 0.9 ppm, respectively. Sequence regions present in the NMR sample but not resolved or present in the provided structure (e.g., loops or expression tags) were given predicted values from their corresponding average values in the BMRB. For the runs that were performed with SPARTA+ 38 predicted shifts, the reported errors for each individual prediction were used. For the IDPs V5dm and hIDD, predicted shifts were provided using predicted sequence-specific random coil chemical shifts according to the method in 36 , 37 Prediction errors were taken from the reported RMSD of the prediction method and were 0.16, 1.0, 0.42, 0.37, and 0.43 ppm for H, N, C α , Cβ, and CO resonances, respectively. Prediction errors associated with BMRB-derived values were taken as the standard deviation of the corresponding resonance distribution for the particular amino acid type in the BMRB.

Comparison of resonance assignment algorithms

BARASA was compared to three triple resonance assignment algorithms that are highly utilized by the NMR community. All algorithms were provided the same crosspeak lists as BARASA, albeit in different file formats. As FLYA can utilize predicted chemical shift data, the algorithm was provided with the same predicted shifts and associated errors as BARASA. Assignment results were taken from the strong assignments generated from 20 runs. The assignment algorithm I-PINE was run using the I-PINE server. AutoAssign was run on NMRbox using the default parameters. For each algorithm, proposed assignments were compared to reference assignments. At each residue position, the proposed assignment was determined to have either matched, mismatched, or been missing when compared to the reference assignments (see Results and Discussion). The same reference assignments were used for the evaluation of all algorithms.

Generation of simulated data sets

To assess the performance of BARASA on datasets of lower quality, the MBP crosspeak lists were processed to randomly retain spin systems and/or individual crosspeaks at specific probabilities depending on cross peak type. For each data quality condition 10 different independent data sets were randomly generated and BARASA was run on each of them. The results from each of these executions of BARASA were generated from the curation of 20 independent annealing runs. The performance of BARASA on data with artifactual peaks was evaluated using a depleted data set and adding randomly generated cross peaks such that 20% of all C α , Cβ, and CO cross-peaks were artifacts. Each artifact peak was generated in the following manner. A random residue from the protein sequence, containing an amide group and desired peak type (C α , Cβ, CO, C(i-1), Cβ(i-1), or CO(i-1)) was chosen. Chemical shifts for each dimension of the cross peak were randomly generated from a Gaussian distribution with a mean and standard deviation equal to the mean and standard deviation value of that atom of that residue type in the BMRB. All artifact peaks were given the maximum peak intensity of their peak lists to ensure they would not be cached during the run.

BARASA was implemented in C++ and can be built on all major computing platforms (MacOS, Linux, and Windows). BARASA possesses a command line interface, as well as a GUI implemented using the wxWidgets library and utilizes the Boost libraries. For this study, the simulations were run on 2019 6-core MacBook Pro (Intel processor) with up to 12 annealing runs running in parallel.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

Resonance assignments for IGPS and IL-1Ra have been deposited to the BMRB under accession numbers 51347 , and 51352 , respectively. Cross peak lists and protein sequences for IL-1β, IL-1Ra, IGPS, MBP, Cy1, ecTS, v5domain, and huIDR, which form the foundation of the analysis here, are included in the Source Data file. BMRB statistics used to test BARASA are also included in the Source Data file. Assignments referenced in this study from the BMRB can be accessed via the following accession codes: 434 , 4354 , 19082 , 18927 , and 28135 . The experimental structures referenced in this study from the PDB can be accessed via the following accession codes: 9ILB , 2IRT , 1IGS , 1DMB , 7RY6 , and 1AOB . Supplementary Information is available and consists of fifteen tables and one figure listing resonance assignments made by BARASA, summary statistics of BARASA’s performance using SPARTA+ predicated chemical shifts, AlphaFold2 structural models or in the presence of artifact peaks.  Source data are provided with this paper.

Code availability

BARASA will be made generally available for non-commercial use through, preferably, NMRbox 50 [ https://nmrbox.nmrhub.org/ ] or, less preferred, by contacting [email protected] for Linux or OSX compatible executables.

Ikeya, T. et al. Solution NMR views of dynamical ordering of biomacromolecules. Biochem. Biophys. Acta 1862 , 287–306 (2018).

Article   CAS   Google Scholar  

Shimada, I., Ueda, T., Kofuku, Y., Eddy, M. T. & Wuthrich, K. GPCR drug discovery: integrating solution NMR data with crystal and cryo-EM structures. Nat. Rev. Drug Disc. 18 , 59–82 (2019).

Alderson, T. R. & Kay, L. E. NMR spectroscopy captures the essential role of dynamics in regulating biomolecular function. Cell 184 , 577–595 (2021).

Article   CAS   PubMed   Google Scholar  

Camacho-Zarco, A. R. et al. NMR provides unique insight into the functional dynamics and interactions of intrinsically disordered proteins. Chem. Rev. 122 , 9331–9356 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wüthrich, K. Sequential individual resonance assignments in the 1H-NMR spectra of polypeptides and proteins. Biopolymers 22 , 131–138 (1983).

Article   PubMed   Google Scholar  

Wüthrich, K., Wider, G., Wagner, G. & Braun, W. Sequential resonance assignments as a basis for determination of spatial protein structures by high resolution proton nuclear magnetic resonance. J. Mol. Biol. 155 , 311–319 (1982).

Billeter, M., Braun, W. & Wüthrich, K. Sequential resonance assignments in protein 1H nuclear magnetic resonance spectra. Computation of sterically allowed proton-proton distances and statistical analysis of proton-proton distances in single crystal protein conformations. J. Mol. Biol. 155 , 321–346 (1982).

Englander, S. W. & Wand, A. J. Main chain directed strategy for the assignment of 1H NMR spectra of proteins. Biochemistry 26 , 5953–5958 (1985).

Article   Google Scholar  

Di Stefano, D. L. & Wand, A. J. Two-dimensional 1H NMR study of human ubiquitin: a main chain directed assignment and structure analysis. Biochemistry 26 , 7272–7281 (1987).

Wand, A. J. & Nelson, S. J. Refinement of the main chain directed assignment strategy for the analysis of 1H NMR spectra of proteins. Biophys. J. 59 , 1101–1112 (1991).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Nelson, S. J., Schneider, D. M. & Wand, A. J. Implementation of the main chain directed assignment strategy. Computer assisted approach. Biophys. J. 59 , 1113–1122 (1991).

Ikura, M., Kay, L. E. & Bax, A. A novel approach for sequential assignment of 1H, 13C, and 15N spectra of larger proteins: Heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin. Biochemistry 29 , 4659–4667 (1990).

Montelione, G. T. & Wagner, G. Conformation-independent sequential NMR connections in polypeptides by H1-C13-N15 triple-resonance experiments. J. Magn. Reson. 87 , 183–188 (1990).

ADS   CAS   Google Scholar  

Driscoll, P. C., Marius Clore, G., Marion, D., Wingfield, P. T. & Gronenborn, A. M. Complete resonance assignment for the polypeptide backbone of interleukin 1ß using three-dimensional heteronuclear NMR spectroscopy. Biochemistry 29 , 3542–3556 (1990).

Sattler, M., Schleucher, J. & Griesinger, C. Heteronuclear multidimensional NMR experiments for the structure determination of proteins in solution employing pulsed field gradients. Prog. NMR Spectr. 34 , 93–158 (1999).

Frueh, D. P. Practical aspects of NMR signal assignment in larger and challenging proteins. Prog. NMR Spectr. 78 , 47–75 (2014).

Gardner, K. H. & Kay, L. E. The use of 2H, 13C, 15N multidimensional NMR to study the structure and dynamics of proteins. Annu. Rev. Biophys. Biomol. Struct. 27 , 357–406 (1998).

Palmer, A. G. Chemical exchange in biomacromolecules: past, present, and future. J. Magn. Reson. 241 , 3–17 (2014).

Tjandra, N. & Bax, A. Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278 , 1111–1114 (1997).

Article   ADS   CAS   PubMed   Google Scholar  

Salmon, L. & Blackledge, M. Investigating protein conformational energy landscapes and atomic resolution dynamics from NMR dipolar couplings: A review. Rep. Prog. Phys. 78 , 126601–126630 (2015).

Article   ADS   PubMed   Google Scholar  

Clore, G. M. & Gronenborn, A. M. Applications of three- and four-dimensional heteronuclear NMR spectroscopy to protein structure determination. Prog. NMR Spectr. 23 , 43–92 (1991).

Zimmerman, D. E. et al. Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269 , 592–610 (1997).

Moseley, H. N. B., Monleon, D. & Montelione, G. T. Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. Methods Enzymol. 339 , 91–108 (2001).

Baran, M. C., Huang, Y. J., Moseley, H. N. B. & Montelione, G. T. Automated analysis of protein NMR assignments and structures. Chem. Rev. 104 , 3541–3555 (2004).

Pervushin, K., Riek, R., Wider, G. & Wüthrich, K. Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Nat. Acad. Sci. USA 94 , 12366–12371 (1997).

Hitchens, T. K., Lukin, J. A., Zhan, Y., McCallum, S. A. & Rule, G. S. MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins. J. Biomol. Nmr. 25 , 1–9 (2003).

Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220 , 671–680 (1983).

Article   ADS   MathSciNet   CAS   PubMed   MATH   Google Scholar  

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21 , 1087–1092 (1953).

Article   ADS   CAS   MATH   Google Scholar  

Clubb, R. T., Thanabal, V. & Wagner, G. A constant-time three-dimensional triple-resonance pulse scheme to correlate intraresidue 1HN, 15N, and 13C′ chemical shifts in 15N13C-labelled proteins. J. Man. Reson. 97 , 213–217 (1992).

Article   ADS   CAS   Google Scholar  

Grzesiek, S. & Bax, A. Improved 3D triple-resonance NMR techniques applied to a 31 kDa protein. J. Magn. Reson. 96 , 432–440 (1992).

Bax, A. & Ikura, M. An efficient 3D NMR technique for correlating the proton and 15N backbone amide resonances with the α-carbon of the preceding residue in uniformly15N/13C enriched proteins. J. Biomol. Nmr. 1 , 99–104 (1991).

Wittekind, M. & Mueller, L. HNCACB, a high-sensitivity 3D NMR experiment to correlate amide-proton and nitrogen resonances with the alpha- and beta-carbon resonances in proteins. J. Magn. Reson. Ser. B 101 , 201–205 (1993).

Grzesiek, S. & Bax, A. Correlating backbone amide and side chain resonances in larger proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114 , 6291–6293 (1992).

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Han, B., Liu, Y., Ginzinger, S. W. & Wishart, D. S. SHIFTX2: significantly improved protein chemical shift prediction. J. Biomol. Nmr. 50 , 43–57 (2011).

Kjaergaard, M. & Poulsen, F. M. Sequence correction of random coil chemical shifts: Correlation between neighbor correction factors and changes in the Ramachandran distribution. J. Biomol. Nmr. 50 , 157–165 (2011).

Kjaergaard, M., Brander, S. & Poulsen, F. M. Random coil chemical shift for intrinsically disordered proteins: Effects of temperature and pH. J. Biomol. Nmr. 49 , 139–149 (2011).

Shen, Y. & Bax, A. SPARTA plus: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J. Biomol. Nmr. 48 , 13–22 (2010).

Hyberts, S. G., Milbradt, A. G., Wagner, A. B., Arthanari, H. & Wagner, G. Application of iterative soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson Gap scheduling. J. Biomol. NMR 52 , 315–327 (2012).

Lee, W. et al. I-PINE web server: an integrative probabilistic NMR assignment system for proteins. J. Biomol. Nmr. 73 , 213–222 (2019).

Mishra, S. H. et al. Global protein dynamics as communication sensors in peptide synthetase domains. Sci. Adv. 8 , eabn6549 (2022).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Nerli, S., De Paula, V. S., McShan, A. C. & Sgourakis, N. G. Backbone-independent NMR resonance assignments of methyl probes in large proteins. Nat. Commun. 12 , 691–691 (2021).

Xu, Y. & Matthews, S. MAP-XSII: an improved program for the automatic assignment of methyl resonances in large proteins. J. Biomol. NMR 55 , 179–187 (2013).

Chao, F. A. et al. FLAMEnGO 2.0: an enhanced fuzzy logic algorithm for structure-based assignment of methyl group resonances. J. Magn. Reson 245 , 17–23 (2014).

Monneau, Y. R. et al. Automatic methyl assignment in large proteins by the MAGIC algorithm. J. Biomol. NMR 69 , 215–227 (2017).

Pritisanac, I., Wurz, J. M., Alderson, T. R. & Guntert, P. Automatic structure-based NMR methyl resonance assignment in large proteins. Nat. Commun. 10 , 4922 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Pritisanac, I. et al. Automatic assignment of methyl-NMR spectra of supramolecular machines using graph theory. J. Am. Chem. Soc. 139 , 9523–9533 (2017).

Delaglio, F. et al. NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J. Biomol. Nmr. 6 , 277–293 (1995).

Maciejewski, M. W. et al. NMRbox: A resource for biomolecular NMR computation. Biophys. J. 112 , 1529–1534 (2017).

Lee, W., Tonelli, M. & Markley, J. L. NMRFAM-SPARKY: Enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31 , 1325–1327 (2015).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Berman, H. M. et al. The protein data bank. Nucleic Acid Res. 28 , 235–242 (2000).

Gardner, K. H. Solution NMR studies of a 42 KDa Escherichia coli maltose binding protein/β-cyclodextrin complex: Chemical shift assignments and analysis. J. Am. Chem. Soc. 120 , 11738–11748 (1998).

Sapienza, P. J. & Lee, A. L. Backbone and ILV methyl resonance assignments of E. coli thymidylate synthase bound to cofactor and a nucleotide analogue. Biomol. NMR Assign. 8 , 195–199 (2014).

Yang, Y. & Igumenova, T. I. The C-Terminal V5 domain of protein kinase Cα Is intrinsically disordered, with propensity to associate with a membrane mimetic. PLoS ONE 8 , 65699–65699 (2013).

Article   ADS   Google Scholar  

Camacho-Zarco, A. R. et al. Molecular basis of host-adaptation interactions between influenza virus polymerase PB2 subunit and ANP32A. Nat. Commun. 11 , 3656 (2020).

Download references

Acknowledgements

We are grateful to Dominque Frueh and colleagues for providing crosspeak lists for Cy1 and for fruitful discussions and to Andrew Lee, Martin Blackledge and Tatyana Igumenova for providing crosspeak and/or spin system lists for ecTS, hIDD and V5dm, respectively. We also thank the Texas A&M High Performance Research Computing Center for access to computational resources for the prediction of the Cy1 structure and to NMRbox for access to NMRPipe and other data processing packages. This work was supported by grants from the Mathers Foundation (MF-1809-00155), the National Institutes of Health (GM129076) and Texas A&M University to A.J.W. and by a postdoctoral fellowship from the Gulf Coast Consortium provided by the Cancer Prevention and Research Institute of Texas (RP210043) to A.C.B.

Author information

Authors and affiliations.

Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX, 77843, USA

Anthony C. Bishop, Glorisé Torres-Montalvo, Kyle Mimun & A. Joshua Wand

Graduate Group in Biochemistry & Molecular Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19014, USA

Sravya Kotaru & A. Joshua Wand

Department of Chemistry, Texas A&M University, College Station, TX, 77843, USA

A. Joshua Wand

Department of Molecular & Cellular Medicine, Texas A&M University, College Station, TX, 77843, USA

You can also search for this author in PubMed   Google Scholar

Contributions

A.C.B. and A.J.W. conceived the algorithm. A.C.B. wrote the computer code to implement the algorithm. A.C.B., S.K., G.T.-M., and A.J.W. tested BARASA. A.C.B., S.K., and G.T.-M. prepared isotopically enriched protein, collected, processed and analyzed NMR data for IL-1β, IGPS, and IL-1Ra, respectively. S.K. and G.T.-M. manually assigned IGPS and IL-1Ra, respectively. K.M prepared isotopically enriched MBP and analyzed MBP NMR data. A.C.B. collected and processed MBP NMR data. A.C.B and G.T.-M ran the test cases through FLYA, AutoAssign and I-PINE. A.C.B. and A.J.W. wrote the manuscript with input from all authors.

Corresponding author

Correspondence to A. Joshua Wand .

Ethics declarations

Competing interests.

The authors declare the following competing interests. Texas A&M AgriLife has secured federal copyright of BARASA and will market the program. As inventors, A.C.B. and A.J.W. will receive a share of royalties generated by commercial use. There are no other competing interests.

Peer review

Peer review information.

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bishop, A.C., Torres-Montalvo, G., Kotaru, S. et al. Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing. Nat Commun 14 , 1556 (2023). https://doi.org/10.1038/s41467-023-37219-z

Download citation

Received : 13 April 2022

Accepted : 06 March 2023

Published : 21 March 2023

DOI : https://doi.org/10.1038/s41467-023-37219-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Backbone and methyl side-chain resonance assignments of the single chain fab fragment of trastuzumab.

  • Donald Gagné
  • James M. Aramini

Biomolecular NMR Assignments (2024)

Breaking boundaries: TINTO in POKY for computer vision-based NMR walking strategies

  • Andrea Estefania Lopez Giraldo
  • Zowie Werner
  • Woonghee Lee

Journal of Biomolecular NMR (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

nmr assignment

About NMRtist

NMRtist is a cloud computing service for the fully automated analysis of protein NMR spectra (e.g. peak picking, chemical shift assignment, structure determination) using deep learning-based approaches. Each project created in NMRtist receives 30 GB of private storage, which can be filled by experimental data and analyzed using the available applications. You don't need to have any hardware resources or follow complex software configuration processes. NMRtist applications can be executed by just few mouse clicks in your web browser. All calculations are executed on NMRtist computational nodes, making the results available for download from NMRtist website.

ARTINA is a deep learning-based application for end-to-end protein structure determination by NMR spectroscopy. Using as input NMR spectra and the protein sequence, the method identifies automatically (strictly without any human intervention): cross-peak positions, chemical shift assignments, upper limit distance restraints, and the protein structure. ARTINA deep learning models have been trained with over 600 000 cross-peak examples from more than 1300 2D/3D/4D spectra. The method demonstrated its ability to solve structures with a median backbone RMSD of 1.44 Å to PDB reference, and identified correctly 91.36% of the chemical shift assignments. View our short video tutorial to learn how to get started with ARTINA.

New Update (27.03.2023): We've added support for additional file types: (a) manual peak lists (.list, .peaks), (b) chemical shift lists (.prot), (c) chemical shift statistics (.stats), (d) lower/upper distance restraints (.lol/.upl), (e) Talos angle restraints (.aco), and (f) protein structure (.pdb). With this latest release, all of these files can now be uploaded to the project storage and utilized as inputs for applications. The new data files enable users to perform structure-based assignment , chemical shift transfer , and to use manually refined ARTINA output files in application runs. Learn more about the supported file formats in our blog article .

You can use the NMRtist platform free-of-charge (academic users) to perform automated peak picking, shift assignment, or full structure determination. Create a free account to use all functions of the service, or start an anonymous project by pressing the button below.

Recommended articles

Getting started tutorial.

This tutorial presents the first steps in the NMRtist system. It guides through account and project creation, data upload, and submission of an exemplary structure calculation job. You can go through the tutorial with your own data, or use one of our example datasets. We highly recommend doing this tutorial before making the first application call.

Artificial Intelligence for NMR Applications (ARTINA) is a deep learning-based approach to fully automated NMR protein structure determination. The method takes as input only NMR spectra and the protein sequence, and delivers automatically: peak lists, shift assignments, distance restraints, and the structure.

Video tutorial

This video tutorial introduces beginners to the NMRtist system, guiding them through the process of submitting an automated protein structure determination job, and showcasing representative results from such a job.

Examples of automatically determined structures

nmr assignment

Jan. 15, 2024, 8:14 p.m.

[Manuscript] The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis

Open dataset containing 1329 2D-4D NMR spectra that allow the reproduction of 100 protein structures from original measurements. This dataset was originally compiled for the development of the ARTINA deep learning-based spectra analysis method (see https://nmrdb.ethz.ch and the manuscript ).

Nov. 30, 2023, 8:15 p.m.

[Manuscript] Time-optimized protein NMR assignment with an integrative deep learning approach using AlphaFold and chemical shift prediction

Our new study, recently accepted in Science Advances ( https://www.science.org/doi/full/10.1126/sciadv.adi9323 ), explores the integration of in-silico predictions like AlphaFold with ARTINA, enhancing the efficiency and accuracy of NMR data analysis. This research represents a significant leap towards data-efficient use of our system for protein studies.

Feb. 2, 2023, 8:39 p.m.

[Manuscript] NMRtist: an online platform for automated biomolecular NMR spectra analysis

Our manuscript (application note), presenting the NMRtist platform, has been accepted for publication in Bioinformatics ( https://doi.org/10.1093/bioinformatics/btad066 ).

Dec. 21, 2022, midnight

NMRtist usage

Since the release of the platform in February 2022, NMRtist analysed 4 368 2D/3D/4D NMR spectra, completed 1 100 automated chemical shift assignment and 444 automated structure determination jobs.

Dec. 20, 2022, midnight

ARTINA and NMRtist presented to the broader audience

Between 06.2022 and 01.2023, we presented ARTINA and NMRtist at several NMR events, including: Chianti Workshop (Principina Terra, Italy), EUROMAR (Utrecht, The Netherlands), EMBO Practical Course (Basel, Switzerland), EMBO Lecture Course (Berhampur, India), Biomolecular NMR: Advanced Tools, Machine Learning (Gothenburg, Sweden), and ICMRBS (Boston, USA).

Oct. 19, 2022, midnight

[Manuscript] Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

Our manuscript, presenting the ARTINA workflow for rapid assignment and structure determination, has been published in Nature Communications ( https://doi.org/10.1038/s41467-022-33879-5 ).

Oct. 2, 2021, midnight

Biomolecular NMR: Advanced Tools workshop

NMRtist was presented at the Biomolecular NMR: Advanced Tools workshop (29.09-01.10 2021). All participants of the training, supervised by Prof. Peter Güntert and Dr. Piotr Klukowski, submitted datasets to the platform, obtaining automatically determined structures and/or assignments.

nmr assignment

Assignments

Note:  Do not forget to have a look at our 1H NMR Automatic Assignments Tutorial

Mnova provides a very simple interface to assign your molecule. Open your NMR spectrum and load a molecule structure. Then follow the menu 'Analysis/Assignment' (or use the shortcut 'A').

Click on an atom on the molecular structure (or a spectrum region) and then release the mouse and drag it to your desired peak. Once your desired peak is highlighted on the spectrum, click on it to assign it.

This peak will now be assigned to the atom (which will turn to green). Once the assignment has been made, you will get an atom number label on the chemical shift and hovering the mouse over the atom will highlight the applicable peak in the spectrum and hovering the mouse over the peak will highlight the corresponding atom on the molecular structure.

You can also assign a region of the spectrum just by clicking, dragging and releasing the mouse over the desired region. In the example below, we have assigned a -CH2 group, so a new window will be displayed to allow us to select which atom we want to assign, 18, 18', both (in blank) or even we can select any other annotation: 18a, 18b, cis/trans, ax/eq, etc:

Assign a multiplet by dragging the mouse to the 'multiplet box' (in this case the name of the multiplet is replaced with the atom number) or to the 'integral curve', as you can see in the picture below:

We recommend you to assign your atoms to your multiplet boxes in order to transfer assignments through datasets. you can also assign a 13C-NMR of the same molecule in the same document. To do that, select the atoms from the Table of molecules by following the menu 'View/Tables/Molecules'. In order to keep the assignments propagation, follow the menu 'View/Tables/Assignments' to select what datasets you want to take into account (in the example below, we have selected the 1H and the 13C NMR datasets).

Assignments to 2D spectra:

Once you have assigned the 1H and the 13C spectra, if you open in the same document a 2D-NMR spectrum and you link the spectra (from the Assignments table), you will see the assignments graphically on the screen and hovering the mouse over the atom will highlight the applicable chemical shift. In the example below, you can see the correlation between the two hydrogens and the carbon of the position 12 in a HSQC:

Also you can carry out assignments to 2D spectra by typing the applicable number of the atom in the 'Assignments Table' or graphically by selecting the atom in the molecule structure and the corresponding signal in the 2D:

Click on the 2D spectra and a new window will be displayed to select the atoms are involved. (in that case, you will only need to select the atom 10 in f1):

Finally click OK and hover the mouse over the atom 23 to see the result (please notice how the assignment table has been automatically filled with the number '10' in the HSQC column, in order to show the correlation between the C-10 and the H-23).

Copying several datasets on the same page is also available. Here you can see a page which contains a 1H, 13C and HSQC datasets with an assignment in the molecular structure:

About Author

nmr assignment

Related Posts

Publications Mestrelab

Mnova System Requirements and Mbook (In-house installations)

nmr assignment

Arithmetics

nmr assignment

A robust, general automatic phase correction algorithm for high-resolution NMR data

Comments are closed.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Chemistry LibreTexts

NMR - Interpretation

  • Last updated
  • Save as PDF
  • Page ID 1812

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Nuclear Magnetic Resonance (NMR) interpretation plays a pivotal role in molecular identifications. As interpreting NMR spectra, the structure of an unknown compound, as well as known structures, can be assigned by several factors such as chemical shift, spin multiplicity, coupling constants, and integration. This Module focuses on the most important 1 H and 13 C NMR spectra to find out structure even though there are various kinds of NMR spectra such as 14 N, 19 F, and 31 P. NMR spectrum shows that x- axis is chemical shift in ppm. It also contains integral areas, splitting pattern, and coupling constant.

Strategy for Solving Structure

Here is the general strategy for solving structure with NMR:

  • Molecular formula is determined by chemical analysis such as elementary analysis
  • Double-bond equivalent (also known as Degree of Unsaturation ) is calculated by a simple equation to estimate the number of the multiple bonds and rings. It assumes that oxygen (O) and sulfur (S) are ignored and halogen (Cl, Br) and nitrogen is replaced by CH. The resulting empirical formula is C a H b

Equation 1 (1).jpg

  • Structure fragmentation is determined by chemical shift, spin multiplicity, integral (peak area), and coupling constants (\(^1J\), \(^2J\))
  • Molecular skeleton is built up using 2-dimensional NMR spectroscopy.
  • Relative configuration is predicted by coupling constant ( 3 J).

Chemical Shift

Chemical shift is associated with the Larmor frequency of a nuclear spin to its chemical environment. Tetramethylsilane (TMS, \(\ce{(CH3)4Si}\)) is generally used as an internal standard to determine chemical shift of compounds: δ TMS =0 ppm. In other words, frequencies for chemicals are measured for a 1 H or 13 C nucleus of a sample from the 1 H or 13 C resonance of TMS. It is important to understand trend of chemical shift in terms of NMR interpretation. The proton NMR chemical shift is affect by nearness to electronegative atoms (O, N, halogen.) and unsaturated groups (C=C,C=O, aromatic). Electronegative groups move to the down field (left; increase in ppm). Unsaturated groups shift to downfield (left) when affecting nucleus is in the plane of the unsaturation, but reverse shift takes place in the regions above and below this plane. 1 H chemical shift play a role in identifying many functional groups. Figure \(\PageIndex{1}\). indicates important example to figure out the functional groups.

Fig1.bmp

Chemical equivalence

Protons with Chemical equivalence has the same chemical shift due to symmetry within molecule (\(CH_3COCH_3\)) or fast rotation around single bond (-CH 3 ; methyl groups).

Spin-Spin Splitting

Spin-Spin splitting means that an absorbing peak is split by more than one “neighbor” proton. Splitting signals are separated to J Hz, where is called the coupling constant. The spitting is a very essential part to obtain exact information about the number of the neighboring protons. The maximum of distance for splitting is three bonds. Chemical equivalent protons do not result in spin-spin splitting. When a proton splits, the proton’s chemical shift is determined in the center of the splitting lines.

Spin Multiplicity (Splitting pattern)

Spin Multiplicity plays a role in determining the number of neighboring protons. Here is a multiplicity rules: In case of \(A_mB_n\) system, the multiplicity rule is that Nuclei of \(B\) element produce a splitting the \(A\) signal into \(nB+1\) lines. The general formula which applies to all nuclei is \(2_nI+1\), where \(I\) is the spin quantum number of the coupled element. The relative intensities of the each lines are given by the coefficients of the Pascal’s triangle (Figure \(\PageIndex{2}\)).

Fig2.bmp

First-order splitting pattern

The chemical shift difference in Hertz between coupled protons in Hertz is much larger than the \(J\) coupling constant:

\[ \dfrac{\Delta \nu }{J} \ge 8\]

Where \(\Delta \nu\) is the difference of chemical shift. In other word, the proton is only coupled to other protons that are far away in chemical shift. The spectrum is called first-order spectrum . The splitting pattern depends on the magnetic field. The second-order splitting at the lower field can be resolved into first-order splitting pattern at the high field. The first-order splitting pattern is allowed to multiplicity rule (N+1) and Pascal’s triangle to determine splitting pattern and intensity distribution.

Example \(\PageIndex{1}\)

The note is that structure system is A 3 M 2 X 2 . H a and H x has the triplet pattern by Hm because of N+1 rule. The signal of Hm is split into six peaks by H x and H a (Figure3) The First order pattern easily is predicted due to separation with equal splitting pattern.

Fig3.bmp

High-order splitting pattern

High-order splitting pattern takes place when chemical shift difference in Hertz is much less or the same that order of magnitude as the j coupling.

\[\frac{\Delta v}{J} \leq 10\]

The second order pattern is observed as leaning of a classical pattern: the inner peaks are taller and the outer peaks are shorter in case of AB system (Figure \(\PageIndex{4}\)). This is called the roof effect .

Fig4.bmp

Here is other system as an example: A 2 B 2 (Figure \(\PageIndex{5}\)). The two triplet incline toward each other. Outer lines of the triplet are less than 1 in relative area and the inner lines are more than 1. The center lines have relative area 2.

Fig5.bmp

Coupling constant (J Value)

Coupling constant is the strength of the spin-spin splitting interaction and the distance between the split lines. The value of distance is equal or different depending on the coupled nuclei. The coupling constants reflect the bonding environments of the coupled nuclei. Coupling constant is classified by the number of bonds:

Geminal proton-proton coupling ( 2 J HH )

Germinal coupling generates through two bonds (Figure \(\PageIndex{6}\)). Two proton having geminal coupling are not chemically equivalent. This coupling ranges from -20 to 40 Hz. 2 J HH depends on hybridization of carbon atom and the bond angle and the substituent such as electronegative atoms. When S-character is increased, Geminal coupling constant is increased: 2 J sp1 > 2 J sp2 > 2 J sp3 The bond angle(HCH) gives rise to change 2 J HH value and depend on the strain of the ring in the cyclic systems. Geminal coupling constant determines ring size. When bond angle is decreased, ring size is decreased so that geminal coupling constant is more positive. If a atom is replace to an electronegative atom, Geminal coupling constant move to positive value.

Fig6.bmp

Vicinal proton-proton coupling ( 3 J HH )

Vicinal coupling occurs though three bonds (Figure \(\PageIndex{7}\).). The Vicinal coupling is the most useful information of dihedral angle, leading to stereochemistry and conformation of molecules. Vicinal coupling constant always has the positive value and is affected by the dihedral angle (?;HCCH), the valence angle (?; HCC), the bond length of carbon-carbon, and the effects of electronegative atoms. Vicinal coupling constant depending on the dihedral angle (Figure \(\PageIndex{8}\)) is given by the Karplus equation.

\[^3 J=7.0-0.5 \cos \phi+4.5 \cos ^{2} \phi\]

When ? is the 90 o , vicinal coupling constant is zero. In addition, vicinal coupling constant ranges from 8 to 10 Hz at the and ?=180 o , where ?=0 o and ?=180 o means that the coupled protons have cis and trans configuration, respectively.

Fig7.bmp

The valence angle(?;Figure \(\PageIndex{8}\)) also causes change of 3 J HH value. Valence angle is related with ring size. Typically, when the valence angle decreases, the coupling constant reduces. The distance between the carbons atoms gives influences to vicinal coupling constant

Fig8.bmp

The coupling constant increases with the decrease of bond length. Electronegative atoms affect vicinal coupling constants so that electronegative atoms decrease the vicinal coupling constants.

Integral is referred to integrated peak area of 1H signals. The intensity is directly proportionally to the number of hydrogen.

1.jpg

Spin-Spin splitting

Comparing the 1 H NMR, there is a big difference thing in the 13 C NMR. The 13 C- 13 C spin-spin splitting rarely exit between adjacent carbons because 13 C is naturally lower abundant (1.1%)

  • 13 C- 1 H Spin coupling : 13 C- 1 H Spin coupling provides useful information about the number of protons attached a carbon atom. In case of one bond coupling ( 1 J CH ), -CH, -CH 2 , and CH 3 have respectively doublet, triplet, quartets for the 13 C resonances in the spectrum. However, 13 C- 1 H Spin coupling has an disadvantage for 13 C spectrum interpretation. 13 C- 1 H Spin coupling is hard to analyze and reveal structure due to a forest of overlapping peaks that result from 100% abundance of 1 H.
  • Decoupling : Decoupling is the process of removing 13 C- 1 H coupling interaction to simplify a spectrum and identify which pair of nuclei is involved in the J coupling. The decoupling 13 C spectra shows only one peak(singlet) for each unique carbon in the molecule(Figure \(\PageIndex{10}\).). Decoupling is performed by irradiating at the frequency of one proton with continuous low-power RF.

Fig10.bmp

  • Distortionless enhancement by polarization transfer (DEPT): DEPT is used for distinguishing between a CH 3 group, a CH 2 group, and a CH group. The proton pulse is set at 45 o , 90 o , or 135 o in the three separate experiments. The different pulses depend on the number of protons attached to a carbon atom. Figure \(\PageIndex{11}\). is an example about DEPT spectrum.

Fig11.bmp

2-dimensional NMR spectroscopy (COSY)

COSY stands for COrrelation SpectroscopY. COSY spectrum is more useful information about what is being correlated.

1 H- 1 H COSY (COrrelation SpectroscopY)

1 H- 1 H COSY is used for clearly indicate correlation with coupled protons. A point of entry into a COSY spectrum is one of the keys to predict information from it successfully. Relation of Coupling protons is determined by cross peaks(correlation peaks) and in the COSY spectrum. In other words, Diagonal peaks by lines ar e coupled to each other. Figure \(\PageIndex{12}\) indicates that there are correlation peaks between proton H 1 and H 2 as well as between H 2 and H 4 . This means the H 2 coupled to H 1 and H 4 .

Fig12.bmp

1 H- 13 C COSY (HETCOR)

1 H- 13 C COSY is the heteronuclear correlation spectroscopy. The HETCOR spectrum is correlated 13 C nuclei with directly attached protons. 1 H- 13 C coupling is one bond. The cross peaks mean correlation between a proton and a carbon (Figure \(\PageIndex{13}\)). If a line does not have cross peak, this means that this carbon atoms has no attached proton (e.g. a quaternary carbon atom)

Fig13.bmp

  • Balc*, M., Basic p1 sH- and p13 sC-NMR spectroscopy. 1st ed.; Elsevier: Amsterdam ; Boston, 2005; p xii, 427.
  • Breitmaier, E., Structure elucidation by NMR in organic chemistry : a practical guide. 3rd rev. ed.; Wiley: Chichester, West Sussex, England, 2002; p xii, 258.
  • Jacobsen, N. E., NMR spectroscopy explained : simplified theory, applications and examples for organic chemistry and structural biology. Wiley-Interscience: Hoboken, N.J., 2007; p xv, 668.
  • Silverstein, R. M.; Webster, F. X., Spectrometric identification of organic compounds. 6th ed.; Wiley: New York, 1998; p xiv, 482.

Outside Links

  • NMRShiftDB: a Free web database for NMR data : nmrshiftdb.chemie.uni-mainz.de/nmrshiftdb
  • NMR database from ACD/LAbs : www.acdlabs.com/products/spec_lab/exp_spectra/spec_libraries/aldrich.html
  • NMR database from John Crerar Library : http://crerar.typepad.com/crerar_lib...h_ir_nmr_.html

Draw the 1H NMR spectrum for 2-Hydroxypropane in CDCl3. Assume sufficient resolution to provide a first-order spectrum and ignore vicinal proton-proton coupling(3JHH)

1) the structure of 2-hydoroxyporpane is drawn

Pro1_1.bmp

Figure out which protons are chemically equivalent, i.e., two methyl (-CH 3 ) groups are chemical equivalent.

Pro1_2.bmp

4) Splitting pattern is determined by (N+1) rule: Ha is split into two peaks by H b (#of proton=1). H b has the septet pattern by H a (#of proton=6). H c has one peak.(Note that H c has doublet pattern by H b due to vicinal proton-proton coupling.)

Answer1.bmp

Contributors and Attributions

  • You Jin Seo

IMAGES

  1. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    nmr assignment

  2. NMR assignment and mapping of binding sites of Gαi1. (A) 2D [ 15 N, 1

    nmr assignment

  3. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    nmr assignment

  4. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    nmr assignment

  5. Solved 1H NMR and 13C NMR Assignment: Assign the 1H NMR and

    nmr assignment

  6. NMR imino assignments and secondary structure determination of

    nmr assignment

VIDEO

  1. Final Paper 1: FR

  2. bhic 134 previous year solve paper

  3. Poky: Use BMRB to assign N-HSQC

  4. NPTEL Swayam Advanced NMR Techniques in Solution and Solid-State Week-1 Assignment Answers| NPTEL

  5. NMR Spectroscopy

  6. NMRFAM-SPARKY: Semi-automated assignment by Versatile Assigner

COMMENTS

  1. Home

    Biomolecular NMR Assignments is a dedicated forum for publishing sequence-specific resonance assignments for proteins and nucleic acids. Provides an avenue for depositing these data into a public database at BioMagResBank. Assignment Notes are published in biannual editions in June and December. No page charges or fees for online color images.

  2. Assignment of 1H-NMR spectra

    H-NMR spectra. On this page we will deal with how to interpret an NMR spectrum. The meaning of assignment in the title is to assign each peak to a proton in the molecule under investigation. The examples here are of 1D proton assignments. For more complex examples, see the 2D assignments of 12,14-di t butylbenzo [g]chrysene and cholesteryl acetate.

  3. 6.6: ¹H NMR Spectra and Interpretation (Part I)

    Fig. 6.6d The 1H NMR spectrum of methyl acetate with signals assignment Chemical Shift of Protons Near π Electrons — Anisotropy Effect The chemical shift values of aromatic protons and vinylic protons (those directly bonded to an alkene carbon) resonate much further downfield (higher frequency, higher chemical shift) than can be accounted ...

  4. NMR: Structural Assignment

    This action is not available. Assignment of structures is a central problem which NMR is well suit to address. Explains how both 13C NMR spectra and low and high resolution proton NMR spectra can be used to help to work out the ….

  5. Rapid protein assignments and structures from raw NMR spectra with the

    However, the analysis of NMR spectra and the resonance assignment, which are indispensable for NMR studies, remain time-consuming even for a skilled and experienced spectroscopist.

  6. NMR-Challenge.com: Exploring the Most Common Mistakes in NMR Assignments

    However, NMR spectral interpretation skills can only be learned by experience. We present a new educational Web site including more than 160 NMR spectral assignments measured for real samples. Each assignment contains NMR spectra of an unknown compd., with the task for the students being to det. its structure.

  7. Time-optimized protein NMR assignment with an integrative ...

    Assigning NMR spectra of large proteins with more than 200 residues presents a fundamental challenge in the field of NMR spectroscopy, due to several factors affecting both measurement process (line broadening, lower sensitivity, and larger conformational heterogeneity) and data interpretation (signal overlap and shift assignment ambiguity).

  8. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    15 N-HSQC as an assay for probing protein - ligand interactions: the need for the NMR resonance assignment. During the process of rational drug design, it is often necessary to characterize the interactions between the therapeutic target (protein) and candidate drug (ligand) beyond determination of the binding affinity (K d).Heteronuclear solution NMR experiments 15 N-HSQC can provide ...

  9. NMR-Challenge.com: An Interactive Website with Exercises in Solving

    Each assignment contains NMR spectra of an unknown compound, with the task for the students being to determine its structure. Basic assignments contain only one-dimensional 1 H and 13 C NMR spectra; advanced assignments also offer two-dimensional correlation spectra. The web application is enriched with an interactive chemical structure drawing ...

  10. A guide to small-molecule structure assignment through computation of

    Experimental 1H and/or 13C NMR spectral data and its proper interpretation for the compound of interest is required as a starting point. ... For a typical structure assignment of a small organic ...

  11. Automated NMR resonance assignments and structure determination using a

    A novel approach for sequential assignment of proton, carbon-13, and nitrogen-15 spectra of larger proteins: heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to ...

  12. PDF NMR assignment

    NMR assignment by Roy Hoffman 2006 9 Gradient pulses • It is possible to apply a magnetic gradient to the sample. • A gradient affects the signal in the following manner. At the start of the experiment it disperses the signal, making it disappear. Then the application of a gradient in the opposite direction allows the signal to be seen again.

  13. Assigning NMR spectra of RNA, peptides and small organic molecules

    NMR assignment typically involves analysis of peaks across multiple NMR spectra. Chemical shifts of peaks are measured before being assigned to atoms using a variety of methods. These approaches quickly become complicated by overlap, ambiguity, and the complexity of correlating assignments among multiple spectra. Here we propose an alternative ...

  14. Sparky

    Sparky is a graphical NMR assignment and integration program for proteins, nucleic acids, and other polymers. For more information about what Sparky does look at the introduction in the manual. You may also be interested in other nmr software. News Please use NMRFAM Sparky which is being actively developed by Woonghee Lee (2017). The UCSF ...

  15. Protein NMR Resonance Assignment

    This facilitated NMR to be independent of X-ray crystallography, and the solution structures of proteins were determined by NMR using the assignment of proton signals and proton-proton distance information. The limited resolution of two-dimensional (2D) 1 H NMR spectra, however, restricted the molecular weights of target proteins to less than 8 ...

  16. Robust automated backbone triple resonance NMR assignments of ...

    Assignment of resonances of nuclear magnetic resonance (NMR) spectra to specific atoms within a protein remains a labor-intensive and challenging task. Automation of the assignment process often ...

  17. Protein NMR

    Much space and discussion is devoted to practical aspects. The implementation of protein NMR assignment is described using the program CCPNmr Analysis. This program has been developed by CCPN and actively seeks input from the NMR community. CCPNmr Analysis is based on the detailed and well thought-out CCPN Data Model which has the advantage (a ...

  18. NMRtist

    NMRtist is a cloud computing service for the fully automated analysis of protein NMR spectra (e.g. peak picking, chemical shift assignment, structure determination) using deep learning-based approaches. Each project created in NMRtist receives 30 GB of private storage, which can be filled by experimental data and analyzed using the available applications.

  19. NMR Spectroscopy

    In the nmr spectrum of the pure liquid, sharp signals from both the keto and enol tautomers are seen, their mole ratio being 4 : 21 (keto tautomer signals are colored purple). Chemical shift assignments for these signals are shown in the shaded box above the spectrum.

  20. Assignments

    Note: Do not forget to have a look at our 1H NMR Automatic Assignments Tutorial Mnova provides a very simple interface to assign your molecule. Open your NMR spectrum and load a molecule structure. Then follow the menu 'Analysis/Assignment' (or use the shortcut 'A'). Click on an atom on the molecular structure (or a spectrum region)Read More

  21. NMR

    NMR - Interpretation is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by LibreTexts. NMR interpretation plays a pivotal role in molecular identifications. As interpreting NMR spectra, the structure of an unknown compound, as well as known structures, can be assigned by several ….