• Login & order NMR service now
  • NMR service Login & order NMR service now NMR service NMR chromatography service Why use our superior service Contact us The NMR team How to submit samples Use the instruments yourself Terms & conditions
  • Assignment of 1H-NM…
  • What is NMR What is NMR Uses of NMR Basis of NMR Chemical shift Spin-spin coupling
  • Techniques Techniques 1 H NMR 2D NMR Relaxation Multinuclear Semi-solids Solid state
  • Apps Apps Solvent shifts NMR thermometer Reference frequency
  • Guides Guides Measuring a 1 H spectrum on the old 500 Measuring a 1 H spectrum Measuring other nuclei Measuring 2D NMR Measuring diffusion Measuring relaxation Measuring solid & semi-solid
  • Terms & conditions

Assignment of 1 H-NMR spectra

On this page we will deal with how to interpret an NMR spectrum. The meaning of assignment in the title is to assign each peak to a proton in the molecule under investigation. The examples here are of 1D proton assignments. For more complex examples, see the 2D assignments of 12,14-di t butylbenzo[g]chrysene and cholesteryl acetate .

In the example in fig. 1 of isopropyl- β -D-thiogalactopyranoside (shown without the hydrogens for simplicity – each carbon has four bonds, click here to see the molecule with hydrogens ), all the hydroxyls have been exchanged with the deuterium oxide solvent to deuteroxyls. Therefore, the hydroxyl signals do not appear in the spectrum and do not couple with the other signals, making the spectrum simpler.

Fig. 1. 1 H-NMR spectrum of isopropyl- β -D-thiogalactopyranoside in D 2 O

From the integrals, we see that there are two multiples of three, one of which has tall sharp signals so very likely corresponds to the two methyl (CH 3 ) signals. The remaining signals are expected to yield integrals of one so the integrals of three and four are overlapping. H6 is expected to yield two separate signals because they are diasteriomeric (if one of them is exchanged with another group, the attached carbon would be optically active. This fact affects their chemical shift and they differ magnetically - If you don't understand this, don't worry, just take it form granted for now).

From the chemical shifts we see that what we suspect are methyls have the appropriate chemical shift and the remaining signals fall in the overlapping CH and CH 2 regions as expected. If you are an experienced sugar chemist you will know that the signal with the highest chemical shift is usually the anomeric signal (H1 – the hydrogen connected to the carbon next to the sugar ring-closing oxygen).

The coupling patterns can be used to continue the analysis. You could be forgiven for thinking that the methyl signals display an AXY coupling pattern. However, they only couple with the single i Pr proton so should yield an AX pattern. The reason is that the methyls (labeled MeA and MeB) are diasteriomeric so have different chemical shifts (not magnetically identical, just like the H6 protons). The result is two overlapping AX patterns (fig. 2).

Fig. 2. The methyl doublets of isopropyl- β -D-thiogalactopyranoside in D 2 O

The i Pr proton is coupled to six methyl protons yielding a septet (fig. 3).

Fig. 3. The i Pr septet of isopropyl- β -D-thiogalactopyranoside in D 2 O

The anomeric H1 is coupled to H2 yielding an AX doublet (fig. 4).

Fig. 4. The anomeric H1 doublet of isopropyl- β -D-thiogalactopyranoside in D 2 O

H4 has an unusually small coupling to H5 (this occurs when the two CH bonds are approximately at right-angles to each other), so small that it is not observed in a normal spectrum. So H4 displays an AX pattern instead of the expected AXY pattern although the peaks are slightly broad indicating the missing coupling (fig. 5).

Fig. 5. The H4 multiplet of isopropyl- β -D-thiogalactopyranoside in D 2 O

H3 couples with both H2 and H4 and yields the expected AXY pattern. While H5, H6A and H6B have very similar chemical shifts and stong coupling that combine to yield very strongly second order coupled ABC pattern that is difficult to analyze (fig. 6).

Fig. 6. The H3 and the overlapping H5, H6A and H6B multiplets of isopropyl- β -D-thiogalactopyranoside in D 2 O

In the example of trans -geraniol in fig. 7 (shown without the hydrogens for simplicity – each carbon has four bonds, click here to see the molecule with hydrogens ), proton-5 (H5) is coupled and therefore split by proton-4 (H4); H8 and H9 represent two protons each that are split by each other into triplet AX 2 patterns; and H2 is split into four by the three protons at H1 and the resulting quartet is split again by H3. However, second order coupling distorts the multiplets making the assignment more difficult.

Fig. 7. Part of the 1 H-NMR spectrum of trans -geraniol in CDCl 3

Protein NMR Resonance Assignment

  • Reference work entry
  • pp 2033–2037
  • Cite this reference work entry

proton nmr assignment

  • Fuyuhiko Inagaki 2  

1464 Accesses

Biosynthetic labeling ; Side chain assignment ; Spectroscopic assignment

Overview of Protein Resonance Assignment

Until the introduction of sequential assignment procedure developed by Kurt Wüthrich and his coworkers in 1980s (Wüthrich 1986 ), most of the protein assignment work was accomplished with reference to the crystal structure. Therefore, the establishment of the sequential assignment procedure was a mile stone for the protein NMR. Backbone amide proton (H N ) and α proton (H α ) signals were sequentially assigned based on the distance information between H N i and \({\rm H}^{\alpha}_{{\rm i}-1}\) , and were aligned on the amino acid sequence of the particular protein. This facilitates NMR to be independent from X-ray crystallography and the structure of proteins in solution could be determined by NMR using the assignment of proton signals and proton-proton distance information. However, due to limited resolution in 1 H 2D-NMR spectra, the molecular weight of the target protein...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bax A, Grzesiek S. Methodological advances in protein NMR. Acc Chem Res. 1993;26:131–8.

CAS   Google Scholar  

Cavanagh J, Fairbrother W, Palmer AG, Rance M, Skeleton NJ. Protein NMR spectroscopy. 2nd ed. Amsterdam: Elsevier; 2007.

Google Scholar  

Kainosho M, Tsuji T. Assignment of the three methionyl carbonyl carbon resonances in Streptomyces subtilisin inhibitor by a carbon-13 and nitrogen-15 double- labeling technique. A new strategy for structural studies of proteins in solution. Biochemistry. 1982;21:6273–9.

CAS   PubMed   Google Scholar  

Kay LE. Nuclear magnetic resonance methods for high molecular weight proteins: a study involving a complex of maltose binding protein and β-cyclodextrin. In: James TL, Dotsch V, Schmitz U, editors. Methods in enzymology 339. New York: Academic; 2001. p. 174–203.

McIntosh LP, Dahlquist FW. Biosynthetic incorporation of 15 N and 13 C for assignment and interpretation of nuclear magnetic resonance spectra of proteins. Q Rev Biophys. 1990;23:1–38.

Morita EH, Shimizu M, Ogasawara T, Endo Y, Tanaka R, Kohno T. A novel way of amino acid-specific assignment in 1 H- 15 N HSQC spectra with a wheat germ cell-free protein synthesis system. J Biomol NMR. 2004;30:37–45.

Shen Y, Delaglio F, Cornilescu G, Bax A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR. 2009;44:213–23.

CAS   PubMed Central   PubMed   Google Scholar  

Wüthrich K. NMR of proteins and nucleic acids. New York: Wiley; 1986.

Wüthrich K, Wider K. Transverse relaxation-optimized NMR spectroscopy with biomacromolecular structure in solution. Magn Reson Chem. 2003;41:S80–8.

Download references

Author information

Authors and affiliations.

Department of Structural Biology, Hokkaido University, N21, W11, Kita-ku, Sapporo, 001-0021, Japan

Fuyuhiko Inagaki

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Fuyuhiko Inagaki .

Editor information

Editors and affiliations.

Department of Biochemistry, University of Leicester, Leicester, UK

Gordon C. K. Roberts

Rights and permissions

Reprints and permissions

Copyright information

© 2013 European Biophysical Societies' Association (EBSA)

About this entry

Cite this entry.

Inagaki, F. (2013). Protein NMR Resonance Assignment. In: Roberts, G.C.K. (eds) Encyclopedia of Biophysics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16712-6_312

Download citation

DOI : https://doi.org/10.1007/978-3-642-16712-6_312

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-16711-9

Online ISBN : 978-3-642-16712-6

eBook Packages : Biomedical and Life Sciences Reference Module Biomedical and Life Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, including protons in solid-state nmr resonance assignment and secondary structure analysis: the example of rna polymerase ii subunits rpo4/7.

proton nmr assignment

  • 1 Physical Chemistry, Eidgenössische Technische Hochschule Zurich, Zurich, Switzerland
  • 2 Center for Biomolecular Magnetic Resonance, Institute of Biophysical Chemistry, Goethe University Frankfurt, Frankfurt, Germany
  • 3 Department of Chemistry, Tokyo Metropolitan University, Hachioji, Japan
  • 4 Institut de Biologie et Chimie des Protéines, MMSB, Labex Ecofect, UMR 5086 CNRS, Université de Lyon, Lyon, France

1 H-detected solid-state NMR experiments feasible at fast magic-angle spinning (MAS) frequencies allow accessing 1 H chemical shifts of proteins in solids, which enables their interpretation in terms of secondary structure. Here we present 1 H and 13 C-detected NMR spectra of the RNA polymerase subunit Rpo7 in complex with unlabeled Rpo4 and use the 13 C, 15 N, and 1 H chemical-shift values deduced from them to study the secondary structure of the protein in comparison to a known crystal structure. We applied the automated resonance assignment approach FLYA including 1 H-detected solid-state NMR spectra and show its success in comparison to manual spectral assignment. Our results show that reasonably reliable secondary-structure information can be obtained from 1 H secondary chemical shifts (SCS) alone by using the sum of 1 H α and 1 H N SCS rather than by TALOS. The confidence, especially at the boundaries of the observed secondary structure elements, is found to increase when evaluating 13 C chemical shifts, here either by using TALOS or in terms of 13 C SCS.

Introduction

Solid-state NMR and, in particular, proton-detected spectroscopy under fast MAS allows to characterize larger and larger proteins and protein complexes ( Linser et al., 2011 ; Andreas et al., 2015 ; Struppe et al., 2017 ; Schubeis et al., 2018 ; Bougault et al., 2019 ). Here, we demonstrate the resonance assignment and secondary-structure determination of the subunit Rpo7 of the archaeal DNA-dependent RNA polymerase (RNAP) in the context of the protein complex Rpo4/Rpo7 (33.5 kDa). RNAPs from bacteria, archaea, and eukarya are well-characterized in terms of their subunit composition, as well as their structure, and much is known about the regulation mechanisms and complex interplay of transcription factors throughout the transcription cycle of initiation, elongation, and transcription termination ( Werner and Grohmann, 2011 ; Sainsbury et al., 2015 ; Hantsche and Cramer, 2016 ). Especially the archaeal RNAP has served as a model system for dissecting the functions of the individual subunits of the human RNAP II ( Werner, 2007 , 2008 ).

Two of these subunits, Rpb4/Rpb7, that form a stalk-like protrusion in RNAP II, or rather their archaeal homologs Rpo4/Rpo7 (or Rpo4/7), are known to bind the nascent single-stranded RNA, contribute to transcription initiation as well as termination efficiency and increase processivity during elongation ( Meka, 2005 ; Újvári and Luse, 2006 ; Grohmann and Werner, 2010 , 2011 ). Yet, how these functions are achieved in molecular detail remains elusive, and conformational changes of Rpo4/7 in response to RNA binding have not been detected when probed by labeling techniques, such as fluorescence and electron paramagnetic resonance spectroscopy ( Grohmann et al., 2010 ). NMR spectroscopy could provide further information at the atomic level.

As a first step, we present the 1 H, 13 C, and 15 N protein resonance assignment employing solid-state MAS experiments of a sedimented Rpo4/7 complex from the archeon Methanocaldococcus jannaschii . For this, we labeled the Rpo7 subunit uniformly with 13 C/ 15 N, while Rpo4 was employed at natural isotopic abundance. This enabled us to selectively study the Rpo7 subunit within the complex. We assigned, on the basis of the acquired spectra and using different assignment strategies, ~80% of the C α , C β , and backbone nitrogen atoms. It has been demonstrated that NMR chemical-shift values encode for the secondary structure ( Wishart et al., 1992 ; Wishart and Sykes, 1994 ; Wang, 2002 ; Shen et al., 2009 ). We compared the secondary structure predictions based on the different chemical shifts, and compared them also to the known crystal structure. We found that for proton resonances, the most reliable information can be derived from 1 H secondary chemical shifts (SCS) using the sum of 1 H α and 1 H N SCS. Nevertheless, 13 C chemical shifts are found to be more reliable in terms of secondary-structure information, both directly from SCS and from TALOS.

Materials and Methods

Protein expression and purification, sample preparation.

Plasmids pET21_Rpo7 and gGEX_2k_Rpo4 were transformed into E. coli BL21 (DE3) cells separately for Rpo4 and Rpo7. Rpo4 was overexpressed with an N-terminal glutathione S-transferase (GST)-tag in rich medium ( Terrific Broth, 2006 ) and purified via affinity chromatography using glutathione agarose (GSTrap, GE Healthcare, Glattbrugg, Switzerland) using P100 buffer (20 mM tris/acetate pH 7.9, 100 mM K acetate, 10 mM Mg acetate, 0.1 mM ZnSO 4 , 5 mM DTT, 10% (w/v) glycerol) and 10 mM reduced glutathione for elution, similar to previous protocols ( Werner and Weinzierl, 2002 ; Klose et al., 2012 ). The GST-tag was cleaved by overnight incubation with thrombin at 37°C. To deactivate and remove the GST-tag, a 20-min heat shock of the cleaved elution fractions at 65°C was applied with subsequent centrifugation (13,000 rpm, 20 min, 4°C), leaving purified Rpo4 in the supernatant. For isotope labeling with 15 N and 13 C, Rpo7 mutant S65C was expressed in M9-minimal medium ( Studier, 2005 ) consisting of 6.8 g Na 2 HPO 4 , 3 g KH 2 PO 4 , 0.5 g NaCl, 1 ml of each 1 M MgSO 4 , 10 mM ZnCl 2 , 1 mM FeCl 3 , and 100 mM CaCl 2 per 1 L medium, supplemented with 10 ml MEM vitamin solution (100×). One gram 15 NH 4 Cl and 2.5 g 13 C-glucose (Cambridge Isotope Laboratories, Tewksbury, USA) were the only nitrogen and carbon sources. Rpo7 * (the asterisk denotes isotope labeling) purification from inclusion bodies was carried out as described previously ( Werner and Weinzierl, 2002 ; Klose et al., 2012 ).

The complex formation of Rpo4 and Rpo7 * (with 20% excess) was carried out by unfolding and stepwise refolding dialysis in P100 buffer using urea (6, 4, 3, 2, 1, 0.5, and 0 M urea concentrations, 1 h per step, room temperature). Subsequently, a 20 min heat shock at 65°C and a subsequent centrifugation step (8,000 × g, 20 min, 4°C) was applied to remove excess or misfolded Rpo7 * after the dialysis. Purity and stability of the complex was confirmed by SDS and native page ( Figure S1 ). All chemicals were of p.a. grade and purchased from Sigma Aldrich (Buchs, Switzerland), unless stated otherwise.

Solid-State NMR Spectroscopy

Rpo4/7 * supplemented with DSS and sodium azide was sedimented into NMR rotors (0.7 and 3.2 mm, Bruker Biospin, Rheinstetten, Germany) by ultracentrifugation (35,000 rpm, 4°C, 16 h) using home-made filling tools ( Böckmann et al., 2009 ) resulting in 0.6 and 24 mg protein in the rotors with 0.7 and 3.2 mm diameter, respectively. Solid-state NMR spectra were recorded on a Bruker AVANCE III 850 MHz NMR spectrometer using either a 3.2 mm Bruker “E-free” probe or a 0.7 mm Bruker triple-resonance probe. The MAS spinning frequencies were set to 17.0 kHz for the 3.2 mm rotor and 110 kHz for the 0.7 mm rotor, with sample temperatures of 16°C (lowest possible temperature in this set-up) and 5°C for the 0.7 and 3.2 mm rotors, respectively. The 2D and 3D spectra were processed with TopSpin (version 3.5, Bruker Biospin, Rheinstetten, Germany) and analyzed in CcpNmr Analysis 2.4.2 ( Stevens et al., 2011 ). More details of the conducted experiments are presented in Table S1 . Polarization transfers between H-C and H-N used adiabatic cross polarization ( Hediger et al., 1995 ), as did N-C polarization transfers ( Baldus et al., 1996 ), while C-C transfers used either DARR ( Takegoshi et al., 2003 ) or DREAM ( Verel et al., 2001 ).

The 13 C-detected spectra used for the assignment were all recorded on a single sample (3.2 mm rotor). Reproducibility was checked by 2D measurements on samples from two different preparations in 0.7 mm rotors, which yielded identical spectra in all cases.

The obtained assignment was deposited in the BioMagResBank under accession number 27959.

TALOS+ Predictions and FLYA Calculations

TALOS+ predictions were performed using version 3.8 ( Shen et al., 2009 ). The secondary structure assignments based on the DSSP algorithm ( Kabsch and Sander, 1983 ) were used as given in the corresponding PDB entry 1GO3 ( Todone et al., 2001 ) and the 3D atomic coordinates were extracted from the same PDB entry. Solid-state FLYA calculations ( Schmidt and Güntert, 2012 ; Schmidt et al., 2013 ) were performed with CYANA version 3.97 ( Güntert and Buchner, 2015 ). Peak lists of 13 C and 1 H-detected spectra were used, using the peak lists from the resonance assignment (manual peak lists) or using automatically generated peak lists. Automated peak picking has been performed in CcpNmr using the implemented picking routine. The lowest contour level was set to 2.0–3.0 time noise RMSD for this process. The tolerance value for chemical-shift matching was set to 0.55 ppm for 13 C, 15 N, and 0.3 ppm for 1 H.

Results and Discussion

Assignment of 13 c detected solid-state nmr spectra.

The 13 C and 15 N-MAS solid-state NMR spectra of Rpo4/7 * show well-dispersed signals and roughly the expected number of peaks ( Figure S2 ) in the region of serine (four out of six expected peaks), threonine (4/4), alanine (7/8), and glycine (12/16) as can be seen in the 2D dipolar correlation spectra in Figure 1 , suggesting that the sample contains Rpo4/7 * in a single, well-defined conformation. The 13 C-linewidths are on the order of 115 Hz, which points to a homogeneous sample.

www.frontiersin.org

Figure 1. (A) Example of a 13 C, 15 N sequential resonance walk. (B) 2D 13 C, 13 C DARR spectrum of Rpo4/7 * measured at 20.0 T with a MAS frequency of 17 kHz and a DARR mixing time of 20 ms. (C) 2D NCA spectrum of Rpo4/7 * measured at 20.0 T with a MAS frequency of 17 kHz. In (B,C) , C α , and C β peaks are labeled according to the manually created shift list using the CcpNmr software.

Seven 3D 13 C-detected spectra (NCACB, NCACX, CANCO, NCOCX, NcoCACB, CANcoCA, and CCC) were measured to obtain the 13 C and 15 N assignment. The 13 C and 15 N assignment was mainly achieved by a combination of two strategies described earlier ( Schuetz et al., 2010 ) and shown in Figure 1A . The first is based on a sequential walk using NCACB, CANCO, NCOCX, the second uses the relayed experiments NcoCACB and CANcoCA, in combination with NCACB. The side chains were mainly assigned by analyzing NCACX and CCC spectra [employing Dipolar Recoupling Enhanced by Amplitude Modulation (DREAM) ( Verel et al., 2001 ; Westfeld et al., 2012 ) and Dipolar Assisted Rotational Resonance (DARR) ( Takegoshi et al., 2003 ) transfer steps].

Manual analysis of all 3D spectra resulted in the assignment shown in the 2D 13 C, 13 C DARR ( Figure 1B ) and 2D 15 N, 13 C NCA ( Figure 1C ) spectra, where 99% of all visible peaks are assigned. The assignment graph is shown in Figure S3 . Statistics of the manually performed peak assignment is shown in Table S2 . The resonances of most of the unassigned residues could thus neither be detected in 3D nor in 2D spectra, most probably because they are located in flexible parts of the protein. Figure 2 illustrates the spatial correlation between unassigned residues and the crystallographic B -factor, which shows that the most flexible part, the RNA binding loop ( Meka, 2005 ), which is not resolved in the crystal structure ( Todone et al., 2001 ), is found to be close to the unassigned residues Ser151–Ser159. The invisible residues are, however, not flexible enough to be visible in an INEPT spectrum (data not shown).

www.frontiersin.org

Figure 2 . X-ray crystal structure of Rpo4/7 (PDB: 1GO3). Rpo4 is shown as white ribbons. (A) Rpo7 (ribbons), colored according to the crystallographic B factor (see scale bar, in Å 2 ). (B) Rpo7 (ribbons), colored blue and red for backbone-assigned and unassigned residues, respectively. The RNA-binding loop, the region with the highest flexibility, for which no coordinates are available, is indicated by the flanking residues S151 and S159.

Assignment of 1 H-detected Solid-State NMR Spectra

To assign the amide H N and aliphatic H α protons of fully protonated Rpo7 * in complex with Rpo4, we used proton-detected spectroscopy at 110 kHz MAS frequency. The assignment of the 2D hNH fingerprint spectrum is shown in Figure 3 . The assignment was done using three 3D spectra, namely hCANH, hNCAH, and hCONH ( Barbet-Massin et al., 2014 ; Penzel et al., 2015 ), and taking advantage of the 13 C and 15 N peak assignment described above. Details of the experiments are given in Table S1 . The assignment of the NCA spectrum was transferred peak by peak to hCANH ( Figures 3A,D ) and hNCAH ( Figures 3B,D ) spectra. To confirm the assignment of amide protons, an additional hCONH spectrum was used to verify the CO chemical shift of the previous residue ( Figures 3C,D ). In total, 97% of the amide protons and 93% of the H α protons for which C α and N assignments exist could be assigned. In the assignment graph of Figure S2 those atoms are highlighted in blue and red, respectively.

www.frontiersin.org

Figure 3. (A) 2D NCA spectrum (gray) and 2D plane of a 3D hCANH (cyan) spectrum at δ( 1 H) = 7.6 ppm showing an example of the assignment transfer for 133Gly; (B) 2D NCA (gray) spectrum and 2D plane of a 3D hNCAH (blue) spectra at δ( 1 H) = 3.7 ppm showing the example of the assignment transfer for 133Gly; (C) 2D plane of hCONH spectrum δ( 1 H) = 7.6 ppm; (D) schematic representation of the assignment transfer for H N and H α atoms; (E) 2D hNH correlation spectrum of fully protonated Rpo4/7 * at 110 kHz MAS. The spectrum includes labels for the 15 N- 1 H peaks as predicted from the manually created shift list. On the right side of the figure 1D traces for 1 H are presented at the corresponding 15 N frequencies. The 1 H linewidth characteristics of the full population of marked cross-peaks are summarized in the boxplot in the bottom right, indicating the maximum, 3rd quartile, mean, 1st quartile and minimum value of proton FWHM linewidth in Hz with a mean value of 160 ± 40 Hz. (F) 2D hCH correlation spectrum of fully protonated Rpo4/7 * at 110 kHz MAS with peaks labeled as in (E) .

The mean value and standard deviation of the 1 H linewidths of the fully protonated hNH spectrum are 156 ± 40 Hz for all the peaks marked in Figure 3E . On the right side of the spectra 1D traces of 1 H are shown at the corresponding 15 N frequencies with linewidths of selected peaks.

The results of the manual assignment procedure were validated by automated resonance assignments as implemented in the solid-state FLYA algorithm ( Schmidt and Güntert, 2012 ; Schmidt et al., 2013 ). In addition to the 13 C and 15 N chemical shifts, 1 H solid-state chemical shifts were assigned as well in an automated process. Figure S4A illustrates the good agreement between the manual assignments and the assignments obtained by FLYA. For residues shown in green, the FLYA assignment agreed with the manual assignment (within a tolerance of 0.55 ppm for 13 C, 15 N, and 0.3 ppm for 1 H). A few significant differences (red) were observed. In those cases, the manual assignment was carefully verified and found to be consistent. Agreement (including both dark and light green residues) between FLYA and the manually assigned backbone atoms was found for 95% of 15 N, 92% of 13 C', 95% of 13 C α , 87% of H N , and 89% of H α atoms. The FLYA algorithm was also applied using automatically picked peak lists as input, and we found agreement to 82% of 15 N, 84% of 13 C', 82% of 13 C α , 75% of H N , and 76% of H α atoms ( Figure S4B ). We conclude that the automatic assignment provides a good starting point for manual assignment or a good check of manual results.

Secondary Structure From 13 C- and 1 H-detected Spectra

In order to compare the secondary structure determined by different approaches from solid-state NMR chemical shifts, either using SCS or by backbone dihedral angle statistics [TALOS+ ( Shen et al., 2009 )], we used the X-ray crystal structure of Rpo4/7 determined at 1.75 Å [PDB: 1GO3 ( Todone et al., 2001 )] as a common reference. The positions of the secondary structure elements were determined from the X-ray coordinates via the algorithm DSSP ( Kabsch and Sander, 1983 ). The results are indicated at the top of Figure 4 , Figures S5, S6 as well as by the gray bars.

www.frontiersin.org

Figure 4. (A) Difference of Δδ( 13 Cα) and Δδ( 13 Cβ) secondary chemical shifts (SCS) (red). (B) Negative sum of 1 H α and 1 H N SCS (purple). SCS are obtained by subtracting the random-coil shifts from the observed chemical shifts. Positive SCS differences indicate α-helices, negative SCS difference β-sheets. (C) Secondary structure based on 13 C and 15 N (light red), 1 H and 15 N (light blue) and all (light green) chemical shifts using TALOS+ ( Shen et al., 2009 ). Secondary structure elements observed by crystallography are shown as dark (α-helix) and light (β-sheet) gray shaded areas, according to PDB 1GO3 ( Todone et al., 2001 ).

As an indicator for the secondary structure, the SCS of C α , C β , CO, as well the SCS difference of C α and C β were calculated and are visualized in Figures S5, S6 . For solid-state NMR, the most commonly used indicator is ΔδC α -ΔδC β which has the advantage of being independent from reference errors ( Spera and Bax, 1991 ). Three or more negative values in a row indicate a β-sheet, four or more positive values an α-helix. For reference, the positions of the secondary structure elements were determined from the X-ray coordinates. The results are indicated in Figure 4A , Figures S5A, S6A , and Table S3 . Overall, the correspondence is good, with some significant deviations in the β-strands, in particular β2. Upon visual inspection of the structure of β2 and β3 in the crystal structure ( Figure S7 ), it becomes clear that this is related to the fact that β2 is rather distorted and irregular, while β3 is more regular. The difference between these two β-sheets is also clearly seen in the Ramachandran plots ( Figure S8 ). The differences in the NMR SCS are therefore based on actual structural properties.

To obtain secondary-structure information from proton-detected fingerprint spectra, SCS of both 1 H α and 1 H N were used ( Figure 4B , Figures S5B, S6A , Table S3 ). It is well-known ( Wang, 2002 ), that 15 N SCS is a poor indicator for secondary structure ( Figures S5, S6 , orange). Instead, the sum of 1 H α and 1 H N SCS appears to be a suitable measure for secondary structure identification ( Figure 4 , Figures S5, S6 , purple), even though summing up doesn't compensate for referencing errors. While not as precise as the 13 C chemical shifts, the sum of the two proton SCS still provides useful information about secondary structure.

Our results are similar to solution NMR in that SCS data of 1 H α for α-helices were found more reliable than that of 1 H N ( Wang, 2002 ). We found the 13 C α - 13 C β SCS data to be a more suitable indicator than SCS sum 1 H α + 1 H N data. Similarly, 1 H α SCS were shown ( Wang, 2002 ) to be on average more sensitive in distinguishing β-sheets from random coil conformations than 13 C α and 13 C β chemical shifts. In our case 13 C α - 13 C β SCS data were the most reliable. However, for big proteins where transfer efficiencies are not always good, 13 C β data may be unavailable ( Penzel et al., 2015 ; Stöppler et al., 2018 ). We identified that, besides of 13 C α SCS, the sum of 1 H α and 1 H N SCS is a suitable alternative parameter to derive secondary structure.

Additionally, secondary-structure elements were predicted using the software TALOS+ ( Shen et al., 2009 ) and are shown in Figure 4C . Three different combinations of chemical shifts derived from manual assignment were used: 13 C and 15 N, 1 H and 15 N, and all three available shifts. The combination of 13 C and 15 N data extracted using TALOS+ (light red) yielded the most promising results, as the predicted secondary structure fits well with the crystal structure, including strand β2 and β10 that were only incompletely recognized by the SCS data. Surprisingly, TALOS+ results did not improve upon inclusion of 1 H chemical shifts (light green); instead a disruption for strand β4 appeared and strands β2, β5, and β10 became shorter (see also Figure S9 for a comparison in terms of backbone dihedrals). In order to check the reliability of TALOS+ secondary structure results for cases where 13 C data are absent, we evaluated the combination of 1 H and 15 N chemical-shift values (light blue). The calculation resulted in two additional misplaced α-helices, which was not the case for other chemical-shift combinations that included 13 C data. Therefore, while TALOS+ predictions that included 13 C chemical shifts were successful, calculations including only 1 H and 15 N chemical shifts were here found to be less reliable than SCS analysis when the sum of 1 H α and 1 H N SCS is used.

Conclusions

Using MAS solid-state NMR, we sequentially assigned 78% of the 13 C, 15 N resonances of the RNA polymerase subunit Rpo7 in complex with unlabeled Rpo4, and successfully transferred these to 1 H detected NMR spectra assigning ~70% of the 1 H N and 1 H α resonances. Further assessing the secondary structure in comparison to the known crystal structure, our results confirm that 13 C SCS are a bona fide predictor of secondary structure elements. While using only 1 H α or 1 H N SCS alone showed an increased uncertainty in the boundaries of observed secondary structure elements compared to the crystal structure, in cases where 13 C β chemical shifts are not available, secondary structure elements can be identified using either 13 C α or the sum of 1 H α and 1 H N SCS.

The proton assignment forms the basis for protein-nucleic acid interaction studies to identify the RNA-binding sites of Rpo4/7 through 1 H chemical-shift perturbations. Proton chemical-shift values are in particular sensitive to non-covalent interactions involved in molecular recognition and thus serve as sensitive reporters. Also, the investigation of the molecular dynamics becomes accessible, in the presence and absence of nucleotides, through 15 N R 1ρ and R 2 ' relaxation-rate constants that, once protons are assigned, are measured most efficiently in a series of hNH fingerprint spectra or, with higher resolution, in hCANH spectra.

Data Availability Statement

All datasets generated for this study are included in the manuscript/ Supplementary Files .

Author Contributions

AT carried out protein syntheses and analyses, and generated NMR samples with support of DK. AT, with the help of TW and MS, conducted the NMR experiments and analyzed the data. PG extended FLYA capabilities and supported FLYA calculations carried out by TW. AT wrote the manuscript with input from all authors. TW, AB, and BM designed and supervised the study. All authors approved the submitted version.

This work was supported by the French ANR (ANR-14-CE09-0024B), the LABEX ECOFECT (ANR-11-LABX-0048) within the Université de Lyon program Investissements d'Avenir (ANR-11-IDEX-0007), by the Swiss National Science Foundation (Grant 200020_159707), and by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement n° 741863, FASTER). TW acknowledges support from the ETH Career SEED-69 16-1 and the ETH Research Grant ETH-43 17-2.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors thank Prof. Dina Grohmann (University of Regensburg, Germany) for providing plasmids and helpful discussion.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb.2019.00100/full#supplementary-material

Andreas, L. B., Le Marchand, T., Jaudzems, K., and Pintacuda, G. (2015). High-resolution proton-detected NMR of proteins at very fast MAS. J. Magn. Reson. 253, 36–49. doi: 10.1016/j.jmr.2015.01.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Baldus, M., Geurts, D. G., Hediger, S., and Meier, B. H. (1996). Efficient15N−13C polarization transfer by adiabatic-passage hartmann–hahn cross polarization. J. Magn. Reson. Series A 118, 140–144. doi: 10.1006/jmra.1996.0022

CrossRef Full Text | Google Scholar

Barbet-Massin, E., Pell, A. J., Retel, J. S., Andreas, L. B., Jaudzems, K., Franks, W. T., et al. (2014). Rapid proton-detected NMR assignment for proteins with fast magic angle spinning. J. Am. Chem. Soc. 136, 12489–12497. doi: 10.1021/ja507382j

Böckmann, A., Gardiennet, C., Verel, R., Hunkeler, A., Loquet, A., Pintacuda, G., et al. (2009). Characterization of different water pools in solid-state NMR protein samples. J. Biomol. NMR 45, 319–327. doi: 10.1007/s10858-009-9374-3

Bougault, C., Ayala, I., Vollmer, W., Simorre, J.-P., and Schanda, P. (2019). Studying intact bacterial peptidoglycan by proton-detected NMR spectroscopy at 100 kHz MAS frequency. J. Struct. Biol. 206, 66–72. doi: 10.1016/j.jsb.2018.07.009

Grohmann, D., Klose, D., Klare, J. P., Kay, C. W. M., Steinhoff, H.-J., and Werner, F. (2010). RNA-binding to archaeal RNA polymerase subunits F/E: A DEER and FRET Study. J. Am. Chem. Soc. 132, 5954–5955. doi: 10.1021/ja101663d

Grohmann, D., and Werner, F. (2010). Hold On! RNA polymerase interactions with the nascent RNA modulate transcription elongation and termination. RNA Biol. 7, 310–315. doi: 10.4161/rna.7.3.11912

Grohmann, D., and Werner, F. (2011). Cycling through transcription with the RNA polymerase F/E (RPB4/7) complex: structure, function and evolution of archaeal RNA polymerase. Res. Microbiol. 162, 10–18. doi: 10.1016/j.resmic.2010.09.002

Güntert, P., and Buchner, L. (2015). Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62, 453–471. doi: 10.1007/s10858-015-9924-9

Hantsche, M., and Cramer, P. (2016). The structural basis of transcription: 10 years after the nobel prize in chemistry. Angew. Chem. Int. Ed. 55, 15972–15981. doi: 10.1002/anie.201608066

Hediger, S., Meier, B. H., and Ernst, R. R. (1995). Adiabatic passage Hartmann-Hahn cross polarization in NMR under magic angle sample spinning. Chem. Phys. Lett. 240, 449–456. doi: 10.1016/0009-2614(95)00505-X

Kabsch, W., and Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637. doi: 10.1002/bip.360221211

Klose, D., Klare, J. P., Grohmann, D., Kay, C. W. M., Werner, F., and Steinhoff, H.-J. (2012). Simulation vs. reality: a comparison of in silico distance predictions with DEER and FRET measurements. PLoS ONE 7:e39492. doi: 10.1371/journal.pone.0039492

Linser, R., Dasari, M., Hiller, M., Higman, V., Fink, U., Lopez del Amo, J.-M., et al. (2011). Proton-detected solid-state NMR spectroscopy of fibrillar and membrane proteins. Angew. Chem. Int. Ed. 50, 4508–4512. doi: 10.1002/anie.201008244

Meka, H. (2005). Crystal structure and RNA binding of the Rpb4/Rpb7 subunits of human RNA polymerase II. Nucleic Acids Res. 33, 6435–6444. doi: 10.1093/nar/gki945

Penzel, S., Smith, A. A., Agarwal, V., Hunkeler, A., Org, M.-L., Samoson, A., et al. (2015). Protein resonance assignment at MAS frequencies approaching 100 kHz: a quantitative comparison of J-coupling and dipolar-coupling-based transfer methods. J. Biomol. NMR 63, 165–186. doi: 10.1007/s10858-015-9975-y

Sainsbury, S., Bernecky, C., and Cramer, P. (2015). Structural basis of transcription initiation by RNA polymerase II. Nat. Rev. Mol. Cell Biol. 16, 129–143. doi: 10.1038/nrm3952

Schmidt, E., Gath, J., Habenstein, B., Ravotti, F., Székely, K., Huber, M., et al. (2013). Automated solid-state NMR resonance assignment of protein microcrystals and amyloids. J. Biomol. NMR 56, 243–254. doi: 10.1007/s10858-013-9742-x

Schmidt, E., and Güntert, P. (2012). A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134, 12817–12829. doi: 10.1021/ja305091n

Schubeis, T., Le Marchand, T., Andreas, L. B., and Pintacuda, G. (2018). 1H magic-angle spinning NMR evolves as a powerful new tool for membrane proteins. J. Magn. Reson. 287, 140–152. doi: 10.1016/j.jmr.2017.11.014

Schuetz, A., Wasmer, C., Habenstein, B., Verel, R., Greenwald, J., Riek, R., et al. (2010). Protocols for the sequential solid-state NMR spectroscopic assignment of a uniformly labeled 25 kDa protein: HET-s(1-227). Chem. Eur. J. Chem. Bio. 11, 1543–1551. doi: 10.1002/cbic.201000124

Shen, Y., Delaglio, F., Cornilescu, G., and Bax, A. (2009). TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44, 213–223. doi: 10.1007/s10858-009-9333-z

Spera, S., and Bax, A. (1991). Empirical correlation between protein backbone conformation and C.alpha. and C.beta. 13C nuclear magnetic resonance chemical shifts. J. Am. Chem. Soc. 113, 5490–5492. doi: 10.1021/ja00014a071

Stevens, T. J., Fogh, R. H., Boucher, W., Higman, V. A., Eisenmenger, F., Bardiaux, B., et al. (2011). A software framework for analysing solid-state MAS NMR data. J Biomol NMR 51, 437–447. doi: 10.1007/s10858-011-9569-2

Stöppler, D., Macpherson, A., Smith-Penzel, S., Basse, N., Lecomte, F., Deboves, H., et al. (2018). Insight into small molecule binding to the neonatal Fc receptor by X-ray crystallography and 100 kHz magic-angle-spinning NMR. PLoS Biol. 16:e2006192. doi: 10.1371/journal.pbio.2006192

Struppe, J., Quinn, C. M., Lu, M., Wang, M., Hou, G., Lu, X., et al. (2017). Expanding the horizons for structural analysis of fully protonated protein assemblies by NMR spectroscopy at MAS frequencies above 100 kHz. Solid State Nucl. Magn. Reson. 87, 117–125. doi: 10.1016/j.ssnmr.2017.07.001

Studier, F. W. (2005). Protein production by auto-induction in high density shaking cultures. Protein Expr. Purif. 41, 207–234. doi: 10.1016/j.pep.2005.01.016

Takegoshi, K., Nakamura, S., and Terao, T. (2003). 13C−1H dipolar-driven 13C−13C recoupling without 13C rf irradiation in nuclear magnetic resonance of rotating solids. J. Chem. Phys. 118, 2325–2341. doi: 10.1063/1.1534105

Terrific Broth (2006). Cold Spring Harb. Protoc. 2006:pdb.rec8620. doi: 10.1101/pdb.rec8620

Todone, F., Brick, P., Werner, F., Weinzierl, R. O., and Onesti, S. (2001). Structure of an archaeal homolog of the eukaryotic RNA polymerase II RPB4/RPB7 complex. Mol. Cell 8, 1137–1143. doi: 10.1016/S1097-2765(01)00379-3

Újvári, A., and Luse, D. S. (2006). RNA emerging from the active site of RNA polymerase II interacts with the Rpb7 subunit. Nat. Struct. Mol. Biol. 13, 49–54. doi: 10.1038/nsmb1026

Verel, R., Ernst, M., and Meier, B. H. (2001). Adiabatic dipolar recoupling in solid-state NMR: the DREAM scheme. J. Magn. Reson. 150, 81–99. doi: 10.1006/jmre.2001.2310

Wang, Y. (2002). Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Sci. 11, 852–861. doi: 10.1110/ps.3180102

Werner, F. (2007). Structure and function of archaeal RNA polymerases. Mol. Microbiol. 65, 1395–1404. doi: 10.1111/j.1365-2958.2007.05876.x

Werner, F. (2008). Structural evolution of multisubunit RNA polymerases. Trends Microbiol. 16, 247–250. doi: 10.1016/j.tim.2008.03.008

Werner, F., and Grohmann, D. (2011). Evolution of multisubunit RNA polymerases in the three domains of life. Nat. Rev. Microbiol. 9, 85–98. doi: 10.1038/nrmicro2507

Werner, F., and Weinzierl, R. O. J. (2002). A recombinant RNA polymerase II-like enzyme capable of promoter-specific transcription. Mol. Cell 10, 635–646. doi: 10.1016/S1097-2765(02)00629-9

Westfeld, T., Verel, R., Ernst, M., Böckmann, A., and Meier, B. H. (2012). Properties of the DREAM scheme and its optimization for application to proteins. J. Biomol. NMR 53, 103–112. doi: 10.1007/s10858-012-9627-4

Wishart, D. S., and Sykes, B. D. (1994). The 13C chemical-shift index: a simple method for the identification of protein secondary structure using 13C chemical-shift data. J Biomol NMR 4, 171–180.

PubMed Abstract | Google Scholar

Wishart, D. S., Sykes, B. D., and Richards, F. M. (1992). The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. Biochemistry 31, 1647–1651. doi: 10.1021/bi00121a010

Keywords: Rpo4/7, solid-state NMR, carbon and proton assignments, secondary chemical shifts, ssFLYA

Citation: Torosyan A, Wiegand T, Schledorn M, Klose D, Güntert P, Böckmann A and Meier BH (2019) Including Protons in Solid-State NMR Resonance Assignment and Secondary Structure Analysis: The Example of RNA Polymerase II Subunits Rpo4/7. Front. Mol. Biosci. 6:100. doi: 10.3389/fmolb.2019.00100

Received: 02 July 2019; Accepted: 17 September 2019; Published: 04 October 2019.

Reviewed by:

Copyright © 2019 Torosyan, Wiegand, Schledorn, Klose, Güntert, Böckmann and Meier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Anja Böckmann, a.bockmann@ibcp.fr ; Beat H. Meier, beme@ethz.ch

† These authors have contributed equally to this work

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of springeropen

Assigning NMR spectra of RNA, peptides and small organic molecules using molecular network visualization software

Jan marchant.

1 Department of Chemistry and Biochemistry, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 USA

Michael F. Summers

2 Howard Hughes Medical Institute, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 USA

Bruce A. Johnson

3 Structural Biology Initiative, CUNY Advanced Science Research Center, 85 St. Nicholas Terrace, New York, NY 10031 USA

Associated Data

NMR assignment typically involves analysis of peaks across multiple NMR spectra. Chemical shifts of peaks are measured before being assigned to atoms using a variety of methods. These approaches quickly become complicated by overlap, ambiguity, and the complexity of correlating assignments among multiple spectra. Here we propose an alternative approach in which a network of linked peak-boxes is generated at the predicted positions of peaks across all spectra. These peak-boxes correlate known relationships and can be matched to the observed spectra. The method is illustrated with RNA, but a variety of molecular types should be readily tractable with this approach.

Electronic supplementary material

The online version of this article (10.1007/s10858-019-00271-3) contains supplementary material, which is available to authorized users.

The power of NMR spectroscopy relative to other molecular spectroscopies lies in the ability to detect spectral signals and interactions associated with specific atoms. Requisite assignment of NMR signals typically follows a paradigm of measuring the chemical shifts of local maxima (peaks) within each spectrum followed by correlating signals within and among different types of NMR spectra and associating those peak positions with specific atoms, either by automated methods or interactive analysis. Although automated assignment methods are desirable and work toward this goal is ongoing, interactive methods remain the standard for much NMR assignment. Although interactive analysis is aided by the display of peak-boxes associated with each measured peak, which can display known assignments or other annotations, the tracking of thousands of scalar- and dipolar-coupled peaks in multiple datasets can be challenging. We describe here an inverted approach that focuses on networks of coupled peaks that are predicted from the molecular structure and type of NMR experiment. Instead of picking peak positions and then attempting to assign them, we generate a linked network of assigned peak-boxes at these predicted positions that can then be interactively aligned with the observed spectra. This approach allows the spectroscopist to make simultaneous use of multiple spectral features that can minimize ambiguity in the assignments compared to the process of assigning individual peaks.

Our technique relies on a priori knowledge of the molecular topology and the ability to predict chemical shifts and coupling patterns. Both types of information are available for a range of molecule types (Steinbeck et al. 2003 ; Ulrich et al. 2008 ; Barton et al. 2013 ; Brown et al. 2015 ). We have used the technique to assign RNAs as large as 68 nucleotides (including fragments of much larger RNA projects) (Keane et al. 2015 ; Marchant et al. 2018 ; Zhang et al. 2018 ) and small cyclic peptides, but the principles apply to many molecular types including DNA, modest sized peptides, and arbitrary small organic molecules. The higher the quality of chemical shift predictions and predicted NOE peaks the better the starting point, but the approach allows bootstrapping from better portions of the starting set to regions of lower quality.

The approach described here could be implemented with a variety of tools for chemical shift prediction, peak network generation and interactive assignment. Here we describe using the protocol with NMRFx Analyst, a software tool that is freely available and open sourced and extends the existing NMRFx Processor (Norris et al. 2016 ). NMRFx Analyst integrates NMR processing, chemical shift prediction and peak picking and assignment tools useful for this approach. An earlier implementation of the approach is also available in NMRViewJ (Johnson and Blevins 1994 ). There are several requirements to implement the approach in other tools. The key requirements are a source of predicted shifts and the ability to interactively move multiple peaks in response to the movement of a single peak. The shift prediction is done once at the start of a project, and so can be done with an external tool. Generation of peaks based on the NMR shift prediction can also be done external to the software. So any NMR analysis tool (such as CCPN Analyst (Skinner et al. 2016 ), Sparky (Lee et al. 2015 ) or CARA(Keller)) that can read external peak files has the core technology to get started without any modifications. The interactive adjustment of peak positions in response to moving a single peak would likely require code modifications or a plugin module, but this should be relatively straightforward to implement.

We describe here the approach for a 50-nucleotide RNA hairpin. A 3 min video illustrating the major steps on a 22-nucleotide RNA hairpin is available as supplementary material. The molecular topology for the RNA is readily available from the primary sequence coupled with NMRFx Analyst’s built-in library of nucleotides. A secondary structure (or if available, the tertiary structure) is additionally necessary for chemical shift prediction and NOE cross-peak prediction. Predicting chemical shifts of the target molecule is an essential component of the protocol. For RNA molecules we use our previously described attribute-based shift prediction technique (Barton et al. 2013 ; Brown et al. 2015 ), but with 3D coordinates a structure based method could be used (Frank et al. 2013 , 2014 ; Brown et al. 2015 ). The attribute technique predicts hydrogen, carbon, and nitrogen chemical shifts based on a set of attributes describing the central nucleotide in a five-nucleotide window. The only input necessary is the primary sequence and a dot-bracket style representation of the secondary structure (Lorenz et al. 2011 ).

For RNA assignments we have used a set of three different experiments. These are homonuclear 2D TOCSY, 2D 1 H- 13 C HMQC and 2D NOESY. The technique is not dependent on having the TOCSY and HMQC experiments, but a greater number of complementary experiments will reduce ambiguities in the assignment process. Each experiment type necessitates a different protocol for peak-box generation. The TOCSY protocol simply generates peak-boxes for protons that have less than a specified number of homonuclear J-coupling steps. In particular, the H5–H6 coupling of uracil and cytosine and couplings between ribose protons are generated. The HMQC involves all carbons with directly bonded protons. While the expected peaks and correlations for the HMQC and TOCSY are relatively insensitive to tertiary structure, peak-box generation for the NOESY involves various assumptions.

For an RNA (or other molecule) where the 3D structure is known peak-boxes are generated for all hydrogen pairs whose distance is less than a specified limit (often 5 or 6 Å). Where the 3D structure is not available, NMRFx Analyst uses the secondary structure specified with dot-bracket notation and a built-in set of rules to generate peak-boxes for helical and tetraloop regions. Intra-residue peak-boxes are also generated and are less dependent on the structural information. While this NOESY protocol is unable to generate inter-residue peak-boxes in larger loops, the combination of peak-boxes in helical and tetraloop regions and intra-residue peak-boxes in all regions gives a substantial number of predicted peaks that can be used as a basis for a search to other regions. The intra-residue assignments can be used to get the correct shift assignments which are then used to assign peaks that haven’t been predicted ( vide infra ).

Overlapping peaks are a serious impediment to the assignment of larger RNAs, but this can be alleviated by the use of isotopically labeled RNA molecules to minimize the number of spectral peaks (Lu et al. 2010 ; Longhini et al. 2016 ). Nucleotide and atom-specific 2 H labeling, or 13 C labeling combined with pulse sequences that filter and edit the spectra based on the presence of 13 C labelled nuclei can be used to generate a complementary set of experiments in which the number of peaks in each individual experiment is reduced, but all expected peaks can be observed in the complete set of experiments (LeBlanc et al. 2017 ). NMRFx Analyst allows specifying the labeling pattern by both nucleotide type and specific residues. The peak-box generator uses this in combination with each experiment’s edit-filter scheme to generate the expected cross-peaks for the labelled RNA.

Once the set of peak-boxes is generated for each experiment the user can begin to interactively assign the spectra. Each available spectrum is displayed with its corresponding peak-boxes superimposed. Any given spectrum might be displayed in multiple windows so that expansions of relevant portions of the spectra can be displayed. The user can then interactively drag, with motions of mouse or track pad, a peak-box from its predicted position to alignment with an observed spectral peak (Fig.  1 ).

An external file that holds a picture, illustration, etc.
Object name is 10858_2019_271_Fig1_HTML.jpg

Screenshot of the NMRFx Analyst GUI with a network assignment procedure in progress. The rectangular peak-boxes illustrate predicted peaks, label numbers indicate the residues involved, and arrows are used to show whether peak-boxes can be moved in each dimension (no X) or are frozen in that dimension (with X). Peak-boxes in black (with residue numbers 6–46, and 6–7) are initially in the predicted positions and can be freely adjusted, as shown by black arrows for peak-box 6–46. Peak-box 7–6 (red) has been selected (yellow background) and then frozen and can no longer be adjusted in either dimension. As a consequence of freezing this peak-box, peak-box 7 (orange) is now frozen in the horizontal position yet adjustable in the vertical so it could be slid down to align with the peak below. The opposite is true for peak-box 6 (magenta) which could be slid left to align with a peak. Other red peak-boxes have already been positioned and frozen. Controls at bottom allow for freezing and thawing peaks. The Tweak + Freeze button will automatically center a peak-box on an overlapped peak before freezing

In the traditional approach, peak-boxes are initially not assigned so there is no unambiguous relationship between different peak-boxes within the spectrum or between spectra. In this new approach, while peak-boxes are not necessarily correctly positioned, they each have an assigned atom for each dimension. The assignment means that sets of peaks will share atoms on one or both dimensions. This is illustrated visually when one selects a peak as shown in Fig.  1 . Connecting lines are drawn between peak-boxes with common atom assignments. As a user drags a peak-box, the entire set of peak-boxes that share an atom with the moved peak will move synchronously with the directly shifted peak. The essence of the method is that whereas observing an individual peak in relation to a spectral signal might be ambiguous, a whole set of coupled peaks is not.

Individual peak-boxes may initially be predicted to be close to multiple spectral signals, precluding unambiguous placement in isolation. In this new approach, however, the entire set of linked peak-boxes across multiple experiments inform the user’s decision. An example of this is shown in Fig.  2 , step 3, where two possible alignments of a group of peak-boxes within the NOESY spectrum are possible, but can be resolved with analysis of the HMQC spectrum. Positioning peak-boxes in crowded regions is still difficult, but is often unnecessary due to the presence of linked peak-boxes that are in uncrowded regions. An additional practical advantage of the approach is that typographical errors are minimized. Rather than the user typing in, with possible errors, an atomic assignment to a peak-box label field in the GUI, all peaks start with a computer generated assignment.

An external file that holds a picture, illustration, etc.
Object name is 10858_2019_271_Fig2_HTML.jpg

Demonstration of the assignment procedure for a portion of a 50 nt RNA. In each panel the upper spectrum is a 1 H- 1 H NOESY and the lower a 1 H- 13 C HMQC. 1 Peak-boxes are initially positioned according to predicted chemical shifts. Upon selecting a peak-box for positioning, the linked peak-boxes are indicated by connecting lines. Visual inspection identifies a candidate peak to which the peak-box labeled 4–5 is manually repositioned, as indicated by the solid arrow. Linked peaks are repositioned automatically, as indicated by the dashed arrows. 2 The peak-box position is frozen, indicated in red. The remaining three peak-boxes in the spin system are automatically frozen, and prevented from moving in their shared dimension, indicated in orange for the x -axis. Their associated peaks are readily identified due to this restriction. 3 Examination of the NOESY spectrum reveals two well-matched possibilities for assignment of the peak-box labeled 7. The correct assignment is found by reference to the HMQC spectrum, in which there is only one reasonable candidate. 4 Repositioning the remaining peak-boxes for the spin-system associated with this atom automatically repositions associated peak-boxes from the remaining spin-system under consideration. 5 The remaining spin-system contains peak-boxes restricted from moving along the y -axis due to previously frozen peaks, indicated in magenta, such that their associated peaks are readily identified. 6 Final positions of the peak-boxes under consideration

The protocol is greatly facilitated by a means to specify whether any given peak-box has been positioned into a final location. In NMRFx Analyst, this is done by clicking a “Freeze” button or using a corresponding keyboard shortcut. Once frozen, a peak-box will be displayed with a different color so that the user has a visual indication of which peak-boxes have been confidently placed (Fig.  1 ). Freezing an individual peak-box will lock both of its dimensions to their current position so that it can’t subsequently be moved. The linked (sharing the same atom) dimensions of other peak-boxes, in the same and different experiments, will also be frozen. Thus, linked peak-boxes might only be frozen in a single dimension. Such peak-boxes may only be slid along the free dimension which facilitates their assignment by minimizing the choice of locations to a single dimension. A color scheme is used to indicate whether a peak is frozen on the x-axis, y-axis or both axes. Peak-boxes can also be unlocked via a “Thaw” button. Freezing peak-boxes also updates the atom assignment table with the chemical shift of the peak-box dimensions. Thus the final assignment list is generated from only peak-boxes that have been frozen.

As described above, the set of peak-boxes generated for NOESY spectra requires assumptions about the molecular structure and it is unlikely that they will perfectly match the spectra. Extraneous peak-boxes are easily deleted. Where peaks cannot be associated with a generated peak-box, the user can manually add a peak-box at the peak’s location. The software still provides significant value in this process as the observed signal might align with peak-boxes that have already been frozen. In this case assignment possibilities for the manually added peak-box are displayed and a link can be made to the already frozen peak-boxes.

The above description has focused on applications to RNA. The approach, however, was initially developed as a means to assign cyclic peptides. The basic protocol for peptides is essentially the same as described above. The differences involve methods for chemical shift prediction and rules for peak-box generation. Predicted chemical shifts can be obtained simply from average chemical shifts for standard amino-acids available from the BMRB (Ulrich et al. 2008 ). Alternatively, NMRFx Analyst includes a built-in (as yet, unpublished) tool for generating predictions based on sequence and dihedral angles, and optionally ring-current shifts. Projects involving cyclic peptides often include non-canonical amino-acids (Hosseinzadeh et al. 2017 ). Shift prediction for non-canonical amino-acids is supported using a built-in predictor based on HOSE codes that can form predictions for any arbitrary organic molecule. Peptides, and all other molecules supported, can also use predictions generated in 3rd party software and imported from a text file. As for RNA, 2D TOCSY, 1 H- 13 C HMQC and 2D NOESY experiments have been implemented, but various experiment combinations are possible. COSY experiments can be included, for example, by using the TOCSY peak-box generation protocol but limiting the number of transfer steps in the peak generator to one. The TOCSY and HMQC experiments are particularly robust because they don’t depend on having 3D structural information, though constraints involved in cyclizing the peptide can be used to generate a reasonable family of structures for NOESY predictions.

The described protocol is also completely applicable to arbitrary small organic molecules and provides a means to rapidly assign, without typographical errors, these molecules using one or more 2D spectra. Predictions can be made using the internal HOSE code based predictor or external tools (Schütz et al. 1997 ; Smurnyy et al. 2008 ). Prediction of NOESY peaks to complement those from scalar-coupled experiments can be made with an approximate 3D structure. Missing and additional peaks can be dealt with as described above.

While the chemical shift predictions that are used always have some level of error, a key benefit of this approach is that individual errors of large magnitude are easily identified and tolerated due to redundancy in the network of moving peaks. More widespread errors in the predicted chemical shifts, particularly if accompanied by errors in the predicted network of NOEs, would potentially prove more challenging, however in our experience of close to 100 distinct RNA molecules this problem has not arisen. This tolerance to error should also allow the method to be used in situations such as RNA–protein complexes where the RNA chemical shifts near the interface are perturbed from their expected values.

The above protocol, as implemented in NMRFx Analyst, provides a rapid way to facilitate the assignment of a variety of RNA, DNA, peptides and small molecules. It has been used for the assignment of a variety of published RNA projects (Keane et al. 2015 ; Marchant et al. 2018 ; Zhang et al. 2018 ) and for rapid assignment of a variety of cyclic peptides (unpublished studies). Its use requires access to chemical shift predictions which are available within NMRFx Analyst or through a wide variety of external software packages. Prediction of peaks expected in scalar-coupled experiments (e.g. TOCSY, COSY, and HMQC) require only an understanding of the covalent structure of the molecule and prediction of a significant number of NOESY peaks can be made with reasonable assumptions about structure. In particular, intra-residue peaks can be predicted and used to aid in assigning inter-residue peaks. The protocol fits between the traditional manual assignment methods that rely on assigning picked peaks and fully automated methods. We anticipate that it will form a basis for adding more automated capabilities in the future. For example, one can already drag a peak near to a signal and have it automatically positioned to the close peak. By basing the automated capabilities on this visual tool, the user will be able to observe the results of the automation and manually intervene. As chemical shift and structural prediction methods are developed across all molecule types, we expect the approaches for chemical shift assignment illustrated here to be adopted into widespread use.

Below is the link to the electronic supplementary material.

Acknowledgements

This work was supported in part by grants from the National Institute of General Medical Sciences of the National Institutes of Health (U54 GM 103297 to BAJ and JM, R01 GM 123012 to BAJ, and GM 42561 to MFS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Barton S, Heng X, Johnson BA, Summers MF. Database proton NMR chemical shifts for RNA signal assignment and validation. J Biomol NMR. 2013; 55 :33–46. doi: 10.1007/s10858-012-9683-9. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown JD, Summers MF, Johnson BA. Prediction of hydrogen and carbon chemical shifts from RNA using database mining and support vector regression. J Biomol NMR. 2015; 63 :39–52. doi: 10.1007/s10858-015-9961-4. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frank AT, Bae SH, Stelzer AC. Prediction of RNA 1H and 13C chemical shifts: a structure based approach. J Phys Chem B. 2013; 117 :13497–13506. doi: 10.1021/jp407254m. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frank AT, Law SM, Brooks CL. A simple and fast approach for predicting 1H and 13C chemical shifts: toward chemical shift-guided simulations of RNA. J Phys Chem. 2014; 118 (42):12168–12175. doi: 10.1021/jp508342x. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hosseinzadeh P, Bhardwaj G, Mulligan VK, et al. Comprehensive computational design of ordered peptide macrocycles. Science. 2017; 358 :1461–1466. doi: 10.1126/science.aap7577. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnson BA, Blevins RA. NMRView: a computer program for the visualization and analysis of NMR data. J Biomol NMR. 1994; 4 :603–614. doi: 10.1007/BF00404272. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Keane SC, Heng X, Lu K, et al. RNA structure. Structure of the HIV-1 RNA packaging signal. Science. 2015; 348 :917–921. doi: 10.1126/science.aaa9266. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Keller R, CARA: http://cara.nmr.ch
  • LeBlanc RM, Longhini AP, Le Grice SFJ, et al. Combining asymmetric 13C-labeling and isotopic filter/edit NOESY: a novel strategy for rapid and logical RNA resonance assignment. Nucleic Acids Res. 2017; 45 :e146. doi: 10.1093/nar/gkx591. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee W, Tonelli M, Markley JL. NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics. 2015; 31 :1325–1327. doi: 10.1093/bioinformatics/btu830. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Longhini AP, LeBlanc RM, Becette O, et al. Chemo-enzymatic synthesis of site-specific isotopically labeled nucleotides for use in NMR resonance assignment, dynamics and structural characterizations. Nucleic Acids Res. 2016; 44 :e52. doi: 10.1093/nar/gkv1333. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lorenz R, Bernhart SH, Zu Siederdissen CH, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011; 6 (1):26. doi: 10.1186/1748-7188-6-26. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lu K, Miyazaki Y, Summers MF. Isotope labeling strategies for NMR studies of RNA. J Biomol NMR. 2010; 46 :113–125. doi: 10.1007/s10858-009-9375-2. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marchant J, Bax A, Summers MF. Accurate measurement of residual dipolar couplings in large RNAs by variable flip angle NMR. J Am Chem Soc. 2018; 140 :6978–6983. doi: 10.1021/jacs.8b03298. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Norris M, Fetler B, Marchant J, Johnson BA. NMRFx Processor: a cross-platform NMR data processing program. J Biomol NMR. 2016; 65 :205–216. doi: 10.1007/s10858-016-0049-6. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schütz V, Purtuc V, Felsinger S, Robien W. CSEARCH-STEREO: a new generation of NMR database systems allowing three-dimensional spectrum prediction. Fresenius J Anal Chem. 1997; 359 :33–41. doi: 10.1007/s002160050531. [ CrossRef ] [ Google Scholar ]
  • Skinner SP, Fogh RH, Boucher W, et al. CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis. J Biomol NMR. 2016; 66 :111–124. doi: 10.1007/s10858-016-0060-y. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smurnyy YD, Blinov KA, Churanova TS, et al. Toward more reliable 13C and 1H chemical shift prediction: a systematic comparison of neural-network and least-squares regression based approaches. J Chem Inf Model. 2008; 48 :128–134. doi: 10.1021/ci700256n. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Steinbeck C, Krause S, Kuhn S. NMRShiftDB-constructing a free chemical information system with open-source components. J Chem Inf Comput Sci. 2003; 43 :1733–1739. doi: 10.1021/ci0341363. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ulrich EL, Akutsu H, Doreleijers JF, et al. BioMagResBank. Nucleic Acids Res. 2008; 36 :D402–D408. doi: 10.1093/nar/gkm957. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang K, Keane SC, Su Z, et al. Structure of the 30 kDa HIV-1 RNA dimerization signal by a hybrid Cryo-EM, NMR, and molecular dynamics approach. Structure. 2018; 26 :490–498. doi: 10.1016/j.str.2018.01.001. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Chemistry Steps

Chemistry Steps

proton nmr assignment

Organic Chemistry

Nuclear magnetic resonance (nmr) spectroscopy.

In the previous post , we talked about the principles behind the chemical shift addressing questions like how the ppm values are calculated, why they are independent of the magnetic field strength, and what is the benefit of using a more powerful instrument.

Today, the focus will be on specific regions of chemical shift characteristic for the most common functional groups in organic chemistry.

Below are the main regions in the 1 H NMR spectrum and the ppm values for protons in specific functional groups:

proton nmr assignment

The energy axis is called a δ (delta) axis and the units are given in part per million (ppm) . Most often the signal area for organic compounds ranges from 0-12 ppm .

The right side of the spectrum is the low-energy region ( upfield ) and the left side is the high-energy region ( downfield ). This might be a confusing terminology and we talked about its origin earlier , so read that post if you need to know more but you definitely need to remember that:

Downfield means higher energy – left side of the spectrum (higher ppm)

Upfield means lower energy – right side of the spectrum (lower ppm)

proton nmr assignment

Let’s start with the chemical shift of protons of alkyl C-H groups.

The Chemical Shift of Connected to sp 3 Hybridized Carbons

We can see in the table that sp 3 hybridized C – H bonds in alkanes and cycloalkanes give signal in the upfield region (shielded, low resonance frequency) at the range of 1–2 ppm .

The only peak that comes before saturated C-H protons is the signal of the protons of tetramethylsilane, (CH3) 4 Si, also called TMS. This is a standard reference point with the signal set exactly at 0 ppm and y ou can ignore it when analyzing an NMR spectrum. There are a lot of compounds especially organometallics that give signals at negative ppm, but you will probably not need those in undergraduate courses.

One trend to remember here is that protons bonded to more substituted carbon atoms resonate at higher ppm:

proton nmr assignment

The Chemical Shift of Protons Connected to Heteroatoms

The second group of protons giving signal in this region is the ones bonded to heteroatoms such as oxygen and nitrogen. And even though the signal can be in the range from 1-6 ppm, it is usually in the downfield end of this spectrum.

This is due to the higher electronegativity of those atoms pulling the electron density and deshielding the protons. As a result, they are more exposed to the magnetic field and require higher energy radiation for resonance absorption.

The effect of electron-withdrawing groups on the chemical shift can be visualized by the image below:

proton nmr assignment

The stronger the electron-withdrawing group, the more deshielded the adjacent protons and the higher their ppm value.

Now, 1-6 ppm for protons on heteroatoms is a broad range and to recognize these peaks easier, keep in mind that they also appear broader as a result of hydrogen bonding .

The O-H and N-H protons are exchangeable, and this is a handy feature because when in doubt, you can add a drop of deuterated water (D 2 O) and make the signal disappear since deuterium does not resonate in the region where protons do:

proton nmr assignment

Other groups that give broad, and sometimes, deuterium-exchangeable signals are the amines, amides, and thiols.

And one more thing, which we will discuss in the signal splitting, is that the OH signal is not split by adjacent protons unless the sample is very well-dried.

The Chemical Shift of Protons on sp 2 Hybridized Carbons

The protons of alkenes are deshielded and their signals appear downfield from the saturated C-H protons in the 4-6 ppm range .

There are two reasons for this. First, sp2 hybridized carobs are more electronegative than sp 3 carbons since they have more s character (33% vs 25% s ). So, sp 2 orbitals hold electrons closer to the nucleus than the sp 3 orbitals do which means less shielding , therefore a stronger “feel” of the magnetic field and a higher resonance frequency.

The second reason is a phenomenon called magnetic anisotropy.  When protons on carbon-carbon double bond are placed in a magnetic field, the circulating π electrons create a local magnetic field that adds to the applied field which causes them to experience a stronger net field and therefore resonate at a higher frequency:

proton nmr assignment

This effect is more pronounced in aromatic compounds which have resonance in the range from 7 to 8 ppm . The circulation of the p electrons in benzene is called a ring current and the protons experience an additional magnetic field that is induced by this ring current.

Interestingly, aromatic compounds with inner hydrogens such as, for example, porphyrins, [18]-annulene and the ones with hydrogens over the ring are shielded by the induced magnetic field and appear scientifically upfield :

proton nmr assignment

Interestingly, antiaromatic compounds generate a different ring current which in turn generates an induced magnetic field with opposite directions than in aromatic compounds. Thus,  antiaromatic  systems show the  opposite trend : the  inner protons appear in a higher ppm  area than the outer protons. For example, the protons outside the ring of [12]annulene appear at 5.91 ppm whereas the inner protons are characterized by a chemical shift of 7.86 ppm:

proton nmr assignment

The Chemical Shift of Alkynes

The p electrons of a triple bond generate a local magnetic field just as we discussed for alkenes and one would expect to see their signal more downfield since the sp carbon is more electronegative than sp 2 carbons.

However, hydrogens of external alkynes resonate at a lower frequency than vinylic hydrogens that appear in the 2-3 ppm range.

The reason is that, unlike alkenes, the induced magnetic field of the p electrons in the triple bond is opposite to the applied magnetic field. This puts the proton in a shielded environment and thus it feels a weaker magnetic field:

proton nmr assignment

The conflicting effects of magnetic anisotropy and the higher electronegativity of sp hybridized carbons put the signal of acetylenic hydrogens in between alkanes (1-1.8 ppm) and alkenes (4-6 ppm).

  • NMR spectroscopy – An Easy Introduction
  • NMR Chemical Shift
  • NMR Chemical Shift Range and Value Table
  • NMR Number of Signals and Equivalent Protons
  • Homotopic Enantiotopic Diastereotopic and Heterotopic
  • Homotopic Enantiotopic Diastereotopic Practice Problems
  • Integration in NMR Spectroscopy
  • Splitting and Multiplicity (N+1 rule) in NMR Spectroscopy
  • NMR Signal Splitting N+1 Rule Multiplicity Practice Problems
  • 13 C NMR NMR
  • DEPT NMR: Signals and Problem Solving
  • NMR Spectroscopy-Carbon-Dept-IR Practice Problems

11 thoughts on “NMR Chemical Shift Values Table”

Is there any information on multiple splitting patterns, e.g. doublet of doublet or doublet of triplets, for example?

Although I have not written an article on complex splitting yet, they are summarized in the NMR study guides on page 3. You can download them on the CS Benefits page .

For the general splitting patterns, refer to these: Splitting and Multiplicity (N+1 rule) in NMR Spectroscopy NMR Signal Splitting N+1 Rule Multiplicity Practice Problems

Thank you! I was so confused about the energy of upfield and downfield…

Good to know it makes more sense now, Raul.

Very nice and easy to understand. Thank you very much

Thank you, Zia.

Hello, How would I sketch the H NMR spectrum of 4-chlorocumene?

Chemdraw is the gold standard for organic chemistry. It is paid program, however, they have the possibility of free use in the browser – https://chemdrawdirect.perkinelmer.cloud/js/sample/index.html . I don’t know if this option comes with sketching spectra though. You can try ChemSpider too.

My heartiest thanks.

This is a really nice resource! Thank you!

Nice thanks

Leave a Comment Cancel reply

Notify me of followup comments via e-mail. You can also subscribe without commenting.

Have we been helpful? Please let us know in the Reviews section here.

web analytics

WassUp 1.9.4.5 timestamp: 2024-05-20 02:06:01PM UTC (09:06AM) If above timestamp is not current time, this page is cached.

Advanced Organic Chemistry: 1H NMR spectrum of phenol C 6 H 5 OH

Interpreting t he H-1 hydrogen-1 (proton) NMR spectrum of phenol C 6 H 5 OH

H-1 proton NMR spectroscopy - spectra index

email doc brown Re-edit   C 6 H 5 OH

Links associated with phenol

This is a BIG chemistry website, PLEASE take time to explore it

Introductory note on the 1H NMR spectra of phenol

Students and teachers please note my explanation of the proton NMR spectrum of phenol is designed for advanced, but pre-university, chemistry courses . The chemical shift δ splitting pattern effects for phenol are confined to a proton spin-spin coupling effects analysed using the n+1 rule for adjacent non-equivalent proton fields (n is the number of neighbouring protons in a non-equivalent different chemical environment for the phenol molecule) . It is assumed that the integrated intensities of the δ chemical shifts give the ratio of the protons in the different non-equivalent chemical environments of the phenol molecule. The most common solvent used for investigating the 1H NMR spectrum of compounds like phenol, is CDCl 3 and other deuterated solvents to avoid confusion with a 1 H NMR signal, 2 D ( 2 H) has a different chemical shift.
TMS is the acronym for tetramethylsilane, formula Si(CH 3 ) 4 , whose protons are arbitrarily given a chemical shift of 0.0 ppm. This is the 'standard' in 1 H NMR spectroscopy and all other proton resonances, called chemical shifts , are measured with respect to the TMS, and depend on the individual (electronic) chemical environment of the hydrogen atoms in an organic molecule - phenol here. The chemical shifts quoted in ppm on the diagram of the H-1 NMR spectrum of phenol represent the peaks of the intensity of the chemical shifts of (which are often groups of split lines at high resolution) AND the relative integrated areas under the peaks gives you the ratio of protons in the different chemical environments of the phenol molecule.
Interpreting the H-1 NMR spectrum of phenol In terms of spin-spin coupling from the possible proton magnetic orientations, for phenol I have only considered the interactions of non-equivalent protons on adjacent carbon atoms e.g. R-C H -C H-X protons etc. but no splitting of or by the hydroxyl OH proton. You need high resolution H-1 NMR spectrum of phenol to detect the different proton environments. The 6 hydrogen atoms (protons) of phenol occupy 4 different chemical environments so that the high resolution NMR spectra should show 4 principal peaks of different H-1 NMR chemical shifts (diagram above for phenol). Chemical shifts (a) to (d) on the H-1 NMR spectrum diagram for phenol.

Although there are 6 hydrogen atoms in the molecule, there are only 4 possible different chemical environments for the hydrogen atoms in phenol molecule.

The integrated signal proton ratio of 2:2:1:1 observed in the high resolution H-1 NMR spectrum, corresponds with the structural formula of phenol.

The high resolution 1H NMR spectrum of phenol

The high resolution spectra of phenol shows 4 groups of proton resonances and in the 2:2:1:1 ratio expected from the structural formula of phenol.

The ppm quoted on the diagram represent the peak of resonance intensity for a particular proton group in the molecule of phenol - since the peak' is at the apex of a band of H-1 NMR resonances due to spin - spin coupling field splitting effects - see high resolution notes on phenol below.

So, using the chemical shifts and applying the n+1 rule to phenol and make some predictions using some colour coding! (In problem solving you work the other way round!)

Resonance (a) 1 H Chemical shift for O H proton, 5.35 ppm.

This is observed as a singlet, there are no adjacent protons on the C1 carbon atom of the benzene ring.

One of the problems in interpreting NMR spectra is that the benzene ring CH proton 1H resonances (converted to chemical shifts) are often quite close together e.g. as in the 1H NMR spectrum of phenol. Resonance (b) 1 H Chemical shift for a C H protons on C2/C6, 6.84 ppm. This 1 H NMR resonance applies to the protons on the equivalent carbon atoms C2 and C6. This resonance is split into a 1:1 doublet by the adjacent C3 or C5 CH proton (n+1 = 2). Note there is no proton on carbon atom C1 that might increase the splitting effect. Evidence for the presence of a CH group in the molecule of phenol

Resonance (c) 1 H Chemical shift for a C H protons on C3/C5, 7.24 ppm.

This 1 H NMR resonance applies to the protons on the equivalent carbon atoms C3 and C5. This resonance is split into a 1:2:1 triplet by the adjacent CH protons on C4 and C6 on either side (n+2 = 3).

Resonance (d) 1 H Chemical shift for a C H proton on C4, 6.93 ppm.

This 1 H NMR resonance applies to the proton on carbon atom C4. This resonance is split into a 1:2:1 triplet by the adjacent CH protons on C3 and C5 on either side (n+2 = 3).

EXTRA NOTE on why the OH proton chemical shift is usually observed as a singlet in phenols like phenol and how deuterium oxide can be used to identify the peak caused by the hydroxyl proton

Although extremely weak acids, there is constant exchanging of protons between alcohol molecules (R = alkyl groups of phenol or just the rest of the molecule). R-O- H   +  H -O-R    R-O- H   +  H -O-R The rate of proton transfer is increased by traces of water. R-O -H   +  H -O- H     R-O- H   +  H -O- H This cannot happen with the non-acidic C-H protons of alkyl groups in phenols like phenol. This rapid proton transfer interferes with the field splitting effects of the hydroxyl O-H protons and carbon C-H protons and the spin-spin coupling effects disappear s if enough deuterium oxide is present. This phenomena can be used to identify the O- H proton resonance in phenols from other C-H proton resonances in hydroxyl molecules like phenol. If deuterium oxide ( D 2 O , where D = 2 H) is added to the NMR alcohol sample, the 1 H protons are rapidly replaced by 2 H protons in the phenol molecule. R-O- H   +  D -O- D     R-O- D   +  H -O- D The 2 H chemical shift frequency is different to the 1 H chemical shift frequency, so the effect of D 2 O is to remove (or reduce intensity of) the chemical shift for the O H proton from the 1 H NMR spectrum of phenol, thereby identifying the original 1 H chemical shift as belonging to the hydroxyl group O-H proton and not a C-H proton of the phenol molecule.

The splitting pattern from proton spin-spin coupling effects is analysed using the n+1 rule for adjacent non-equivalent proton fields (n is the number of neighbouring protons in a non-equivalent different chemical environment) and applied to the 1 H NMR spectrum of phenol.

Key words & phrases: C6H6O C6H5OH Interpreting the proton H-1 NMR spectra of phenol, low resolution & high resolution proton nmr spectra of phenol, H-1 nmr spectrum of phenol, understanding the hydrogen-1 nmr spectrum of phenol, explaining the line splitting patterns from spin-spin coupling  in the high resolution H-1 nmr spectra of phenol, revising the H-1 nmr spectrum of phenol, proton nmr of phenol, ppm chemical shifts of the H-1 nmr spectrum of phenol, explaining and analyzing spin spin line splitting in the H-1 nmr spectrum, how to construct the diagram of the H-1 nmr spectrum of phenol, how to work out the number of chemically different protons in the structure of the phenol organic molecule, how to analyse the chemical shifts in the hydrogen-1 H-1 proton NMR spectrum of phenol using the n+1 rule to explain the spin - spin coupling ine splitting in the proton nmr spectrum of phenol deducing the nature of the protons from the chemical shifts ppm in the H-1 nmr spectrum of phenol examining the 1H nmr spectrum of  phenol analysing the 1-H nmr spectrum of phenol how do you sketch and interpret the H-1 NMR spectrum of phenol interpreting interpretation of the 1H proton spin-spin coupling causing line splitting in the NMR spectrum of phenol   assignment of chemical shifts in the proton 1H NMR spectrum of phenol formula explaining spin-spin coupling for line splitting for phenol aromatic hydroxyl functional group Explanatory diagram of the 1H H-1 proton NMR spectrum of the phenol molecule in terms of its molecular structure. Listing data of all the chemical shift peaks in ppm in the proton NMR spectrum of phenol. How to explain the H-1 NMR spectrum of phenol. The values of the integrated proton ratios in the 1-H NMR spectrum of the phenol molecule. How to work out the molecular structure of the phenol molecule from its proton NMR spectrum. The uses and distinctive features of the proton NMR spectrum of the phenol molecule explained. What does the H-1 proton NMR spectrum tell us about the structure and properties of the phenol molecule? How do you interpret the H-1 NMR spectrum of phenol How to interpret the H-1 NMR spectrum of phenol Explanatory diagram of the chemical shifts of the 1H H-1 proton NMR spectrum of the phenol molecule in terms of its molecular structure. Listing data of all the chemical shift peaks in ppm in the proton NMR spectrum of phenol. How to explain the H-1 NMR spectrum of phenol. The chemical shifts and  integrated values of the proton ratios in the 1-H NMR spectrum of the phenol molecule. How to work out the molecular structure of the phenol molecule from its proton NMR spectrum. The uses and distinctive features of the proton NMR spectrum of the phenol molecule explained. What does the H-1 proton NMR spectrum chemical shifts tell us about the structure and properties of the phenol molecule? explaining the spin-spin proton coupling effects in the 1H NMR spectrum of phenol. interpretation diagram explaining the proton s plitting pattern produced from the n+1 rule and the theoretical ratio of chemical shift δ and values of intensities for the proton NMR spectrum lines of phenol

The mass spectrum of phenol

The C-13 NMR spectrum of phenol

The infrared spectrum of phenol

The chemistry of AROMATIC COMPOUNDS revision notes INDEX

H-1 proton NMR spectroscopy index

( Please read 8 points at the top of the 1 H NMR index page )

ALL SPECTROSCOPY INDEXES

Use My Google search site box

Email doc b: [email protected]

TOP OF PAGE

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 18 October 2022

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

  • Piotr Klukowski   ORCID: orcid.org/0000-0003-1045-3487 1 ,
  • Roland Riek   ORCID: orcid.org/0000-0002-6333-066X 1 &
  • Peter Güntert   ORCID: orcid.org/0000-0002-2911-7574 1 , 2 , 3  

Nature Communications volume  13 , Article number:  6151 ( 2022 ) Cite this article

12k Accesses

28 Citations

30 Altmetric

Metrics details

  • Machine learning
  • Solution-state NMR

Nuclear Magnetic Resonance (NMR) spectroscopy is a major technique in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. We present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 Å median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.

Similar content being viewed by others

proton nmr assignment

Accurate structure prediction of biomolecular interactions with AlphaFold 3

proton nmr assignment

Highly accurate protein structure prediction with AlphaFold

proton nmr assignment

Augmenting large language models with chemistry tools

Introduction.

Studying structures of proteins and ligand-protein complexes is one of the most influential endeavors in molecular biology and rational drug design. All key structure determination techniques, X-ray crystallography, electron microscopy, and NMR spectroscopy, have led to remarkable discoveries, but suffer from their respective experimental limitations. NMR can elucidate structures and dynamics of small and medium size proteins in solution 1 and even in living cells 2 . However, the analysis of NMR spectra and the resonance assignment, which are indispensable for NMR studies, remain time-consuming even for a skilled and experienced spectroscopist. Attributed to this, the percentage of NMR protein structures in the Protein Data Bank (PDB) has decreased from a maximum of 14.6% in 2007 to 7.3% in 2021 ( https://www.rcsb.org/stats ). The problem has sparked research towards automating different tasks in NMR structure determination 3 , 4 , including peak picking 5 , 6 , 7 , 8 , 9 , resonance assignment 10 , 11 , 12 , and the identification of distance restraints 13 , 14 . Several of these methods are available as webservers 15 , 16 . This enabled semi-automatic 17 , 18 but not yet unsupervised automation of the entire NMR structure determination process, except for a very small number of favorable proteins 7 , 19 .

The advance of machine learning techniques 20 now offers unprecedented possibilities for reliably replacing decisions of human experts by efficient computational tools. Here, we present a method that achieves this goal for NMR assignment and structure determination. We show for a diverse set of 100 proteins that NMR resonance assignments and protein structures can be determined within hours after completing the NMR measurements. Our method, Art ificial I ntelligence for N MR A pplications, ARTINA (Fig.  1 ), combines machine learning for tasks that are difficult to model otherwise with existing algorithms—evolutionary optimization for resonance assignment with FLYA 12 , chemical shift database searches for torsion angle restraint generation with TALOS-N 21 , ambiguous distance restraints, network-anchoring and constraint combination for NOESY assignment 14 , 22 and simulated annealing by torsion angle dynamics for structure calculation with CYANA 23 . Machine learning is used in multiple flavors—deep residual neural networks 24 for visual spectrum analysis to identify peak positions (pp-ResNet) and to deconvolve overlapping signals (deconv-ResNet) in 25 different types of spectra (Supplementary Table  1 ), kernel density estimation (KDE) to reconstruct original peak positions in folded spectra, a deep graph neural network 25 , 26 (GNN) for chemical shift estimation within the refinement of chemical shift assignments, and a gradient boosted trees 27 (GBT) model for the selection of structure proposals.

figure 1

The flowchart presents the interplay between the main components of the automated protein structure determination workflow: Residual Neural Network (ResNet), FLYA automated chemical shift assignment, Graph Neural Network (GNN), Gradient Boosted Trees (GBT), and CYANA structure calculation.

A major challenge in developing ARTINA was the collection and preparation of a large training data set that is required for machine learning, because, in contrast to assignments and structures, NMR spectra are generally not archived in public data repositories. Instead, we were obliged to collect from different sources and standardize complete sets of multidimensional NMR spectra for the assignment and structure determination of 100 proteins.

In the following work, we describe the algorithm, training and test data, and results of ARTINA automated structure determination, which are on par with those achieved in weeks or months of human experts’ labor.

Benchmark dataset

One of the major obstacles for developing deep learning solutions for protein NMR spectroscopy is the lack of a large-scale standardized benchmark dataset of protein NMR spectra. To date, published manuscripts presenting the most notable methods for computational NMR, typically refer to less than 50 2D/3D/4D NMR spectra in their experimental sections. Even the well-recognized CASD-NMR competition cannot serve as a major source of training data for deep learning, since only the NOESY spectra of 10 proteins were used in the last round of the event 28 .

To make our study possible, we established a standardized benchmark of 1329 2D/3D/4D NMR spectra, which allows 100 proteins to be recalculated using their original spectral data (Fig.  2 and Supplementary Table  2 ). Each protein record in our dataset contains 5–20 spectra together with manually identified chemical shifts (usually depositions at the Biological Magnetic Resonance Data Bank, BMRB) and the previously determined (“ground truth”) protein structure (PDB record; Supplementary Table  3 ). The benchmark covers protein sizes typically studied by NMR spectroscopy with sequence lengths between 35 and 175 residues (molecular mass 4–20 kDa).

figure 2

PDB codes (or names, MH04, MDM2, KRAS4B, if PDB code unavailable) of the 100 benchmark proteins are ordered by the number of residues. The histogram shows the number of spectra for backbone assignment, side-chain assignment, and NOE measurement. Spectrum types in each data set are shown by light to dark blue circles indicating the number of individual spectra of the given type. The percentages of benchmark records that contain a given spectrum type are given at the top. Spectrum types present in less than 5% of the data sets have been omitted.

Automated protein structure determination

The accuracy of protein structure determination with ARTINA was evaluated in a 5-fold cross-validation experiment with the aforementioned benchmark dataset. Five instances of pp-ResNet and GBT were trained, each one using data from about 80% of the proteins for training and the remaining ones for testing. Since each protein was present exactly once in the test set, reported quality metrics were obtained directly in the cross-validation experiment, and no averaging between data splits was required. To deploy pp-ResNet and GBT models in our online system, we constructed an ensemble by averaging predictions of all 5 cross-validation models. The other models were trained only once using either generated data (deconv-ResNet, Supplementary Fig.  1 ) or BMRB depositions excluding all benchmark proteins (GNN, KDE).

In this experiment, we reproduced 100 structures in fully automated manner using only NMR spectra and the protein sequences as input. Since ARTINA has no tunable parameters and does not require any manual curation of data, each structure was calculated by a single execution of the ARTINA workflow. All benchmark datasets were analyzed by ARTINA in parallel with execution times of 4–20 h per protein.

All automatically determined structures, overlaid with the corresponding reference structures from the PDB, are visualized in Fig.  3 , Supplementary Fig.  2 , and Supplementary Movie  1 . ARTINA was able to reproduce the reference structures with a median backbone root-mean-square deviation (RMSD) of 1.44 Å between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges determined by CYRANGE 29 (Fig.  4a and Supplementary Table  4 ). ARTINA automatically identified between 459 and 4678 distance restraints (2198 on average over 100 proteins), which corresponds to 4.25–33.20 restraints per residue (Fig.  4b ). This number is mainly influenced by the extent of unstructured regions and the quality of the NOESY spectra. In agreement with earlier findings 30 , it correlates only weakly with the backbone RMSD to reference (linear correlation coefficient −0.38). As a more expressive validation measure for the structures from ARTINA, we computed a predicted RMSD to the PDB reference structure on the basis of the RMSDs between the 10 candidate structure bundles calculated in ARTINA (see “Methods”, Fig.  5 , and Supplementary Table  5 ). The average deviation between actual and predicted RMSDs for the 100 proteins in this study is 0.35 Å, and their linear correlation coefficient is 0.77 (Fig.  5 ). In no case, the true RMSD exceeds the predicted one by more than 1 Å.

figure 3

The structures are aligned with the RMSD to reference range as indicated on the left and hexagonal frames color-coded by their size as indicated above. Structures with no corresponding PDB depositions are marked by an asterisk.

figure 4

a Backbone RMSD to reference. b Number of distance restraints per residue. c Chemical shift assignment accuracy. Bars represent quantity values for benchmark proteins, identified by PDB codes (or protein names). Proteins are ordered by size, which is indicated by a color-coded circle. Values in the center of each panel are 10th, 50th, and 90th percentiles of values presented in the bar plot. Short/medium/long-range restraints are between residues i and j with | i – j | ≤ 1, 2 ≤ | i – j | ≤  4, and | i – j | ≥ 5, respectively.

figure 5

The predicted RMSD to reference (pRMSD) is calculated from the ARTINA results without knowledge of the reference PDB structure (see “Methods”) and, by definition, always in the range of 0–4 Å. For comparability, actual RMSD values to reference are also truncated at 4 Å (protein 2M47 with RMSD 4.47 Å). The dotted lines represent deviations of ±1 Å between the two RMSD quantities.

Additional structure validation scores obtained from ANSSUR 31 (Supplementary Table  6 ), RPF 32 (Supplementary Table  7 ), and consensus structure bundles 33 (Supplementary Table  8 ) confirm that overall the ARTINA structures and the corresponding reference PDB structures are of equivalent quality. Energy refinement of the ARTINA structures in explicit water using OPALp 34 (not part of the standard ARTINA workflow) does not significantly alter the agreement with the PDB reference structures (Supplementary Table  9 ). The benchmark data set comprises 78 protein structures determined by the Northeast Structural Genomics Consortium (NESG). On average, ARTINA yielded structures of the same accuracy for NESG targets (median RMSD to reference 1.44 Å) as for proteins from other sources (1.42 Å).

On average, ARTINA correctly assigned 90.39% of the chemical shifts (Fig.  4c ), as compared to the manually prepared assignments, including both “strong” (high-reliability) and “weak” (tentative) FLYA assignments 12 . Backbone chemical shifts were assigned more accurately (96.03%) than side-chain ones (86.50%), which is mainly due to difficulties in assigning lysine/arginine (79.97%) and aromatic (76.87%) side-chains. Further details on the assignment accuracy for individual amino acid types in the protein cores (residues with less than 20% solvent accessibility) are given in Supplementary Table  10 . Assignments for core residues, which are important for the protein structure, are generally more accurate than for the entire protein, in particular for core Ala, Cys, and Asp residues, which show a median assignment accuracy of 100% over the 100 proteins. The lowest accuracies are observed for core His (83.3%), Phe (83.3%), and Arg (87.5%) residues. The three proteins with highest RMSD to reference, 2KCD, 2L82, and 2M47 (see below), show 68.2, 83.8, and 75.7% correct aromatic assignments, respectively, well below the corresponding median of 85.5%. On the other hand, the assignment accuracies for the methyl-containing residues Ala, Ile, Val are above average and reach a median of 100, 97.6, and 98.6%, respectively.

The quality of automated structure determination and chemical shift assignment reflects the performance of deep learning-based visual spectrum analysis, presented qualitatively in Figs.  6 – 7 , Supplementary Fig.  3 , and Supplementary Movies  2 – 4 . In this experiment, our models (pp-ResNet, deconv-ResNet) automatically identified 1,168,739 cross-peaks with high confidence (≥0.50) in the benchmark spectra. All 1329 peak lists, together with automatically determined protein structures and chemical shift lists, are available for download.

figure 6

A fragment of a 15 N-HSQC spectrum of the protein 1T0Y is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses). a 1 , a 2 Initial peak picking marker position is refined by the deconvolution model. b 1 , b 2 pp-ResNet output is deconvolved into two components. c The deconvolution model supports maximally 3 components per initial signal. d Two peak picking markers are merged by the deconvolution model. e Peak picking output deconvolved into three components.

figure 7

A fragment of the 13 C-HSQC spectrum of protein 2K0M is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses).

Error analysis

The largest deviations from the PDB reference structure were observed for the proteins 2KCD, 2L82, and 2M47, for which the pRMSD consistently indicated low accuracy (Fig.  5 ). Significant deviations are mainly due to displacements of terminal secondary structure elements (e.g., a tilted α-helix near a chain terminus), or inaccurate loop conformations (e.g., more flexible than in the PDB deposition). We investigated the origin of these discrepancies.

2KCD is a 120-residue (14.4 kDa) protein from Staphylococcus saprophyticus with an α-β roll architecture. Its dataset comprises 19 spectra (8 backbone, 6 side-chain, and 5 NOESY). The ARTINA structure has a backbone RMSD to PDB reference of 3.13 Å, which is caused by the displacement of the C-terminal α-helix (residues 105–109; Supplementary Fig.  4a ). Excluding this 5-residue fragment decreases the RMSD to 2.40 Å (Supplementary Table  11 ). The positioning of this helix appears to be uncertain, since an ARTINA calculation without the 4D CC-NOESY spectrum yields a significantly lower RMSD of 1.77 Å (Supplementary Table  12 ).

2L82 is a de novo designed protein of 162 residues (19.7 kDa) with an αβ 3-layer (αβα) sandwich architecture. Although only 9 spectra (4 backbone, 2 side-chain and 3 NOESY) are available, ARTINA correctly assigned 97.87% backbone and 81.05% side-chain chemical shifts. The primary reason for the high RMSD value of 3.55 Å is again a displacement of the C-terminal α-helix (residues 138–153). The remainder of the protein matches closely the PDB deposition (1.04 Å RMSD, Supplementary Fig.  4b ).

The protein with highest RMSD to reference (4.72 Å) in our benchmark dataset is 2M47, a 163-residue (18.8 kDa) protein from Corynebacterium glutamicum with an α-β 2-layer sandwich architecture, for which 17 spectra (7 backbone, 7 side chain and 3 NOESY) are available. The main source of discrepancy are two α-helices spanning residues 111–157 near the C-terminus. Nevertheless, the residues contributing to the high RMSD value are distributed more extensively than in 2L82 and 2KCD just discussed. Interestingly, 2 of the 10 structure proposals calculated by ARTINA have an RMSD to reference below 2 Å (1.66 Å and 1.97 Å). In the final structure selection step, our GBT model selected the 4.72 Å RMSD structure as the first choice and 1.66 Å as the second one (Supplementary Fig.  4c ). Such results imply that the automated structure determination of this protein is unstable. Since ARTINA returns the two structures selected by GBT with the highest confidence, the user can, in principle, choose the better structure based on contextual information.

In addition to these three case studies, we performed a quantitative analysis of all regular secondary structure elements and flexible loops present in our 100-protein benchmark in order to assess their impact on the backbone RMSD to reference (Supplementary Table  11 ). All residues in the structurally well-defined regions determined by CYRANGE 29 were assigned to 6 partially overlapping sets: (a) first secondary structure element, (b) last secondary structure element, (c) α-helices, (d) β-sheets, (e) α-helices and β-sheets, and (f) loops. Then, the RMSD to reference was calculated 6 times, each time with one set excluded. In total, for 66 of the 100 proteins the lowest RMSD was obtained if set (f) was excluded from RMSD calculation, and 13% benefited most from removal of the first or last secondary structure element (a or b). Moreover, for 18 out of the 19 proteins with more than 0.5 Å RMSD decrease compared to the RMSD for all well-defined residues, (a), (b), or (f) was the primary source of discrepancy. These results are consistent with our earlier statement that deviations in automatically determined protein structures are mainly caused by terminal secondary structure elements or inaccurate loop conformations.

Ablation studies

During the experiment, we captured the state of each structure determination at 9 time-points, 3 per structure determination cycle: (a) after the initial FLYA shift assignment, (b) after GNN shift refinement, and (c) after structure calculation (Fig.  1 ). Comparative analysis of these states allowed us to quantify the contribution of different ARTINA components to the structure determination process (Table  1 ).

The results show a strong benefit of the refinement cycles, as quantities reported in Table  1 consistently improve from cycle 1 to 3. The majority of benchmark proteins converge to the correct fold after the first cycle (1.56 Å median backbone RMSD to reference), which is further refined to 1.52 Å in cycle 2 and 1.44 Å in cycle 3. Additionally, within each chemical shift refinement cycle, improvements in assignment accuracy resulting from the GNN predictions are observed. This quantity also increases consistently across all refinement cycles, in particular for side-chains. Refinement cycles are particularly advantageous for large and challenging systems, such as 2LF2, 2M7U, or 2B3W, which benefit substantially in cycles 2 and 3 from the presence of the approximate protein fold in the chemical shift assignment step.

Impact of 4D NOESY experiments

As presented in Fig.  2 , 26 out of 100 benchmark datasets contain 4D CC-NOESY spectra, which require long measurement times and were used in the manual structure determination. To quantify their impact, we performed automated structure determinations of these 26 proteins with and without the 4D CC-NOESY spectra (Supplementary Table  12 ).

On average, the presence of 4D CC-NOESY improves the backbone RMSD to reference by 0.15 Å (decrease from 1.88 to 1.73 Å) and has less than 1% impact on chemical shift assignment accuracy. However, the impact is non-uniform. For three proteins, 2KIW, 2L8V, and 2LF2, use of the 4D CC-NOESY decreased the RMSD by more than 1 Å. On the other hand, there is also one protein, 2KCD, for which the RMSD decreased by more than 1 Å by excluding the 4D CC-NOESY.

These results suggest that overall the amount of information stored in 2D/3D experiments is sufficient for ARTINA to reach close to optimal performance, and only modest improvement can be achieved by introducing additional information redundancy from 4D CC-NOESY spectra.

Automated chemical shift assignment

Apart from structure determination, our data analysis pipeline for protein NMR spectroscopy can address an array of problems that are nowadays approached manually or semi-manually. For instance, ARTINA can be stopped after visual spectrum analysis, returning positions and intensities of cross-peaks that can be utilized for any downstream task, not necessarily related to protein structure determination.

Alternatively, a single chemical shift refinement cycle can be performed to get automatically assigned cross-peaks from spectra and sequence. We evaluated this approach with three sets of spectra: (i) Exclusively backbone assignment spectra were used to assign N, C α , C β , C’, and H N shifts. With this input, ARTINA assigned 92.40% (median value) of the backbone shifts correctly. (ii) All through-bond but no NOESY spectra were used to assign the backbone and side-chain shifts. This raised the percentage of correct backbone assignments to 94.20%. (iii) The full data set including NOESY yielded 96.60% correct assignments of the backbone shifts. These three experiments were performed for the 45 benchmark proteins, for which CBCANH and CBCAcoNH, as well as either HNCA and HNcoCA or HNCO and HNcaCO experiments were available. The availability of NOESY spectra had a large impact on the side-chain assignments: 86.00% were correct for the full spectra set iii, compared to 73.70% in the absence of NOESY spectra (spectra set ii). The presence of NOESY spectra consistently improved the chemical shift assignment accuracy of all amino acid types (Supplementary Tables  13 and 14 ). The improvement is particularly strong for aromatic residues (Phe, 61.6 to 76.5%, Trp 52.5 to 80%, and Tyr 71.4 to 89.7%), but not limited to this group.

The results obtained with ARTINA differ in several aspects substantially from previous approaches towards automating protein NMR analysis 3 , 4 , 7 , 12 , 17 , 18 , 19 , 35 . First, ARTINA comprehends the entire workflow from spectra to structures rather than individual steps in it, and there are strictly no manual interventions or protein-specific parameters to be adapted. Second, the quality of the results regarding peak identification, resonance assignments, and structures have been assessed on a large and diverse set of 100 proteins; for the vast majority of which they are on par with what can be achieved by human experts. Third, the method provides a two-orders-of-magnitude leap in efficiency by providing assignments and a structure within hours of computation time rather than weeks or months of human work. This reduces the effort for a protein structure determination by NMR essentially to the preparation of the sample and the measurement of the spectra. Its implementation in the https://nmrtist.org webserver (Supplementary Movie  5 ) encapsulates its complexity, eliminates any intermediate data and format conversions by the user, and enables the use of different types of high-performance hardware as appropriate for each of the subtasks. ARTINA is not limited to structure determination but can be used equally well for peak picking and resonance assignment in NMR studies that do not aim at a structure, such as investigations of ligand binding or dynamics.

Although ARTINA has no parameters to be optimized by the user, care should be given to the preparation of the input data, i.e., the choice, measurement, processing, and specification of the spectra. Spectrum type, axes, and isotope labeling declarations must be correct, and chemical shift referencing consistent over the entire set of spectra. Slight variations of corresponding chemical shifts within the tolerances of 0.03 ppm for 1 H and 0.4 ppm for 13 C/ 15 N can be accommodated, but larger deviations, resulting, for instance, from the use of multiple samples, pH changes, protein degradation, or inaccurate referencing, can be detrimental. Where appropriate, ARTINA proposes corrections of chemical shift referencing 36 . Furthermore, based on the large training data set, which comprises a large variety of spectral artifacts, ARTINA largely avoids misinterpreting artifacts as signals. However, with decreasing spectral quality, ARTINA, like a human expert, will progressively miss real signals.

Regarding protein size and spectrum quality, limitations of ARTINA are similar to those encountered by a trained spectroscopist. Machine-learning-based visual analysis of spectra requires signals to be present and distinguishable in the spectra. ARTINA does not suffer from accidental oversight that may affect human spectra analysis. On the other hand, human experts may exploit contextual information to which the automated system currently has no access because it identifies individual signals by looking at relatively small, local excerpts of spectra.

In this paper, we used all spectra that are available from the earlier manual structure determination. For most of the 100 proteins, the spectra data set has significant redundancy regarding information for the resonance assignment. Our results indicate that one can expect to obtain good assignments and structures also from smaller sets of spectra 37 , with concomitant savings of NMR measurement time. We plan to investigate this in a future study.

The present version of ARTINA can be enhanced in several directions. Besides improving individual models and algorithms, it is conceivable to integrate the so far independently trained collection of machine learning models, plus additional models that replace conventional algorithms, into a coherent system that is trained as a whole. Furthermore, the reliability of machine learning approaches depends strongly on the quantity and quality of training data available. While the collection of the present training data set for ARTINA was cumbersome, from now on it can be expected to expand continuously through the use of the https://nmrtist.org website, both quantitatively and qualitatively with regard to greater variability in terms of protein types. spectral quality, source laboratory, data processing (including non-linear sampling), etc., which can be exploited in retraining the models. ARTINA can also be extended to use additional experimental input data, e.g., known partial assignments, stereospecific assignments, 3 J couplings, residual dipolar couplings, paramagnetic data, and H-bonds. Structural information, e.g., from AlphaFold 38 , can be used in combination with reduced sets of NMR spectra for rapid structure-based assignment. Finally, the range of application of ARTINA can be generalized to small molecule-protein complexes relevant for structure-activity relationship studies in drug research, protein-protein complexes, RNA, solid state, and in-cell NMR.

Overall, ARTINA stands for a paradigm change in biomolecular NMR from a time-consuming technique for specialists to a fast method open to researchers in molecular biology and medicinal chemistry. At the same time, in a larger perspective, the appearance of generally highly accurate structure predictions by AlphaFold 38 is revolutionizing structural biology. Nevertheless, there remains space for the experimental methods, for instance, to elucidate various states of proteins under different conditions or in dynamic exchange, or for studying protein-ligand interaction. Regarding ARTINA, one should keep in mind that its applications extend far beyond structure determination. It will accelerate virtually any biological NMR studies that require the analysis of multidimensional NMR spectra and chemical shift assignments. Protein structure determination is just one possible ARTINA application, which is both demanding in terms of the amount and quality of required experimental data and amenable to quantitative evaluation.

Spectrum benchmark collection

To collect the benchmark of NMR spectra (Fig.  2 and Supplementary Table  2 ), we implemented a crawler software, which systematically scanned the FTP server of the BMRB data bank 39 , identifying data files relevant to our study. Additional datasets were obtained by setting up a website for the deposition of published data ( https://nmrdb.ethz.ch ), from our collaboration network, or had been acquired internally in our laboratory. NMR data was collected from these channels either in the form of processed spectra (Sparky 40 , NMRpipe 41 , XEASY 42 , Bruker formats), or in the form of time-domain data accompanied by depositor-supplied NMRpipe processing scripts. No additional spectra processing (e.g., baseline correction) was performed as part of this study.

The most challenging aspects of the benchmark collection process were: scarcity of data—only a small fraction of all BMRB depositions are accompanied by uploaded spectra (or time-domain data), lack of standards for NMR data depositions—each protein data set had to be prepared manually, as the original data was stored in different formats (spectra name conventions, axis label standards, spectra data format), and difficulties in correlating data files deposited in the BMRB FTP site with contextual information about the spectrum and the sample (e.g., sample characteristics, measurement conditions, instrument used). Manually prepared (mostly NOESY) peak lists, which are available from the BMRB for some of the proteins in the benchmark, were not used for this study.

Different approaches to 3D 13 C-NOESY spectra measurement had to be taken into account: (i) Two separate 13 C NOESY for aliphatic and aromatic signals. These were analyzed by ARTINA without any special treatment. We used ALI , ARO tags (Supplementary Movie  S5 ) to provide the information that only either aliphatic or aromatics shifts are expected in a given spectrum. (ii) Simultaneous NC-NOESY. These spectra were processed twice to have proper scaling of the 13 C and 15 N axes in ppm units, and cropped to extract 15 N-NOESY and 13 C-NOESY spectra. If nitrogen and carbon cross-peak amplitudes have different signs, we used POS , NEG tags to provide the information that only either positive or negative signals should be analyzed. (iii) Aliphatic and aromatic signals in a single 13 C-NOESY spectrum. These measurements do not require any special treatment, but proper cross-peak unfolding plays a vital role in aromatic signals analysis.

Overview of the ARTINA algorithm

ARTINA uses as input only the protein sequence and a set of NMR spectra, which may contain any combination of 25 experiments currently supported by the method (Supplementary Table  1 ). Within 4–20 h of computation time (depending on protein size, number of spectra, and computing hardware load), ARTINA determines: (a) cross-peak positions for each spectrum, (b) chemical shift assignments, (c) distance restraints from NOESY spectra, and (d) the protein structure. The whole process does not require any human involvement, allowing rapid protein NMR assignment and structure determination by non-experts.

The ARTINA workflow starts with visual spectrum analysis (Fig.  1 ), wherein cross-peak positions are identified in frequency-domain NMR spectra using deep residual neural networks (ResNet) 24 . Coordinates of signals in the spectra are passed as input to the FLYA automated assignment algorithm 12 , yielding initial chemical shift assignments . In the subsequent chemical shift refinement step, we bring to the workflow contextual information about thousands of protein structures solved by NMR in the past using a deep GNN 25 that was trained on BMRB/PDB depositions. Its goal is to predict expected values of yet missing chemical shifts, given the shifts that have already been confidently and unambiguously assigned by FLYA. With these GNN predictions as additional input, the cross-peak positions are reassessed in a second FLYA call, which completes the chemical shift refinement cycle (Fig.  1 ).

In the structure refinement cycle , 10 variants of NOESY peak lists are generated, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis by varying the confidence threshold of a signal selected by ResNet between 0.05 and 0.5. Each set of NOESY peak lists is used in an independent CYANA structure calculation 22 , 23 , yielding 10 intermediate structure proposals (Fig.  1 ). The structure proposals are ranked in the intermediate structure selection step based on 96 features with a dedicated GBT model. The selected best structure proposal is used as contextual information in a consecutive FLYA run, which closes the structure refinement cycle .

After the two initial steps of visual spectrum analysis and initial chemical shift assignment, ARTINA interchangeably executes refinement cycles. The chemical shift refinement cycle provides FLYA with tighter restraints on expected chemical shifts, which helps to assign ambiguous cross-peaks. The structure refinement cycle provides information about possible through-space contacts, allowing identified cross-peaks (especially in NOESY) to be reassigned. The high-level concept behind the interchangeable execution of refinement cycles is to iteratively update the protein structure given fixed chemical shifts, and update chemical shifts given the fixed protein structure. Both refinement cycles are executed three times.

Automated visual analysis of the spectrum

We established two machine learning models for the visual analysis of multidimensional NMR spectra (see downloads in the Code availability section). In their design, we made no assumptions about the downstream task and the 2D/3D/4D experiment type. Therefore, the proposed models can be used as the starting point of our automated structure determination procedure, as well as for any other task that requires cross-peak coordinates.

The automated visual analysis starts by selecting all extrema \({{{{{\boldsymbol{x}}}}}}=\left\{{{{{{{\boldsymbol{x}}}}}}}_{1},{{{{{{\boldsymbol{x}}}}}}}_{2},\ldots,{{{{{{\boldsymbol{x}}}}}}}_{N}\right\}\) , \({{{{{{\boldsymbol{x}}}}}}}_{n}\in {{\mathbb{N}}}^{D}\) in the NMR spectrum, which is represented as a D -dimensional regular grid storing signal intensities at discrete frequencies. We formulated the peak picking task as an object detection problem, where possible object positions are confined to \({{{{{\boldsymbol{x}}}}}}\) . This task was addressed by training a deep residual neural network 24 , in the following denoted as peak picking ResNet (pp-ResNet), which learns a mapping \({{{{{{\boldsymbol{x}}}}}}}_{n}\to[0,\,1]\) that assigns to each signal extremum a real-valued score, which resembles its probability of being a true signal rather than an artefact.

Our network architecture is strongly linked to ResNet-18 24 . It contains 8 residual blocks, followed by a single fully connected layer with sigmoidal activation. After weight initialization with Glorot Uniform 43 , the architecture was trained by optimizing a binary cross-entropy loss using Adam 44 with learning rate 10 –4 and gradient clipping of 0.5.

To establish an experimental training dataset for pp-ResNet, we normalized the 1329 spectra in our benchmark with respect to resolution (adjusting the number of data grid points per unit chemical shift (ppm) using linear interpolation) and signal amplitude (scaling the spectrum by a constant). Subsequently, 675,423 diverse 2D fragments of size 256 × 32 × 1 were extracted from the normalized spectra and manually annotated, yielding 98,730 positive and 576,693 negative class training examples. During the training process, we additionally augmented this dataset by flipping spectrum fragments along the second dimension (32 pixels), stretching them by 0–30% in the first and second dimensions, and perturbing signal intensities with Gaussian noise addition.

The role of the pp-ResNet is to quickly iterate over signal extrema in the spectrum, filtering out artefacts and selecting approximate cross-peak positions for the downstream task. The relatively small network architecture (8 residual blocks) and input size of 2D 256 × 32 image patches make it possible to analyze large 3D 13 C-resolved NOESY spectra in less than 5 min on a high-end desktop computer. Simultaneously, the first dimension of the image patch (256 pixels) provides long-range contextual information on the possible presence of signals aligned with the current extremum (e.g., C α , C β cross-peaks in an HNCACB spectrum).

Extrema classified with high confidence as true signals by pp-ResNet undergo subsequent analysis with a second deep residual neural network (deconv-ResNet). Its objective is to perform signal deconvolution, based on a 3D spectrum fragment (64 × 32 × 5 voxels) that is cropped around a signal extremum selected by pp-ResNet. This task is defined as a regression problem, where deconv-ResNet outputs a 3 × 3 matrix storing 3D coordinates of up to 3 deconvolved peak components, relative to the center of the input image. To ensure permutation invariance with respect to the ordering of components in the output coordinate matrix, and to allow for a variable number of 1–3 peak components, the architecture was trained with a Chamfer distance loss 45 .

Since deconv-ResNet deals only with true signals and their local neighborhood, its training dataset can be conveniently generated. We established a spectrum fragment generator, based on rules reflecting the physics of NMR, which produced 110,000 synthetic training examples (Supplementary Fig.  1 ) having variable (a) numbers of components to deconvolve (1–3), (b) signal-to-noise ratio, (c) component shapes (Gaussian, Lorentzian, and mixed), (d) component amplitude ratios, (e) component separation, and (f) component neighborhood type (i.e., NOESY-like signal strips or HSQC-like 2D signal clusters). The deconv-ResNet model was thus trained on fully synthetic data.

Signal unaliasing

To use ResNet predictions in automated chemical shift assignment and structure calculation, detected cross-peak coordinates must be transformed from the spectrum coordinate system to their true resonance frequencies. We addressed the problem of automated signal unfolding with the classical machine learning approach to density estimation.

At first, we generated 10 5 cross-peaks associated with each experiment type supported by ARTINA (Supplementary Table  1 ). In this process, we used randomly selected chemical shift lists deposited in the BMRB database, excluding depositions associated with our benchmark proteins. Subsequently, we trained a Kernel Density Estimator (KDE):

which captures the distribution \({p}_{e}\left({{{{{\boldsymbol{x}}}}}}\right)\) of true peaks being present at position \({{{{{\boldsymbol{x}}}}}}\) in spectrum type \(e\) , based on N e = 10 5 cross-peaks coordinates \({{{{{{\boldsymbol{x}}}}}}}_{i}^{(e)}\) generated with BMRB data, and \(\kappa\) being the Gaussian kernel.

Unfolding a k -dimensional spectrum is defined as a discrete optimization problem, solved independently for each cross-peak \({{{{{{\boldsymbol{x}}}}}}}_{j}^{\left(e\right)}\) observed in a spectrum of type \(e\) :

where \({{{{{\boldsymbol{w}}}}}}\in{{\mathbb{R}}}^{k}\) is a vector storing the spectral widths in each dimension (ppm units), \({{\circ }}\)  is element-wise multiplication, \({{{{{\boldsymbol{s}}}}}}\in \,{{\mathbb{Z}}}^{k}\) is a vector indicating how many times the cross-peak is unfolded in each dimension, and \({{{{{{\boldsymbol{s}}}}}}}^{{{{{{\boldsymbol{*}}}}}}}\in {{\mathbb{Z}}}^{k}\) is the optimal cross-peak unfolding.

As long as regular and folded signals do not overlap or have different signs in the spectrum, KDE can unfold the peak list regardless of spectrum dimensionality. The spectrum must not be cropped in the folded dimension, i.e., the folding sweep width must equal the width of the spectrum in the corresponding dimension.

All 2D/3D spectra in our benchmark were folded in at most one dimension and satisfy the aforementioned requirements. However, the 4D CC-NOESY spectra satisfy neither, as regular and folded peaks both overlap and have the same signal amplitude sign. This introduces ambiguity in the spectrum unfolding that prevents direct use of the KDE technique. To retrieve original signal positions, 4D CC-NOESY cross-peaks were unfolded to overlap with signals detected in 3D 13 C-NOESY. In consequence, 4D CC-NOESY unfolding depended on other experiments, and individual 4D cross-peaks were retained only if they were confirmed in a 3D experiment.

Chemical shift assignment

Chemical shift assignment is performed with the existing FLYA algorithm 12 that uses a genetic algorithm combined with local optimization to find an optimal matching between expected and observed peaks. FLYA uses as input the protein sequence, lists of peak positions from the available spectra, chemical shift statistics, either from the BMRB 39 or the GNN described in the next section, and, if available, the structure from the previous refinement cycle. The tolerance for the matching of peak positions and chemical shifts was set to 0.03 ppm for 1 H, and 0.4 ppm for 13 C/ 15 N shifts. Each FLYA execution comprises 20 independent runs with identical input data that differ in the random numbers used in the optimization algorithm. Nuclei for which at least 80% of the 20 runs yield, within tolerance, the same chemical shift value are classified as reliably assigned 12 and used as input for the following chemical shift refinement step.

Chemical shift refinement

We used a graph data structure to combine FLYA-assigned shifts with information from previously assigned proteins (BMRB records) and possible spatial interactions. Each node corresponds to an atom in the protein sequence, and is represented by a feature vector composed of (a) a one-hot encoded atom type code (e.g., C α , H β ), (b) a one-hot encoded amino acid type, (c) the value of the chemical shift assigned by FLYA (only if a confident assignment is available, zero otherwise), (d) atom-specific BMRB shift statistics (mean and standard deviation), and (e) 30 chemical shift values obtained from BMRB database fragments. The latter feature is obtained by searching BMRB records for assigned 2–3-residue fragments that match the local protein sequence and have minimal mean-squared-error (MSE) to shifts confidently assigned by FLYA (non-zero values of feature (c) in the local neighborhood of the atom). The edges of the graph correspond to chemical bonds or skip connections. The latter connect the C β atom of a given residue with C β atoms 2, 3, and 5 residues apart in the amino acid sequence, and have the purpose to capture possible through-space influence on the chemical shift that is typically observed in secondary structure elements.

The chemical shift refinement task is defined as a node regression problem, where an expected value of the chemical shift is predicted for each atom that lacks a confident FLYA assignment. This task is addressed with a DeepGCN model 25 , 26 that was trained on 28,400 graphs extracted from 2840 referenced BMRB records 39 . Each training example was created by building a fully assigned graph out of a single BMRB record, and dropping chemical shift values (feature (c) above) for randomly chosen atoms that FLYA typically assigns either with low confidence or inaccurately.

Our DeepGCN model is designed specifically for de novo structure determination, as it uses only the protein sequence and partial shift assignments to estimate values of missing chemical shifts. Its predictions are used to guide the FLYA genetic algorithm optimization 12 by reducing its search range for assignments. The precise final chemical shift value is always determined by the position of a signal in the spectrum, rather than the model prediction alone.

Torsion angle restraints

Before each structure calculation step, torsion angle restraints for the ϕ and ψ angles of the polypeptide backbone were obtained from the current backbone chemical shifts using the program TALOS-N 21 . Restraints were only generated if TALOS-N classified the prediction as ‘Good’, ‘Strong’, or ‘Generous’. Given a TALOS-N torsion angle prediction of ϕ ± Δ ϕ , the allowed range of the torsion angle was set to ϕ ± max(Δ ϕ , 10°) for ‘Good’ and ‘Strong’ predictions, and ϕ ± 1.5 max(Δ ϕ , 10°) for ‘Generous’ predictions, and likewise for ψ .

Structure calculation and selection

Given the chemical shift assignments and NOESY cross-peak positions and intensities, the structure is calculated with CYANA 23 using the established method 22 that comprises 7 cycles of NOESY cross-peak assignment and structure calculation, followed by a final structure calculation. In total, 8 × 100 conformers are calculated for a given input data set using 30,000 torsion angle dynamics steps per conformer. The 20 conformers with the lowest final target function value are chosen to represent the solution structure proposal. The entire combined NOESY assignment and structure calculation procedure is executed independently 10 times based on 10 variants of NOESY peak lists, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis. The first set generously includes all signals selected by ResNet with confidence ≥0.05. The other variants of NOESY peak lists follow the same principle with increasingly restrictive confidence thresholds of 0.1, 0.15, …, 0.5.

The CYANA structures calculations are followed by a structure selection step, wherein the 10 intermediate structure proposals are compared pairwise by a Gradient Boosted Tree (GBT) model that uses 96 features from each structure proposal (including the CYANA target function value 23 , number of long-range distance restraints, etc.; for details, see downloads in the Code availability section) to rank the structures by their expected accuracy. The best structure from the ranking is subsequently used as contextual information for the chemical shift refinement cycle (Fig.  1 ), or returned as the final outcome of ARTINA. The second-best final structure is also returned for comparison.

To train GBT, we collected a set of successful and unsuccessful structure calculations with CYANA. Each training example was a tuple ( s i , r i ), where s i is the vector of features extracted from the CYANA structure calculation output, and r i is the RMSD of the output structure to the PDB reference. The GBT was trained to take the features s i and s j of two structure calculations with CYANA as input, and to predict a binary order variable o ij , such that o ij = 1 if r i  <  r j , and 0 otherwise. Importantly, the deposited PDB reference structures were not used directly in the GBT model training (they are used only to calculate the RMSDs). Consequently, the GBT model is unaffected by methodology and technicalities related to PDB deposition (e.g., the structure calculation software used to calculate the deposited reference structure).

Structure accuracy estimate

As an accuracy estimate for the final ARTINA structure, a predicted RMSD to reference (pRMSD) is calculated from the ARTINA results (without knowledge of the reference PDB structure). It aims at reproducing the actual RMSD to reference, which is the RMSD between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges as given in Supplementary Table  4 . The predicted RMSD is given by pRMSD = (1 – t ) × 4 Å, where, in analogy to the GDT_HA value 46 , t is the average fraction of the RMSDs ≤ 0.5, 1, 2, 4 Å between the mean coordinates of the best ARTINA candidate structure bundle and the mean coordinates of the structure bundles of the 9 other structure proposals. Since t ∈ [0, 1], the pRMSD is always in the range of 0–4 Å, grouping all “bad” structures with expected RMSD to reference ≥ 4 Å at pRMSD = 4 Å.

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Data availability

References structures: PDB Protein Data Bank ( https://www.rcsb.org/ ; accession codes in Fig.  2 and Supplementary Table  3 ).

Spectra and reference assignments: BMRB Biological Magnetic Resonance Data Bank ( https://bmrb.io/ ; entry IDs in Supplementary Table  3 ).

Peak lists, assignments, and structures: https://nmrtist.org/static/public/publications/artina/ARTINA_results.zip and in the ETH Research Collection under DOI 10.3929/ethz-b-000568621.

Source data for Figs.  2 , 4 , and 5 is available in Supplementary Tables  2 , 4 , and 5, respectively.

Code availability

The ARTINA algorithm is available as a webserver at https://nmrtist.org . pp-ResNet, deconv-ResNet, GNN, and GBT are available for download in binary form, together with architecture schemes, example input data, model input description, and source code that allows to read model files and make predictions ( https://github.com/PiotrKlukowski/ARTINA , https://nmrtist.org/static/public/publications/artina/models/ {ARTINA_peak_picking.zip, ARTINA_peak_deconvolution.zip, ARTINA_shift_prediction.zip, ARTINA_structure_ranking.zip}). These files provide a full technical specification of the components developed within ARTINA, and allow for their independent use in Python.

Existing software used: Python ( https://www.python.org/ ), CYANA ( https://www.las.jp/ ), TALOS-N ( https://spin.niddk.nih.gov/bax/software/TALOS-N ).

Wüthrich, K. NMR studies of structure and function of biological macromolecules (Nobel Lecture). Angew. Chem. Int. Ed. 42 , 3340–3363 (2003).

Article   CAS   Google Scholar  

Sakakibara, D. et al. Protein structure determination in living cells by in-cell NMR spectroscopy. Nature 458 , 102–105 (2009).

Article   ADS   CAS   Google Scholar  

Guerry, P. & Herrmann, T. Advances in automated NMR protein structure determination. Q. Rev. Biophys. 44 , 257–309 (2011).

Güntert, P. Automated structure determination from NMR spectra. Eur. Biophys. J. 38 , 129–143 (2009).

Garrett, D. S., Powers, R., Gronenborn, A. M. & Clore, G. M. A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J. Magn. Reson. 95 , 214–220 (1991).

ADS   CAS   Google Scholar  

Koradi, R., Billeter, M., Engeli, M., Güntert, P. & Wüthrich, K. Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J. Magn. Reson. 135 , 288–297 (1998).

Würz, J. M. & Güntert, P. Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. J. Biomol. NMR 67 , 63–76 (2017).

Klukowski, P. et al. NMRNet: A deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 34 , 2590–2597 (2018).

Li, D. W., Hansen, A. L., Yuan, C. H., Bruschweiler-Li, L. & Brüschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12 , 5229 (2021).

Bartels, C., Güntert, P., Billeter, M. & Wüthrich, K. GARANT—A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18 , 139–149 (1997).

Zimmerman, D. E. et al. Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269 , 592–610 (1997).

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Linge, J. P., O’Donoghue, S. I. & Nilges, M. Automated assignment of ambiguous nuclear overhauser effects with ARIA. Methods Enzymol. 339 , 71–90 (2001).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319 , 209–227 (2002).

Allain, F., Mareuil, F., Ménager, H., Nilges, M. & Bardiaux, B. ARIAweb: a server for automated NMR structure calculation. Nucleic Acids Res. 48 , W41–W47 (2020).

Lee, W. et al. I-PINE web server: Aan integrative probabilistic NMR assignment system for proteins. J. Biomol. NMR 73 , 213–222 (2019).

Huang, Y. P. J. et al. An integrated platform for automated analysis of protein NMR structures. Methods Enzymol. 394 , 111–141 (2005).

Kobayashi, N. et al. KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J. Biomol. NMR 39 , 31–52 (2007).

López-Méndez, B. & Güntert, P. Automated protein structure determination from NMR spectra. J. Am. Chem. Soc. 128 , 13112–13122 (2006).

Murphy, K. P. Probabilistic Machine Learning: An Introduction (MIT Press, 2022).

Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56 , 227–241 (2013).

Güntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 453–471 (2015).

Güntert, P., Mumenthaler, C. & Wüthrich, K. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273 , 283–298 (1997).

Article   Google Scholar  

Kaiming, H., Xiangyu, Z., Shaoqing, R. & Jian, S. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016).

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https://arxiv.org/abs/1609.02907 (2016).

Chiang, W. L. et al. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD) 257–266 (2019).

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proc. 32nd Conference on Neural Information Processing Systems (NIPS) (2018).

Rosato, A. et al. The second round of Critical Assessment of Automated Structure Determination of Proteins by NMR: CASD-NMR-2013. J. Biomol. NMR 62 , 413–424 (2015).

Kirchner, D. K. & Güntert, P. Objective identification of residue ranges for the superposition of protein structures. BMC Bioinform. 12 , 170 (2011).

Buchner, L. & Güntert, P. Systematic evaluation of combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 81–95 (2015).

Fowler, N. J., Sljoka, A. & Williamson, M. P. A method for validating the accuracy of NMR protein structures. Nat. Commun . 11 , 6321 (2020).

Huang, Y. J., Powers, R. & Montelione, G. T. Protein NMR recall, precision, and F-measure scores (RPF scores): Structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127 , 1665–1674 (2005).

Buchner, L. & Güntert, P. Increased reliability of nuclear magnetic resonance protein structures by consensus structure bundles. Structure 23 , 425–434 (2015).

Koradi, R., Billeter, M. & Güntert, P. Point-centered domain decomposition for parallel molecular dynamics simulation. Comput. Phys. Commun. 124 , 139–147 (2000).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J. Biomol. NMR 24 , 171–189 (2002).

Buchner, L., Schmidt, E. & Güntert, P. Peakmatch: A simple and robust method for peak list matching. J. Biomol. NMR 55 , 267–277 (2013).

Scott, A., López-Méndez, B. & Güntert, P. Fully automated structure determinations of the Fes SH2 domain using different sets of NMR spectra. Magn. Reson. Chem. 44 , S83–S88 (2006).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Goddard, T. D. & Kneller, D. G. Sparky 3. (University of California, San Francisco, 2001).

Delaglio, F. et al. NMRPipe—A multidimensional spectral processing system based on Unix pipes. J. Biomol. NMR 6 , 277–293 (1995).

Bartels, C., Xia, T. H., Billeter, M., Güntert, P. & Wüthrich, K. The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J. Biomol. NMR 6 , 1–10 (1995).

Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. Proc. Mach. Learn. Res. 9 , 249–256 (2010).

Google Scholar  

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).

Davies, E. R. Computer Vision (Academic Press, 2018).

Kryshtafovych, A. et al. New tools and expanded data analysis capabilities at the protein structure prediction center. Proteins 69 , 19–26 (2007).

Download references

Acknowledgements

We thank Drs. Frédéric Allain, Fred Damberger, Hideo Iwai, Harindranath Kadavath, Julien Orts, and Dean Strotz for providing unpublished spectra. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 891690 (P.K.), and a Grant-in-Aid for Scientific Research of the Japan Society for the Promotion of Science (P.G., 20 K06508).

Author information

Authors and affiliations.

Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland

Piotr Klukowski, Roland Riek & Peter Güntert

Institute of Biophysical Chemistry, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany

  • Peter Güntert

Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, 192-0397, Tokyo, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

P.K. prepared training and test data sets, designed and trained machine learning models, performed experiments described in the manuscript, and implemented ARTINA within the nmrtist.org web platform. P.K. and P.G. wrote the software. P.K., R.R., and P.G. conceived the project, analyzed the results, and wrote the manuscript.

Corresponding authors

Correspondence to Piotr Klukowski , Roland Riek or Peter Güntert .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Benjamin Bardiaux, Gaetano Montelione, Theresa Ramelot, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.  Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary info file #1, description of additional supplementary files, supplementary movie 1, supplementary movie 2, supplementary movie 3, supplementary movie 4, supplementary movie 5, reporting summary, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Klukowski, P., Riek, R. & Güntert, P. Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. Nat Commun 13 , 6151 (2022). https://doi.org/10.1038/s41467-022-33879-5

Download citation

Received : 28 March 2022

Accepted : 30 September 2022

Published : 18 October 2022

DOI : https://doi.org/10.1038/s41467-022-33879-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

The 100-protein nmr spectra dataset: a resource for biomolecular nmr data analysis.

  • Piotr Klukowski
  • Fred F. Damberger

Scientific Data (2024)

Overlay databank unlocks data-driven analyses of biomolecules for all

  • Anne M. Kiirikki
  • Hanne S. Antila
  • O. H. Samuli Ollila

Nature Communications (2024)

5D solid-state NMR spectroscopy for facilitated resonance assignment

  • Alexander Klein
  • Suresh K. Vasa
  • Rasmus Linser

Journal of Biomolecular NMR (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

proton nmr assignment

Royal Society of Chemistry

Highly acidic N -triflylphosphoramides as chiral Brønsted acid catalysts: the effect of weak hydrogen bonds and multiple acceptors on complex structures and aggregation †

ORCID logo

First published on 29th April 2024

N -Triflylphosphoramides (NTPAs) represent an important catalyst class in asymmetric catalysis due to their multiple hydrogen bond acceptor sites and acidity, which is increased by several orders of magnitude compared to conventional chiral phosphoric acids (CPAs). Thus, NTPAs allow for several challenging transformations, which are not accessible with CPAs. However, detailed evidence on their hydrogen bonding situation, complex structures and aggregation is still lacking. Therefore, this study covers the hydrogen bonding behavior and structural features of binary NTPA/imine complexes compared to their CPA counterparts. Deviating from the single-well potential hydrogen bonds commonly observed in CPA/imine complexes, the NTPA/imine complexes exhibit a tautomeric equilibrium between two proton positions. Low-temperature NMR at 180 K supported by computer simulations indicates a OHN hydrogen bond between the phosphoramide oxygen and the imine, instead of the mostly proposed NHN H-bond. Furthermore, this study finds no evidence for the existence of dimeric NTPA/NTPA/imine complexes as previously suggested for CPA systems, both synthetically and through NMR studies.

Introduction

However, the investigated classical chiral phosphoric acid catalysts are still limited to reactive substrates. 2 For this purpose, the groups of List and Yamamoto reported the synthesis of other highly acidic catalysts such as disulfonimides (DSIs), 20 imidodiphosphorimidates (IDPis) 21 or N -triflylphosphoramides (NTPAs). 22 These catalysts vary significantly from CPAs due to their increased acidity, chiral microenvironment, and the presence of multiple hydrogen bond acceptor sites, potentially enabling both OHN and NHN hydrogen bonds. 23 For DSIs detailed investigations were conducted by our group revealing the existence of binary DSI/imine complex structures with hydrogen bonds to both acceptors, the nitrogen and the oxygen of the DSI. 24,25 Furthermore, IDPi-catalyzed transformations were closely investigated by the group of List in an attempt to locate and characterize ionic or covalent species. 26,27 However, the structure and hydrogen bond situation of NTPAs has to the best of our knowledge not been intensively studied yet. Several challenging transformations such as cycloadditions or addition reactions to imines which are not suitable with CPAs could be realized with NTPAs. 23,28,29 Exemplarily, the group of Yamamoto reported an asymmetric Mukaiyama–Mannich reaction, in which less reactive aldimines, without an N -(2-hydroxyphenyl) moiety, are activated by NTPA. 30 Subsequently, the question arises how the use of a stronger Brønsted acid such as NTPA influences the structure of the reaction intermediates and the hydrogen bond strength. 31 Furthermore, the incorporation of a N -trifluoromethanesulfonyl group into the CPA system breaks the C 2 -symmetry of the catalyst and introduces multiple hydrogen bond acceptors ( Fig. 1B ). Hence, there is the question which of the several hydrogen bond acceptors is included in the resulting hydrogen bond. So far, Yamamoto et al. and other groups suggested that the negative charge is located at the nitrogen and consequently a hydrogen bond exists between the nitrogen of the NTPA catalyst and the substrate. 30,32–34 However, Yoon et al. isolated a substrate-acid complex indicating a hydrogen bond between a protonated imidazolium and the phosphoramide oxygen of the NTPA catalyst via X-ray crystallography. 35 Nevertheless, crystal structures do not always align with intermediates observed in solution. 36 Moreover, it is also possible that the catalyst forms two hydrogen bond with one substrate, leading to a bifunctional activation mode. 23 Therefore, the question remains what kind of H-bonding is active in NTPAs in solution. Low-temperature NMR spectroscopy at 180 K was proven to be an excellent tool to investigate the occurring intermediates as well as potential hydrogen bonds as sensitive experimental indicator for the binding situation within the binary complexes. 10,12

Therefore, binary complexes of a NTPA catalyst with 11 different N -arylimines were investigated by low-temperature NMR. First, an in-depth analysis of the hydrogen bond situation in these binary NTPA/imine complexes was performed. The hydrogen bond was characterized by analyzing 1 H and 15 N chemical shifts in a Steiner–Limbach correlation and compared with the corresponding CPA/imine complexes. 37–39 Second, various two-dimensional NMR-techniques were applied at 180 K for a chemical shift assignment of the NTPA/imine systems. Based on the assignment, characteristic NOE cross signals between the catalyst and substrate were analyzed. Structures for the binary complexes based on NOE cross signals were supported by molecular dynamics (MD)-simulations. In particular, the effect of the increased number of hydrogen bond acceptors of the NTPA compared to the CPA is addressed. Finally, the NTPA catalyst was examined for its potential involvement in a dimeric reaction pathway in the Mukaiyama–Mannich reaction.

Results and discussion

Model system, hydrogen bond analysis.

Whereas the CPA/imine systems show chemical shifts for the hydrogen bond proton above 16 ppm and follow a parabolic curve revealing very strong hydrogen bonds ( Fig. 3 ), the NTPA/imine E - and Z -complexes exhibit high field shifts for both 1 H and 15 N (for 1 H and 15 N spectra see ESI Fig. S1–S11 † ). As a result, these complexes were positioned far down on the left side of the Steiner–Limbach curve close to the almost pure ion pair with HBF 4 . The position of the NTPA complexes showed that the proton within the hydrogen bond is significantly shifted toward the imine nitrogen suggesting weak or no hydrogen bonds. However, upon closer examination of the Steiner–Limbach curve, there's another significant difference between the CPA/imine complexes and the current NTPA/imine complexes. In contrast to the CPA systems, which aligned perfectly with the parabolic Steiner–Limbach correlation curve, the data points of the NTPA systems formed a straight line, deviating from the parabolic dependency.

This was the first time we observed such a trend within our catalyst systems, and it can be interpreted as explained above: the parabolic correlation curve is valid only for strong H-bond and not for a proton-transfer equilibrium between two tautomeric forms. 37,40 Therefore, the deviation of NTPA 1 /imine complexes from the parabolic 1 H/ 15 N chemical shift correlation curve is an indication for a double-well potential in these complexes. 37,40 This means, the proton is either located at the nitrogen of the imine or at the nitrogen/oxygen of the NTPA catalyst and upon variation of the imine basicity only the relative populations of the two positions are changed and not the hydrogen bond itself, which explains the linear correlation using the different imines. Indeed, upon examination of the 1h J NH coupling constants of the binary NTPA 1 /imine complexes, coupling constants of 1h J NH ∼ 87–91 Hz are detected, which are larger than for all CPA/imine complexes ( 1h J NH between 82 and 86 Hz). 12 1h J NH coupling constants are a fundamental factor for the binding strengths between proton and nitrogen of the imine. The observed 1h J NH coupling constants for the NTPA 1 /imine complexes differ only about 2–5 Hz from the completely protonated HBF 4 /imine complexes. Again, this suggests very weak hydrogen bonds and a tautomeric equilibrium between two proton positions.

Structure elucidation and hydrogen bond position

As the binary complex between NTPA 1 and imine 2 produced spectra with good signal dispersion in the hydrogen bond region, this system was chosen for the investigations ( Fig. 4a ).

The results from the modified 1 H, 31 P-HMBC experiments demonstrated that scalar coupling is a decisive factor for the magnetization transfer in the NTPA 1 / Z -imine complex, opposed to chemical shift anisotropy (CSA) or dipole interactions ( Fig. 4b ). The doublet corresponding to the NTPA 1 / Z -imine hydrogen bond was readily apparent in the modified 1 H, 31 P-HMBC sequence only allowing magnetization transfer through scalar coupling. No signals with the modified 1 H, 31 P-HMBC sequence only allowing magnetization transfer through 1 H chemical shift anisotropy ( 1 H-CSA) and 1 H, 31 P dipolar interactions (DD) cross relaxation were detected. This means, that 1 H-CSA and DD cross relaxation is not the main origin of the magnetization transfer but scalar coupling. Detecting these cross peaks was an unexpected achievement. From the NMR spectra we cannot differentiate, whether the origin is a real 2h J PH transfer with the proton attached to the imine nitrogen or whether it comes from the low populated PXH position of the double well potential. The detection of cross peaks exclusively in the binary Z -complex can be attributed to two phenomena. First, the E -complex is involved in an exchange with the free imine (not active for Z ) resulting in partial decoupling and therefore missing cross peaks. This is obvious from EXSY spectra (see ESI Fig. S15 and S16 † ) and results in an exchange broadened line width of the hydrogen-bonded proton in the 1 H spectrum (half line widths E -complex ∼ 50.1 Hz, Z -complex ∼ 24.7 Hz). Secondly, the observed 1h J NH coupling constants for the NTPA 1 /imine Z -complexes are slightly smaller than for the binary E -complexes, indicating a stronger hydrogen bond and thus supporting potentially 2h J PH magnetization transfer.

To conclude, our findings thus far demonstrate the presence of a weak hydrogen bond, characterized by a tautomeric equilibrium between two proton positions, in NTPA/imine complexes with a strong preference for the protonated imine. This phenomenon differs from what was observed in previously studied CPA systems. 10,12 Nevertheless, despite demonstrating the presence of a weak hydrogen bond with the 1 H, 31 P HMBC experiments, there was still no evidence regarding which hydrogen bond acceptor is involved in the hydrogen bond. So far, mostly the negative charge was predicted to be at the nitrogen and consequently the hydrogen bond was proposed between the nitrogen of the NTPA catalyst and the substrate. 30,32–34 However, in principle, there are four potential hydrogen bond acceptors including the phosphoramide oxygen or nitrogen, as well as the triflyl oxygens. Based on the finding of the 1 H, 31 P HMBC experiments, we can ignore the triflyl oxygens and suggest either the phosphoramide nitrogen or oxygen as the most likely hydrogen bond acceptor. To gain further insights, 15 N labeling of NTPA 1 was necessary. This way, potential trans -hydrogen bond scalar couplings ( 1h J NH and 2h J NN ) could be detected and therefore prove a hydrogen bond between the imine and nitrogen of the NTPA, as it was proposed previously. 30,32–34 Hence, the BINOL-based 15 N-labeled NTPA 1 was synthesized according to the procedure of Yamamoto et al. 22 Since first trans -hydrogen bond scalar couplings were measured between nitrogen atoms of Watson–Crick base pairs in 15 N-labeled RNA with large coupling scalar couplings of 2h J NN ≈ 7 Hz, also in the NTPA complexes strong trans -hydrogen bond 2h J NN scalar couplings were expected. 43

Moreover, concerning the NTPA complexes, these scalar couplings through hydrogen bonds should serve as the most effective indicator for detecting the presence of weak hydrogen bonds to the NTPA nitrogen, considering the presence of multiple other hydrogen bond acceptors. Specifically, in case of a hydrogen bond to the NTPA nitrogen, these 1h J NH scalar coupling are expected to be larger than the 2h J PH scalar couplings already detected in the modified 1 H, 31 P-HMBC experiments. However, even with 15 N-labeled NTPA 1 neither the 2h J NN coupling nor the 1h J NH coupling between the acidic proton and NTPA nitrogen were detectable in a 1D 15 N or 2D 1 H, 15 N-HSQC spectrum (see ESI Fig. S12 † ) Two reasons account for this phenomenon. First, the 15 N signal of the free labeled NTPA is sharp (half line widths ∼ 3.4 Hz), while the 15 N signal of the NTPA in the binary complex is broadened significantly (half line widths ∼ 53.7 Hz, for spectra see ESI Fig. S13 † ). This line broadening suggests again exchange processes and therefore missing cross peaks via partial decoupling. These exchange processes may derive from either an exchange of the hydrogen bond proton acceptor (nitrogen and oxygen) or conformational exchange.

Secondly, nitrogen might not be the main contributor in hydrogen bonding. The expected scalar coupling for a NHN hydrogen bond is around one order of magnitude larger than for a PXH situation. Furthermore, the 15 N line widths of both E - and Z -complexes is only double of that of 31 P in the binary Z-complex. Thus, the modified 1 H, 31 P-HMBC experiments indicate that, at least for the binary Z -complex, cross peaks should be observable in the 2D 1 H, 15 N-HSQC spectrum as well if a hydrogen bond exists between the imine and nitrogen of the NTPA. Combining then the results of the 1 H, 31 P-HMBC and the 2D 1 H, 15 N-HSQC spectrum, the phosphoramide oxygen must be regarded as the main hydrogen bond acceptor. This contradicts previous suggestions, which proposed a hydrogen bond between the imine and nitrogen of the NTPA. 30,32–34

To corroborate the position of the hydrogen bond, further structural investigations of the NTPA/imine complexes via low-temperature NMR spectroscopy and molecular dynamics (MD)-simulations were applied. In previous studies of CPA/imine complexes, we identified the presence of two different orientations for each isomer within the binary complexes, independent of the CPA and the substitution of the N -arylimine. 11,44 Even at 180 K, the different orientations exhibit fast exchange on the NMR time scale. 11,44 However, due to significantly weaker hydrogen bonds in the NTPA/imine complexes and the multiple hydrogen bond acceptors, additional structures compared to CPAs were expected. Still, to achieve a full structure elucidation, 1 H, 1 H–COSY, 1 H, 1 H-TOCSY, 1 H, 1 H-NOESY, 1 H, 19 F-HOESY, 1 H, 13 C-HSQC, 1 H, 13 C-HMBC, 1 H, 15 N-HMBC and 1 H, 31 P-HMBC were performed with 15 N-labeled imine 2 and NTPA 1 at 180 K.

Based on intense NOEs between imine 2 and the BINOL backbone of the NTPA 1 (for spectra see ESI Fig. S17–S21 † ), computer simulations suggest a structure motif for the binary E -complex as shown in Fig. 5 top (for details see ESI † ). Various structures with a hydrogen bond to the phosphoramide oxygen were obtained from classical molecular dynamics simulations: one of the main structures resembles the type I E (here type I E 0 ), another one the type II E (here type II E 0 ) of the CPA complexes. The individual species were compared to the NOE data, which allowed for the identification of type I E 0 shown in the Fig. 5 top as the most likely candidate for the experimental data. Again, the hydrogen bond was located between the phosphoramide oxygen and the nitrogen of the imine. The type I E 0 species was additionally confirmed to correspond to a density functional minimum structure (B3LYP def2-TZVP with implicit CPCM solvent – for details see ESI † ). Furthermore, displacing the H-bond towards N − revealed again the type I E 0 structure as the primary minimum (details see ESI † ). This suggests an H-bond as shown in Fig. 5 . These findings for the binary NTPA 1 / E -imine complement the hydrogen bond situation identified in the NTPA 1 / Z -imine complex, as confirmed by scalar coupling in a 1 H, 31 P-HMBC experiment.

For the first time, there is now evidence that in NTPA catalyzed reactions, intermediates have a strong ion-pair character supported by an extremely weak hydrogen bond between the phosphoramide oxygen and the nitrogen of the imine. This finding opposes the previously proposed activation mode for NTPA-catalyzed reactions suggesting that the negative charge and consequently the hydrogen bond exists between the nitrogen of the NTPA catalyst and the substrate. 30,32–34 Our detailed hydrogen bond analysis, as well as structural investigations via low-temperature NMR at 180 K supported by computer simulations suggest an ion-pair character in these binary complexes supported by a weak hydrogen bond between the phosphoramide oxygen and the nitrogen of the imine. Whereas Yoon et al. also isolated a substrate-acid complex indicating a hydrogen bond between a protonated imidazolium and the phosphoramide oxygen of the NTPA catalyst via X-ray crystallography, 35 our findings confirm this binding situation in solution. We expect that these findings aid in understanding reaction intermediates, potentially shedding light on the factors contributing to the enantioselectivity of reactions, which is essential for further catalyst development.

Screening for dimers in NTPA catalyzed reactions

This raised the question of whether the observed phenomenon is also applicable to NTPAs, addressing the presence of a sterically demanding N -trifluoromethanesulfonyl group and the absence of a C 2 -symmetry. Therefore, we examined NTPA 1 in an enantioselective Mukaiyama–Mannich reaction developed by the group of Yamamoto to address possible reaction intermediates ( Table 1 ). 30 Significantly, the challenge of achieving enantioselective Mukaiyama–Mannich reactions catalyzed by chiral phosphoric acids, which previously needed a 2-hydroxyphenyl moiety on the aldimine, was solved by taking advantage of a NTPA catalyst. 30 Previously, a NTPA catalyst bearing 2,4,6-trimethyl-3,5-dinitrophenyl substituents at the 3,3′-positions enabled the reactions with excellent enantioselectivities up to 95% ee. 30

To check, whether intermediates involving two catalyst molecules are present in the NTPA/imine complexes too, we conducted both a synthetic study and an NMR study. In this study, we once more employed NTPA 1 featuring 3,5-bis trifluoromethyl phenyl substituents at the 3,3′-positions, as in previously investigated CPA systems these substituents showed a general preference for dimerization. 47 In the synthetic part we aimed to achieve an inversion of enantioselectivity by varying temperature and catalyst loading in the Mukaiyama–Mannich reaction, as it was previously seen in the transfer hydrogenation of imines under CPA catalysis. 47 During the reaction, we applied temperatures ranging from −80 °C to +80 °C. Contrary to our expectations of observing an inversion, we merely observed a decrease in enantioselectivity from 86% ee to 56% ee ( Table 1 ). When the catalyst loadings were varied from 0.1% to 25%, only minor differences in enantioselectivity were detected. Consequently, the synthetic approach did not reveal any indication of a dimeric pathway.

Conclusions

Data availability, author contributions, conflicts of interest, acknowledgements.

  • D. Parmar, E. Sugiono, S. Raja and M. Rueping, Chem. Rev. , 2014, 114 , 9047  CrossRef   CAS   PubMed .
  • T. Akiyama and K. Mori, Chem. Rev. , 2015, 115 , 9277  CrossRef   CAS   PubMed .
  • M. Mahlau and B. List, Angew. Chem., Int. Ed. , 2013, 52 , 518  CrossRef   CAS   PubMed .
  • J. M. M. Verkade, L. J. C. van Hemert, P. J. L. M. Quaedflieg and F. P. J. T. Rutjes, Chem. Soc. Rev. , 2008, 37 , 29  RSC .
  • T. Akiyama, J. Itoh, K. Yokota and K. Fuchibe, Angew. Chem., Int. Ed. , 2004, 43 , 1566  CrossRef   CAS   PubMed .
  • D. Uraguchi and M. Terada, J. Am. Chem. Soc. , 2004, 126 , 5356  CrossRef   CAS   PubMed .
  • M. Yamanaka, J. Itoh, K. Fuchibe and T. Akiyama, J. Am. Chem. Soc. , 2007, 129 , 6756  CrossRef   CAS   PubMed .
  • S. J. Connon, Angew. Chem., Int. Ed. , 2006, 45 , 3909  CrossRef   CAS   PubMed .
  • S. Hoffmann, A. M. Seayad and B. List, Angew. Chem., Int. Ed. , 2005, 44 , 7424  CrossRef   CAS   PubMed .
  • N. Sorgenfrei, J. Hioe, J. Greindl, K. Rothermel, F. Morana, N. Lokesh and R. M. Gschwind, J. Am. Chem. Soc. , 2016, 138 , 16345  CrossRef   CAS   PubMed .
  • J. Greindl, J. Hioe, N. Sorgenfrei, F. Morana and R. M. Gschwind, J. Am. Chem. Soc. , 2016, 138 , 15965  CrossRef   CAS   PubMed .
  • K. Rothermel, M. Melikian, J. Hioe, J. Greindl, J. Gramüller, M. Žabka, N. Sorgenfrei, T. Hausler, F. Morana and R. M. Gschwind, Chem. Sci. , 2019, 10 , 10025  RSC .
  • L. Simón and J. M. Goodman, J. Am. Chem. Soc. , 2008, 130 , 8741  CrossRef   PubMed .
  • A. F. Zahrt, J. J. Henle, B. T. Rose, Y. Wang, W. T. Darrow and S. E. Denmark, Science , 2019, 363 , eaau5631  CrossRef   CAS   PubMed .
  • J. P. Reid and J. M. Goodman, J. Am. Chem. Soc. , 2016, 138 , 7910  CrossRef   CAS   PubMed .
  • J. P. Reid and J. M. Goodman, Chem.–Eur. J. , 2017, 23 , 14248  CrossRef   CAS   PubMed .
  • J. P. Reid and M. S. Sigman, Nature , 2019, 571 , 343  CrossRef   CAS   PubMed .
  • M. Orlandi, J. A. S. Coelho, M. J. Hilton, F. D. Toste and M. S. Sigman, J. Am. Chem. Soc. , 2017, 139 , 6803  CrossRef   CAS   PubMed .
  • A. J. Neel, A. Milo, M. S. Sigman and F. D. Toste, J. Am. Chem. Soc. , 2016, 138 , 3863  CrossRef   CAS   PubMed .
  • P. García-García, F. Lay, P. García-García, C. Rabalakos and B. List, Angew. Chem., Int. Ed. , 2009, 48 , 4363  CrossRef   PubMed .
  • P. S. J. Kaib, L. Schreyer, S. Lee, R. Properzi and B. List, Angew. Chem., Int. Ed. , 2016, 55 , 13200  CrossRef   CAS   PubMed .
  • D. Nakashima and H. Yamamoto, J. Am. Chem. Soc. , 2006, 128 , 9626  CrossRef   CAS   PubMed .
  • G. Caballero-García and J. M. Goodman, Org. Biomol. Chem. , 2021, 19 , 9565  RSC .
  • K. Rothermel, M. Žabka, J. Hioe and R. M. Gschwind, J. Org. Chem. , 2019, 84 , 13221  CrossRef   CAS   PubMed .
  • M. Žabka and R. M. Gschwind, Chem. Sci. , 2021, 12 , 15263  RSC .
  • H. Zhou, Y. Zhou, H. Y. Bae, M. Leutzsch, Y. Li, C. K. De, G.-J. Cheng and B. List, Nature , 2022, 605 , 84  CrossRef   CAS   PubMed .
  • R. Properzi, P. S. J. Kaib, M. Leutzsch, G. Pupo, R. Mitra, C. K. De, L. Song, P. R. Schreiner and B. List, Nat. Chem. , 2020, 12 , 1174  CrossRef   CAS   PubMed .
  • A. Borovika and P. Nagorny, Tetrahedron , 2013, 69 , 5719  CrossRef   CAS .
  • T. Hashimoto, H. Nakatsu, K. Yamamoto and K. Maruoka, J. Am. Chem. Soc. , 2011, 133 , 9730  CrossRef   CAS   PubMed .
  • F. Zhou and H. Yamamoto, Angew. Chem., Int. Ed. , 2016, 55 , 8970  CrossRef   CAS   PubMed .
  • K. Kaupmees, N. Tolstoluzhsky, S. Raja, M. Rueping and I. Leito, Angew. Chem., Int. Ed. , 2013, 52 , 11569  CrossRef   CAS   PubMed .
  • S. P. Bew, J. Liddle, D. L. Hughes, P. Pesce and S. M. Thurston, Angew. Chem., Int. Ed. , 2017, 56 , 5322  CrossRef   CAS   PubMed .
  • P. C. Knipe and M. D. Smith, Org. Biomol. Chem. , 2014, 12 , 5094  RSC .
  • H.-H. Liao, A. Chatupheeraphat, C.-C. Hsiao, I. Atodiresei and M. Rueping, Angew. Chem., Int. Ed. , 2015, 54 , 15540  CrossRef   CAS   PubMed .
  • E. M. Sherbrook, M. J. Genzink, B. Park, I. A. Guzei, M.-H. Baik and T. P. Yoon, Nat. Commun. , 2021, 12 , 5735  CrossRef   CAS   PubMed .
  • S. O. Garbuzynskiy, B. S. Melnik, M. Y. Lobanov, A. V. Finkelstein and O. V. Galzitskaya, Proteins , 2005, 60 , 139  CrossRef   CAS   PubMed .
  • S. Sharif, G. S. Denisov, M. D. Toney and H.-H. Limbach, J. Am. Chem. Soc. , 2007, 129 , 6313  CrossRef   CAS   PubMed .
  • H.-H. Limbach, M. Pietrzak, S. Sharif, P. M. Tolstoy, I. G. Shenderovich, S. N. Smirnov, N. S. Golubev and G. S. Denisov, Chem.–Eur. J. , 2004, 10 , 5195  CrossRef   CAS   PubMed .
  • H. Benedict, I. G. Shenderovich, O. L. Malkina, V. G. Malkin, G. S. Denisov, N. S. Golubev and H.-H. Limbach, J. Am. Chem. Soc. , 2000, 122 , 1979  CrossRef   CAS .
  • S. Sharif, G. S. Denisov, M. D. Toney and H.-H. Limbach, J. Am. Chem. Soc. , 2006, 128 , 3375  CrossRef   CAS   PubMed .
  • F. Löhr, S. G. Mayhew and H. Rüterjans, J. Am. Chem. Soc. , 2000, 122 , 9289  CrossRef .
  • G. Federwisch, R. Kleinmaier, D. Drettwan and R. M. Gschwind, J. Am. Chem. Soc. , 2008, 130 , 16846  CrossRef   CAS   PubMed .
  • A. J. Dingley and S. Grzesiek, J. Am. Chem. Soc. , 1998, 120 , 8293  CrossRef   CAS .
  • M. Melikian, J. Gramüller, J. Hioe, J. Greindl and R. M. Gschwind, Chem. Sci. , 2019, 10 , 5226  RSC .
  • D. Jansen, J. Gramüller, F. Niemeyer, T. Schaller, M. C. Letzel, S. Grimme, H. Zhu, R. M. Gschwind and J. Niemeyer, Chem. Sci. , 2020, 11 , 4381  RSC .
  • J. Gramüller, M. Franta and R. M. Gschwind, J. Am. Chem. Soc. , 2022, 144 , 19861  CrossRef   PubMed .
  • M. Franta, J. Gramüller, P. Dullinger, S. Kaltenberger, D. Horinek and R. M. Gschwind, Angew. Chem., Int. Ed. , 2023, 62 , e202301183  CrossRef   CAS   PubMed .
  • R. Mitra, H. Zhu, S. Grimme and J. Niemeyer, Angew. Chem., Int. Ed. , 2017, 56 , 11456  CrossRef   CAS   PubMed .

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Chemistry LibreTexts

6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

  • Last updated
  • Save as PDF
  • Page ID 398288

  • Serge L. Smirnov and James McCarty
  • Western Washington University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra. In this Chapter we will show how the well-resolved 2D 15 N-HSQC resonances can be assigned to specific residues and chemical groups within protein samples. As an example, we will consider a couple of complementary types of 3D NMR data: HNCACB and CBCA(CO)NH and their joint application for making heteronuclear NMR resonance assignment in proteins. Such an assignment opens a number of ways to probe structure and function (e.g. ligand binding) for the target protein samples.

Learning Objectives

  • Grasp why the resonance assignment of 2D 15 N-HSQC can be beneficial : the case of ligand (drug) binding by a protein (therapeutic target)
  • Familiarize with 3D heteronuclear through-bond (J-coupling) NMR : introduction and case of HNCACB and CBCA(CO)NH pair of 3D experiments
  • Follow an example of assignment of heteronuclear NMR resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ) from a combination of 2D 15 N-HSQC and 3D HNCACB/CBCA(CO)NH

15 N-HSQC as an assay for probing protein – ligand interactions: the need for the NMR resonance assignment

During the process of rational drug design, it is often necessary to characterize the interactions between the therapeutic target (protein) and candidate drug (ligand) beyond determination of the binding affinity ( K d ). Heteronuclear solution NMR experiments 15 N-HSQC can provide significant insight for such interactions. Let’s recall that most of the signals in this 2D NMR spectra originate from backbone H-N amide groups and some (minority) from the side chain NH and NH 2 groups. The position of 15 N-HSQC resonances are defined by the 1 H N and 15 N H chemical shift values, which in tern depend on the local electronic environment. Ligand binding changes such an environment for the residues forming the binding site even if the tertiary structure of the rest of the protein does not get perturbed. In such a case, the 15N-HSQC resonance pattern undergoes local changes: only the resonances representing NH groups involved in the binding site change their position significantly (>0.05 ppm in 1 H and/or >0.2 ppm in 15 N dimension) or signal intensity (including peak disappearance). Figure VI.2.A illustrates such a change.

Figure_VI.2.Ab_.png

Importantly, every 15 N-HSQC resonance in Figure VI.2.A is labeled with a single letter to help identify specific peaks which undergo spectral changes upon ligand binding. This data could have much greater impact if the peaks which underwent the most pronounced changes in position and/or intensity were assigned to specific amino acid residues within the polypeptide and chemical groups within those residues (backbone vs. side chain). The rest of this Chapter demonstrates some of the fundamentals of the heteronuclear NMR resonance assignment methodology.

Heteronuclear 3D NMR introduction: CBCA(CO)NH spectrum as an example

Just like every 2D 15N-HSQC resonance reports a J-coupling via a covalent bond between an 15N and 1H spin-½ nuclei, there are 3D NMR experiments which report resonances originating from J-coupling (through-bond) of three types of spin-½ nuclei ( 1 H, 13 C, 15 N). In this section we will introduce two such types of 3D NMR data: HNCAB and CBCA(CO)NH. In order to produce a protein sample with nearly complete uniform labeling with 13 C and 15 N isotopes, bacterial recombinant protein expression can be performed in a minimal media supplemented with 13 C-labeled glucose and 15 N-labeled ammonium chloride as the sole sources of carbon and nitrogen respectively. Figure VI.2.B introduces a general concept of a 3D NMR data and shows an element of 3D CBCA(CO)NH spectrum.

Figure_VI.2.Bd_.png

Each resonance (“cross-peak”) of a 3D CBCA(CO)NH spectrum indicates a through-bond (J-coupling scalar) interaction between two atoms of the backbone amide group ( 1 H N and 15 N H ) or residue j and Cα and Cβ nuclei ( 13 C) of preceding residue j -1. The name of the experiment, CBCA(CO)NH refers to the specific spin-½ nuclei involved (and not involved) in relevant J-coupling interactions: Cβ and Cα are J-coupled to NH while the connecting carbonyl carbon is not reporting any NMR signal (although its magnetization state is affected during the experiment). Two types of residues generate special CBCA(CO)NH peak pattern: prolines have no amide proton, so they do not have CBCA(CO)NH peaks linked with their amide groups. Glycine residues have no Cβ, therefore for any residue following a glycine only a single CBCA(CO)NH resonance will be observed (from glycine NH to previous Cα).

The NMR resonance assignment: combined use of two complementary datasets HNCACB and CBCA(CO)NH

By itself, CBCA(CO)NH does not convey much of sequential information. Another heteronuclear 3D NMR dataset, HNCACB, affords a powerful complement here. Just like CBCA(CO)NH, HNCACB reports resonances originating from J-coupling between backbone amide group and Cα / Cβ nuclei. The difference is that HNCACN reports two additional peaks, all intra-residual: between HN and Cα a Cβ spins ( Figure VI.2.C ).

Figure_VI.2.Clast_.png

Typically, HNCACB and CBCA(CO)NH are acquired with identical parameters including spectral width in all three dimensions and the same number of data points in the 15 N dimension (or 15 N planes as on panel B of Figure VI.2.B ) Now, let’s imagine that we go through every 15 N plane and build the pairs of “residue j / residue j -1″ HNCAB/CBCA(CO)NH peaks. This does not give us the sequence-specific NMR resonance assignments yet but already creates such pairs of 3D cross-peaks linked to di-peptides within the sequence. Now, let’s take into account that for some types of residues their 13Cα and 13Cβ chemical shift values differs remarkably from those from other residue types. For details, take a look at BMRB chemical shift statistics for amino acid residues with emphasis on Gly, Ala, Ser, Thr. Knowing where such residues are positioned within the polypeptide sequence, we can start “connecting the dots” by mapping HNCACB/CBCA(CO)NH planes and di-peptides on actual amino acid sequence.

Figure_VI.2.D.png

Figure VI.2.D provides a general idea of how the two 3D NMR experiments HNCACB and CBCA(CO)NH can be utilized together to map the signals on the amino acid sequence of a protein sample. The C of Ala residues typically has chemical shift values below 20.0 ppm, which is unique. This allows identification of Ala patterns HNCACB/CBCA(CO)NH spectral patters. Starting from this starting points (as well from other distinct values, e.g. Cα for Gly and Cβ for Ser/Thr), one can continue “connecting the dots” process outlined in Figure VI.2.D to cover the entire sequence. If these two 3D NMR datasets encounter resonance overlaps, which are impossible to resolve, more 3D NMR dataset pairs are utilized in a similar way, e.g. HNCO/HN(CA)CO and others. This process allows assignment to specific residues and chemical groups of nearly all backbone and some side-chain resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ). Methods for assigning side-chain chemical shift values are not discussed in this chapter but conceptually they are similar to the ones described here.

With the general process of the protein NMR resonance assignment described, let’s assume that this method was successfully applied to the protein target (T) sample presented in Figure VI.2.A. The resonance assignment completion allows one to replace letter labels with residue-number labels (similar to the ones used in Figure VI.2.D). This in turn allows one to determine the specific residues affected directly or allosterically by binding of the ligand (L) to the target. In many cases, such information together with other data leads to the determination of the ligand binding residues within the target. If the ligand is a candidate therapeutic agent, identification of the ligand binding residues greatly advances ensuing efforts to optimize the drug.

Example \(\PageIndex{1}\)

Analyze Figure VI.2.A and list at least two resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Major spectral changes for this model spectrum include resonances moving by >0.05 ppm in 1 H or >0.2 ppm in 15 N dimensions as well as peak disappearance (peak intensity going down to zero).

Upon ligand L binding target protein (T), resonance f disappears and resonance s moves by >0.05 ppm in 1 H dimension.

Example \(\PageIndex{2}\)

Inspect BMRB entry 50205 and list all the heteronuclear NMR datasets utilized for the NMR resonance assignment.

BMRB entry 50205 contains the chemical shift assignment data for the target sample and offers several ways to look at its underlying NMR data including the list of experiments used to perform the NMR resonance assignment and the chemical shift values. E.g., the NMR-STAR v3 text file has a section titled _Experiment_list, which sums up the heteronuclear NMR data types used for making the assignments: 2D 1 H- 15 N HSQC and 3D HNCACB, CBCA(CO)NH, HNCO and HN(CA)CO.

Example \(\PageIndex{3}\)

How many 3D HNCACB resonances would you expect to originate from a Lys residue which is preceded by a Met?

four as both Lys and Met have backbone amide (HN) groups and both have Cα and Cβ atoms.

Practice Problems

Problem 1 . Analyze Figure VI.2.A and list all the resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Example 1 above will help you start the analysis.

Problem 2 . From BMRB entry linked to PDB 5VNT, list all the heteronuclear NMR datasets utilized for the NMR resonance assignment for the target sample.

Problem 3 . Let’s consider panel B of Figure VI.2.B . Imagine that the 13 C dimension is taken out of the spectrum (all 13 C planes are collapsed together). What type of 2D spectrum will remain after such a dimension reduction?

Problem 4 . How many 3D HNCACB resonances would you expect to originate from a Gly residue which is preceded by a Pro?

Problem 5 . How many 3D HNCACB resonances would you expect to originate from a Pro residue which is preceded by a Gly?

Problem 6* . Look up the amino acid NMR chemical shift values statistics table presented with BMRB repository and list the average values for the following resonances: 15 N, 13 Cα and 13 Cβ for Gly, Ala, Tyr, Glu, Arg, Ser, Thr, Pro. From this analysis, suggest what types of residues tend to report unusually low or high chemical shift values in comparison with the rest of the amino acids?

IMAGES

  1. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    proton nmr assignment

  2. Proton nuclear magnetic resonance ( 1 H NMR) spectrum data (A) of the

    proton nmr assignment

  3. SOLVED: NMR analysis: Identify all chemical shifts and assign protons

    proton nmr assignment

  4. How To Draw The Proton NMR Spectrum of an Organic Molecule

    proton nmr assignment

  5. Compound Interest: Analytical Chemistry

    proton nmr assignment

  6. Solved Methyl butanoate, produces the proton NMR that is

    proton nmr assignment

VIDEO

  1. A2 NMR part 2 Proton NMR

  2. Proton NMR Explained 😲 #shorts #viral #shortvideo #study

  3. NMR Spectroscopy ll Part

  4. POKY: Manual Protein NMR Backbone Assignment by Mikayla Truong

  5. INTERPRETATION OF NMR SPECTRA| RULES OF INTERPRETATION FOR NMR SPECTRA| @jhwconcepts711

  6. NMR Lecture 2

COMMENTS

  1. Assignment of 1H-NMR spectra

    H-NMR spectra. On this page we will deal with how to interpret an NMR spectrum. The meaning of assignment in the title is to assign each peak to a proton in the molecule under investigation. The examples here are of 1D proton assignments. For more complex examples, see the 2D assignments of 12,14-di t butylbenzo [g]chrysene and cholesteryl acetate.

  2. 5.10: Interpreting Proton NMR Spectra

    Expected Product: 1 H NMR: The ratio of protons is 2:3:3. Solution. Yes, ethyl acetate was synthesized. First, the ratio of protons equals the number of protons in our expected product. The peak at 1.20 ppm is a triplet that integrates to 3 protons. The integration indicates that the peak corresponds to a methyl group.

  3. 5: Proton Nuclear Magnetic Resonance Spectroscopy (NMR)

    5.8: Structural Assignment Assignment of structures is a central problem which NMR is well suit to address. Explains how both 13C NMR spectra and low and high resolution proton NMR spectra can be used to help to work out the structures of organic compounds. (n 1) Rule; Background to C-13 NMR; Determine Structure with Combined Spectra

  4. Introduction to Proton NMR

    On this page we are focusing on the magnetic behaviour of hydrogen nuclei - hence the term proton NMR or 1 H-NMR. 1 H NMR spectroscopy is used more often than 13 C NMR, partly because proton spectra are much easier to obtain than carbon spectra. The 13 C isotope is only present in about 1% of carbon atoms, and that makes it difficult to detect.

  5. NMR Spectroscopy

    Proton NMR Spectroscopy ... Chemical shift assignments for these signals are shown in the shaded box above the spectrum. The chemical shift of the hydrogen-bonded hydroxyl proton is δ 14.5, exceptionally downfield. We conclude, therefore, that the rate at which these tautomers interconvert is slow compared with the inherent time scale of nmr ...

  6. Proton nuclear magnetic resonance

    Example 1 H NMR spectrum (1-dimensional) of a mixture of menthol enantiomers plotted as signal intensity (vertical axis) vs. chemical shift (in ppm on the horizontal axis). Signals from spectrum have been assigned hydrogen atom groups (a through j) from the structure shown at upper left.. Proton nuclear magnetic resonance (proton NMR, hydrogen-1 NMR, or 1 H NMR) is the application of nuclear ...

  7. Protein NMR Resonance Assignment

    The establishment of the sequential assignment procedure without depending on the existing three-dimensional (3D) structures was, therefore, a milestone for the protein NMR. Backbone amide proton (1 H N) and α proton (1 H α) signals were sequentially assigned based on the distance information between 1 H N i and 1 H α i − 1 and between 1 H ...

  8. Protein NMR Resonance Assignment

    This facilitates NMR to be independent from X-ray crystallography and the structure of proteins in solution could be determined by NMR using the assignment of proton signals and proton-proton distance information. However, due to limited resolution in 1 H 2D-NMR spectra, the molecular weight of the target protein is restricted to be less than 8 ...

  9. PDF Assigning the 1H-NMR Signals of Aromatic Ring 1H-atoms

    another. For an example, the aromatic region of the 1H-NMR of o-isopropylaniline will be analyzed. 1) Coupling Patterns The first analysis should always involve the observable coupling in each of the signals in the aromatic region. In a 300 MHz spectrum, the ortho and meta couplings may all be resolved and provide information about the assignments.

  10. Rapid Proton-Detected NMR Assignment for Proteins with Fast Magic Angle

    Using a set of six 1H-detected triple-resonance NMR experiments, we establish a method for sequence-specific backbone resonance assignment of magic angle spinning (MAS) nuclear magnetic resonance (NMR) spectra of 5-30 kDa proteins. The approach relies on perdeuteration, amide 2H/1H exchange, high magnetic fields, and high-spinning frequencies (ωr/2π ≥ 60 kHz) and yields high-quality NMR ...

  11. Frontiers

    Assignment of 1 H-detected Solid-State NMR Spectra. To assign the amide H N and aliphatic H α protons of fully protonated Rpo7 * in complex with Rpo4, we used proton-detected spectroscopy at 110 kHz MAS frequency. The assignment of the 2D hNH fingerprint spectrum is shown in Figure 3.The assignment was done using three 3D spectra, namely hCANH, hNCAH, and hCONH (Barbet-Massin et al., 2014 ...

  12. Complete 1H and 13C NMR spectral assignment of d-glucofuranose

    Here, complete 1 H and 13 C NMR spectral analysis of α- and β- d -glucofuranose was performed, including signal assignment, chemical shifts, and coupling constants. Selective and non-selective 1D and 2D NMR experiments were used for the analysis, complemented by spin simulations and iterative spectral analysis.

  13. Novel proton NMR assignment procedure for RNA duplexes

    Novel proton NMR assignment procedure for RNA duplexes. Hans A. Heus; and ; Arthur Pardi; Cite this: J. Am. Chem. Soc. 1991, 113, 11, 4360-4361. ... Facilitated Assignment of Adenine H2 Resonances in Oligonucleotides Using Homonuclear Long-Range Couplings. Journal of the American Chemical Society 2009, 131 ...

  14. Assigning NMR spectra of RNA, peptides and small organic molecules

    NMR assignment typically involves analysis of peaks across multiple NMR spectra. Chemical shifts of peaks are measured before being assigned to atoms using a variety of methods. ... Database proton NMR chemical shifts for RNA signal assignment and validation. J Biomol NMR. 2013; 55:33-46. doi: 10.1007/s10858-012-9683-9. [PMC free article ...

  15. NMR: Structural Assignment

    This action is not available. Assignment of structures is a central problem which NMR is well suit to address. Explains how both 13C NMR spectra and low and high resolution proton NMR spectra can be used to help to work out the ….

  16. NMR-Challenge.com: Exploring the Most Common Mistakes in NMR Assignments

    NMR spectroscopy is the most powerful tool for determining the structures of chemicals. It is applied in a broad range of scientific disciplines, including physics, structural biology, material science, medicine, and chemistry. Interpreting NMR spectra is part of the core skill set mainly of organic chemists. In 2022 we introduced the educative website NMR-Challenge.com presenting 200 spectral ...

  17. How to assign the signals in the H-NMR of Aspirin?

    Here is a link to the actual assigned proton nmr spectrum for acetylsalicylic acid. Your peak assignments are correct. Would E and B be doublet of doublets because they couple with D and C so they would have ortho and meta coupling? Yes, that is correct. Here is another link with more detail on the coupling constants (see p. 13 for example)

  18. NMR Chemical Shift Values Table

    The Chemical Shift of Connected to sp3 Hybridized Carbons. We can see in the table that sp3 hybridized C - H bonds in alkanes and cycloalkanes give signal in the upfield region (shielded, low resonance frequency) at the range of 1-2 ppm. The only peak that comes before saturated C-H protons is the signal of the protons of tetramethylsilane ...

  19. NMR

    The proton NMR chemical shift is affect by nearness to electronegative atoms (O, N, halogen.) and unsaturated groups (C=C,C=O, aromatic). Electronegative groups move to the down field (left; increase in ppm). Unsaturated groups shift to downfield (left) when affecting nucleus is in the plane of the unsaturation, but reverse shift takes place in ...

  20. 1H proton nmr spectrum of phenol C6H6O C6H5OH low/high resolution

    Introductory note on the 1H NMR spectra of phenol. Students and teachers please note my explanation of the proton NMR spectrum of phenol is designed for advanced, but pre-university, chemistry courses. The chemical shift δ splitting pattern effects for phenol are confined to a proton spin-spin coupling effects analysed using the n+1 rule for adjacent non-equivalent proton fields (n is the ...

  21. Assignment of the proton NMR spectrum of reduced and oxidized

    Improved pulse sequences for sequence specific assignment of aromatic proton resonances in proteins. Journal of Biomolecular NMR 2007, 37 (3 ... 1H,13C,15N-NMR Resonance Assignments of Oxidized Thioredoxin h from the Eukaryotic Green Alga Chlamydomonas reinhardtii Using New Methods based on Two-Dimensional Triple-Resonance NMR Spectroscopy and ...

  22. Rapid protein assignments and structures from raw NMR spectra with the

    However, the analysis of NMR spectra and the resonance assignment, which are indispensable for NMR studies, remain time-consuming even for a skilled and experienced spectroscopist.

  23. Highly acidic N -triflylphosphoramides as chiral Brønsted acid

    The hydrogen bond was characterized by analyzing 1 H and 15 N chemical shifts in a Steiner-Limbach correlation and compared with the corresponding CPA/imine complexes. 37-39 Second, various two-dimensional NMR-techniques were applied at 180 K for a chemical shift assignment of the NTPA/imine systems. Based on the assignment, characteristic ...

  24. Remote proton elimination: C-H activation enabled by distal ...

    Here, we present an approach to the formation of carbon-carbon σ bonds by remote proton elimination, a distinct mode of carbon-hydrogen activation enabled by distal acidification through five carbon-carbon bonds. ... H. Kählig for valuable contributions to the assignment of the decalin products by NMR; D. Kaiser for expert proofreading ...

  25. Proton NMR studies of human lysozyme: spectral assignment and

    1H-NMR assignments and local environments of aromatic residues in bovine, human and guinea pig variants of alpha-lactalbumin. European Journal of Biochemistry 1992 , 210 (3) , 699-709.

  26. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    This page titled 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Serge L. Smirnov and James McCarty. In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra.