A Step-By-Step Guide to 1D and 2D NMR Interpretation

Nuclear Magnetic Resonance (NMR) spectroscopy is an incredibly powerful tool for characterizing molecular structures. When submitting to the FDA or other regulatory agencies, full structural characterization by NMR provides crucial evidence of compound identity. A combination of 1-dimensional and 2-dimensional NMR experiments are necessary for complete confidence in chemical structure. This post will walk you through the steps to fully characterize a molecule by 1- and 2-dimensional NMR, including on how to perform NMR interpretation.

Typical Outline Of NMR Experiments For Structure Elucidation

Step 1: ¹H-NMR

The first step in structural characterization is 1-dimensional proton ¹H-NMR. The chemical shift, multiplicity, coupling constants, and integration are all factors to consider when assigning protons. In this example, only three protons can be assigned by the proton spectrum alone: protons 3, 4, and 6.

1H

To begin, let’s start with proton 3 . Proton 3 is the only methyl group in the structure, and therefore must integrate to 3 protons. The only peak with an integration of 3 is the doublet at 1.770 ppm. The high field chemical shift supports this assignment. The peak is split into a doublet with a coupling constant of 1.2 Hz, reflecting the long-range coupling between protons 3 and 4, which also supports this assignment.

Protons that are coupled to each other should exhibit the same coupling constant. The long-range coupling constant observed for proton 3 (J=1.2 Hz, split into a doublet by proton 4) is reflected in the coupling constant for proton 4 (J=1.2 Hz, split into a quartet by proton 3). Therefore, the peak at 7.690 ppm must represent proton 4 ! The integration and chemical shift support the assignment, as proton 4 is the only aromatic proton in the structure.

There is only one singlet in the ¹H-NMR spectrum. The only proton that should show up as a singlet is proton 6 , as it has no neighboring protons that would split the peak (the nearest proton is 5 bonds away!). The chemical shift of 11.256 ppm supports this assignment, as imide protons often show up far downfield. The peak also integrates to 1 proton, supporting the assignment.

The remaining protons are doublets, triplets, and multiplets that can be assigned by 2-dimensional COSY.

Integration Flowchart

Step 2: ¹H-¹H COSY

¹H-¹H Correlation Spectroscopy (COSY) shows the correlation between hydrogens which are coupled to each other in the ¹H NMR spectrum. The ¹H spectrum is plotted on both 2D axes. While 2-bond and 3-bond ¹H-¹H coupling is easily visible by COSY, long range coupling can also be observed with long acquisition times. The cross-peaks (not on the diagonal) that are symmetric to the diagonal show the COSY correlations. For example, protons 3 and 4 are coupled to each other, since they form a box pattern symmetric to the diagonal. This confirms assignments 3 and 4 made from the proton spectrum alone.

Thymidine COSY

Two types of COSY coupling: 3-bond short range coupling between protons 7 and 8 (red) and 4-bond long range coupling between protons 3 and 4 (blue).

Cosy Zoom Edited

My favorite way to analyze a COSY spectrum with many unassigned protons is to make a table of correlations, like the one seen here. Look at the table for any clear differences in correlation and begin there! In this example, all unassigned protons show one or two COSY correlations-except the proton at 4.233 ppm, which correlates to three other protons by COSY. The only proton expected to correlate with three nonequivalent protons is proton 9 !

Now that proton 9 has been assigned, the fun really begins. Thymidine’s structure suggests that proton 9 should couple protons 8, 10, and 11. Based on the COSY, proton 9 couples protons at 2.068 ppm (2H), 3.754 ppm (1H), and 5.209 ppm (1H). From this list, we can easily assign proton 8 as the peak at 2.068 ppm based on its integration of 2 protons. To differentiate protons 10 and 11, take a look at our COSY table; 3.754 ppm shows two COSY correlations, while 5.209 ppm only shows one. Therefore, we can assign proton 10 as 5.209 ppm and proton 11 as 3.754 ppm.

Once proton 8 has been assigned, we can easily assign proton 7 based on the remaining COSY correlation for proton 8. Proton 7’s peak at 6.163 ppm is split into a triplet by the two 8 protons, confirming the assignment.

All that remains are protons 12 and 13. We can assign proton 12 (3.564 ppm) based on its integration of 2H and its COSY correlation to proton 11. The last remaining peak at 4.999 ppm must be proton 13 ; this is confirmed by COSY correlation with proton 12, triplet multiplicity based on splitting by proton 12, and integration of one proton.

Thymidine COSY 1H Correlation Flowchart

Now we have a fully assigned ¹H-NMR spectrum! This spectrum will help us assign our carbons using HSQC and HMBC NMR spectroscopy.

1h Side Black Edited 2 2

Step 3: ¹³C-NMR

Carbon NMR is a necessary step in full structural characterization. However, ¹³C-NMR alone does not provide enough information to assign the carbons in the molecule. The NMR spectrum below does confirm the number of carbons in the molecule; however, HSQC and HMBC (we will get to these soon!) are necessary to assign the carbons with confidence. Note that one of the carbons is hidden beneath the solvent signal but is clearly visible after zooming into that region.

13c With Zoom Edited

Step 4: DEPT-45, 90, and 135

Distortionless Enhancement of Polarization Transfer (DEPT) experiments help assign carbon peaks by determining the number of protons attached to each carbon. For very simple molecules, DEPT may be enough to partially or fully assign all carbons. In complex molecules, DEPT and HSQC together are useful for confirming both carbon and proton assignments. For example, the DEPT experiments below can only identify carbon 3 -it is the only CH₃ peak. I always go back and use DEPT to confirm the carbons I assigned by HSQC.

  • DEPT-45 shows CH, CH₂, and CH₃ carbons as positive peaks. Carbons with no protons are not visible.
  • DEPT-90 shows only CH peaks as positive peaks. Carbons with no protons, CH₂, and CH₃ carbons are not visible.
  • DEPT-135 shows CH and CH₃ carbons as positive peaks and CH₂ carbons as negative peaks. Carbons with no protons are not visible.

Dept Overlay No Labels Edited

Step 5: ¹H-¹³C HSQC

¹H-¹³C Heteronuclear Single Quantum Coherence Spectroscopy (HSQC) shows which hydrogens are directly attached to which carbon atoms. The ¹H spectrum is shown on the horizontal axis and the ¹³C spectrum is shown on the vertical axis. The HSQC spectrum is most valuable when protons have already been assigned.

For example, HSQC shows a correlation between proton 4 and the carbon at 136.113 ppm; this carbon is now assigned as carbon 4. Carbons 3, 4, 7, 8, 9, 11, and 12 are assigned by HSQC. Only 1-bond correlations are observed, so HSQC assignments are relatively straightforward. The DEPT experiments also confirm these assignments. HSQC is also useful in confirming proton assignments of nitrogen or oxygen-bound protons; they show no signal by HSQC. This further supports the assignments of protons 6, 10, and 13.

HSQC Edited

An example correlation between proton and carbon 4 is observed by HSQC.

Step 6: ¹H-¹³C HMBC

¹H-¹³C Heteronuclear Multiple Bond Correlation Spectroscopy (HMBC) shows the correlations between protons and carbons that are separated by multiple bonds. The ¹H spectrum is shown on the horizontal axis and the ¹³C spectrum is shown on the vertical axis. Correlated atoms are shown in blue and the connecting atoms are shown in red. Note that direct hydrogen-carbon bonds (1-bond correlations) are generally not seen. For example, hydrogen 4 shows correlations with carbons 1, 2, 3, 5, and 7, but not carbon 4.

HMBC Thymidine Structure

HMBC interactions between proton 4 and carbons 1, 2, 3, 5, and 7.

HMBC is incredibly useful for assigning carbons that have no protons attached. In this example, carbons 1, 2, and 5 have no protons attached. Carbon 1 is assigned by HMBC interactions with protons 3, 4, and 6; carbon 2 by interaction with protons 3, 4, 6, and 7; and carbon 5 by interactions with protons 4 and 7 only. The chemical environment of carbon 5 suggests it would appear more downfield than carbon 1, which confirms these assignments.

HMBC also confirms assignments that were based solely on the proton and COSY spectrum. For example, protons 10 and 13 are differentiated by HMBC; proton 10 is confirmed by interactions with carbons 8, 9, and 11 , while proton 13 is confirmed by interactions with 11 and 12 . HMBC supports all proton and all carbon assignments, unambiguously confirming both the structure and analysis of thymidine.

HMBC Edited

At Emery Pharma, we are experts in 1D and 2D NMR characterization and structure elucidation; in fact, 2D NMR projects are some of our favorites! We have supported numerous pharmaceutical companies in full NMR characterization for API submissions to regulatory agencies, as well as complete structure elucidation of impurities. We provide a fully annotated report with images similar to those seen here and support our results with high resolution mass spectrometry and elemental analysis. For more information on our NMR services, including GLP/cGMP or R&D projects, please visit our NMR Services page , or contact us at [email protected] .

2d nmr assignment

Emery Pharma

Request for proposal, let us be a part of your success story..

Do you have questions regarding a potential project? Or would you like to learn more about our services? Please reach out to a member of the Emery Pharma team via the contact form, and one of our experts will be in touch soon as possible. We look forward to working with you!

2d nmr assignment

2D Assignment and quantitative analysis of cellulose and oxidized celluloses using solution-state NMR spectroscopy

  • Original Research
  • Open access
  • Published: 27 July 2020
  • Volume 27 , pages 7929–7953, ( 2020 )

Cite this article

You have full access to this open access article

2d nmr assignment

  • Tetyana Koso   ORCID: orcid.org/0000-0002-0429-3205 1 ,
  • Daniel Rico del Cerro   ORCID: orcid.org/0000-0002-6606-4344 1 ,
  • Sami Heikkinen 1 ,
  • Tiina Nypelö   ORCID: orcid.org/0000-0003-0158-467X 2 ,
  • Jean Buffiere 3 ,
  • Jesus E. Perea-Buceta   ORCID: orcid.org/0000-0003-0028-1271 1 ,
  • Antje Potthast   ORCID: orcid.org/0000-0003-1981-2271 4 ,
  • Thomas Rosenau   ORCID: orcid.org/0000-0002-6636-9260 4 ,
  • Harri Heikkinen   ORCID: orcid.org/0000-0002-0240-6528 5 ,
  • Hannu Maaheimo 6 ,
  • Akira Isogai   ORCID: orcid.org/0000-0001-8095-0441 7 ,
  • Ilkka Kilpeläinen   ORCID: orcid.org/0000-0001-8582-2174 1 &
  • Alistair W. T. King   ORCID: orcid.org/0000-0003-3142-9259 1  

6677 Accesses

34 Citations

1 Altmetric

Explore all metrics

The limited access to fast and facile general analytical methods for cellulosic and/or biocomposite materials currently stands as one of the main barriers for the progress of these disciplines. To that end, a diverse set of narrow analytical techniques are typically employed that often are time-consuming, costly, and/or not necessarily available on a daily basis for practitioners. Herein, we rigorously demonstrate a general quantitative NMR spectroscopic method for structural determination of crystalline cellulose samples. Our method relies on the use of a readily accessible ionic liquid electrolyte, tetrabutylphosphonium acetate ([P 4444 ][OAc]):DMSO-d 6 , for the direct dissolution of biopolymeric samples. We utilize a series of model compounds and apply now classical (nitroxyl-radical and periodate) oxidation reactions to cellulose samples, to allow for accurate resonance assignment, using 2D NMR. Quantitative heteronuclear single quantum correlation (HSQC) was applied in the analysis of key samples to assess its applicability as a high-resolution technique for following cellulose surface modification. Quantitation using HSQC was possible, but only after applying T 2 correction to integral values. The comprehensive signal assignment of the diverse set of cellulosic species in this study constitutes a blueprint for the direct quantitative structural elucidation of crystalline lignocellulosic, in general, readily available solution-state NMR spectroscopy.

Graphic abstract

2d nmr assignment

Similar content being viewed by others

2d nmr assignment

Atomic resolution of cotton cellulose structure enabled by dynamic nuclear polarization solid-state NMR

2d nmr assignment

Direct quantification of the degree of polymerization of hydrolyzed cellulose by solid-state NMR spectroscopy

2d nmr assignment

Resolving the discrepancies in reported 13C solid state NMR chemical shifts for native celluloses

Avoid common mistakes on your manuscript.

Introduction

Surface chemical modification of cellulosic materials is a logical approach to tune the properties and, thus, applicability of these bio-renewable polymers (Klemm et al. 2005 ). Unlike in small-molecule-based chemical disciplines, with cellulosics, currently there is no established general quantitative analytical technique to accurately assess chemical changes with sufficient resolution. This is in large part due to the poor solubility of cellulosic materials in common molecular solvents, preventing non-destructive solution-state analyses. This limitation has imposed researchers to rely on the poorer chemical resolution of solid-state techniques or indirect methods for characterization of samples, which contain a significant phase composition of crystalline cellulose. Typically, a succession of direct and indirect methods are applied for this task, affording partial insights. However, this process is often lengthy as a whole and its threads are difficult to bring together. Solid-state NMR, in particular, has found utility in the quantification of the different crystalline phases in celluloses (Newman 1999 ; Kono et al. 2002 ; Zuckerstätter et al. 2009 ). High resolution, using ultra-fast magic angle spinning (MAS), and multidimensional experiments are possible for solid-state NMR. However, spectral resolution is rather limited using typical MAS probes, preventing the accurate separation and quantitation of different chemical species. In addition, T 1 relaxation times in the solid state are typically very long, requiring labeling strategies to give sufficient signal-to-noise (S/N) for quantitative experiments. Chemical modification of nanocelluloses (Habibi et al. 2010 ), which by nature involves regioselective surface chemistry, represents a significant challenge due to the infancy of the field and complexity of the materials. This has been compounded by a flood of conceptual articles applying chemistries but which lack the analytical rigor of traditional chemistry disciplines. For accurate definition of feedstocks and reaction products, a multitude of complimentary methods are commonly used (Foster et al. 2018 ). However, until recently the one irreplaceable method to organic chemistry (solution-state NMR) has not seriously been considered.

The proposed solution-state NMR technique is practically applicable to all crystalline celluloses and even whole biomass samples, provided the molecular weights are not high enough to reduce spectral resolution and S/N (due to relaxation effects), as demonstrated by Holding et al. ( 2016 ). However, a specific requirement for a useable NMR method is to have accurate assignments for the structural features common to the most studied samples. Another specific requirement is to allow for quantitation of chemical species, which is somewhat limited using NMR to analyze polymeric samples, even using basic 1D solution-state experiments. This is particularly difficult for the more complicated chemical modifications, or whole biomass samples, which would require resolution in several dimensions.

Previously, we published the use of a novel ionic liquid electrolyte, tetrabutylphosphonium acetate ([P 4444 ][OAc]):DMSO-d 6 for the solution-state NMR analysis of nanocelluloses (King et al. 2018 ). The choice of the ionic liquid electrolyte was discussed, in detail, in previous articles. (Deb et al. 2016 ; King et al. 2018 ) However, the choice is very much related to the high stability of tetraalkylphosphonium cations preventing reaction with solutes and, thus, artifact formation. In addition, the ability to dissolve cellulose efficiently at such low ratios of [P 4444 ][OAc] to DMSO allows for low viscosity solutions, thus, higher resolution spectra. Furthermore, as [P 4444 ][OAc] signals do not overlap with the cellulose resonances, in the 1 H and 13 C ppm domains, which makes [P 4444 ][OAc] ideal for this purpose. Direct-dissolution NMR solvents, based on the use of 1-ethyl-3-methylimidazolium acetate ([emim][OAc]) (Cheng et al. 2013 ) or tetrabutylammonium fluoride ([N 4444 ]F) (Heinze et al. 2000 ; Östlund et al. 2009 ), are problematic for fine chemical analysis of cellulosics. [emim][OAc] is known to react with cellulose (Liebert and Heinze 2008 ; Ebner et al. 2008 ; Clough et al. 2015 ) and high purity [N 4444 ]F is very unstable in non-protic solvents (Sun and DiMagno 2005 ). Unfortunately, both also have signals that overlap with the cellulose backbone resonances. Alternative low-cost and unreactive perdeuterated cellulose solvents have also not yet appeared.

In this work, we provide thorough characterization for a few different cellulose substrates, using the ([P 4444 ][OAc]):DMSO-d 6 electrolyte, before and after applying common oxidation schemes. Three cellulose substrates were used. The first is low degree of polymerization-cellulose nanocrystals (LDP-CNC), isolated by super-critical water (sc-H 2 O) extraction of microcrystalline cellulose (MCC) (Buffiere et al. 2016 ). This was used as it is quite low molecular weight, offering good spectral resolution for signal assignment. The second was pristine cellulose nanocrystals (CNCs) derived from cotton. This is a representative CNC sample, also with relatively low molecular weight. The third substrate was MCC, a common cellulose model compound. The reaction products include cellulose which has been oxidized using either of two synthetically significant methods: 1) periodate oxidation (Kim et al. 2000 ; Nypelö et al. 2018 ) or 2) nitroxyl-radical (e.g., TEMPO)-oxidation (Isogai et al. 2011 , 2018 ). The spin-systems are assigned (polymeric and terminal units) using a range of common NMR methods and with the help of the monomeric (glucose, gluconic acid and glucuronic acid) and dimeric model compounds (cellobiose and cellobionic acid). Standard heteronuclear single-quantum correlation (HSQC) NMR experiments are not quantitative. Therefore, a suitable quantitative HSQC sequence was tested, with and without T 2 correction, to demonstrate the accuracy of separation and quantitation of key chemical species, before and after oxidation. The results aim to illustrate the potential of this method, not only for analysis of cellulose and cellulose derivatives, but also as a method to improve quantitation in analysis of lignocellulosics in general.

Materials and methods

Raw materials and preparation of oxidized celluloses.

MCC (DP N-GPC 153) was purchased from Sigma-Aldrich. The LDP-CNCs (DP N-GPC 37, 15 wt% dispersed in water) were the precipitated ‘residue’ from the sc-H 2 O extraction of microcrystalline cellulose (MCC), as described by Buffiere et al. ( 2016 ). They were freeze-dried before use to remove as much free water as reasonably possible. Nitroxyl-radical oxidation of the LDP-CNCs was carried out in the NaClO/NaClO 2 system in the presence of 4-acetamido-2,2,6,6-tetramethylpiperidine 1-oxyl (4-AcNH-TEMPO), under acidic conditions (pH 5.8), according to Hirota et al. ( 2009 ). This yielded 4-AcNH-TEMPO-oxidized LDP-CNCs (TOx-LDP-CNCs). The prepared sodium polyglucuronic acid salt form of the TOx-LDP-CNCs was acidified to pH 1.0 and separated by centrifugation with subsequent water washing and freeze-drying. Pinnick oxidation of the reducing ends of LDP-CNCs was carried out under acidic conditions (pH 5.0) in the presence of one weight equivalent of NaClO 2, to yield Pinnick-oxidized LDP-CNCs (POx-LDP-CNCs). The prepared salt form was acidified to pH 1.0 and further processed, as described above. Periodate-oxidized CNCs (NaIO 4 -CNCs) were prepared from pristine CNCs (prepared from cotton by hydrolysis with H 2 SO 4 followed by desulfation with HCl), as described in Nypelö et al. ( 2018 ). After oxidation with sodium periodate, a film from the oxidized CNCs was cast in a Petri dish, by initial sonication of cellulosic dispersion, casting and air-drying. A mixture of gluconic acid and the corresponding lactone was prepared by evaporation of an aqueous gluconic acid solution (49–53 wt%) in a rotary evaporator. Detailed procedures can be found in the Supporting Information. Cellobionic acid was purchased from Aldox, Dept. of Food Science and Technology, BOKU, Vienna, Austria. All other solvents and chemicals were commercially available from Sigma-Aldrich and VWR, except DMSO-d 6 (Eurisotop) and 4-AcNH-TEMPO (TCI Europe) and used as received, without further purification. More detailed information can be found in the Supporting Information.

Preparation of ionic liquid electrolyte and cellulosic samples

Tetrabutylphosphonium acetate ([P 4444 ][OAc]) was synthesized according to an optimized method by King et al. ( 2018 ). Briefly: Tetrabutylphosphonium chloride ([P 4444 ]Cl) was prepared from tri- n -butylphosphine by reaction with n -butyl chloride in a teflon-lined Parr acid-digestion reactor. [P 4444 ][OAc] was obtained by metathesis reaction of [P 4444 ]Cl with potassium acetate (KOAc) in isopropyl alcohol and purified by precipitation from chloroform to remove residual salts.

A stock solution of [P 4444 ][OAc]:DMSO-d 6 (20:80 wt%) for solution-state NMR analysis of the cellulosic materials was simply prepared by dissolution of crystalline [P 4444 ][OAc] into DMSO-d 6 , in the w/w ratio of 1:4. Direct contact of [P 4444 ][OAc] with air was minimized, to avoid moisture uptake. It may be preferable to use a glove-box or argon flush especially in humid climates. In a typical sample dissolution procedure, 50 mg of dry cellulosic material were introduced into 950 mg of [P 4444 ][OAc]:DMSO-d 6 stock electrolyte, in 4 mL sealed vials, equipped with stirring bars. These were initially stirred at RT to see if the samples dissolved. If not, they were heated at 65 °C under inert atmosphere. Once the solutions were clear and visually isotropic, the samples were transferred into 5 mm NMR tubes (Wilmad-Labglass Co., USA) for analysis.

NMR experiments

Spectra were recorded using a Bruker Avance 600 MHz Avance III or NEO spectrometers. The majority of the experiments were recorded using a SmartProbe™ optimized for X-nucleus detection. For some samples, an inverse triple resonance probe-head ( 1 H/ 19 F, 13 C, 31 P) or a cryogenically-cooled quadruple resonance ( 1 H, 13 C, 31 P, 15 N) probe-head were used.

The key NMR experiments are as follows:

Standard 1 H and 13 C 1D experiments were recorded for all samples. In some cases, instead of simple 1D 13 C experiments, 13 C (refocused) insensitive nuclei enhanced by polarization transfer (INEPT) experiments were recorded. They provided > 2 × improvement in S/N, at the expense of the loss of quaternary signals.

Quantitative 13 C (inverse-gated 1 H-decoupling), was run for the 4-AcNH-TEMPO oxidized LDP-CNC sample, with a repetition delay of 8 s and a 30° pulse (King et al. 2018 ).

Diffusion-edited 1D 1 H experiments were measured for all polymeric samples using a 1D bipolar-pulse pair stimulated echo (BPPSTE) pulse sequence (‘ledbpgp2s1d’ in the Bruker TopSpin 4.0 pulse program library).

Multiplicity-edited HSQC (Willker et al. 1993 ), experiments (‘hsqcedetgp’, or ‘hsqcedetgpsisp2.2’ for increased sensitivity, in the Bruker TopSpin 4.0 pulse program library) were recorded for all samples.

Quantitative Carr-Purcell-Meiboom-Gill (CPMG)-adjusted HSQC (Q-CAHSQC) experiments (Koskela et al. 2005 ) were recorded for the LDP-CNC, TOx-LDP-CNC and MCC samples. The sequence (‘qcahsqc’) was obtained directly from Bruker.

Diffusion and multiplicity-edited HSQCs were measured for low LDP-CNC and TOx-LDP-CNC cellulose samples. 2D HSQC-total correlation spectroscopy (HSQC-TOCSY) (Schleucher et al. 1994 ) experiments (‘hsqcdietgpsisp.2’ in the Bruker TopSpin 4.0 pulse program library), with short (15 ms) and long (120 ms) TOCSY mixing times, were recorded for all required samples to aid in resonance assignment.

Heteronuclear multiple bond correlation (HMBC) (Bax and Summers 1986 ), experiments (‘hmbcgplpndqf’ in the Bruker TopSpin 4.0 pulse program library) were recorded for LDP-CNC and TOx-LDP-CNC cellulose samples.

All NMR measurements were conducted at a sample temperature of 65 °C. Typically, the time-domain size in the indirect 13 C-dimension (f1) for HSQC was 1024 and HMBC was 512, corresponding to 512 (td1/2) and 512 (td1) actual t 1 -increments in the real data, for phase sensitive HSQC sequences and the magnitude mode HMBC sequence, respectively. High digital resolution was used as most samples were quite low molecular weight. Chemical shifts in 1 H and 13 C ppm scales were calibrated against the DMSO-d 6 signals (2.50 ppm for residual 1 H and 39.52 ppm for 13 C). All spectra were processed using Bruker TopSpin 4.0.6 ( https://bruker.com/ ) and/or MestReNova 10.0.2 ( https://mestrelab.com/ ) software. Further 1D data processing was completed using Fityk 1.3.1 (Wojdyr 2010 ) ( https://fityk.nieto.pl/ ). Full NMR experimental and conditions are given in the Supporting Information.

Results and discussion

Cellulose model and methodology choice.

For NMR analysis, the samples were dissolved in the [P 4444 ][OAc]:DMSO-d 6 (1:4 wt%) electrolyte at 5 wt%, at as low temperatures as possible (typically 25–80 °C). For the low DP samples this occurred rapidly at RT. This concentration of cellulosic materials allowed for detection of the low-intensity signals, such as the chain ends in the polymeric samples. All spectra were collected at the elevated temperature of 65 °C, as it offers further improvement in resolution and S/N, due to longer spin–lattice (T 1 ) and spin–spin (T 2 ) relaxation times. It is known that T 2 increases with elevated temperature, due to an approximate inverse-law relationship between T 2 and viscosity (Kim 2008 ). Significant improvements in resolution were also previously observed for the case of MCC dissolved in the homologous methyltrioctylphosphonium acetate ([P 8881 ][OAc]):DMSO-d 6 system (Holding et al. 2016 ).

Literature resonance assignment of the most basic monomeric units of oxidized celluloses are incomplete. Thus, a range of monomers and dimers were studied, specifically in the [P 4444 ][OAc]:DMSO-d 6 (1:4 wt%) electrolyte. Cellulose is also complicated by the fact that there are non-reducing end (NRE) and anomeric reducing-end (RE) units that differ in their chemical shifts from the corresponding species in the polymeric units. Separation of these species, using 2D correlation methods, is not guaranteed for high molecular weight samples. Hence, monomeric and dimeric models are described, in addition to the LDP-CNC sample (DP N-GPC 37). This sample is rather unique in the fact that there are not many sources of low DP cellulose accessible, in large enough quantities, for synthesis and assignment of the products.

The NMR spectra of native celluloses can provide information on the average chain length of the polymer, as the signals of the reducing end and non-reducing end are relatively well separated (King et al. 2018 ; Heise et al. 2019 ; Holding et al. 2016 ). However, the characterization of modified cellulose samples can be complicated, as both the location and substitution pattern may vary along the polymer chain, and the high molecular weight can preclude using more sophisticated NMR techniques. In addition, as the literature data on relevant monomeric units of oxidized cellulose is incomplete, a range of monomeric, dimeric and oligomeric models were chosen or prepared to aid the spectral interpretation of the oxidized samples. These include: glucose, cellobiose, LDP-CNC, glucuronic acid, gluconic acid and cellobionic acid.

Chemical shift assignment of cellulose and modified units

The full assignment for the dimers, polymeric units and terminal units in this study are shown in Fig.  1 . These are in the [P 4444 ][OAc]:DMSO-d 6 (1:4 wt%) electrolyte at 65 °C, referenced against DMSO-d 6 (residual 1 H at 2.5 ppm and 13 C at 39.52 ppm). The following will be a description of how the assignments are made and further aspects of the study.

figure 1

Representative structures, atom numerations and resonance assignments for 1 H and 13 C NMR sets of: a α-anomer of cellobiose; b β-anomer of cellobiose; c LDP-CNC β-anomer of cellulose (DP N 37); d LDP-CNC α-anomer of cellulose (DP N 37); e equilibrium between cellobionic acid (turquoise) and cellobionolactone (purple); f 4-AcNH-TEMPO oxidized LDP-CNCs

The HSQC, 13 C and HSQC-TOCSY spectra for the monomers are shown in Fig. S5-S13; glucose (Fig. S5-S8), gluconic acid, (Fig. S9-S11) and glucuronic acid (Fig. S12-S13). Tabulated chemical shift data, along with previous literature assignments, are shown in Tables S1 and S3: glucose (Table S1) (Roslund et al. 2008 ) and glucuronic acid (Table S3) (Agrawal 1992 ). Our assignments did not change significantly from those of the literature assignments.

Cellobiose (Fig.  1 a, b) consists of α an β anomers giving 24 identifiable correlations in HSQC and resonances the 13 C spectrum (Fig.  2 a, b). The ratio of anomers is 34:66 (α: β) by 1 H NMR (by integration). HSQC-TOCSYs are shown in Fig. S14-S18, in the Supporting Information, allowing for complete assignment. The tabulated chemical shift data is also given in the Supporting Information (Table S4-S5), along with previous literature assignments in D 2 O (Roslund et al. 2008 ) which do not show any major deviations from our data. With cellobiose, the spectra start to become quite a bit more complex than for the monomers, with many overlapping peaks in the C2–C5 region (65–85 ppm in the 13 C domain). HSQC-TOCSY spectra with short mixing times (15 ms, Fig. S14), which provides COSY-like correlations, was most useful for tracing the complete spin-systems for the anomers. In the case of cellobiose, which is not polymeric, we still term the glucopyranose with the hemiacetal anomeric carbon atom as the RE and the one with the glycosidic C1 as the NRE. HSQC-TOCSY with long mixing time (120 ms, Fig. S18), was used to easily visualize the corresponding TOCSY correlations for the separate RE and NRE spin-systems. Full assignments are given in Fig.  1 a, b and Table S4-S5. At 2048 indirect (f1) increments in the HSQC (Fig.  2 a) the resolution starts to approach that of the 13 C spectrum.

figure 2

Cellobiose spectra at 65 °C in [P 4444 ][OAc]/DMSO-d 6 (5 wt%): a Multiplicity-edited 2D HSQC (2048 time-domain data size in f1 corresponding to 1024 t 1 -increments for the real spectrum); b 13 C spectrum. Non-reducing end (NRE) resonances shown in green, reducing end resonances α and β (RE- α and RE-β) shown in red and blue, respectively

In the assignment of cross-peaks, we have tried to be consistent with the color labeling of the assigned correlations: (1) NREs are labeled in green; (2) nonmodified internal AGU correlations are labeled in black; (3) α and β REs are labeled in red and blue, respectively; (4) oxidized internal AGUs, anhydroglucopyranosiduronic acid (AGA) units are labeled in brown; (5) where appropriate, the open (acid) form of the unit was labeled turquoise and closed (lactone) form in purple. This applies to all the figures, except Fig.  8 . Cross-peak coloration of the HSQC spectra, unless grayscale, depends on the multiplicity: for primary (CH) and tertiary (CH 3 ) substituted carbons, cross-peak correlations are shown in green; for secondary (CH 2 ) substituted carbons cross-peak correlations are shown in blue.

LOW-DP cellulose

LDP-CNCs (DP N-GPC 37) consist of chains of β-(1,4)-linked glucopyranose units terminated by RE and NRE groups (Fig.  1 c, d). These are true nanocrystals (Fig. S35-S37), formed by partial depolymerization and recrystallisation of MCC using sc-H 2 O. This ‘residue’ fraction (Buffiere et al. 2016 ) comprised of cellulose crystallite fragments consisting of both cellulose I and cellulose II allomorphs. This is illustrated by the microscopy combined with wide-angle X-ray scattering (WAXS) analysis of the material (Fig. S37), showing both phases of cellulose I and cellulose II. Some of the main distinguishable diffraction planes corresponding to cellulose I Miller indices (French 2014 ) are clearly visible at 15.6° for \((1{\overline 1} 0)\) and (110), and 22.3° for (200). For cellulose II there are also distinguishable peaks at 12.3° for the \((1{\overline 1} 0)\) and 20.1° for the (110) Miller indices (French 2014 ). Gel-permeation chromatography (GPC) showed a higher molecular weight residue, originating from MCC but the majority of material was as a lower molecular weight fraction, with a peak-maximum at a DP of ~ 32 and overall DP N-GPC of 37 (Fig. S38).

The CH-1 region in the HSQC spectrum (Fig.  3 a) was characterized by four clear signals and the signal with highest intensity was assigned as anhydroglucose unit (AGU)-1 that belongs to the bulk polymeric CH-1 (δ H  = 4.40 ppm (d); δ C  = 102.38 ppm), while the remaining signals correspond to NRE-1, RE-α-1 and RE-β-1. This region is characteristic of (hemi)acetals and such close grouping is caused by the rigid conformation adopted by the sugar unit, with the α-anomer showing a characteristic down-field shift (to > 4.5 ppm) in the 1 H domain and up-field shift in the 13 C domain. Detailed assignment of the remaining HSQC correlations was then completed using HSQC-TOCSY (Fig. S20-S22) and HMBC to separate the CH-6 position correlations (Fig. S19). The ratio of α to β, of the 1 H spectrum is 38:62, by deconvolution (Supporting Information Fig. S3), using ‘ Fityk ’ (Wojdyr 2010 ). The same method yielded a DPN- 1 H of 15. While there is clearly error in this calculation, we favor the lower DP value, given by the NMR, as overestimation by GPC was also demonstrated in a previous publication (Heise et al. 2019 ), comparing ‘CCOA labelling’ (Röhrling et al. 2002 ) and 2 separate GPC systems; one calibrated using pullulan standards and one using multi-angle light scattering (MALLS) detection.

figure 3

LDP-CNCs spectra at 65 °C in [P 4444 ][OAc]:DMSO-d 6 (5 wt%): a Multiplicity-edited 2D HSQC (512 time-domain data size in f1, corresponding to 256 actual t 1 -increments); b refocused 13 C INEPT. Non-reducing end (NRE) resonances shown in green, reducing end resonances α and β (RE-α and RE-β) shown in red and blue, respectively, internal (middle chain) anhydrous glucose unit resonances (AGU) shown in black

Throughout each of the experiments, the number of t 1 -increments for the real spectrum (td1/2 for phase-sensitive HSQC) can be changed to improve resolution to the required level, to allow for separation of each signal. This is rather straightforward for low molecular weight compounds, such as glucose, cellobiose and to a lesser extent the LDP-CNCs. In this regard, the resolution in the indirectly detected 13 C-dimension in HSQC can start to approach that of the 13 C spectra, providing T 2 values are long enough to benefit from the further sampling. However, as molecular weights increase, the potential gain in resolution can often not be worth the additional collection times, with collection time proportional to the number of f1 increments. In addition, with shorter T 2 values the signals decay quickly and increased sampling will simply result in increased noise, with minimal increase in spectral resolution. Therefore, there is a trade-off between number of scans and number of increments, as molecular weight increases. To assess the resolution gain for a typical cellulose model, MCC (DP N-GPC 153), we measured the full-width at half maximum (FWHM) values (here in ppm units) in the 13 C dimension from HSQCs, gathered for different increment values (Fig. S2). The graph shows an inverse power function relating the resolution to the number of f1 increments. The optimum resolution, with little further trade-off in resolution vs collection time, can be achieved using 1024 increments, for the utilized spectral width of 24,883 Hz in the 13 C-dimension (corresponding to 24.3 Hz/pt digital resolution of the data in the f1-dimension). However, for most of the cases where good enough resolution is required for assignment of main peaks, 256 t 1 -increments is sufficient and 512 still gives a reasonable improvement. This can be reduced further by using a smaller spectral width, as 24,883 Hz (165 ppm) is already rather wide, encompassing much more of the 13 C ppm domain that is necessary for unmodified cellulose. As molecular weight of the cellulose sample increases, the improvement in resolution with application of increasing increments is less apparent due to restricted motion, resulting in faster relaxation. However, 256–512 increments (i.e. 512–1024 time-domain size in f1) are perfectly reasonable values to achieve good S/N in an overnight run for assignment of NRE and RE signals for DP N values of up to ~ 200. It should also be considered that as the molecular weight increases, the relative abundance of NRE and RE resonances also decreases.

Nitroxyl-radical oxidized cellulose

Oxidation of cellulose with nitroxyl-radicals, such as TEMPO or AcNH-TEMPO, supposedly yield selective oxidation at the surface primary 6-hydroxyls (vs secondary 2- or 3-hydroxyls) to carboxylates but should also be capable of oxidizing the RE (hemi)acetal/aldehyde. Thus, the monomer unit in oxidized cellulose should be glucuronic acid (assuming each monomer is oxidized). If the terminal RE unit is oxidized, at the RE-1 position, gluconic acid should be the oxidized unit. Gluconic acid is available commercially as the sodium salt or as a solution in water, where it exists in equilibrium with the cyclic ester (lactone) form, dependent on water content. Indeed, drying mixtures of gluconic acid, even under ambient conditions, will induce lactonization (with loss of H 2 O) to the δ-gluconolactone (Fig. S9). In this study, we dried a 49–53 wt% solution of gluconic acid in water using a rotary evaporator, at RT. The product was dissolved into the electrolyte and an HSQC spectrum was recorded (Fig. S9a). The HSQC spectrum clearly shows two separate spin-systems, i.e. open-chain and lactone forms. Both were assigned using 2D HSQC-TOCSY and compared with the HSQC spectrum for pure δ-gluconolactone (Fig. S10). Spectra of glucuronic acid (α- and β-anomers) were also taken for reference and can be found in the Supporting Information (Fig. S12-S13).

A water slurry of LDP-CNCs (15 wt%) was oxidized under mild acidic conditions (pH 5.8) with AcNH-TEMPO, in the presence of the NaClO/NaClO 2 oxidant system (see Supporting Information). Pinnick oxidation conditions were chosen to ensure complete conversion of aldehyde species to carboxylates. Water soluble (high degree of oxidation (DO)) and a water insoluble (low DO) samples were recovered and separated by centrifugation. However, as the highly oxidized sodium carboxylate samples do not dissolve in the electrolyte, we were forced to acidify (Fujisawa et al. 2010 ) the fractions, for further NMR analysis. The principal structure of polyglucuronic acid is represented in Fig.  1 f, with the AGA unit as the oxidized polymeric unit.

While the insoluble fraction expectedly consisted of minimally oxidized cellulose, the soluble TOx-LDP-CNC fraction had clearly identifiable correlations in the HSQC not corresponding to polymeric glucose resonances (Fig.  4 ). As with previous samples, the resonances for the AGA units were assigned using HSQC-TOCSY (Fig. S23-S26) but also using HMBC (Fig. S27), to further illustrate the linkage of the carboxylates C-6 to the H-5 position. The assignments for the AGA units were fully consistent with those of 13 C assignments for polyglucuronic acid in D 2 O (Table S7 of the Supporting Information), from TEMPO oxidation of cellulose (Tahiri and Vignon 2000; Isogai et al. 2011 ). The RE and NRE peaks corresponding to glucose terminated chains are also assignable. One might assume that the NRE C6-OHs should be more accessible to oxidation than any other C6-OH. However, they are clearly present with the NRE more abundant than the RE signals, requiring scaling of the spectra close to the background to visualize the RE signals (Fig.  4 -inset).

figure 4

Spectra of TOx-LDP-CNCs at 65 °C in [P 4444 ][OAc]:DMSO-d 6 (5 wt%): a Multiplicity-edited 2D HSQC (1024 time-domain data size in f1, corresponding to 512 actual t 1 -increments); b refocused 13 C INEPT. Non-reducing end (NRE) of cellooligomeric resonances shown in green, anhydroglucopyranosiduronic acid (AGA) unit resonances are shown in brown, internal (middle chain) non-oxidized anhydroglucose unit resonances (AGU) shown in black

Cellobionic acid

Both the common nitroxyl-radical and Pinnick oxidation conditions should lead to oxidation of the reducing ends to carboxylates (Fig.  5 ). The Pinnick (acidic chlorite) oxidation at the reducing ends of CNCs is commonly used as the first step in reducing end functionalization, typically via amide formation and leading to nano-structures with self-assembly potential (Villares et al. 2018 ; Lin et al. 2019 ). To aid in the assignment of the terminal units in the oxidized products, we obtained a commercial sample of cellobionic acid and analyzed it in the [P 4444 ][OAc]:DMSO-d 6 electrolyte (Fig.  5 ). A doubling of the peaks was observed, consistent with partial conversion to the lactone form. The sample was also observed to be somewhat unstable at 65 °C, presumably decomposing by oligomerization. Therefore, the four spin-systems were assigned at 27 °C using HSQC-TOCSY (Fig. S29) and the two spin-systems corresponding to the acid form were identified by adding a drop of water into the NMR tube, allowing for almost complete conversion of the lactone form to the acid form (Fig. S31). The final assignments for the mixture of compounds at 65 °C showed little deviation from the sample at 27 °C. Thus, suitable model assignments for the oxidized reducing ends were afforded by the open-chain acid and closed-chain lactone spin-systems (Fig.  6 ).

figure 5

Scheme and conditions for oxidation of cellulose under acidic nitroxyl-radical (e.g. TEMPO or AcNH-TEMPO) or Pinnick (NaClO 2 ) oxidation conditions

figure 6

Cellobionic acid spectra at 65 °C in [P 4444 ][OAc]:DMSO-d 6 (5 wt%): a Multiplicity-edited 2D HSQC (512 time-domain data size in f1, corresponding to 256 actual t 1 -increments); b refocused 13 C INEPT. Glucopyranose unit resonances (black), open-chain acid unit (turquoise) and lactone unit (purple). ‘A’ and ‘L’ subscripts refer to ‘acid’ and ‘lactone’ forms

Reducing end oxidation to carboxylate

Under nitroxyl-radical oxidation conditions (Hirota et al. 2009 ) we would have expected that the RE-1 position would have also been completely oxidized to carboxylates. However, expansion of the acetal region in the HSQC of the TOx-LDP-CNCs and increase in intensity shows presence of residual anomeric CH-1 resonances (Fig.  4 a, inset). Clearly, complete oxidation of the reducing ends has not occurred. In addition to the cellobionic acid model compound, a further oxidation of the LDP-CNCs was performed under Pinnick oxidation conditions to allow for a more complete conversion of the reducing ends to gluconate moieties (or to the corresponding lactone). The HSQC spectrum for the oxidized POx-LDP-CNC product (Fig.  7 a), where the spectral scale was increased to emphasize the baseline signals, shows signals corresponding to the open-chain acid spin-system, almost identical to that of cellobionic acid but signals corresponding to the closed-chain lactone form are absent. Reducing end signals are also present, again indicating incomplete oxidation. Similarly, if the nitroxyl-radical oxidized sample is also scaled to a similar scale, emphasizing baseline signals, the same peaks corresponding to the open-chain acid form are present (Fig.  7 b). HSQC-TOCSY spectra of both these samples (Fig S29-S30) also allow for tracing of the spin-system, consistent with the cellobionic acid model (Fig S28). Therefore, there are now unequivocal solution-state NMR assignments for this functional moiety, which can be used for further understanding/optimization of associated chemistry.

figure 7

Scaled (to emphasize baseline signals) multiplicity-edited HSQC spectra, at 65 °C in [P 4444 ][OAc]:DMSO-d 6 (5 wt%), showing presence of terminal gluconic acid moieties for: a) POx-LDP-CNCs (1024 time-domain data size in f1, corresponding to 512 actual t 1 -increments) and b) TOx-LDP-CNCs (1024 time-domain data size in f1, corresponding to 512 actual t 1 -increments)

figure 8

Multiplicity-edited HSQC spectrum at 65 °C in [P 4444 ][OAc]:DMSO-d 6 (5 wt%) (1024 time-domain data size in f1, corresponding to 512 actual t 1 -increments) of NaIO 4 -CNCs

Assignment and stability of periodate-oxidized CNCs

Finally, a typical procedure for periodate oxidation (NaIO 4 ) of cellulose nanocrystals (CNCs) was performed (Nypelö et al. 2018 ). The resulting NaIO 4 –CNCs were then dissolved in the electrolyte and analyzed. After analysis of the samples (overnight at 65 °C) the sample was brown, whereas other cellulose samples (nitroxyl-radical-oxidized and unoxidized) did not colorize significantly at all. This may indicate some kind of degradation or possibly iodine formation, from (per)iodate residues. The HSQC spectrum showed a forest of peaks (Fig.  8 ), clustered mainly around the CH2-5 region and from the high resolution of many of the signals in the 1 H spectrum (Fig.  8 , top 1 H trace) it is quite clear that significant depolymerization had occurred. However, when the diffusion-edited 1 H spectrum (diffusion-editing filters out the slow-diffusing species) was collected (Fig.  8 , bottom 1 H trace), broad signals corresponding to polymeric cellulose resonances were apparent. Based on our previous assignments for glucose (Fig. S5), cellobiose (Fig.  2 ) and cellulose (Fig.  3 ), almost all the correlations in the HSQC could be assigned, with only a few signals remaining unassigned. This indicated that a significant proportion of the cellulose was fragmented into glucose, cellobiose and likely other oligomers. Yet, polymeric cellulose also remained.

Hosoya et al. ( 2018 ) recently demonstrated that oxidation of cellulose at position 6 to carboxylate does not seem to introduce instability to cellulose, based on experimental kinetics and transition-state modelling. However, oxidation at positions 2 & 3 to ketones, and position 6 to aldehyde, does seem to introduce significant instability to cellulose. It is proposed that under alkaline conditions, β-elimination occurs leading to fragmentation of the sugar units. As the [P 4444 ][OAc]:DMSO-d 6 NMR electrolyte is rather a basic media, mainly attributable to the acetate anion and absence of any protic solvating species, it is apparent that the position 2 & 3 aldehydes that are formed during periodate oxidation also introduce significant instability to the cellulose polymer. Therefore, a mechanism can be proposed (Fig.  9 ) which accounts for the current NMR observations: periodate oxidation proceeds by oxidizing different points along cellulose chains, on the surface of the CNCs. After dissolution into the basic electrolyte, fragmentation at these oxidation sites occurs liberating the oligomeric, dimeric and monomeric sugars which linked the oxidation points on the surface chains. These are clearly resolvable using HSQC. Likewise, the untouched polymeric chains at the core of the CNCs are also resolvable and their presence is clearly illustrated through the diffusion-edited 1 H spectrum, which filters out all low molecular-weight monomeric, dimeric and oligomeric species. This proposed mechanism is also consistent with previous mechanistic studies demonstrating that periodate oxidation on cellulose proceeds heterogeneously, by formation of oxidized domains on the crystallite surfaces (Kim et al. 2000 ). The ‘unknown’ low molecular weight residues, that remain unassigned in the HSQC (Fig.  8 ), may correspond to fragments not attached to the polymeric units, resulting from C2–C3 bond cleavage. Closer examination of the diffusion-edited 1 H and HSQC spectra (Fig. S34) reveals some more complexity in the (hemi)acetal region, which may result from acetal formation with these fragments.

figure 9

Proposed mechanism for the observation of cellulose, cellooligomers, cellobiose and glucose in the HSQC spectrum of NaIO 4 -CNCs, with the [P 4444 ][OAc]:DMSO-d 6 inducing β-elimination in oxidized units (Hosoya et al. 2018 ). ‘[O]’ refers to ‘oxidation’ and the red dotted segments on the schematic surface refer to the oxidation sites, which are then cleaved upon dissolution into the basic [P 4444 ][OAc]:DMSO-d 6 electrolyte

While it seems that periodate oxidation introduces instability through β-elimination under basic conditions, this method seems to allow for assessment of that stability and may offer a further method for validating the reported increase in stability of further modification schemes, e.g., through borohydride reduction of periodate-oxidized cellulose (Potthast et al. 2009 ).

Quantitation using HSQC

One drawback of solution-state 13 C NMR analysis is the low abundance of 13 C-nucleus leading to low sensitivity. Thus, high numbers of repetitions are required in order to obtain decent S/N ratios, for adequate quantitation accuracy. This is exacerbated by the requirement for longer relaxation delays. However, there is an increasing trend of deconvolution of 1 H spectra of polymers, as S/N is much better than for 13 C. Of course, not all 1 H resonance signals are easily identified and separated by deconvolution, due to the lower resolution of 1 H. Baseline correction can also be problematic and if one wishes to quantify the RE and NRE signals using this method, the errors very rapidly become large at a DP N of > 100 (Holding et al. 2016 ). 2D HSQC provides vastly improved resolution of species over 1D experiments and significantly improved S/N over 1D 13 C-data, as it is a 1 H-detected experiment. In terms of analysis of celluloses and oxidized celluloses, separation of the main polymeric-1 cross-peak from the (nitroxyl-radical) oxidized polymeric-1 cross-peak and from the α-RE, and β-RE, cross-peaks is now very good using the [P 4444 ][OAc]:DMSO-d 6 electrolyte. This potentially allows for a rapid and reliable method for data extraction; requiring only phasing, baseline correction and 2D correlation peak integration of the raw data. NRE signals have volume overlapping with the main polymeric-1 correlation so this is not so clearly separable, except based on the assumption that there is the same amount of NRE species as there are RE species. The geminal -6 signals are also well separated from the rest of the cellulose backbone signals. Separation of other signals is possible but the error starts to increase the closer they are to each other, due to peak volume overlap. Samples with wider ranges of functionalities, such as lignocellulosic biomass samples or those that have resonances downfield from the cellulose polymeric-1 acetal correlations, are also easily separated.

The major drawbacks with 2D HSQC and polymeric samples are four-fold: 1) differences in 1  J( 13 C– 1 H) values for different 13 C– 1 H pairs cause variations in intensities of those correlations. Typical HSQC experiments assume an average 1  J( 13 C– 1 H) value (typically 145 Hz) for the experiments, represented in a specific INEPT polarization transfer delay. 2) Coherence transfer periods, where sufficient time is given for 1 H magnetization to evolve, cause intensity variation of the correlation peaks as each resonance has different T 2 values, i.e., mainly during the INEPT delay periods, more or less signal is lost for different resonances, prior to acquisition. 3) Correlation-peak distortions arising from evolution of homonuclear J( 1 H– 1 H) coupling during the INEPT steps can cause errors in integration. 4) Non-linear excitation bandwidth leads to variation of cross-peak signal intensity, especially across the 13 C frequency range at high field strengths. Several quantitative HSQC sequences or processing strategies have been developed that attempt to correct for these issues. Variations in 1  J( 13 C– 1 H) values have been corrected for by applying INEPT-delay modulation in the first ‘quantitative HSQC’ (Q-HSQC) experiment (Heikkinen et al. 2003 ). This corresponds to the application of a spread of INEPT delays covering the typical 1  J( 13 C– 1 H) coupling value range expected in organic materials. Signal losses due to variations in T 2 values as well as in 1  J( 13 C– 1 H) values have been accounted for in the ‘time-zero HSQC’ (HSQC 0 ) experiment (Hu et al. 2011 ), which records a loop of an increasing train of HSQC sequences (HSQC X , X = 1–3), prior to actual acquisition. This has the effect of increasing coupling and relaxation effects for each loop, which can be extrapolated back to ‘time-zero’, where potentially all effects are removed. Obviously for HSQC 0 , T 2 values have to allow recording of HSQC 2 and HSQC 3 data sets with intensity allowing reliable extrapolation. The ‘quick-quantitative HSQC’ (QQ-HSQC) experiment is a rather elegant method for potentially reducing the collection times by a factor of 4 (Peterson and Loening 2007 ). This encodes the INEPT-modulation into different vertical slices in the sample but only represents a possible doubling of S/N for the same collection times. However, this is only really applicable for small, slow-relaxing, molecules due to relaxation effects. The ‘quantitative CPMG-adjusted HSQC’ (Q-CAHSQC) experiment (Koskela et al. 2005 ) applies CPMG-INEPT steps to avoid cross-peak distortions, due to J( 1 H– 1 H) coupling evolution. The ‘quantitative, offset-compensated, CPMG-adjusted HSQC’ (Q-OCCAHSQC) experiment (Koskela et al. 2010 ) applies novel broad-band pulses to reduce 13 C offset errors that are most prevalent on ultrahigh-field instruments, e.g. 1000 MHz, over wider frequency ranges (> 150 ppm). This is not really a concern for cellulose samples on a 600 MHz spectrometer, where the 13 C domain is rather narrow (< 50 ppm) but may start to become an issue for lignocellulose samples at ultrahigh field. The final sequence of interest is the ‘quantitative, equal carbon HSQC’ (QEC-HSQC) experiment (Mäkelä et al. 2016 ) where a refocusing period, after the first INEPT step, is used to discard the excess CH 2 and CH 3 magnetization. This yields the same signal intensity for each protonated carbon in the sample. This could potentially be of value in the analysis of lignocellulose samples but is not really necessary at this point for systems where the well separated resonances (and their multiplicities) are already reasonably well assigned. An overview of the different HSQC experiments is given in Table 1 .

For our purposes, the Q-CAHSQC sequence seems to be most suitable, if we can correct the processed integral data for T 2 losses. Issues with 13 C-offset, variation in 1  J( 13 C– 1 H) and relaxation can be accounted for mathematically, to some degree, in post-processing (Zhang and Gellerstedt 2007 ). The most concerning issue for cellulose is the very fast T 2 relaxation, that the HSQC 0 sequence accounts for, but the other approaches do not. If the T 2 values are known, it is possible to adjust for signal losses by application of Eq.  1 , after integration of the 2D HSQC spectra; where ∆ is the delay period in which T 2 losses occur, V is the measured correlation peak volume and V 0 is the theoretical correlation peak volume, with no losses due to relaxation:

In order to apply this correction, T 2 values for the resonances of interest must be measured. Zhang and Gellerstedt ( 2007 ) have shown that the 2D HSQC-CPMG sequence for determination of T 2 values of cellulose triacetate, gives inaccurate T 2 values. However, this is a chicken and egg scenario; losses due to the HSQC portion of the sequence obviously contribute to the inaccuracies, which becomes more of an issue where there is very high molecular weight material in the sample due to disproportionate loss of signal from those resonances. Therefore, the lower resolution 1D CPMG is the only real option for determining more representative ‘average’ T 2 values, for such samples. Nevertheless, loss of higher molecular weight signal during the INEPT delays is always going to be a problem. In terms of quantitation of chemical species; if the system is not complex, as is typically in cellulose samples, T 2 values can be relatively easily obtained. As such we determined T 1 and T 2 values for the LDP-CNC, MCC and TOx-LDP-CNC samples (Table 2 ).

After summing the appropriate delay times leading to T 2 losses, a ∆ value of 13.9 ms was calculated for the Q-CAHSQC sequence, which is in a very similar range to some of the MCC T 2 values. Therefore, considerable signal loss is expected and needs to be corrected for. Equation 1 was applied in the correction of integral values. The main cross-peaks of interest were the polymeric CH-1 (AGU-1), NRE-1, α-RE-1, β-RE-1, oxidized polymeric CH-1 (AGA-1), AGU- gem -6 and oxidized RE-6 position (Ox-RE- gem -6). The MCC sample also clearly contained a little xylan, so the xylan 1 (AXU-1) and geminal- 5 (AXU- gem -5) positions were also integrated in the MCC sample. The corrected results are given in Table 3 and were processed further to yield a few additional parameters: (1) the ratio of AGU-1 to AGU- gem -6 (AGU 1/6 ), (2) the DP N from HSQC (DP N-HSQC ). (3) The % values of α-RE-1 by HSQC (% α-HSQC ) and β-RE-1 by HSQC (% β-HSQC ), (4) The % values for AXU based on AXU-1 (% AXU-1 ) and AXU- gem -5 (% AXU-5 ). 5) The % values for oxidation of AGU to AGA (% AGA ) and RE (% RE-Ox ). In addition to the processed HSQC data, we have data from deconvolution of the 1 H spectra, for comparison: 1) the DP N from 1 H (DP \(_{\text{N-}{^1}\text{H}}\) ), 2) The % values of α-RE-1 by 1 H (%α- 1 H) and β-RE-1 by 1 H (%β - 1 H).

Clearly the T 2 correction has a big impact on the integral values, especially for those with short T 2 values, i.e. for the bulk polymeric AGU-1 and AGU- gem -6. As these positions are most likely to be used for quantitation, e.g. of DS values, it is clearly critical to do the T 2 correction. To compare how effective the quantitation is, the parameter AGU 1/6 shows how accurate integration of AGU-1 and AGU- gem -6 is, with the optimum value of 1. For the LDP-CNC sample, the value improves significantly after T 2 correction. For the MCC experiments, the lower-resolution but higher S/N experiments (ns = 40, td1 = 128) gave a value of 1.00 and 0.96, for the room temperature probe-head and He-cooled cryoprobe-head, respectively. This indicates that making all efforts to maximize S/N is critical for quantitation, even at the expense of resolution. The TOx-LDP-CNC sample also gave a significant improvement in AGU 1/6 , from 0.57 to 0.95. DP N-HSQC values also changed significantly and the corrected values were more or less consistent with the DP \(_{\text{N-}{^1}\text{H}}\) values. There is a noticeable difference between the DP N-GPC and those obtained from NMR. More accurate studies validating the use of NMR against both labelling and GPC studies are needed. Practically, the HSQC method is still limited in what samples can be studied for DP N-HSQC determination, as better S/N will be required with increases in molecular weight. For the same reason that HSQC-CPMG is not suitable for T 2 determination, for cellulose samples, HSQC on such samples is going to eliminate a significant proportion of the faster relaxing high molecular weight material, artificially decreasing the DP N-HSQC values somewhat. The ratios of α-RE-1 and β-RE-1 were relatively consistent between the corrected and uncorrected values, for HSQC and 1 H deconvolution. However, clearly the lower abundance of RE resonances for MCC causes significant error, although, this situation can be improved using a cryoprobe-head and possibly linear prediction. AXU contents for the two MCC experiments were relatively consistent, based on AXU- gem -5 integration. However, the higher resolution experiment gave more consistent values, based on integration of AXU-1 and AXU- gem -5, as the separation of these peaks from the cellulose resonances was much better in the higher resolution case. The degrees of oxidation for TOx-LDP-CNC, % AGA and % RE-Ox , were also relatively consistent.

Overall, the CH1 peaks for the low molecular weight LDP-CNC and TOx-LDP-CNC samples are easily separable with 512 f1 increments (td1), or perhaps even less (of course depending on field strength), due to their slower T 2 relaxation (Fig.  10 a, b). With the higher molecular weight MCC, separation of the CH1 resonances is definitely improved with the higher number of increments (Fig.  10 c, d). While there is sufficient separation of the RE-1 and AGU-1 signals, so that f1 resolution can be lowered further (to allow for increased collection times), poor S/N is still an issue for the RE-1 signal in both spectra (Fig.  10 c, d). This situation is improved somewhat with the use of the cryoprobe-head, where S/N is approximately doubled (Table 3 ). However, if quantitation of DS values is all that is required, lower resolutions are acceptable to reduce collection times to a few hr. If resolution eventually does become an issue in quantitation of the DS of some substituent, for higher molecular weight samples, then ball milling will likely have to be applied to reduce molecular weights (Ling et al. 2019 ), preventing disproportionate T 2 losses. However, this requires future work with well-defined samples over wide molecular weight ranges, both polydisperse and non-polydisperse.

figure 10

2D Q-CAHSQC (quantitative HSQC) 3D projections of the CH1 region for: a LDP-CNCs, b TOx-LDP-CNCs, c MCC (ns = 8, td1 = 640) with a room temperature probe-head, d MCC (ns = 8, td1 = 640) with a cryoprobe-head, e MCC (ns = 40, td1 = 128) with a room temperature probe-head, and f MCC (ns = 40, td1 = 128) with a cryoprobe-head. F1 is the 13 C dimension and F2 is the 1 H dimension. No forward linear prediction was used to improve resolution

Further applicability

This method is ideally suited to the analysis of nanocelluloses, due to the relatively low molecular weight that these samples show, in particular CNCs. However, higher molecular weight samples are also possible, which makes this method of significant wider value for following cellulose surface chemistry, where crystallinity is maintained. Indeed, it has been possible to dissolve and collect an HSQC spectrum for even bacterial nanocellulose, in a related solvent system (Holding et al. 2016 ). The main limitation here is the faster signal relaxation, which would have a significant effect on any HSQC quantitation, relative to the cellulose backbone signals. However, quantitation through 1D spectra, with the aid of signal deconvolution, would not be affected. Thus, a combination of 1D and 2D methods can be applied, optionally using the [P 4444 ] + signals themselves as internal standard.

Several solvent systems are known for analysis of whole biomass samples (Foston et al. 2016 ). Mansfield et al. ( 2012 ) have recommended the use of routine HSQC experiments for quantifying biopolymer species, in whole biomass samples. Their protocol demonstrates the swelling of planetary-milled wood samples in DMSO-d 6 /pyridine-d 5 (4:1) to yield ‘gelled’ samples, yet with a solvent mixture that is unable to directly dissolve cellulose. This method was said to yield similar quantitation results to those samples which are fully processed into the solution-state, by peracetylation. However, wood samples are a difficult case, not only due to the insolubility of cellulose in common molecular solvents but also due to their recalcitrant nature, in general (Deb et al. 2016 ; Kyllönen et al. 2013 ; Kilpeläinen et al. 2007 ). Thus, there is still some way to go to establish quantitative conditions for whole biomass samples, even with non-derivatizing direct-dissolution cellulose solvents. The cellulose portion of these materials is always the most troublesome as it is such a rigid polymer, which suffers from fast T 2 relaxation. However, if extensive milling is applied to allow for full solubilization and low enough molecular weight (further increasing T 2 values), more accurate quantitation may be close. In this context, suitable stable direct-dissolution solvents for solution-state NMR analysis of whole biomass samples has been something that has also been lacking. Cheng et al. ( 2013 ) have already demonstrated this principal, by complete dissolution and analysis of ball-milled Miscanthus into a mixture of the cellulose-dissolving ionic liquid 1-ethyl-3-methylimidazolium acetate ([emim][OAc]) in DMSO-d 6 . Perdeuterated [emim][OAc] ([emim][OAc]-d 14 ):DMSO-d 6 was then prepared and used for the application of a quantitative HSQC experiment on fully dissolved solutions. However, as mentioned previously, [emim][OAc] is known to react with cellulose (Liebert and Heinze 2008 ; Ebner et al. 2008 , Clough et al. 2015 ). It also has significant signal overlap with the polysaccharide spectral region and [emim][OAc]-d 14 is much too expensive and laborious to prepare, for routine analyses. Nevertheless, the [emim][OAc]:DMSO-d 6 solutions were shown to be stable over a longer period (2 weeks), whereas the molecular solvent dispersions showed phase-separation. This is a good indication that the current solvent-system may open the window to a much wider range of samples.

Conclusions

The chemical shifts of polymeric units in cellulose, including NRE and RE units can be unambiguously assigned using solution-state NMR in a novel ionic liquid electrolyte, [P 4444 ][OAc]:DMSO-d 6 . The main monomeric units in 4-AcNH-TEMPO oxidized cellulose (polyglucuronic acid) are also assigned, as are the terminal units for the unoxidized and oxidized materials. The latter has led to identification of the terminal open-chain gluconate moiety after both the used acidic 4-AcNH-TEMPO protocol and Pinnick oxidation conditions. However, in both instances RE groups remained in the oxidized products, indicating a further need for optimization of this reaction for different substrates or more defined structural characterization of substrates for surface oxidation sites, that may undergo β-elimination, yielding new reducing ends. Periodate oxidation of cellulose clearly introduces instability into cellulose, when the dry oxidation product was introduced to the basic electrolyte media. This degradation mechanism is thought to be similar to the β-elimination mechanisms, illustrated in previous publications, for aqueous alkaline media but obviously requires further study to elucidate the mechanism of degradation. Further investigations into how to stabilize the periodate oxidation products towards basic degradation, by further chemical modifications, are needed as periodate oxidation of cellulose is a widely utilized technique. However, NMR analysis in the electrolyte medium seems to be a useful probe into the stability of these compounds, in addition to providing the necessary chemical species resolution that other techniques cannot. Of course, this is also a direct method to follow the progress of oxidation reactions. Nitroxyl-radical-type oxidations (to 6 position carboxylates), under mild acidic conditions, seems to be quite robust, in terms of resulting product stability in the electrolyte and under aqueous alkaline conditions. Thus, avoiding aldehyde formation under alkaline oxidation conditions is clearly important in improving the quality of the oxidized products, by preventing losses and molecular weight reduction due to fragmentation of surface chains. Q-CAHSQC, with T 2 correction, seems to be a suitable experiment and processing combination to yield quantitative data from HSQC, without calibration against internal standards. While this is still not suitable for accurate determination of DP N for higher molecular weight and low polydispersity samples, accurate DS and regioselectivity determination will be possible for certain chemical modifications, even at reducing ends in lower molecular weight samples, such as model CNCs. However, it should be stressed that this solvent system and processing strategy are not only applicable to nanocelluloses but offer the chance to significantly improve our opportunities for quantitative analysis of whole biomass samples, that contain a significant crystalline cellulose phase composition.

Agrawal PK (1992) NMR spectroscopy in the structural elucidation of oligosaccharides and glycosides. Phytochemistry 31:3307–3330. https://doi.org/10.1016/0031-9422(92)83678-r

Article   CAS   PubMed   Google Scholar  

Bax A, Summers MF (1986) Proton and carbon-13 assignments from sensitivity-enhanced detection of heteronuclear multiple-bond connectivity by 2D multiple quantum NMR. J Am Chem Soc 108:2093–2094. https://doi.org/10.1021/ja00268a061

Article   CAS   Google Scholar  

Buffiere J, Ahvenainen P, Borrega M et al (2016) Supercritical water hydrolysis: a green pathway for producing low-molecular-weight cellulose. Green Chem 18:6516–6525. https://doi.org/10.1039/C6GC02544G

Cheng K, Sorek H, Zimmermann H, Wemmer DE, Pauly M (2013) Solution-state 2D NMR spectroscopy of plant cell walls enabled by a dimethylsulfoxide- d 6 /1-ethyl-3-methylimidazolium acetate solvent. Anal Chem 85:3213–3221. https://doi.org/10.1021/ac303529v

Article   PubMed   CAS   Google Scholar  

Clough MT, Geyer K, Hunt PA, Son S, Vagt U, Welton T (2015) Ionic liquids: not always innocent solvents for cellulose. Green Chem 17:231–243. https://doi.org/10.1039/C4GC01955E

Deb S, Labafzadeh SR, Liimatainen U et al (2016) Application of mild autohydrolysis to facilitate the dissolution of wood chips in direct-dissolution solvents. Green Chem 18:3286–3294. https://doi.org/10.1039/C6GC00183A

Ebner G, Schiehser S, Potthast A, Rosenau T (2008) Side reaction of cellulose with common 1-alkyl-3-methylimidazolium-based ionic liquids. Tetrahedron Lett 49:7322–7324. https://doi.org/10.1016/j.tetlet.2008.10.052

Fosten M, Samuel R, He J, Ragauskas AJ (2016) A review of whole cell wall NMR by the direct-dissolution of biomass. Green Chem 18:608–621. https://doi.org/10.1039/c5gc02828k

Foster EJ, Moon RJ, Agarwal UP, Bortner MJ, Bras J, Camarero-Espinosa S, Chan KJ, Clift MJD, Cranston ED, Eichhorn SJ, Fox DM, Hamad WY, Heux L, Jean B, Korey M, Nieh W, Ong KJ, Reid MS, Renneckar S, Roberts R, Shatkin JA, Simonsen J, Stinson-Bagby K, Wanasekara N, Youngblood J (2018) Current characterization methods for cellulose nanomaterials. Chem Soc Rev 47:2609–2679. https://doi.org/10.1039/C6CS00895J

French AD (2014) Idealized powder diffraction patterns for cellulose polymorphs. Cellulose 21:885–896. https://doi.org/10.1007/s10570-013-0030-4

Fujisawa S, Isogai T, Isogai A (2010) Temperature and pH stability of cellouronic acid. Cellulose 17:607–615. https://doi.org/10.1007/s10570-010-9407-9

Habibi Y, Lucia LA, Rojas OJ (2010) Cellulose nanocrystals: chemistry, self-assembly, and applications. Chem Rev 110:3479–3500. https://doi.org/10.1021/cr900339w

Heikkinen S, Toikka MM, Karhunen PT, Kilpeläinen I (2003) Quantitative 2D HSQC (Q-HSQC) via suppression of J -dependence of polarization transfer in NMR spectroscopy: application to wood lignin. J Am Chem Soc 125:4362–4367. https://doi.org/10.1021/ja029035k

Heinze T, Dicke R, Koschella A, Henning Kull A, Klohr E-A, Koch W (2000) Effective preparation of cellulose derivatives in a new simple cellulose solvent. Macromol Chem Phys 201:627–631. https://doi.org/10.1002/(SICI)1521-3935(20000301)201:6<627:AID-MACP627>3.0.CO;2-Y

Heise K, Koso T, Pitkänen L, Potthast A, King AWT, Kostiainen MA, Kontturi E (2019) Knoevenagle condensation for modifying the reducing end groups of cellulose nanocrystals. ACS Macro Lett 8:1642–1647. https://doi.org/10.1021/acsmacrolett.9b00838

Hirota M, Tamura N, Saito T, Isogai A (2009) Oxidation of regenerated cellulose with NaClO 2 catalyzed by TEMPO and NaClO under acid-neutral conditions. Carbohydr Polym 78:330–335. https://doi.org/10.1016/j.carbpol.2009.04.012

Holding AJ, Mäkelä V, Tolonen L et al (2016) Solution-state one- and two-dimensional nmr spectroscopy of high-molecular-weight cellulose. Chemsuschem 9:880–892. https://doi.org/10.1002/cssc.201501511

Hosoya T, Bacher M, Potthast A et al (2018) Insights into degradation pathways of oxidized anhydroglucose units in cellulose by β-alkoxy-elimination: a combined theoretical and experimental approach. Cellulose 25:3797–3814. https://doi.org/10.1007/s10570-018-1835-y

https://bruker.com/ Bruker TopSpin 4.0

https://fityk.nieto.pl/ Fityk 1.3.1

https://mestrelab.com/ MestReNova 10.0

Hu K, Westler WM, Markley JL (2011) Simultaneous quantification and identification of individual chemicals in metabolite mixtures by two-dimensional extrapolated time-zero 1 H– 13 C HSQC (HSQC 0 ). J Am Chem Soc 133:1662–1665. https://doi.org/10.1021/ja1095304

Article   PubMed   PubMed Central   CAS   Google Scholar  

Isogai A, Saito T, Fukuzumi H (2011) TEMPO-oxidized cellulose nanofibers. Nanoscale 3:71–85. https://doi.org/10.1039/C0NR00583E

Isogai A, Hänninen T, Fujisawa S, Saito T (2018) Review: Catalytic oxidation of cellulose with nitroxyl radicals under aqueous conditions. Prog Polym Sci 86:122–148. https://doi.org/10.1016/j.progpolymsci.2018.07.007

Kim TH (2008) Pulsed NMR: Relaxation times as function of viscocity and impurities. 1–5

Kim U-J, Kuga S, Wada M, Okano T, Kondo T (2000) Periodate oxidation of crystalline cellulose. Biomacromol 1:488–492. https://doi.org/10.1021/bm0000337

Kilpeläinen I, Xie H, King AWT, Granström M, Heikkinen S, Argyropoulos DS (2007) Dissolution of Wood in Ionic Liquids. J Agric Food Chem 55:9142–9148. https://doi.org/10.1021/jf071692e

King AWT, Mäkelä V, Kedzior SA et al (2018) Liquid-state NMR analysis of nanocelluloses. Biomacromol 19:2708–2720. https://doi.org/10.1021/acs.biomac.8b00295

Klemm D, Heublein B, Fink H-P, Bohn A (2005) Cellulose: fascinating biopolymer and sustainable raw material. Angewandte Chemie Int Ed 44:3358–3393. https://doi.org/10.1002/anie.200460587

Kono H, Yunoki S, Shikano T, Fujiwara M, Erata T, Takai M (2002) CP/MAS 13 C NMR study of cellulose and cellulose derivatives. 1. complete assignment of the CP/MAS 13 C NMR spectrum of the native cellulose. J Am Chem Soc 124:7506–7511. https://doi.org/10.1021/ja010704o

Koskela H, Kilpeläinen I, Heikkinen S (2005) Some aspects of quantitative 2D NMR. J Magn Reson 174:237–244. https://doi.org/10.1016/j.jmr.2005.02.002

Koskela H, Heikkilä O, Kilpeläinen I, Heikkinen S (2010) Quantitative two-dimensional HSQC experiment for high magnetic field NMR spectrometers. J Magn Reson 202:24–33. https://doi.org/10.1016/j.jmr.2009.09.021

Kyllönen L, Parviainen A, Deb S, Lawoko M, Gorlov M, Kilpeläinen I, King AWT (2013) On the solubility of wood in non-derivatising ionic liquids. Green Chem 15:2374–2378. https://doi.org/10.1039/C3GC41273C

Article   Google Scholar  

Liebert T, Heinze T (2008) Interaction of ionic liquids with polysaccharides 5. Solvents and reaction media for the modification of cellulose. BioResources 3:576–601

Google Scholar  

Lin F, Cousin F, Putaux J-L, Jean B (2019) Temperature-controlled star-shaped cellulose nanocrystal assemblies resulting from asymmetric polymer grafting. ACS Macro Lett 8:345–351. https://doi.org/10.1021/acsmacrolett.8b01005

Ling Z, Wang T, Makarem M, Santiago Cintrón M, Cheng HN, Kang X, Bacher M, Potthast A, Rosenau T, King H, Delhom CD, Nam S, Edwards JV, Kim SH, Xu F, French AD (2019) Effects of ball milling on the structure of cotton cellulose. Cellulose 26:305–328. https://doi.org/10.1007/s10570-018-02230-x

Mansfield SD, Kim H, Lu F, Ralph J (2012) Whole plant cell wall characterisation using solution-state 2D NMR. Nat Protoc 7:1579–1589. https://doi.org/10.1038/nprot.2012.064

Mäkelä V, Helminen JKJ, Kilpeläinen I, Heikkinen S (2016) Quantitative, equal carbon response HSQC experiment QEC HSQC. J Magn Reson 271:34–39. https://doi.org/10.1016/j.jmr.2016.08.003

Newman RH (1999) Estimation of the lateral dimensions of cellulose crystallites using 13 C NMR signal strengths. Solid State Nucl Magn Reson 15:21–29. https://doi.org/10.1016/S0926-2040(99)00043-0

Nypelö T, Amer H, Konnerth J et al (2018) Self-standing nanocellulose janus-type films with aldehyde and carboxyl functionalities. Biomacromol 19:973–979. https://doi.org/10.1021/acs.biomac.7b01751

Östlund Å, Lundberg D, Nordstierna L, Holmberg K, Nydén M (2009) Dissolution and gelation of cellulose in TBAF/DMSO solutions: the roles of fluoride ions and water. Biomacromol 10:2401–2407. https://doi.org/10.1021/bm900667q

Peterson DJ, Loening NM (2007) QQ-HSQC: a quick, quantitative heteronuclear correlation experiment for NMR spectroscopy. Magn Reson Chem 45:937–941. https://doi.org/10.1002/mrc.2073

Potthast A, Schiehser S, Rosenau T, Kostic M (2009) Oxidative modifications of cellulose in the periodate system—reduction and beta-elimination reactions. Holzforschung 63:12–17. https://doi.org/10.1515/HF.2009.108

Roslund MS, Tähtinen P, Niemitz M, Sjöholm R (2008) Complete assignments of the 1 H and 13 C chemical shifts and J H , H coupling constants in NMR spectra of D-glucopyranose and all D-glucopyranosyl-D-glucopyranosides. Carbohydr Res 343:101–112. https://doi.org/10.1016/j.carres.2007.10.008

Röhrling J, Potthast A, Rosenau T, Lang T, Ebner G, Sixta H, Kosma P (2002) A novel method for the determination of carbonyl groups in cellulosics by fluorescence labeling. 1. Method development. Biomacromol 3:959–968. https://doi.org/10.1021/bm020029q

Schleucher J, Schwendinger M, Sattler M et al (1994) A general enhancement scheme in heteronuclear multidimensional NMR employing pulsed field gradients. J Biomol NMR 4:301–306. https://doi.org/10.1007/BF00175254

Sun H, DiMagno SG (2005) Anhydrous tetrabutylammonium fluoride. J Am Chem Soc 127:2050–2051. https://doi.org/10.1021/ja0440497

Villares A, Moreau C, Cathala B (2018) Star-like supramolecular complexes of reducing-end-functionalized cellulose nanocrystals. ACS Omega 3:16203–16211. https://doi.org/10.1021/acsomega.8b02559

Willker W, Leibfritz D, Kerssebaum R, Bermel W (1993) Gradient selection in inverse heteronuclear correlation spectroscopy. Magn Reson Chem 31:287–292. https://doi.org/10.1002/mrc.1260310315

Wojdyr M (2010) Fityk : a general-purpose peak fitting program. J Appl Cryst 43:1126–1128. https://doi.org/10.1107/S0021889810030499

Zhang L, Gellerstedt G (2007) Quantitative 2D HSQC NMR determination of polymer structures by selecting suitable internal standard references. Magn Reson Chem 45:37–45. https://doi.org/10.1002/mrc.1914

Zuckerstätter G, Schild G, Wollboldt RT, Weber HK, Sixta H (2009) The elucidation of cellulose supramolecular structure by 13 C CP-MAS NMR. Lenzinger Berichte 87:38–46

Download references

Acknowledgments

Open access funding provided by University of Helsinki including Helsinki University Central Hospital. The authors would like to thank the Academy of Finland for funding under the project ‘WTF-Click-Nano’ (Project #: 311255). The authors would also like to thank Prof. Herbert Sixta for help in choosing the model cellulose materials.

Author information

Authors and affiliations.

Materials Chemistry Division, Department of Chemistry, Faculty of Science, University of Helsinki, Kumpula Campus, Helsinki, Finland

Tetyana Koso, Daniel Rico del Cerro, Sami Heikkinen, Jesus E. Perea-Buceta, Ilkka Kilpeläinen & Alistair W. T. King

Division of Applied Chemistry, Department of Chemistry and Chemical Engineering, Chalmers University of Technology, Gothenburg, Sweden

Tiina Nypelö

Department of Bioproducts and Biosystems, School of Chemical Engineering, Aalto University, Espoo, Finland

Jean Buffiere

Institute for Chemistry of Renewables, Department of Chemistry, University of Natural Resources and Life Sciences Vienna (BOKU), Wien, Austria

Antje Potthast & Thomas Rosenau

The Finnish Institute for the Verification of the Chemical Weapons Convention (VERIFIN), Helsinki, Finland

Harri Heikkinen

VTT Technical Research Centre of Finland Ltd., Espoo, Finland

Hannu Maaheimo

Department of Biomaterial Sciences, The University of Tokyo, Tokyo, Japan

Akira Isogai

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Tetyana Koso or Alistair W. T. King .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 9071 kb)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Koso, T., Rico del Cerro, D., Heikkinen, S. et al. 2D Assignment and quantitative analysis of cellulose and oxidized celluloses using solution-state NMR spectroscopy. Cellulose 27 , 7929–7953 (2020). https://doi.org/10.1007/s10570-020-03317-0

Download citation

Received : 15 April 2020

Accepted : 25 June 2020

Published : 27 July 2020

Issue Date : September 2020

DOI : https://doi.org/10.1007/s10570-020-03317-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Nitroxyl radical
  • Ionic liquid
  • Cellulose dissolution
  • Quantitative HSQC
  • Find a journal
  • Publish with us
  • Track your research
  • Login & order NMR service now
  • NMR service Login & order NMR service now NMR service NMR chromatography service Why use our superior service Contact us The NMR team How to submit samples Use the instruments yourself Terms & conditions

Types of 2D NMR

  • Examples of 2D spec…

The basis of 2D NMR

2d fourier transform.

  • What is NMR What is NMR Uses of NMR Basis of NMR Chemical shift Spin-spin coupling
  • Techniques Techniques 1 H NMR Relaxation Multinuclear Semi-solids Solid state
  • Apps Apps Solvent shifts NMR thermometer Reference frequency
  • Guides Guides Measuring a 1 H spectrum on the old 500 Measuring a 1 H spectrum Measuring other nuclei Measuring 2D NMR Measuring diffusion Measuring relaxation Measuring solid & semi-solid
  • Terms & conditions

Use our NMR service for 2D and other NMR experiments.

Two dimensional (2D) NMR spectroscopy includes:-

Homonuclear

  • Through bond: COSY , TOCSY , 2D-INADEQUATE , 2D-ADEQUATE
  • Through space: NOESY , ROESY

Heteronuclear correlation

  • One-bond correlation HSQC, HMQC
  • Long-range correlation HMBC

Examples of 2D spectral assignment

Assignment of 12,14-di t butylbenzo[g]chrysene

Assignment of cholesteryl acetate

In a 1D-NMR experiment the data acquisition stage takes place right after the pulse sequence. This order is maintained also with complex experiments although a preparation phase is added before the acquisition. However, in a 2D-NMR experiment, the acquisition stage is separated from the excitation stage by intermediate stages called evolution and mixing. The process of evolution continues for a period of time labeled t 1 . Data acquisition includes a large number of spectra that are acquired as follows: the first time the value of t 1 is set close to zero and the first spectrum is acquired. The second time, t 1 is increased by Δ t and another spectrum is acquired. This process (of incrementing t 1 and acquiring spectra) is repeated until there is enough data for analysis using a 2D Fourier transform. The spectrum is usually represented as a topographic map where one of the axes is f 1 that is the spectrum in the t 1 dimension and the second axis is that which is acquired after the evolution and mixing stages (similar to 1D acquisition). The intensity of the signal is shown by a stronger color the more it is intense.

In the resulting topographic map the signals are a function of two frequencies, f 1 and f 2 . It is possible that a signal will appear at one frequency ( e.g. , 20 Hz) in f 1 and another frequency ( e.g. , 80 Hz) f 2 that means that the signal's frequency changed during the evolution time. In a 2D-NMR experiment, magnetization transfer is measured. Sometimes this occurs through bonds to the same type of nucleus such as in COSY , TOCSY and INADEQUATE or to another type of nucleus such as in HSQC and HMBC or through space such as in NOESY and ROESY .

The various 2D-NMR techniques are useful when 1D-NMR is insufficient such when the signals overlap because their resonant frequencies are very similar. 2D-NMR techniques can save time especially when interested in connectivity between different types of nuclei ( e.g. , proton and carbon ).

The basic 2D NMR experiment(fig. 1) consists of a pulse sequence that excites the nuclei with two pulses or groups of pulses then receiving the free induction decay (fid) . The groups of pulses may be purely radiofrequency (rf) or may include magnetic gradient pulses . The acquisition is carried out many times, incrementing the delay (evolution time - t 1 ) between the two pulse groups. The evolution time is labeled t 1 and the acquisition time, t 2 .

Fig. 1. Basic pulse sequence for 2D acquisition

The FID is then Fourier transformed in both directions (fig. 2) to yield the spectrum. The spectrum is conventionally displayed as a contour diagram. The evolution frequency is labeled f 1 and the acquisition frequency is labeled f 2 and plotted from right to left.

Fig. 2. 2D Fourier transform

The 2D spectrum is usually plotted with its 1D projections for clarity. These may be genuine projections or the equivalent 1D spectra. In a homonuclear spectrum there is usually a diagonal (with the exception of 2D-INADEQUATE ) that represents the correlation of peaks to themselves and is not in itself very informative. The signals away from the diagonal represent correlations between two signals and are used for assignment. For example in the homonuclear COSY spectrum in Fig. 3, the 1 H signal at 1.4 ppm correlates with the 1 H signal at 2.8 ppm because there are cross-peaks but they do not correlate with the signals at 7.3 ppm.

Fig. 3. 2D COSY spectrum of ethylbenzene

In a heteronuclear spectrum there are no diagonal signals and all the signals represent correlations. For example in the heteronculear HSQC short-range correlation spectrum in fig. 4, the 1 H signal at 1.4 ppm correlates with the 13 C signal at 15.7 ppm, the 1 H signal at 2.8 ppm correlates with the 13 C signal at 29.0 ppm, etc.

Fig. 4. 2D HSQC spectrum of ethylbenzene

The signals in a 2D spectrum are not always pure phase. Sometimes, the phase cannot be expressed simply as in HMBC and 2D-INADEQUATE , in which case a magnitude spectrum is plotted. However, magnitude spectra sacrifice resolution as compared to pure phase spectra (and unlike window functions that broaden lines, do not yield sensitivity gains). Therefore, wherever possible, the 2D spectrum should be phased. The resulting signals may be pure phase, anti-phase or negatively phased as in the fig. 5. Negative signals are conventionally represented by dotted or red contours.

Fig. 5. Possible phases for a correlation between two doublets

Accessibility Tools

Highlight links, change contrast, increase text size, increase letter spacing, readability bar, dyslexia friendly font, increase cursor size.

2D NMR

2D NMR is the best method to determine both homo- and heteronuclear connectivity in molecules. These interactions can be either through bonds or through space allowing for complete assignment of molecular structure in 3-dimensions. 

A guide to running 2d experiments on our varian/agilent spectrometers in included. for those wishing to run 2d or 3d experiments on our bruker instruments, please contact the staff..

gHSQC and gHMBC

This page provides the user with the necessary tools and commands to run a gHSQC or gHMBC experiment on a Varian/Agilent spectrometer.

gCOSY

Learn to run the simplest of the COSY experiments on a Varian/Agilent System.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 18 October 2022

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

  • Piotr Klukowski   ORCID: orcid.org/0000-0003-1045-3487 1 ,
  • Roland Riek   ORCID: orcid.org/0000-0002-6333-066X 1 &
  • Peter Güntert   ORCID: orcid.org/0000-0002-2911-7574 1 , 2 , 3  

Nature Communications volume  13 , Article number:  6151 ( 2022 ) Cite this article

11k Accesses

29 Citations

30 Altmetric

Metrics details

  • Machine learning
  • Solution-state NMR

Nuclear Magnetic Resonance (NMR) spectroscopy is a major technique in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. We present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 Å median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.

Similar content being viewed by others

2d nmr assignment

Accurate structure prediction of biomolecular interactions with AlphaFold 3

2d nmr assignment

Highly accurate protein structure prediction with AlphaFold

2d nmr assignment

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Introduction.

Studying structures of proteins and ligand-protein complexes is one of the most influential endeavors in molecular biology and rational drug design. All key structure determination techniques, X-ray crystallography, electron microscopy, and NMR spectroscopy, have led to remarkable discoveries, but suffer from their respective experimental limitations. NMR can elucidate structures and dynamics of small and medium size proteins in solution 1 and even in living cells 2 . However, the analysis of NMR spectra and the resonance assignment, which are indispensable for NMR studies, remain time-consuming even for a skilled and experienced spectroscopist. Attributed to this, the percentage of NMR protein structures in the Protein Data Bank (PDB) has decreased from a maximum of 14.6% in 2007 to 7.3% in 2021 ( https://www.rcsb.org/stats ). The problem has sparked research towards automating different tasks in NMR structure determination 3 , 4 , including peak picking 5 , 6 , 7 , 8 , 9 , resonance assignment 10 , 11 , 12 , and the identification of distance restraints 13 , 14 . Several of these methods are available as webservers 15 , 16 . This enabled semi-automatic 17 , 18 but not yet unsupervised automation of the entire NMR structure determination process, except for a very small number of favorable proteins 7 , 19 .

The advance of machine learning techniques 20 now offers unprecedented possibilities for reliably replacing decisions of human experts by efficient computational tools. Here, we present a method that achieves this goal for NMR assignment and structure determination. We show for a diverse set of 100 proteins that NMR resonance assignments and protein structures can be determined within hours after completing the NMR measurements. Our method, Art ificial I ntelligence for N MR A pplications, ARTINA (Fig.  1 ), combines machine learning for tasks that are difficult to model otherwise with existing algorithms—evolutionary optimization for resonance assignment with FLYA 12 , chemical shift database searches for torsion angle restraint generation with TALOS-N 21 , ambiguous distance restraints, network-anchoring and constraint combination for NOESY assignment 14 , 22 and simulated annealing by torsion angle dynamics for structure calculation with CYANA 23 . Machine learning is used in multiple flavors—deep residual neural networks 24 for visual spectrum analysis to identify peak positions (pp-ResNet) and to deconvolve overlapping signals (deconv-ResNet) in 25 different types of spectra (Supplementary Table  1 ), kernel density estimation (KDE) to reconstruct original peak positions in folded spectra, a deep graph neural network 25 , 26 (GNN) for chemical shift estimation within the refinement of chemical shift assignments, and a gradient boosted trees 27 (GBT) model for the selection of structure proposals.

figure 1

The flowchart presents the interplay between the main components of the automated protein structure determination workflow: Residual Neural Network (ResNet), FLYA automated chemical shift assignment, Graph Neural Network (GNN), Gradient Boosted Trees (GBT), and CYANA structure calculation.

A major challenge in developing ARTINA was the collection and preparation of a large training data set that is required for machine learning, because, in contrast to assignments and structures, NMR spectra are generally not archived in public data repositories. Instead, we were obliged to collect from different sources and standardize complete sets of multidimensional NMR spectra for the assignment and structure determination of 100 proteins.

In the following work, we describe the algorithm, training and test data, and results of ARTINA automated structure determination, which are on par with those achieved in weeks or months of human experts’ labor.

Benchmark dataset

One of the major obstacles for developing deep learning solutions for protein NMR spectroscopy is the lack of a large-scale standardized benchmark dataset of protein NMR spectra. To date, published manuscripts presenting the most notable methods for computational NMR, typically refer to less than 50 2D/3D/4D NMR spectra in their experimental sections. Even the well-recognized CASD-NMR competition cannot serve as a major source of training data for deep learning, since only the NOESY spectra of 10 proteins were used in the last round of the event 28 .

To make our study possible, we established a standardized benchmark of 1329 2D/3D/4D NMR spectra, which allows 100 proteins to be recalculated using their original spectral data (Fig.  2 and Supplementary Table  2 ). Each protein record in our dataset contains 5–20 spectra together with manually identified chemical shifts (usually depositions at the Biological Magnetic Resonance Data Bank, BMRB) and the previously determined (“ground truth”) protein structure (PDB record; Supplementary Table  3 ). The benchmark covers protein sizes typically studied by NMR spectroscopy with sequence lengths between 35 and 175 residues (molecular mass 4–20 kDa).

figure 2

PDB codes (or names, MH04, MDM2, KRAS4B, if PDB code unavailable) of the 100 benchmark proteins are ordered by the number of residues. The histogram shows the number of spectra for backbone assignment, side-chain assignment, and NOE measurement. Spectrum types in each data set are shown by light to dark blue circles indicating the number of individual spectra of the given type. The percentages of benchmark records that contain a given spectrum type are given at the top. Spectrum types present in less than 5% of the data sets have been omitted.

Automated protein structure determination

The accuracy of protein structure determination with ARTINA was evaluated in a 5-fold cross-validation experiment with the aforementioned benchmark dataset. Five instances of pp-ResNet and GBT were trained, each one using data from about 80% of the proteins for training and the remaining ones for testing. Since each protein was present exactly once in the test set, reported quality metrics were obtained directly in the cross-validation experiment, and no averaging between data splits was required. To deploy pp-ResNet and GBT models in our online system, we constructed an ensemble by averaging predictions of all 5 cross-validation models. The other models were trained only once using either generated data (deconv-ResNet, Supplementary Fig.  1 ) or BMRB depositions excluding all benchmark proteins (GNN, KDE).

In this experiment, we reproduced 100 structures in fully automated manner using only NMR spectra and the protein sequences as input. Since ARTINA has no tunable parameters and does not require any manual curation of data, each structure was calculated by a single execution of the ARTINA workflow. All benchmark datasets were analyzed by ARTINA in parallel with execution times of 4–20 h per protein.

All automatically determined structures, overlaid with the corresponding reference structures from the PDB, are visualized in Fig.  3 , Supplementary Fig.  2 , and Supplementary Movie  1 . ARTINA was able to reproduce the reference structures with a median backbone root-mean-square deviation (RMSD) of 1.44 Å between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges determined by CYRANGE 29 (Fig.  4a and Supplementary Table  4 ). ARTINA automatically identified between 459 and 4678 distance restraints (2198 on average over 100 proteins), which corresponds to 4.25–33.20 restraints per residue (Fig.  4b ). This number is mainly influenced by the extent of unstructured regions and the quality of the NOESY spectra. In agreement with earlier findings 30 , it correlates only weakly with the backbone RMSD to reference (linear correlation coefficient −0.38). As a more expressive validation measure for the structures from ARTINA, we computed a predicted RMSD to the PDB reference structure on the basis of the RMSDs between the 10 candidate structure bundles calculated in ARTINA (see “Methods”, Fig.  5 , and Supplementary Table  5 ). The average deviation between actual and predicted RMSDs for the 100 proteins in this study is 0.35 Å, and their linear correlation coefficient is 0.77 (Fig.  5 ). In no case, the true RMSD exceeds the predicted one by more than 1 Å.

figure 3

The structures are aligned with the RMSD to reference range as indicated on the left and hexagonal frames color-coded by their size as indicated above. Structures with no corresponding PDB depositions are marked by an asterisk.

figure 4

a Backbone RMSD to reference. b Number of distance restraints per residue. c Chemical shift assignment accuracy. Bars represent quantity values for benchmark proteins, identified by PDB codes (or protein names). Proteins are ordered by size, which is indicated by a color-coded circle. Values in the center of each panel are 10th, 50th, and 90th percentiles of values presented in the bar plot. Short/medium/long-range restraints are between residues i and j with | i – j | ≤ 1, 2 ≤ | i – j | ≤  4, and | i – j | ≥ 5, respectively.

figure 5

The predicted RMSD to reference (pRMSD) is calculated from the ARTINA results without knowledge of the reference PDB structure (see “Methods”) and, by definition, always in the range of 0–4 Å. For comparability, actual RMSD values to reference are also truncated at 4 Å (protein 2M47 with RMSD 4.47 Å). The dotted lines represent deviations of ±1 Å between the two RMSD quantities.

Additional structure validation scores obtained from ANSSUR 31 (Supplementary Table  6 ), RPF 32 (Supplementary Table  7 ), and consensus structure bundles 33 (Supplementary Table  8 ) confirm that overall the ARTINA structures and the corresponding reference PDB structures are of equivalent quality. Energy refinement of the ARTINA structures in explicit water using OPALp 34 (not part of the standard ARTINA workflow) does not significantly alter the agreement with the PDB reference structures (Supplementary Table  9 ). The benchmark data set comprises 78 protein structures determined by the Northeast Structural Genomics Consortium (NESG). On average, ARTINA yielded structures of the same accuracy for NESG targets (median RMSD to reference 1.44 Å) as for proteins from other sources (1.42 Å).

On average, ARTINA correctly assigned 90.39% of the chemical shifts (Fig.  4c ), as compared to the manually prepared assignments, including both “strong” (high-reliability) and “weak” (tentative) FLYA assignments 12 . Backbone chemical shifts were assigned more accurately (96.03%) than side-chain ones (86.50%), which is mainly due to difficulties in assigning lysine/arginine (79.97%) and aromatic (76.87%) side-chains. Further details on the assignment accuracy for individual amino acid types in the protein cores (residues with less than 20% solvent accessibility) are given in Supplementary Table  10 . Assignments for core residues, which are important for the protein structure, are generally more accurate than for the entire protein, in particular for core Ala, Cys, and Asp residues, which show a median assignment accuracy of 100% over the 100 proteins. The lowest accuracies are observed for core His (83.3%), Phe (83.3%), and Arg (87.5%) residues. The three proteins with highest RMSD to reference, 2KCD, 2L82, and 2M47 (see below), show 68.2, 83.8, and 75.7% correct aromatic assignments, respectively, well below the corresponding median of 85.5%. On the other hand, the assignment accuracies for the methyl-containing residues Ala, Ile, Val are above average and reach a median of 100, 97.6, and 98.6%, respectively.

The quality of automated structure determination and chemical shift assignment reflects the performance of deep learning-based visual spectrum analysis, presented qualitatively in Figs.  6 – 7 , Supplementary Fig.  3 , and Supplementary Movies  2 – 4 . In this experiment, our models (pp-ResNet, deconv-ResNet) automatically identified 1,168,739 cross-peaks with high confidence (≥0.50) in the benchmark spectra. All 1329 peak lists, together with automatically determined protein structures and chemical shift lists, are available for download.

figure 6

A fragment of a 15 N-HSQC spectrum of the protein 1T0Y is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses). a 1 , a 2 Initial peak picking marker position is refined by the deconvolution model. b 1 , b 2 pp-ResNet output is deconvolved into two components. c The deconvolution model supports maximally 3 components per initial signal. d Two peak picking markers are merged by the deconvolution model. e Peak picking output deconvolved into three components.

figure 7

A fragment of the 13 C-HSQC spectrum of protein 2K0M is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses).

Error analysis

The largest deviations from the PDB reference structure were observed for the proteins 2KCD, 2L82, and 2M47, for which the pRMSD consistently indicated low accuracy (Fig.  5 ). Significant deviations are mainly due to displacements of terminal secondary structure elements (e.g., a tilted α-helix near a chain terminus), or inaccurate loop conformations (e.g., more flexible than in the PDB deposition). We investigated the origin of these discrepancies.

2KCD is a 120-residue (14.4 kDa) protein from Staphylococcus saprophyticus with an α-β roll architecture. Its dataset comprises 19 spectra (8 backbone, 6 side-chain, and 5 NOESY). The ARTINA structure has a backbone RMSD to PDB reference of 3.13 Å, which is caused by the displacement of the C-terminal α-helix (residues 105–109; Supplementary Fig.  4a ). Excluding this 5-residue fragment decreases the RMSD to 2.40 Å (Supplementary Table  11 ). The positioning of this helix appears to be uncertain, since an ARTINA calculation without the 4D CC-NOESY spectrum yields a significantly lower RMSD of 1.77 Å (Supplementary Table  12 ).

2L82 is a de novo designed protein of 162 residues (19.7 kDa) with an αβ 3-layer (αβα) sandwich architecture. Although only 9 spectra (4 backbone, 2 side-chain and 3 NOESY) are available, ARTINA correctly assigned 97.87% backbone and 81.05% side-chain chemical shifts. The primary reason for the high RMSD value of 3.55 Å is again a displacement of the C-terminal α-helix (residues 138–153). The remainder of the protein matches closely the PDB deposition (1.04 Å RMSD, Supplementary Fig.  4b ).

The protein with highest RMSD to reference (4.72 Å) in our benchmark dataset is 2M47, a 163-residue (18.8 kDa) protein from Corynebacterium glutamicum with an α-β 2-layer sandwich architecture, for which 17 spectra (7 backbone, 7 side chain and 3 NOESY) are available. The main source of discrepancy are two α-helices spanning residues 111–157 near the C-terminus. Nevertheless, the residues contributing to the high RMSD value are distributed more extensively than in 2L82 and 2KCD just discussed. Interestingly, 2 of the 10 structure proposals calculated by ARTINA have an RMSD to reference below 2 Å (1.66 Å and 1.97 Å). In the final structure selection step, our GBT model selected the 4.72 Å RMSD structure as the first choice and 1.66 Å as the second one (Supplementary Fig.  4c ). Such results imply that the automated structure determination of this protein is unstable. Since ARTINA returns the two structures selected by GBT with the highest confidence, the user can, in principle, choose the better structure based on contextual information.

In addition to these three case studies, we performed a quantitative analysis of all regular secondary structure elements and flexible loops present in our 100-protein benchmark in order to assess their impact on the backbone RMSD to reference (Supplementary Table  11 ). All residues in the structurally well-defined regions determined by CYRANGE 29 were assigned to 6 partially overlapping sets: (a) first secondary structure element, (b) last secondary structure element, (c) α-helices, (d) β-sheets, (e) α-helices and β-sheets, and (f) loops. Then, the RMSD to reference was calculated 6 times, each time with one set excluded. In total, for 66 of the 100 proteins the lowest RMSD was obtained if set (f) was excluded from RMSD calculation, and 13% benefited most from removal of the first or last secondary structure element (a or b). Moreover, for 18 out of the 19 proteins with more than 0.5 Å RMSD decrease compared to the RMSD for all well-defined residues, (a), (b), or (f) was the primary source of discrepancy. These results are consistent with our earlier statement that deviations in automatically determined protein structures are mainly caused by terminal secondary structure elements or inaccurate loop conformations.

Ablation studies

During the experiment, we captured the state of each structure determination at 9 time-points, 3 per structure determination cycle: (a) after the initial FLYA shift assignment, (b) after GNN shift refinement, and (c) after structure calculation (Fig.  1 ). Comparative analysis of these states allowed us to quantify the contribution of different ARTINA components to the structure determination process (Table  1 ).

The results show a strong benefit of the refinement cycles, as quantities reported in Table  1 consistently improve from cycle 1 to 3. The majority of benchmark proteins converge to the correct fold after the first cycle (1.56 Å median backbone RMSD to reference), which is further refined to 1.52 Å in cycle 2 and 1.44 Å in cycle 3. Additionally, within each chemical shift refinement cycle, improvements in assignment accuracy resulting from the GNN predictions are observed. This quantity also increases consistently across all refinement cycles, in particular for side-chains. Refinement cycles are particularly advantageous for large and challenging systems, such as 2LF2, 2M7U, or 2B3W, which benefit substantially in cycles 2 and 3 from the presence of the approximate protein fold in the chemical shift assignment step.

Impact of 4D NOESY experiments

As presented in Fig.  2 , 26 out of 100 benchmark datasets contain 4D CC-NOESY spectra, which require long measurement times and were used in the manual structure determination. To quantify their impact, we performed automated structure determinations of these 26 proteins with and without the 4D CC-NOESY spectra (Supplementary Table  12 ).

On average, the presence of 4D CC-NOESY improves the backbone RMSD to reference by 0.15 Å (decrease from 1.88 to 1.73 Å) and has less than 1% impact on chemical shift assignment accuracy. However, the impact is non-uniform. For three proteins, 2KIW, 2L8V, and 2LF2, use of the 4D CC-NOESY decreased the RMSD by more than 1 Å. On the other hand, there is also one protein, 2KCD, for which the RMSD decreased by more than 1 Å by excluding the 4D CC-NOESY.

These results suggest that overall the amount of information stored in 2D/3D experiments is sufficient for ARTINA to reach close to optimal performance, and only modest improvement can be achieved by introducing additional information redundancy from 4D CC-NOESY spectra.

Automated chemical shift assignment

Apart from structure determination, our data analysis pipeline for protein NMR spectroscopy can address an array of problems that are nowadays approached manually or semi-manually. For instance, ARTINA can be stopped after visual spectrum analysis, returning positions and intensities of cross-peaks that can be utilized for any downstream task, not necessarily related to protein structure determination.

Alternatively, a single chemical shift refinement cycle can be performed to get automatically assigned cross-peaks from spectra and sequence. We evaluated this approach with three sets of spectra: (i) Exclusively backbone assignment spectra were used to assign N, C α , C β , C’, and H N shifts. With this input, ARTINA assigned 92.40% (median value) of the backbone shifts correctly. (ii) All through-bond but no NOESY spectra were used to assign the backbone and side-chain shifts. This raised the percentage of correct backbone assignments to 94.20%. (iii) The full data set including NOESY yielded 96.60% correct assignments of the backbone shifts. These three experiments were performed for the 45 benchmark proteins, for which CBCANH and CBCAcoNH, as well as either HNCA and HNcoCA or HNCO and HNcaCO experiments were available. The availability of NOESY spectra had a large impact on the side-chain assignments: 86.00% were correct for the full spectra set iii, compared to 73.70% in the absence of NOESY spectra (spectra set ii). The presence of NOESY spectra consistently improved the chemical shift assignment accuracy of all amino acid types (Supplementary Tables  13 and 14 ). The improvement is particularly strong for aromatic residues (Phe, 61.6 to 76.5%, Trp 52.5 to 80%, and Tyr 71.4 to 89.7%), but not limited to this group.

The results obtained with ARTINA differ in several aspects substantially from previous approaches towards automating protein NMR analysis 3 , 4 , 7 , 12 , 17 , 18 , 19 , 35 . First, ARTINA comprehends the entire workflow from spectra to structures rather than individual steps in it, and there are strictly no manual interventions or protein-specific parameters to be adapted. Second, the quality of the results regarding peak identification, resonance assignments, and structures have been assessed on a large and diverse set of 100 proteins; for the vast majority of which they are on par with what can be achieved by human experts. Third, the method provides a two-orders-of-magnitude leap in efficiency by providing assignments and a structure within hours of computation time rather than weeks or months of human work. This reduces the effort for a protein structure determination by NMR essentially to the preparation of the sample and the measurement of the spectra. Its implementation in the https://nmrtist.org webserver (Supplementary Movie  5 ) encapsulates its complexity, eliminates any intermediate data and format conversions by the user, and enables the use of different types of high-performance hardware as appropriate for each of the subtasks. ARTINA is not limited to structure determination but can be used equally well for peak picking and resonance assignment in NMR studies that do not aim at a structure, such as investigations of ligand binding or dynamics.

Although ARTINA has no parameters to be optimized by the user, care should be given to the preparation of the input data, i.e., the choice, measurement, processing, and specification of the spectra. Spectrum type, axes, and isotope labeling declarations must be correct, and chemical shift referencing consistent over the entire set of spectra. Slight variations of corresponding chemical shifts within the tolerances of 0.03 ppm for 1 H and 0.4 ppm for 13 C/ 15 N can be accommodated, but larger deviations, resulting, for instance, from the use of multiple samples, pH changes, protein degradation, or inaccurate referencing, can be detrimental. Where appropriate, ARTINA proposes corrections of chemical shift referencing 36 . Furthermore, based on the large training data set, which comprises a large variety of spectral artifacts, ARTINA largely avoids misinterpreting artifacts as signals. However, with decreasing spectral quality, ARTINA, like a human expert, will progressively miss real signals.

Regarding protein size and spectrum quality, limitations of ARTINA are similar to those encountered by a trained spectroscopist. Machine-learning-based visual analysis of spectra requires signals to be present and distinguishable in the spectra. ARTINA does not suffer from accidental oversight that may affect human spectra analysis. On the other hand, human experts may exploit contextual information to which the automated system currently has no access because it identifies individual signals by looking at relatively small, local excerpts of spectra.

In this paper, we used all spectra that are available from the earlier manual structure determination. For most of the 100 proteins, the spectra data set has significant redundancy regarding information for the resonance assignment. Our results indicate that one can expect to obtain good assignments and structures also from smaller sets of spectra 37 , with concomitant savings of NMR measurement time. We plan to investigate this in a future study.

The present version of ARTINA can be enhanced in several directions. Besides improving individual models and algorithms, it is conceivable to integrate the so far independently trained collection of machine learning models, plus additional models that replace conventional algorithms, into a coherent system that is trained as a whole. Furthermore, the reliability of machine learning approaches depends strongly on the quantity and quality of training data available. While the collection of the present training data set for ARTINA was cumbersome, from now on it can be expected to expand continuously through the use of the https://nmrtist.org website, both quantitatively and qualitatively with regard to greater variability in terms of protein types. spectral quality, source laboratory, data processing (including non-linear sampling), etc., which can be exploited in retraining the models. ARTINA can also be extended to use additional experimental input data, e.g., known partial assignments, stereospecific assignments, 3 J couplings, residual dipolar couplings, paramagnetic data, and H-bonds. Structural information, e.g., from AlphaFold 38 , can be used in combination with reduced sets of NMR spectra for rapid structure-based assignment. Finally, the range of application of ARTINA can be generalized to small molecule-protein complexes relevant for structure-activity relationship studies in drug research, protein-protein complexes, RNA, solid state, and in-cell NMR.

Overall, ARTINA stands for a paradigm change in biomolecular NMR from a time-consuming technique for specialists to a fast method open to researchers in molecular biology and medicinal chemistry. At the same time, in a larger perspective, the appearance of generally highly accurate structure predictions by AlphaFold 38 is revolutionizing structural biology. Nevertheless, there remains space for the experimental methods, for instance, to elucidate various states of proteins under different conditions or in dynamic exchange, or for studying protein-ligand interaction. Regarding ARTINA, one should keep in mind that its applications extend far beyond structure determination. It will accelerate virtually any biological NMR studies that require the analysis of multidimensional NMR spectra and chemical shift assignments. Protein structure determination is just one possible ARTINA application, which is both demanding in terms of the amount and quality of required experimental data and amenable to quantitative evaluation.

Spectrum benchmark collection

To collect the benchmark of NMR spectra (Fig.  2 and Supplementary Table  2 ), we implemented a crawler software, which systematically scanned the FTP server of the BMRB data bank 39 , identifying data files relevant to our study. Additional datasets were obtained by setting up a website for the deposition of published data ( https://nmrdb.ethz.ch ), from our collaboration network, or had been acquired internally in our laboratory. NMR data was collected from these channels either in the form of processed spectra (Sparky 40 , NMRpipe 41 , XEASY 42 , Bruker formats), or in the form of time-domain data accompanied by depositor-supplied NMRpipe processing scripts. No additional spectra processing (e.g., baseline correction) was performed as part of this study.

The most challenging aspects of the benchmark collection process were: scarcity of data—only a small fraction of all BMRB depositions are accompanied by uploaded spectra (or time-domain data), lack of standards for NMR data depositions—each protein data set had to be prepared manually, as the original data was stored in different formats (spectra name conventions, axis label standards, spectra data format), and difficulties in correlating data files deposited in the BMRB FTP site with contextual information about the spectrum and the sample (e.g., sample characteristics, measurement conditions, instrument used). Manually prepared (mostly NOESY) peak lists, which are available from the BMRB for some of the proteins in the benchmark, were not used for this study.

Different approaches to 3D 13 C-NOESY spectra measurement had to be taken into account: (i) Two separate 13 C NOESY for aliphatic and aromatic signals. These were analyzed by ARTINA without any special treatment. We used ALI , ARO tags (Supplementary Movie  S5 ) to provide the information that only either aliphatic or aromatics shifts are expected in a given spectrum. (ii) Simultaneous NC-NOESY. These spectra were processed twice to have proper scaling of the 13 C and 15 N axes in ppm units, and cropped to extract 15 N-NOESY and 13 C-NOESY spectra. If nitrogen and carbon cross-peak amplitudes have different signs, we used POS , NEG tags to provide the information that only either positive or negative signals should be analyzed. (iii) Aliphatic and aromatic signals in a single 13 C-NOESY spectrum. These measurements do not require any special treatment, but proper cross-peak unfolding plays a vital role in aromatic signals analysis.

Overview of the ARTINA algorithm

ARTINA uses as input only the protein sequence and a set of NMR spectra, which may contain any combination of 25 experiments currently supported by the method (Supplementary Table  1 ). Within 4–20 h of computation time (depending on protein size, number of spectra, and computing hardware load), ARTINA determines: (a) cross-peak positions for each spectrum, (b) chemical shift assignments, (c) distance restraints from NOESY spectra, and (d) the protein structure. The whole process does not require any human involvement, allowing rapid protein NMR assignment and structure determination by non-experts.

The ARTINA workflow starts with visual spectrum analysis (Fig.  1 ), wherein cross-peak positions are identified in frequency-domain NMR spectra using deep residual neural networks (ResNet) 24 . Coordinates of signals in the spectra are passed as input to the FLYA automated assignment algorithm 12 , yielding initial chemical shift assignments . In the subsequent chemical shift refinement step, we bring to the workflow contextual information about thousands of protein structures solved by NMR in the past using a deep GNN 25 that was trained on BMRB/PDB depositions. Its goal is to predict expected values of yet missing chemical shifts, given the shifts that have already been confidently and unambiguously assigned by FLYA. With these GNN predictions as additional input, the cross-peak positions are reassessed in a second FLYA call, which completes the chemical shift refinement cycle (Fig.  1 ).

In the structure refinement cycle , 10 variants of NOESY peak lists are generated, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis by varying the confidence threshold of a signal selected by ResNet between 0.05 and 0.5. Each set of NOESY peak lists is used in an independent CYANA structure calculation 22 , 23 , yielding 10 intermediate structure proposals (Fig.  1 ). The structure proposals are ranked in the intermediate structure selection step based on 96 features with a dedicated GBT model. The selected best structure proposal is used as contextual information in a consecutive FLYA run, which closes the structure refinement cycle .

After the two initial steps of visual spectrum analysis and initial chemical shift assignment, ARTINA interchangeably executes refinement cycles. The chemical shift refinement cycle provides FLYA with tighter restraints on expected chemical shifts, which helps to assign ambiguous cross-peaks. The structure refinement cycle provides information about possible through-space contacts, allowing identified cross-peaks (especially in NOESY) to be reassigned. The high-level concept behind the interchangeable execution of refinement cycles is to iteratively update the protein structure given fixed chemical shifts, and update chemical shifts given the fixed protein structure. Both refinement cycles are executed three times.

Automated visual analysis of the spectrum

We established two machine learning models for the visual analysis of multidimensional NMR spectra (see downloads in the Code availability section). In their design, we made no assumptions about the downstream task and the 2D/3D/4D experiment type. Therefore, the proposed models can be used as the starting point of our automated structure determination procedure, as well as for any other task that requires cross-peak coordinates.

The automated visual analysis starts by selecting all extrema \({{{{{\boldsymbol{x}}}}}}=\left\{{{{{{{\boldsymbol{x}}}}}}}_{1},{{{{{{\boldsymbol{x}}}}}}}_{2},\ldots,{{{{{{\boldsymbol{x}}}}}}}_{N}\right\}\) , \({{{{{{\boldsymbol{x}}}}}}}_{n}\in {{\mathbb{N}}}^{D}\) in the NMR spectrum, which is represented as a D -dimensional regular grid storing signal intensities at discrete frequencies. We formulated the peak picking task as an object detection problem, where possible object positions are confined to \({{{{{\boldsymbol{x}}}}}}\) . This task was addressed by training a deep residual neural network 24 , in the following denoted as peak picking ResNet (pp-ResNet), which learns a mapping \({{{{{{\boldsymbol{x}}}}}}}_{n}\to[0,\,1]\) that assigns to each signal extremum a real-valued score, which resembles its probability of being a true signal rather than an artefact.

Our network architecture is strongly linked to ResNet-18 24 . It contains 8 residual blocks, followed by a single fully connected layer with sigmoidal activation. After weight initialization with Glorot Uniform 43 , the architecture was trained by optimizing a binary cross-entropy loss using Adam 44 with learning rate 10 –4 and gradient clipping of 0.5.

To establish an experimental training dataset for pp-ResNet, we normalized the 1329 spectra in our benchmark with respect to resolution (adjusting the number of data grid points per unit chemical shift (ppm) using linear interpolation) and signal amplitude (scaling the spectrum by a constant). Subsequently, 675,423 diverse 2D fragments of size 256 × 32 × 1 were extracted from the normalized spectra and manually annotated, yielding 98,730 positive and 576,693 negative class training examples. During the training process, we additionally augmented this dataset by flipping spectrum fragments along the second dimension (32 pixels), stretching them by 0–30% in the first and second dimensions, and perturbing signal intensities with Gaussian noise addition.

The role of the pp-ResNet is to quickly iterate over signal extrema in the spectrum, filtering out artefacts and selecting approximate cross-peak positions for the downstream task. The relatively small network architecture (8 residual blocks) and input size of 2D 256 × 32 image patches make it possible to analyze large 3D 13 C-resolved NOESY spectra in less than 5 min on a high-end desktop computer. Simultaneously, the first dimension of the image patch (256 pixels) provides long-range contextual information on the possible presence of signals aligned with the current extremum (e.g., C α , C β cross-peaks in an HNCACB spectrum).

Extrema classified with high confidence as true signals by pp-ResNet undergo subsequent analysis with a second deep residual neural network (deconv-ResNet). Its objective is to perform signal deconvolution, based on a 3D spectrum fragment (64 × 32 × 5 voxels) that is cropped around a signal extremum selected by pp-ResNet. This task is defined as a regression problem, where deconv-ResNet outputs a 3 × 3 matrix storing 3D coordinates of up to 3 deconvolved peak components, relative to the center of the input image. To ensure permutation invariance with respect to the ordering of components in the output coordinate matrix, and to allow for a variable number of 1–3 peak components, the architecture was trained with a Chamfer distance loss 45 .

Since deconv-ResNet deals only with true signals and their local neighborhood, its training dataset can be conveniently generated. We established a spectrum fragment generator, based on rules reflecting the physics of NMR, which produced 110,000 synthetic training examples (Supplementary Fig.  1 ) having variable (a) numbers of components to deconvolve (1–3), (b) signal-to-noise ratio, (c) component shapes (Gaussian, Lorentzian, and mixed), (d) component amplitude ratios, (e) component separation, and (f) component neighborhood type (i.e., NOESY-like signal strips or HSQC-like 2D signal clusters). The deconv-ResNet model was thus trained on fully synthetic data.

Signal unaliasing

To use ResNet predictions in automated chemical shift assignment and structure calculation, detected cross-peak coordinates must be transformed from the spectrum coordinate system to their true resonance frequencies. We addressed the problem of automated signal unfolding with the classical machine learning approach to density estimation.

At first, we generated 10 5 cross-peaks associated with each experiment type supported by ARTINA (Supplementary Table  1 ). In this process, we used randomly selected chemical shift lists deposited in the BMRB database, excluding depositions associated with our benchmark proteins. Subsequently, we trained a Kernel Density Estimator (KDE):

which captures the distribution \({p}_{e}\left({{{{{\boldsymbol{x}}}}}}\right)\) of true peaks being present at position \({{{{{\boldsymbol{x}}}}}}\) in spectrum type \(e\) , based on N e = 10 5 cross-peaks coordinates \({{{{{{\boldsymbol{x}}}}}}}_{i}^{(e)}\) generated with BMRB data, and \(\kappa\) being the Gaussian kernel.

Unfolding a k -dimensional spectrum is defined as a discrete optimization problem, solved independently for each cross-peak \({{{{{{\boldsymbol{x}}}}}}}_{j}^{\left(e\right)}\) observed in a spectrum of type \(e\) :

where \({{{{{\boldsymbol{w}}}}}}\in{{\mathbb{R}}}^{k}\) is a vector storing the spectral widths in each dimension (ppm units), \({{\circ }}\)  is element-wise multiplication, \({{{{{\boldsymbol{s}}}}}}\in \,{{\mathbb{Z}}}^{k}\) is a vector indicating how many times the cross-peak is unfolded in each dimension, and \({{{{{{\boldsymbol{s}}}}}}}^{{{{{{\boldsymbol{*}}}}}}}\in {{\mathbb{Z}}}^{k}\) is the optimal cross-peak unfolding.

As long as regular and folded signals do not overlap or have different signs in the spectrum, KDE can unfold the peak list regardless of spectrum dimensionality. The spectrum must not be cropped in the folded dimension, i.e., the folding sweep width must equal the width of the spectrum in the corresponding dimension.

All 2D/3D spectra in our benchmark were folded in at most one dimension and satisfy the aforementioned requirements. However, the 4D CC-NOESY spectra satisfy neither, as regular and folded peaks both overlap and have the same signal amplitude sign. This introduces ambiguity in the spectrum unfolding that prevents direct use of the KDE technique. To retrieve original signal positions, 4D CC-NOESY cross-peaks were unfolded to overlap with signals detected in 3D 13 C-NOESY. In consequence, 4D CC-NOESY unfolding depended on other experiments, and individual 4D cross-peaks were retained only if they were confirmed in a 3D experiment.

Chemical shift assignment

Chemical shift assignment is performed with the existing FLYA algorithm 12 that uses a genetic algorithm combined with local optimization to find an optimal matching between expected and observed peaks. FLYA uses as input the protein sequence, lists of peak positions from the available spectra, chemical shift statistics, either from the BMRB 39 or the GNN described in the next section, and, if available, the structure from the previous refinement cycle. The tolerance for the matching of peak positions and chemical shifts was set to 0.03 ppm for 1 H, and 0.4 ppm for 13 C/ 15 N shifts. Each FLYA execution comprises 20 independent runs with identical input data that differ in the random numbers used in the optimization algorithm. Nuclei for which at least 80% of the 20 runs yield, within tolerance, the same chemical shift value are classified as reliably assigned 12 and used as input for the following chemical shift refinement step.

Chemical shift refinement

We used a graph data structure to combine FLYA-assigned shifts with information from previously assigned proteins (BMRB records) and possible spatial interactions. Each node corresponds to an atom in the protein sequence, and is represented by a feature vector composed of (a) a one-hot encoded atom type code (e.g., C α , H β ), (b) a one-hot encoded amino acid type, (c) the value of the chemical shift assigned by FLYA (only if a confident assignment is available, zero otherwise), (d) atom-specific BMRB shift statistics (mean and standard deviation), and (e) 30 chemical shift values obtained from BMRB database fragments. The latter feature is obtained by searching BMRB records for assigned 2–3-residue fragments that match the local protein sequence and have minimal mean-squared-error (MSE) to shifts confidently assigned by FLYA (non-zero values of feature (c) in the local neighborhood of the atom). The edges of the graph correspond to chemical bonds or skip connections. The latter connect the C β atom of a given residue with C β atoms 2, 3, and 5 residues apart in the amino acid sequence, and have the purpose to capture possible through-space influence on the chemical shift that is typically observed in secondary structure elements.

The chemical shift refinement task is defined as a node regression problem, where an expected value of the chemical shift is predicted for each atom that lacks a confident FLYA assignment. This task is addressed with a DeepGCN model 25 , 26 that was trained on 28,400 graphs extracted from 2840 referenced BMRB records 39 . Each training example was created by building a fully assigned graph out of a single BMRB record, and dropping chemical shift values (feature (c) above) for randomly chosen atoms that FLYA typically assigns either with low confidence or inaccurately.

Our DeepGCN model is designed specifically for de novo structure determination, as it uses only the protein sequence and partial shift assignments to estimate values of missing chemical shifts. Its predictions are used to guide the FLYA genetic algorithm optimization 12 by reducing its search range for assignments. The precise final chemical shift value is always determined by the position of a signal in the spectrum, rather than the model prediction alone.

Torsion angle restraints

Before each structure calculation step, torsion angle restraints for the ϕ and ψ angles of the polypeptide backbone were obtained from the current backbone chemical shifts using the program TALOS-N 21 . Restraints were only generated if TALOS-N classified the prediction as ‘Good’, ‘Strong’, or ‘Generous’. Given a TALOS-N torsion angle prediction of ϕ ± Δ ϕ , the allowed range of the torsion angle was set to ϕ ± max(Δ ϕ , 10°) for ‘Good’ and ‘Strong’ predictions, and ϕ ± 1.5 max(Δ ϕ , 10°) for ‘Generous’ predictions, and likewise for ψ .

Structure calculation and selection

Given the chemical shift assignments and NOESY cross-peak positions and intensities, the structure is calculated with CYANA 23 using the established method 22 that comprises 7 cycles of NOESY cross-peak assignment and structure calculation, followed by a final structure calculation. In total, 8 × 100 conformers are calculated for a given input data set using 30,000 torsion angle dynamics steps per conformer. The 20 conformers with the lowest final target function value are chosen to represent the solution structure proposal. The entire combined NOESY assignment and structure calculation procedure is executed independently 10 times based on 10 variants of NOESY peak lists, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis. The first set generously includes all signals selected by ResNet with confidence ≥0.05. The other variants of NOESY peak lists follow the same principle with increasingly restrictive confidence thresholds of 0.1, 0.15, …, 0.5.

The CYANA structures calculations are followed by a structure selection step, wherein the 10 intermediate structure proposals are compared pairwise by a Gradient Boosted Tree (GBT) model that uses 96 features from each structure proposal (including the CYANA target function value 23 , number of long-range distance restraints, etc.; for details, see downloads in the Code availability section) to rank the structures by their expected accuracy. The best structure from the ranking is subsequently used as contextual information for the chemical shift refinement cycle (Fig.  1 ), or returned as the final outcome of ARTINA. The second-best final structure is also returned for comparison.

To train GBT, we collected a set of successful and unsuccessful structure calculations with CYANA. Each training example was a tuple ( s i , r i ), where s i is the vector of features extracted from the CYANA structure calculation output, and r i is the RMSD of the output structure to the PDB reference. The GBT was trained to take the features s i and s j of two structure calculations with CYANA as input, and to predict a binary order variable o ij , such that o ij = 1 if r i  <  r j , and 0 otherwise. Importantly, the deposited PDB reference structures were not used directly in the GBT model training (they are used only to calculate the RMSDs). Consequently, the GBT model is unaffected by methodology and technicalities related to PDB deposition (e.g., the structure calculation software used to calculate the deposited reference structure).

Structure accuracy estimate

As an accuracy estimate for the final ARTINA structure, a predicted RMSD to reference (pRMSD) is calculated from the ARTINA results (without knowledge of the reference PDB structure). It aims at reproducing the actual RMSD to reference, which is the RMSD between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges as given in Supplementary Table  4 . The predicted RMSD is given by pRMSD = (1 – t ) × 4 Å, where, in analogy to the GDT_HA value 46 , t is the average fraction of the RMSDs ≤ 0.5, 1, 2, 4 Å between the mean coordinates of the best ARTINA candidate structure bundle and the mean coordinates of the structure bundles of the 9 other structure proposals. Since t ∈ [0, 1], the pRMSD is always in the range of 0–4 Å, grouping all “bad” structures with expected RMSD to reference ≥ 4 Å at pRMSD = 4 Å.

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Data availability

References structures: PDB Protein Data Bank ( https://www.rcsb.org/ ; accession codes in Fig.  2 and Supplementary Table  3 ).

Spectra and reference assignments: BMRB Biological Magnetic Resonance Data Bank ( https://bmrb.io/ ; entry IDs in Supplementary Table  3 ).

Peak lists, assignments, and structures: https://nmrtist.org/static/public/publications/artina/ARTINA_results.zip and in the ETH Research Collection under DOI 10.3929/ethz-b-000568621.

Source data for Figs.  2 , 4 , and 5 is available in Supplementary Tables  2 , 4 , and 5, respectively.

Code availability

The ARTINA algorithm is available as a webserver at https://nmrtist.org . pp-ResNet, deconv-ResNet, GNN, and GBT are available for download in binary form, together with architecture schemes, example input data, model input description, and source code that allows to read model files and make predictions ( https://github.com/PiotrKlukowski/ARTINA , https://nmrtist.org/static/public/publications/artina/models/ {ARTINA_peak_picking.zip, ARTINA_peak_deconvolution.zip, ARTINA_shift_prediction.zip, ARTINA_structure_ranking.zip}). These files provide a full technical specification of the components developed within ARTINA, and allow for their independent use in Python.

Existing software used: Python ( https://www.python.org/ ), CYANA ( https://www.las.jp/ ), TALOS-N ( https://spin.niddk.nih.gov/bax/software/TALOS-N ).

Wüthrich, K. NMR studies of structure and function of biological macromolecules (Nobel Lecture). Angew. Chem. Int. Ed. 42 , 3340–3363 (2003).

Article   CAS   Google Scholar  

Sakakibara, D. et al. Protein structure determination in living cells by in-cell NMR spectroscopy. Nature 458 , 102–105 (2009).

Article   ADS   CAS   Google Scholar  

Guerry, P. & Herrmann, T. Advances in automated NMR protein structure determination. Q. Rev. Biophys. 44 , 257–309 (2011).

Güntert, P. Automated structure determination from NMR spectra. Eur. Biophys. J. 38 , 129–143 (2009).

Garrett, D. S., Powers, R., Gronenborn, A. M. & Clore, G. M. A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J. Magn. Reson. 95 , 214–220 (1991).

ADS   CAS   Google Scholar  

Koradi, R., Billeter, M., Engeli, M., Güntert, P. & Wüthrich, K. Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J. Magn. Reson. 135 , 288–297 (1998).

Würz, J. M. & Güntert, P. Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. J. Biomol. NMR 67 , 63–76 (2017).

Klukowski, P. et al. NMRNet: A deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 34 , 2590–2597 (2018).

Li, D. W., Hansen, A. L., Yuan, C. H., Bruschweiler-Li, L. & Brüschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12 , 5229 (2021).

Bartels, C., Güntert, P., Billeter, M. & Wüthrich, K. GARANT—A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18 , 139–149 (1997).

Zimmerman, D. E. et al. Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269 , 592–610 (1997).

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Linge, J. P., O’Donoghue, S. I. & Nilges, M. Automated assignment of ambiguous nuclear overhauser effects with ARIA. Methods Enzymol. 339 , 71–90 (2001).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319 , 209–227 (2002).

Allain, F., Mareuil, F., Ménager, H., Nilges, M. & Bardiaux, B. ARIAweb: a server for automated NMR structure calculation. Nucleic Acids Res. 48 , W41–W47 (2020).

Lee, W. et al. I-PINE web server: Aan integrative probabilistic NMR assignment system for proteins. J. Biomol. NMR 73 , 213–222 (2019).

Huang, Y. P. J. et al. An integrated platform for automated analysis of protein NMR structures. Methods Enzymol. 394 , 111–141 (2005).

Kobayashi, N. et al. KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J. Biomol. NMR 39 , 31–52 (2007).

López-Méndez, B. & Güntert, P. Automated protein structure determination from NMR spectra. J. Am. Chem. Soc. 128 , 13112–13122 (2006).

Murphy, K. P. Probabilistic Machine Learning: An Introduction (MIT Press, 2022).

Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56 , 227–241 (2013).

Güntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 453–471 (2015).

Güntert, P., Mumenthaler, C. & Wüthrich, K. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273 , 283–298 (1997).

Article   Google Scholar  

Kaiming, H., Xiangyu, Z., Shaoqing, R. & Jian, S. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016).

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https://arxiv.org/abs/1609.02907 (2016).

Chiang, W. L. et al. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD) 257–266 (2019).

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proc. 32nd Conference on Neural Information Processing Systems (NIPS) (2018).

Rosato, A. et al. The second round of Critical Assessment of Automated Structure Determination of Proteins by NMR: CASD-NMR-2013. J. Biomol. NMR 62 , 413–424 (2015).

Kirchner, D. K. & Güntert, P. Objective identification of residue ranges for the superposition of protein structures. BMC Bioinform. 12 , 170 (2011).

Buchner, L. & Güntert, P. Systematic evaluation of combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 81–95 (2015).

Fowler, N. J., Sljoka, A. & Williamson, M. P. A method for validating the accuracy of NMR protein structures. Nat. Commun . 11 , 6321 (2020).

Huang, Y. J., Powers, R. & Montelione, G. T. Protein NMR recall, precision, and F-measure scores (RPF scores): Structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127 , 1665–1674 (2005).

Buchner, L. & Güntert, P. Increased reliability of nuclear magnetic resonance protein structures by consensus structure bundles. Structure 23 , 425–434 (2015).

Koradi, R., Billeter, M. & Güntert, P. Point-centered domain decomposition for parallel molecular dynamics simulation. Comput. Phys. Commun. 124 , 139–147 (2000).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J. Biomol. NMR 24 , 171–189 (2002).

Buchner, L., Schmidt, E. & Güntert, P. Peakmatch: A simple and robust method for peak list matching. J. Biomol. NMR 55 , 267–277 (2013).

Scott, A., López-Méndez, B. & Güntert, P. Fully automated structure determinations of the Fes SH2 domain using different sets of NMR spectra. Magn. Reson. Chem. 44 , S83–S88 (2006).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Goddard, T. D. & Kneller, D. G. Sparky 3. (University of California, San Francisco, 2001).

Delaglio, F. et al. NMRPipe—A multidimensional spectral processing system based on Unix pipes. J. Biomol. NMR 6 , 277–293 (1995).

Bartels, C., Xia, T. H., Billeter, M., Güntert, P. & Wüthrich, K. The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J. Biomol. NMR 6 , 1–10 (1995).

Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. Proc. Mach. Learn. Res. 9 , 249–256 (2010).

Google Scholar  

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).

Davies, E. R. Computer Vision (Academic Press, 2018).

Kryshtafovych, A. et al. New tools and expanded data analysis capabilities at the protein structure prediction center. Proteins 69 , 19–26 (2007).

Download references

Acknowledgements

We thank Drs. Frédéric Allain, Fred Damberger, Hideo Iwai, Harindranath Kadavath, Julien Orts, and Dean Strotz for providing unpublished spectra. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 891690 (P.K.), and a Grant-in-Aid for Scientific Research of the Japan Society for the Promotion of Science (P.G., 20 K06508).

Author information

Authors and affiliations.

Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland

Piotr Klukowski, Roland Riek & Peter Güntert

Institute of Biophysical Chemistry, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany

Peter Güntert

Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, 192-0397, Tokyo, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

P.K. prepared training and test data sets, designed and trained machine learning models, performed experiments described in the manuscript, and implemented ARTINA within the nmrtist.org web platform. P.K. and P.G. wrote the software. P.K., R.R., and P.G. conceived the project, analyzed the results, and wrote the manuscript.

Corresponding authors

Correspondence to Piotr Klukowski , Roland Riek or Peter Güntert .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Benjamin Bardiaux, Gaetano Montelione, Theresa Ramelot, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.  Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary info file #1, description of additional supplementary files, supplementary movie 1, supplementary movie 2, supplementary movie 3, supplementary movie 4, supplementary movie 5, reporting summary, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Klukowski, P., Riek, R. & Güntert, P. Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. Nat Commun 13 , 6151 (2022). https://doi.org/10.1038/s41467-022-33879-5

Download citation

Received : 28 March 2022

Accepted : 30 September 2022

Published : 18 October 2022

DOI : https://doi.org/10.1038/s41467-022-33879-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

2d nmr assignment

NMR spectra processing for everybody

Unrestrained access to first-class online software for NMR spectra processing It is free and you can get started right away from your browser.

Process directly online

You don't have to go through the hassle of installing any software or applications. Click here to start.

1D and 2D spectra

NMRium accepts 1D and 2D spectras. For 1D spectra it can either be a FID or a fourrier transformed spectrum. Currently, only FT 2D spectra are allowed.

Smart peak picking

NMRium includes an advanced peak picking detection for 1D and 2D spectras and is able to generate the NMR string required for publication or patent.

All the processing and assignment can be stored as a “.nmrium” file. This file contains the original data as well as all the processing that was applied on the spectrum. Assignment of the molecule are also saved in the file.

Not just signal processing

NMRium also handles chemical structures. They can either be imported from a MDL Molfile, copy pasted directly in the molecule panel or drawn.

Perfect for teaching

Try out our structure elucidation exercises or create your own exercises ! They are great for students.

Great user experience

To provide an optimal user experience, the spectra processing is efficiently performed within the web browser.

Opens multiple file formats

Just drag and drop a JCAMP-DX file, a bruker folder or a JEOL file.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Chemistry LibreTexts

2D NMR Introduction

  • Last updated
  • Save as PDF
  • Page ID 167116

Some general principles and techniques used in two-dimensional NMR are discussed. Applications covered are mostly concerned with protein NMR, but additional 2D techniques and applications can be found in the references section.

Introduction

A two dimensional variation of NMR was first proposed by Jean Jeener in 1971; since then, scientists such as Richard Ernst have applied the concept to develop the many techniques of 2D NMR. Although traditional, one-dimensional NMR is sufficient to observe distinct peaks for the various funtional groups of small molecules, for larger, more complex molecules, many overlapping resonances can make interpretation of an NMR spectrum difficult. Two-dimensional NMR, however, allows one to circumvent this challenge by adding additional experimental variables and thus introducing a second dimension to the resulting spectrum, providing data that is easier to interpret and often more informative.

Basics of 2D NMR

Experimental set-up.

In traditional 1D Fourier transform NMR, a sample under a magnetic field is hit with a series of RF pulses, as seen in the pulse sequence below, and the Fourier transform of the outgoing signal results in a 1D spectra as a function chemical shift.

Pulsefrequency1tau.png

A 2D NMR experiment, however, adds an additional dimension to the spectra by varying the length of time (\(\tau\))) the system is allowed to evolve following the first pulse. The result is an outgoing signal f (\(\tau\), t 2 ), which, when Fourier transformed, gives a 2D spectrum of F (\(\omega\) 1 , \(\omega\) 2 ).

The use of two-dimensional NMR allows the researcher to better resolve signals which would normally overlap in 1D NMR. Depending on the size of your molecule, different variations or combinations of 2D and multidimensional NMR experiments are utilized.

The Spin Hamiltonian

The spin of a given nuclei during any NMR experiment is governmed by the spin Hamiltonian. If long-range spin interactions are ignored, the spin Hamilitonian for a one-spin system is given the equation

\[\hat{H}=\hat{H}^0+\hat{H}_{RF}\]

The magnetic field along the z-axis, shielding, and J-coupling with nearby nuclei are all constant and are accounted for in H 0 . H RF is the induced magnetic field resulting from an RF pulse. For a system where two spins are coupled, the H 0 is

\[\hat{H}=\omega_{1}^0\hat{I}_{1z}+\omega_{2}^0\hat{I}_{2z}+{2}\pi{J}_{12}\hat{I}_{1}\hat{I}_{2}\]

Where \(\omega\) is the Larmor frequency, I is the net magnetization vector of the given nucleus or nuclei, and J is the observed J coupling between nuclei. \(\omega\) is directly related to the chemical shift (\(\delta\)) by the equation

\[\omega_{j}^0=-\gamma_{j}{B}^0(1+\delta_{j}^{iso})\]

Where \(\gamma\) is the gyromagnetic ratio of the given isotope. If nuclei 1 and 2 are of the same element and isotope, the system is referred to as homonuclear. If they are different, it is a heteronuclear spin system.

Correlation Spectroscopy (COSY)

The most basic form of 2D NMR is the 2D COSY (pulse sequence shown below) experiment, a homonuclear experiment with a pulse sequence similar to the procedure dicussed above. It consists of a 90 o RF pusle followed by an evolution time and an additional 90 o pulse. The resulting oscillating magnetization (symbolized by decaying the sinusoidal curve) is then acquired during t 2 .

COSY seqtau.png

The analysis of the acquired spectrum is discussed below, making it useful for determining the coupling between nuclei that are connected through one to three bond lengths. However, in macromolecules such as proteins, coupling through bonds alone is not sufficient to obtain substantial structural information. For this reason, the Nuclear Overhauser Effect ( NOE ) is often used in protein NMR to obtain information on the distance between nuclei through space rather than through bonds.

Nuclear Overhauser Effect (NOE)

Thus far, only the coupling of nuclei through bonds has been considered. In bond coupling, the magnetization of nuclei affect those closely bound to them through the electrons that make up those bonds; however, coupling directly between nuclei that are in close spatial proximity to each other also occurs. This is called the Nuclear Overhauser Effect, and it arises when the spin relaxation of nuclei A is felt by nearby nuclei B, stimulating a corresponding change in magnetization in B. In a typical NMR spectrum, the interference of electrons makes this coupling undetectable. However, a sample can be decoupled to “neutralize” the bond coupling through electrons, allowing the space coupling of the NOE to be detected. This is called NOESY (NOE correlation spectroscopy) and is another type of homonuclear NMR.

The pulse sequence for a NOESY NMR experiment is depicted below.

NOESY seqtau.png

Like COSY, the first step is a 90 o pulse followed by a variable evolution time. Unlike COSY, however, pulse two actually consists of two 90 degree pulses separated by a short delay. The first pulse converts the bulk magnetization from the transverse plane to the z-plane, eliminating the effect of electron-aided bond coupling. Then, during the \(\tau\) m , there is cross relaxation between spatially adjacent nuclei. Finally, the last 90 degree pulse converts the space coupling of nuclei into an observable transverse magnetization, which can be detected during t 2 .

2D NMR Spectra

As discussed earlier, by performing multiple one dimensional experiments at varying lengths (\(\tau\)) of the evolution period and performing a Fourier transformation on the signal which converts f (\(\tau\), t 2 ) to F(\(\omega\) 1 , \(\omega\) 2 ), a two dimensional spectrum can be formed into a 3D contour map.

A more useful representation of 2D data is called a correlation map. The correlation map of the steroid progesterone is shown below.

ProgesteroneCOSY.png

In this representation, the x- and y-axes correspond to the frequencies resulting from the Fourier transforms, and the intensity of shade at each frequency coordinate indicates the peak intensity. Two types of peaks are observed in a homonuclear correlation map—diagonal peaks and cross peaks. Diagonal peaks are found along the diagonal of the map where the x- and y-axes have equal frequency values and simply correspond to the absorptions from a one-dimensional NMR experiment. Because heteronuclear NMR does not involve the same isotope, diagonal peaks are not observed. Cross peaks, on the other hand, give information on the coupling of two nuclei and are seen in both homo- and heteronuclear spectra.

Applications in Protein NMR

As previously mentioned, the major advantage of 2D NMR over 1D NMR is the ability to distinguish between the overlapping signals that exist in larger molecules. Heteronuclear two-dimensional NMR is especially important in biological chemistry in the elucidation of the three-dimensional structure of proteins.

Heteronuclear Single Quantum Coherence (HSQC)

A protein is make-up of a series of amino acid monomers. Although there are 19 different amino acids each with a distinct side chain, the protein backbone is an invariable pattern of NH-C-CO as shown in Figure 5.

polar. trans protright.png

When synthesized under the right conditions, a heavy atom protein can be produced which constains NMR active nuclei; however, a 15 N nucleus has a very low gyromagnetic ratio. According to the Hamiltonian operators discussed above, it will give a very weak signal in traditional 2D NMR. Fortunately, the nucleus can be detected indirectly by transferring polarization through a 1 H nucleus. This method is used in HSQC NMR.

In protein NMR, each HSQC experiment has three steps:

  • An INEPT (insensitive nuclei enhanced by polarization transfer) transfers the polarization of a 1 H nuclei to the neighboring 15 N (see figure below)
  • The polarization is transferred back to the 1 H nuclei
  • Signal from the 1 H nuclei is recorded

The pulse sequence for a typical HSQC experiment is detailed below.

HSQC.png

A 90 o 1 H RF pulse creates a transverse polarization in 1 H nuclei. Following the pulse, the nuclei are allowed to evolve for a 1/(4J) time period, which is the longitudinal relaxation time. Next, a 180 o 1 H and 15 N pulse are used at the same time. During the subsequent relaxation time, the 1 H nuclei develop a polarization that is antiphase to 15 N. Finally, a 90 o 1 H and 15 N pulse, again simultaneous, enacts the INEPT transfer of antiphase magnetization from the 1 H nucleus to the 15 N nucleus. Following the INEPT transfer, the 15 N nuclei are allowed to evolve during \(\tau\)before a reverse INEPT transfer moves the 15 N polarization back to 1 H and a 15 N decoupled signal is recorded.

An example HSCQ spectrum from ubiquitin is shown below.

ubiquitin HSQC.png

Notice the greater clarity of spectra of the HSQC vs the COSY experiment. This is a strong advantage of heteronuclear NMR. In this diagram, each peak corresponds to a cross peak, showing coupling between sets of 1 H and 15 N nuclei. Each peak represents the 15 N— 1 H of a unique amino acid along the backbone of the amino acid.

The 2D HSQC experiment which relates 1 H and 15 N is just the start on the long, complicated road of protein structural characterization using 2D and multidimensional NMR techniques. The next experiment is typically a 3D NMR technique in which coupling with 13 C is also including in the spectra—which will give information on which NH peaks are associated with which type of amino acid residue—followed by a NOE experiment—which gives spatial distances between nuclei. These multidimensional techniques are outside the scope of this module; however, the principles which were applied in the construction and analysis of the 2D spectra can be carried over to 3D and 4D NMR. The only difference, naturally, is the additional dimensions. By the time multiple experiments have been carried out, information regarding proximity of nuclei, the types of amino acid associated with each nuclei, and secondary structure has been amassed, and by carefully piecing all the data together, a 3D structure of a given protein can be constructed.

Other Applications of 2D NMR

2D NMR has many more applications beyond protein NMR, including characterization of pharmaceuticals, temperature dependence of carbohydrate conformations, and metabolomics, to just name a few. For more information on these applications and the 2D NMR techniques that are used in them, please see the “Further Reading” section.

  • Aue, W., E. Bartholdi, R.R. Ernst, Two‐dimensional spectroscopy. Application to nuclear magnetic resonance. The Journal of Chemical Physics, 1976. 64 : p. 2229.
  • Gomathi, L., Elucidation of secondary structures of peptides using high resolution NMR. Current Science, 1996. 71 (7): p. 553.
  • Levitt, M.H., Spin Dynamics: Basics of Nuclear Magnetic Resonance . 2008: Wiley
  • Jacobsen, N.A., NMR Spectroscopy Explained: Simplified Theory, Applications and Examples for Organic Chemistry and Structural Biology . 2007: Wiley. 668.
  • Ames, J.B., Hamasaki, N., Molchanova, T., Structure and calcium-binding studies of a recoverin mutant (E85Q) in an allosteric intermediate state. Biochemistry, 2002. 41 (18): p. 5776. DOI: 10.1021/bi012153k
  • Clore, G.G., Angela M., Determining structures of large proteins and protein complexes by NMR. Biological Magnetic Resonance, 1998. 16 .

Outside Links

  • http://en.Wikipedia.org/wiki/Two-dim...e_spectroscopy
  • triton.iqfr.csic.es/guide/eNM...nv/hsqc2d.html
  • www.bioc.aecom.yu.edu/labs/gi...rator_2012.pdf
  • www.chem.queensu.ca/facilitie.../hmqc.htm#hsqc
  • What is the effect of the 90° pulse on the bulk magnetization of a sample?
  • If a 90° pulse for a give sample is 4 fs long, how long is a 180° pulse on the same sample?
  • The magnetic effect of which type of particle must be removed from an NMR experiment in order to observe an NOE?

Solutions: 1) a 90 o pulse moves the bulk magnetization into the transverse plane 2) 8 fs 3) electrons

Further Reading

  • Ludwig, C., Viant, Mark R., Two-dimensional J-resolved NMR spectroscopy: review of a key methodology in the metabolomics toobox. Phytochemical Analysis, 2010. 21 (1): p. 22-32. DOI : 10.1002/pca.1186
  • Brown, S.P., Applications of high-resolution 1H solid-state NMR. Solida State Nuclear Magnetic Resonance, 2012. 41 : p. 1-27. DOI: 10.1016/j.ssnmr.2011.11.006
  • Shrot, Y.F., Lucio, Ghost-peak suppression in ultrafast two-dimensional NMR. Journal of Magnetic Resonance, 2003. 164 (2): p. 351-357. DOI: 10.1016/S1090-7807(03)00177-0

This site requires JavaScript to be enable.

  • Percepta Platform
  • Spectrus Platform
  • By Industry
  • By Area of Interest
  • By Analytical Technique
  • Partners & Integration
  • Professional Services
  • Current Software Versions
  • Hardware and Software Recommendations
  • Maintenance & Support Policies
  • Supported Data Formats
  • Technical Support Requests
  • Resources Library
  • Elucidation of the Month
  • Newsletters
  • IUPAC Blue Book
  • ChemSketch Freeware
  • Online Tools & More
  • Column Selector
  • LC Method Translator
  • Events & Seminars
  • Customer Portal

Hear how BMS scientists are leveraging AI and HTE data in our symposium: Exploring Digitalization, AI & HTE. Register Now

2D NMR Assignment

Learn how to automatically and manually assign peaks for 2D NMR spectra. 1.5min

Join our newsletter!

Keep up-to-date with our quarterly newsletter that brings you the latest educational webinars, resources, tips, and tricks.

2d nmr assignment

IMAGES

  1. 2D NMR- Worked Example 3 (Full Spectral Assignment)

    2d nmr assignment

  2. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    2d nmr assignment

  3. 2D NMR- Worked Example 4 (Full Spectral Assignment)

    2d nmr assignment

  4. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    2d nmr assignment

  5. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    2d nmr assignment

  6. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    2d nmr assignment

VIDEO

  1. ( Part 2, 2D NMR)

  2. Explained: 2D NMR Combined problem

  3. Difference between 1D and 2D NMR

  4. 2nd part 2d NMR ( 1H-1H cosy spectrum) by Ajeet sir #vbpsu #chemistry #chemistduniya

  5. Lec 45 Introduction to 2D NMR

  6. NPTEL Swayam Advanced NMR Techniques in Solution and Solid-State Week-1 Assignment Answers| NPTEL

COMMENTS

  1. PDF 2D NMR FOR THE CHEMIST

    Basics of 2D NMR. All 2D experiments are a simple series of 1D experiments collected with different timing. In general, 2D's can be divided into two types, homonuclear and heteronuclear. Each type can provide either through-bond (COSY-type) or through space (NOESY-type) coupling information. A 2D frequency correlation map is produced after a ...

  2. A Step-By-Step Guide to 1D and 2D NMR Interpretation

    Step 1: ¹H-NMR. The first step in structural characterization is 1-dimensional proton ¹H-NMR. The chemical shift, multiplicity, coupling constants, and integration are all factors to consider when assigning protons. In this example, only three protons can be assigned by the proton spectrum alone: protons 3, 4, and 6. Chemical Shift (ppm)

  3. 2D NMR Introduction

    The most basic form of 2D NMR is the 2D COSY (pulse sequence shown below) experiment, a homonuclear experiment with a pulse sequence similar to the procedure dicussed above. It consists of a 90 o RF pusle followed by an evolution time and an additional 90 o pulse. The resulting oscillating magnetization (symbolized by decaying the sinusoidal ...

  4. Two-dimensional nuclear magnetic resonance spectroscopy

    Two-dimensional nuclear magnetic resonance spectroscopy (2D NMR) is a set of nuclear magnetic resonance spectroscopy (NMR) methods which give data plotted in a space defined by two frequency axes rather than one. Types of 2D NMR include correlation spectroscopy (COSY), J-spectroscopy, exchange spectroscopy (EXSY), and nuclear Overhauser effect spectroscopy (NOESY).

  5. 2D NMR- Worked Example 1 (COSY)

    The first of four worked example problems showing how to tackle a 2D NMR problem. In this video we specifically cover the use of 1D NMR and COSY to different...

  6. NMR-Challenge.com: An Interactive Website with Exercises in Solving

    Another educational website, NMR Exercises, summarizes more than 70 NMR spectral assignments, including 2D NMR spectra, along with their solutions. The solutions are educational presentations discussing the whole assignment process step by step together with some basic NMR concepts and most common mistakes in the structure elucidation. In 2018 ...

  7. 2D NMR- Worked Example 3 (Full Spectral Assignment)

    The third of four worked example problems showing how to tackle a 2D NMR problem. In this video, we work through a full spectral assignment; using all the da...

  8. Time-optimized protein NMR assignment with an integrative ...

    To address this limitation, we previously proposed ARTINA, a deep learning method for automatic assignment of two-dimensional (2D)-4D NMR spectra. Here, we present an integrative approach that combines ARTINA with AlphaFold and UCBShift, enabling chemical shift assignment with reduced experimental data, increased accuracy, and enhanced ...

  9. 2D NMR Basics

    The 2D NMR experiment belongs as well to the Fourier transform spectroscopy than to the impulsion one and relies on a sequence of three time intervals: preparation, evolution and detection (3). In some experiment another time interval is added before the detection: the mixing time ( Figure 1 1 ). Figure 1 1: Scheme for time pulse in a 2 D NMR ...

  10. Protein NMR Resonance Assignment

    This facilitates NMR to be independent from X-ray crystallography and the structure of proteins in solution could be determined by NMR using the assignment of proton signals and proton-proton distance information. However, due to limited resolution in 1 H 2D-NMR spectra, the molecular weight of the target protein is restricted to be less than 8 ...

  11. 2D Assignment and quantitative analysis of cellulose and oxidized

    We utilize a series of model compounds and apply now classical (nitroxyl-radical and periodate) oxidation reactions to cellulose samples, to allow for accurate resonance assignment, using 2D NMR. Quantitative heteronuclear single quantum correlation (HSQC) was applied in the analysis of key samples to assess its applicability as a high ...

  12. 2D NMR- Worked Example 4 (Full Spectral Assignment)

    The fourth and final worked example showing how to tackle a 2D NMR problem. In this video, we work through a slightly more complex full spectral assignment, ...

  13. PDF 2D NMR: HMBC & Assignments in MNova

    HW#10: 2D NMR, HMBC & Assignments using MNova Pg. 2. 1 Acquire a good quality HMBC on the sample you've been using for the 2D labs. Make sure. NS 2 (NS=8 is often best for research samples; but for facility-provided samples NS=4 will be ok), and set TD= 4096 and TD1 256. II. Data Assignments Using MNova.

  14. Automated NMR resonance assignments and structure determination using a

    To address the assignment problem, 4D-CHAINS uses 2D probability density maps of correlated 13 C- 1 H chemical shifts to effectively identify possible spin systems (Fig. 1, Supplementary Figure 2).

  15. 2D-NMR

    Examples of 2D spectral assignment. Assignment of 12,14-di t butylbenzo[g]chrysene. Assignment of cholesteryl acetate. The basis of 2D NMR. In a 1D-NMR experiment the data acquisition stage takes place right after the pulse sequence. This order is maintained also with complex experiments although a preparation phase is added before the acquisition.

  16. 2d Nmr

    2D NMR is the best method to determine both homo- and heteronuclear connectivity in molecules. These interactions can be either through bonds or through space allowing for complete assignment of molecular structure in 3-dimensions. A guide to running 2D experiments on our Varian/Agilent spectrometers in included. For those wishing to run 2D or ...

  17. Rapid protein assignments and structures from raw NMR spectra with the

    Nuclear Magnetic Resonance (NMR) spectroscopy is a major technique in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and ...

  18. 5.6: More Practice with 2D

    Answer. NMR spectra obtained on a JEOL 400 MHz NMR spectrometer. This page titled 5.6: More Practice with 2D is shared under a CC BY-NC 3.0 license and was authored, remixed, and/or curated by Chris Schaller via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

  19. Nuclear Magnetic Resonance

    NMR: Structural Assignment; 2D NMR; Nuclear Magnetic Resonance Spectroscopy (Wenzel) Thumbnail: A 900MHz NMR instrument with a 21.1 T magnet at HWB-NMR, Birmingham, UK. Nuclear Magnetic Resonance is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by LibreTexts.

  20. PDF Mnova 1D and 2D NMR assginment

    Mnova NMR provides intuitive and easy-to-use tools for processing and assigning multiple 1D and 2D NMR spectra. The relevant tables and script make it very easy to report and publish such results. Mnova NMRPredict Desktop can be used to assist the assignment or verify your results. Your assignments can be used to improve the precision of NMR ...

  21. NMRium

    NMRium includes an advanced peak picking detection for 1D and 2D spectras and is able to generate the NMR string required for publication or patent. Export. All the processing and assignment can be stored as a ".nmrium" file. This file contains the original data as well as all the processing that was applied on the spectrum.

  22. 1D and 2D NMR datasets, resonance assignments and coupling constant

    NMR spectra, assignments, and FID files: How the data were acquired: 14 Tesla Agilent DD2 NMR spectrometer (Santa Clara, CA) using a 5 mm OneNMR probe with z-axis pulsed field gradients. ... Analysis of the 2D-NMR spectra for red beet pectin at 75°C (Figs. 23-28) enabled the assignment of its resonances (in ppm) and anomeric couplings ...

  23. 2D NMR Introduction

    The most basic form of 2D NMR is the 2D COSY (pulse sequence shown below) experiment, a homonuclear experiment with a pulse sequence similar to the procedure dicussed above. It consists of a 90 o RF pusle followed by an evolution time and an additional 90 o pulse. The resulting oscillating magnetization (symbolized by decaying the sinusoidal ...

  24. 2D NMR Assignment

    Movies. Learn how to automatically and manually assign peaks for 2D NMR spectra. 1.5min.