The three-dimensional view of molecules at the atomic level provided by X-ray crystallography is not only extremely informative but is also easily and intuitively understood by humans, who very much rely on their vision. However, unlike microscopy, this technique does not directly yield an image. The structural model cannot be directly calculated from the diffraction data, as only the intensities of scattered beams and not their phases are experimentally accessible. In order to obtain the 3-dimensional structure phases have to be obtained by either additional experimental or computational methods. This is known as the phase problem in crystallography. In this manuscript we provide an overview of major milestones along the quest for the lost phases.

La cristalografía proporciona una visión tridimensional de las moléculas a un nivel de detalle atómico, que no sólo resulta muy informativa sino que además puede ser fácil e intuitivamente comprendida por seres tan predominantemente visuales como solemos ser los humanos. Sin embargo, al contrario que la microscopía, esta técnica no ofrece directamente una imagen y el modelo estructural no puede calcularse directamente a partir de los datos de difracción, ya que solamente las intensidades de los rayos difractados y no sus fases son accesibles a la medida experimental. Para determinar la estructura tridimensional las fases deben ser obtenidas por medio de métodos adicionales, bien experimentales o computacionales. Esto constituye el problema de la fase en cristalografía. En este artículo ofreceremos una visión general de los principales hitos en la búsqueda de las fases perdidas.

In 1914 Max von Laue received the Nobel Prize in Physics “for his discovery of the diffraction of X-rays by crystals” (Friedrich, Knipping and Laue,

Given a crystal of known structure, it was soon realized how to calculate the effect of diffraction of X-rays by the electrons present (Bragg,

The resolution limit is related to the experimental setup but more essentially to the crystal properties, imperfect ordering bringing about a loss of resolution. The amount of unique experimental data that can be recorded from a crystal is limited by the maximal resolution to which it diffracts: the higher the resolution (the smaller d), the more independent data are measurable and thus, the more parameters can be afforded to characterize the structural model.

The fundamental relationship between experimental data and the electron density function in the crystal is given by the Fourier transformation of the individual structure factors F. Each structure factor F is a complex number with amplitude and phase; however, since the measured intensities are proportional to the squared structure factors the phase information is lost. The intensities of the scattered beams recorded would be roughly proportional to the square of the structure factors (subject to predictable corrections to account for experimental conditions). But the objective of a crystallographic determination is to resolve the inverse problem of determining the molecular structure within the crystal from the intensities of the scattered beams. To compute the corresponding inverse Fourier transform, with the structure factors as coefficients, their phases should be known as well as their moduli.

To understand the importance of the phase information content, it is very instructive to calculate a Fourier transform with amplitudes corresponding to one structure and phases derived from a different one.

For a detailed pictorial explanation of Fourier transforms, phases and amplitudes in diffraction see the excellent Kevin Cowtan’s Picture Book of Fourier Transforms (

It is important to realize that as a consequence of the Fourier transformation each atom contributes to all structure factors, each structure factors carries information about all atoms. This confers structure solution an “all or nothing” quality: even though phasing does not usually provide the final model as interpretation of the electron density map -refinement and validation are still required- it affords an overall, fairly complete view characterizing a substantial part of the structure. In other words, all strong data need to be accounted for simultaneously, all prominent features in the electron density need to be determined simultaneously as one portion of the structure is not exclusively related to a subset of data or vice versa.

In addition to the phase problem inherent to diffraction, experimental data may be of limited quality (finite resolution, incomplete data, crystal pathologies such as twinning, disorder, pseudo-symmetry or anisotropic diffraction, errors in measured intensities), which in turn may complicate phasing.

If phases cannot be measured but are needed to calculate the electron density map from which the structural model is interpreted, how have we come to determine and archive in repositories over 173.000 structures of minerals (

Given a set of experimentally derived amplitudes (=Structure factors), we can postulate that there is only one chemically plausible structure in the crystal that is consistent with these experimental data (Giacovazzo, _{2}, were adjusted by trial and error (Bragg,

Many years later, in the early 50s the same principle of combined atomic understanding, learned from crystal structures of peptide building blocks, with constraints from fiber diffraction (Astbury and Street,

Beyond such cases, where symmetry and boundary conditions led to establishing a model susceptible of being validated by the data, most phasing problems had to start from the Fourier analysis of the experimental intensity data.

The experimental data allow direct computing of a function that has immediate physical meaning. A Fourier transform calculated using as coefficients the square values of the structure factors, that is, quantities that are proportional to the recorded intensities is thus phase independent (Patterson,

Tagging the molecule with a heavy atom having so many electrons that it would dominate scattering provided a way to phasing chemical structures. Still, relatively small equal atom structures remained difficult until the advent of direct methods.

Dorothy Hodgkin’s determination of the structure of penicillin through the sodium and rubidium salts of benzylpenicillin (Crowfoot

In the case of well diffracting crystals of small molecules, the number of independent diffraction data that can be measured is much higher than the number of parameters required to describe the positions of all atoms in the molecule to be determined. The fact that the system is overdetermined implies that a solution may be possible if the experimentally accessible structure factors are related by a set of known relationships.

Overdetermination can be exploited as the possible sets of phases are not independent. Conditions required for a solution to have physical meaning can be applied as constraints on the sets of phases. Applying these conditions such as positivity (in every point, the electron density must have a non-negative value: either there are some electrons or there are none) and atomicity (structures are composed of atoms) leads to certain statistical relationships among phases.

The first mathematical conditions to be recognized were the inequalities by Harker and Kasper (^{2} or Z^{2}, which is the same as the constant Z times 0 or Z. From this expression, the triplet formula relating the phases of three strong reflections whose indices h, k and h-k (coordinates in reciprocal space) add up to zero, ϕ(h) ≈ ϕ(k)+ ϕ(h-k) can be derived. Phase relationships and their probabilities are combined in the tangent formula, the most efficient phase search motor in direct methods (Hauptman and Karle,

In practice, a good approximation to the function describing the electron density can be calculated from roughly the 3% most intense reflections in every resolution shell (

Computer implementations of direct methods, such as SHELXS (Sheldrick,

In the field of biological crystallography, where the molecules crystallized are proteins, nucleic acids and their complexes, the situation is very different. Although the first diffraction images from a hydrated protein crystal were recorded as early as 1934 (Bernal and Crowfoot,

In summary, macromolecules pose a difficult problem as a consequence of the much larger size of the structure, lower data quality and resolution and more fragile crystals. The first approach opening a way into protein structures was a

The aqueous solution composing half of the protein crystal volume, whose negative effects have just been described, can also be exploited to mediate the solution to the phase problem. Crystals are maintained in the solution where they were grown to prevent dehydration. Incorporating into this solution a chemical species containing heavy atoms, such as soluble platinum, gold, mercury or uranium salts and complexes may lead to the selective incorporation of such species into particular positions of the macromolecule, so that they become part of the periodic structure and contribute to Bragg diffraction. Diffraction can be recorded on crystals upon such treatment and if local, rather than large scale changes are brought upon, the differences in the structure factors between the native and derivatized crystals can be used as an approximation to the diffraction of the heavy atom substructure. This substructure, if composed by a few ordered, heavy atoms, can be solved from the difference data by small molecule methods.

Once the substructure is solved, phase information can be derived for the native macromolecule through trigonometric relations among the recorded data and the determined heavy atom structure factors. Two independent derivatives (with different substructures) are required in theory to determine the sought phases. David Harker provided the methodology for this kind of phase analysis in 1956 while studying the structure of the protein ribonuclease (Harker,

This method, named Multiple Isomorphous Replacement (MIR) was used to determine the first protein structures at the MRC in Cambridge, those of Myoglobin and Hemoglobin by Kendrew and Perutz (Green, Ingram and Perutz,

To overcome the difficulties in extracting a weak signal from the necessarily noisy difference intensities recorded from native and soaked crystals, the alternative is recording diffraction data on crystals of the same kind or even on the same specimen but under conditions that will modify the diffraction properties of particular elements present in the sample. By choosing appropriate wavelengths from a tunable X-ray source, dispersive and anomalous scattering contributions of particular elements may be amplified enough to modify the recorded intensities by a small percentage. In this case, rather than inducing a different substructure in various crystals, the diffraction properties of the same substructure are modified in each diffraction experiment and again, the differences brought upon can be exploited to determine the substructure by small-molecule methods and establish relationships to the phases of the macromolecular structure. This method, named Multiple wavelength Anomalous Diffraction (MAD) (Hendrickson,

Most appropriate elements are again rather electron-rich, from the fourth row onwards in the periodic table. A number of proteins occur containing such elements already in their native form: iron, zinc or molybdenum being the most frequent, but in general the necessity to incorporate a heavy atom substructure would revert to soaking or co-crystallization techniques. Fortunately, substituting methionine by seleno-methionine in recombinant proteins offers a practical solution to eliminate the hurdles brought upon by the rather harsh derivatization treatment on fragile crystals. For chemically synthesized nucleic acids, selective incorporation of bromine in uracil provides an analogous solution.

The anomalous scattering of the light carbon, nitrogen and oxygen atoms, which are the main components in proteins, is negligible for phasing purposes. On the contrary, that of the sulfur present in the amino acids cysteine and methionine and that of phosphorous present in nucleic acids is weak but has been shown to be usable since the early 80’s with the determination of the sulfur rich, small protein crambin (Hendrickson and Teeter,

Experimental advances at synchrotron beamlines allowing to collect more precise data through highly redundant datasets (Dauter and Adamiak,

When illustrating the phase problem (

The phasing problem to be solved is how to place the search model in the unit cell in order to best account for the experimental data recorded (Huber,

The molecular replacement method is intimately related to that of non-crystallographic symmetry. Ultimately, it reduces to exploiting the presence of the same or a similar structure in different crystals or in several crystallographically independent copies present in the same crystal. The structure may be unknown in part or in all of these copies. In the first case, the known stereochemistry is used to construct a model of the unknown crystal. From this, phases can be calculated and map interpretation, model building and refinement will be applied to develop the starting, approximate model into a more accurate representation of the content of the crystal. In the second case, even if the structure is completely unknown, if the relative location of the different copies in one or several polymorphs can be established, it may provide a very powerful constraint on the phases. This is particularly useful to constrain a highly incorrect set of starting phases or in favorable cases, even to drive a random phase set into the correct one. This particular case proved crucial for the structure solution of virus particles (Harrison

Molecular replacement has become the main phasing method for macromolecules as the large number of already determined structures increases its applicability. Beyond this obvious advantage, the development of sophisticated and efficient approximations to Maximum likelihood functions (Storoni, McCoy and Read,

Molecular replacement may use any source of coordinates or electron density model. Although traditionally templates derived from other crystallographic studies were most successful, structures determined by NMR in solution or low resolution electron density maps and even low angle scattering (SAXS) envelopes have been successfully exploited in molecular replacement. Recent improvements in the field of cryo-electron microscopy (cryo-EM) have yielded structural information at resolutions around 3.5 Å, approaching low-resolution X-ray crystallography (Amunts

As previously mentioned, two fundamental barriers hinder direct,

The exceptionally high quality of the data required for dual-space recycling methods to succeed would have limited their use, as in such cases all alternative phasing approaches are also favored. At extremely high resolution, even sophisticated molecular replacement search using single atoms as models may succeed in solving macromolecular structures (McCoy

To make up for the lack of atomic resolution data,

Density modification algorithms appropriate for the high, yet not atomic, resolution cases were more efficient than interpreting the map in terms of atoms (Jia-Xing

Tighter stereochemical constraints than atomicity were introduced to extend

This work was supported by grants BFU2012-35367 and BIO2013-49604- EXP (the Spanish MINECO) and Generalitat de Catalunya (2014SGR-997). We thank Ehmke Pohl for helpful discussion and corrections.

_{1}2, 3P

_{2}2