Main

The size of the P. infestans genome is estimated by optical map and other methods at 240 Mb (Supplementary Information). It is several-fold larger than those of the related Phytophthora species P. sojae (95 Mb) and P. ramorum (65 Mb), which cause soybean root rot and sudden oak death, respectively5,6. We sequenced the genome of P. infestans strain T30-4 using a whole-genome shotgun approach, and generated a ninefold coverage assembly spanning 229 Mb (Table 1 and Supplementary Information). The unassembled fraction of the genome consists of high copy repeat sequences (Supplementary Information). The assembled genome sequence provides near complete coverage of genes, with 98.2% of P. infestans T30-4 complementary DNAs aligning (Supplementary Information). We identified 17,797 protein-coding genes by ab initio gene prediction, protein and expressed sequence tag (EST) homology, and direct genome-to-genome comparative gene modelling with P. sojae and P. ramorum (Supplementary Information). Changes in gene content, number or length do not explain the marked difference in genome size (Table 1 and Supplementary Table 1). No evidence of whole-genome duplication or large-scale dispersed segmental duplication was detected. However, specific disease effector gene families are expanded in P. infestans (see later).

Table 1 Genome assembly and annotation statistics

P. infestans, P. sojae and P. ramorum represent three major phylogenetic clades of Phytophthora6. Among the three genomes, we identified a core set of 8,492 orthologue clusters (including 9,583 P. infestans orthologues and close paralogues), of which 7,113 genes show 1:1:1 orthology relationships (Table 1, Supplementary Fig. 1 and Supplementary Table 2). The core proteome is enriched in genes involved in cellular processes including DNA replication, transcription and protein translation, whereas genes with functions involved in cellular defence mechanisms are underrepresented (Supplementary Fig. 2). Differences in gene family expansion, in particular dynamic repertoires of effector genes (see later), are probably responsible for different traits among Phytophthora species, such as altered host specificity.

Comparison of the three Phytophthora genomes reveals an unusual genome organization, comprised of blocks of conserved gene order in which gene density is relatively high and repeat content is relatively low, separated by regions in which gene order is not conserved, gene density is low and repeat content is high (Table 1 and Fig. 1). The conserved blocks represent 90% of core orthologous groups in all three genomes, including 70% (12,440) of all P. infestans protein-coding genes and 78% of genes in both P. sojae (13,225) and P. ramorum (11,246). Within conserved blocks, genes are typically tightly spaced in all three genomes (Table 1 and Fig. 1), with median intergenic distances of 633 base pairs (bp) for P. ramorum, 804 bp for P. sojae, and 603 bp for P. infestans. In regions between conserved blocks, intergenic distances are greater and increase with increasing genome size (median 1.5 kb for P. ramorum, 2.2 kb for P. sojae, and 3.7 kb for P. infestans). The differences in spacing between genes among the three genomes, within and outside regions of conserved gene order, are evident in Fig. 2a–f. The expansion of regions between conserved blocks results from increased density of repetitive elements (Supplementary Fig. 3), and overall differences in genome size among the three species are largely explained by proliferation of repeats in regions in which gene order is not conserved. This difference between conserved blocks and non-conserved regions is particularly apparent in the greatly expanded P. infestans genome (Fig. 2d, f). Further, it is evident that rapidly evolving secreted effector genes (see later) lie predominantly in the gene-sparse regions (Fig. 2g, h). This dual pattern of intergenic spacing and repeat content has been suggested for large, unsequenced genomes in the Poaceae such as maize7,8,9, but it is not seen in the genomes of other sequenced eukaryotes (Supplementary Fig. 4).

Figure 1: Repeat-driven genome expansion in Phytophthora infestans.
figure 1

Conserved gene order across three homologous Phytophthora scaffolds. Genome expansion is evident in regions of conserved gene order, a consequence of repeat expansion in intergenic regions. Genes are shown as turquoise boxes, repeats as black boxes. Collinear orthologous gene pairs are connected by pink (direct) or blue (inverted) bands.

PowerPoint slide

Figure 2: The P. infestans genome shows an unusual distribution of intergenic region lengths.
figure 2

The flanking distance between neighbouring genes provides a measurement of local gene density. P. infestans genes were sorted into two dimensional bins on the basis of the lengths of flanking intergenic distances to neighbouring genes at their 5′ and 3′ ends. ah, The number of genes in each bin is shown as a colour-coded heat map on orthogonal projection. P. infestans whole-genome analysis (a) shows most genes with intergenic regions between 20-bp and 3-kb long, as well as sets of genes flanked by one or two intergenic region(s) between 5 kb and 36 kb. Comparison with other Phytophthora genomes (b, c) indicates that this separation is observed in P. infestans but not the other two sequenced genomes. Genes in collinear blocks (d) and the core orthologue clusters (e) have primarily shorter intergenic distances, whereas genes outside of collinear blocks (f) reside mostly in gene sparse regions. Genes belonging to the RXLR (g) and Crinkler (CRN) (genes and pseudogenes) (h) effector families have flanking intergenic distances among the longest. Genes found at the ends of scaffolds and hence lacking neighbouring genes were necessarily excluded.

PowerPoint slide

Recent proliferation of Gypsy elements in P. infestans underlies the genome expansion. Approximately one-third of the genome assembly corresponds to families of Gypsy elements (Supplementary Fig. 5). The two families with the highest relative expansion in P. infestans are Gypsy Pi-1 and a new Gypsy long terminal repeat (LTR) element we named ‘Albatross’, which together account for at least 29% of the genome (Supplementary Table 3). Albatross elements cover 32 Mb and are enriched (>2-fold) in the regions in which gene order is not conserved (Supplementary Table 4 and Supplementary Fig. 6), contributing appreciably to relative expansion of gene-sparse regions (Supplementary Fig. 3). Gypsy Pi-1 elements cover 22 Mb and, in contrast to Albatross elements, are relatively evenly distributed across the genome.

Overall, the P. infestans genome contains a strikingly rich and diverse population of transposons (Supplementary Table 3). We identified 273 full-length elements belonging to two large classes of autonomous rolling-circle type helitron DNA transposons (7.3-kb and 6.4-kb elements), in much larger numbers than described in any other genome (Supplementary Tables 3 and 5). Most helitron open reading frames (ORFs) are degenerate pseudogenes, but 13 are intact and presumed functional. Some apparently non-autonomous helitrons have intact termini so their transposition may be driven by gene products from the functional classes. In contrast, the P. sojae and P. ramorum genomes contain no intact helitron elements. The P. infestans genome carries increased numbers of mobile elements across diverse families as compared to P. sojae and P. ramorum, with 5 times as many LTR retrotransposons and 10 times as many helitrons (Supplementary Fig. 7).

Consistent with a model of repeat-driven expansion of the P. infestans genome, the vast majority of repeat elements in the genome are highly similar to their consensus sequences, indicating a high rate of recent transposon activity (Supplementary Fig. 8). In addition, we have observed and experimentally confirmed examples of recently active elements (Supplementary Figs 9–11).

Phytophthora species, like many pathogens, secrete effector proteins that alter host physiology and facilitate colonization. The genome of P. infestans revealed large complex families of effector genes encoding secreted proteins that are implicated in pathogenesis10. These fall into two broad categories: apoplastic effectors that accumulate in the plant intercellular space (apoplast) and cytoplasmic effectors that are translocated directly into the plant cell by a specialized infection structure called the haustorium11. Apoplastic effectors include secreted hydrolytic enzymes such as proteases, lipases and glycosylases that probably degrade plant tissue; enzyme inhibitors to protect against host defence enzymes; and necrotizing toxins such as the Nep1-like proteins (NLPs) and PcF-like small cysteine-rich proteins (SCRs) (Supplementary Table 6).

As in the other Phytophthora species5, candidate effector genes are numerous and typically expanded compared to non-pathogenic relatives (Supplementary Table 6). Most notable among these are the RXLR and Crinkler (CRN) cytoplasmic effectors, described later.

The archetypal oomycete cytoplasmic effectors are the secreted and host-translocated RXLR proteins12. All oomycete avirulence genes (encoding products recognized by plant hosts and resulting in host immunity) discovered so far encode RXLR effectors, modular secreted proteins containing the amino-terminal motif Arg-X-Leu-Arg (in which X represents any amino acid) that defines a domain required for delivery inside plant cells11, followed by diverse, rapidly evolving carboxy-terminal effector domains13,14. Several of these C termini have been shown to exhibit virulence activities as host cell death suppressors15,16. We exploited the known motifs and other conserved sequence features to predict 563 RXLR genes in the P. infestans genome (Supplementary Tables 6, 7 and Supplementary Information). RXLR genes are notably expanded in P. infestans, with 60% more predicted than in P. sojae and P. ramorum (Supplementary Tables 6 and 7). We observed that 70 of these are rapidly diversifying (Supplementary Table 8). Approximately half of P. infestans RXLRs are lineage-specific, largely accounting for the expanded repertoire (Supplementary Figs 12 and 13). In contrast to the core proteome, RXLR genes show evidence of high rates of turnover with only 16 of the 563 genes with 1:1:1 orthology relationships (Supplementary Table 2) and many (88) putative RXLR pseudogenes (Supplementary Table 9). This high turnover in Phytophthora is probably driven by arms-race co-evolution with host plants5,13,14,17.

RXLR effectors show extensive sequence diversity. Markov clustering (TribeMCL18) yields one large family (P. infestans: 85, P. ramorum: 75, P. sojae: 53) and 150 smaller families (Supplementary Fig. 14). The largest family shares a repetitive C-terminal domain structure (Supplementary Figs 15 and 16). Most families have distinct sequence homologies (Supplementary Fig. 14) and patterns of shared domains (Supplementary Fig. 17) with greater diversity than expected if all RXLR effectors were monophyletic.

In contrast to the core proteome, RXLR effector genes typically occupy a genomic environment that is gene sparse and repeat-rich (Fig. 2g and Supplementary Figs 18 and 19). The mobile elements contributing to the dynamic nature of these repetitive regions may enable recombination events resulting in the higher rates of gene gain and gene loss observed for these effectors.

CRN cytoplasmic effectors were originally identified from P. infestans transcripts encoding putative secreted peptides that elicit necrosis in planta, a characteristic of plant innate immunity19. Since their discovery, little had been learned about the CRN effector family. Analysis of the P. infestans genome sequence revealed an enormous family of 196 CRN genes of unexpected complexity and diversity (Supplementary Table 10), that is heavily expanded in P. infestans relative to P. sojae (100 CRNs) and P. ramorum (19 CRNs) (Supplementary Table 6). Like RXLRs, CRNs are modular proteins. CRNs are defined by a highly conserved N-terminal 50-amino-acid LFLAK domain (Supplementary Fig. 20) and an adjacent diversified DWL domain (Fig. 3a, b). Most (60%) possess a predicted signal peptide. Those lacking predicted signal peptides are typically found in CRN families containing members with secretion signals (Supplementary Table 10). CRN C-terminal regions exhibit a wide variety of domain structures, with 36 conserved domains and a further eight unique C termini identified among the 315 Phytophthora CRN proteins (Supplementary Table 11). We observed evidence of recombination between different clades as a mechanism driving CRN diversity (Supplementary Figs 21–23).

Figure 3: Diverse Crinkler (CRN) families exhibit necrosis phenotypes in planta.
figure 3

a, CRN family phylogeny on the basis of the conserved N-terminal sequence, computed using PhyML with default parameters and 100 bootstrap replicates. CRN C-terminal domain structures are shown along the circumference. Branches are coloured according to organism: P. infestans in blue, P. sojae in yellow, and P. ramorum in red. Internal nodes with ≥80% bootstrap support are marked with a black dot. b, Graphical representation of the CRN family domain architecture, exhibiting a conserved N-terminal region followed by diverse C-terminal domains. c, Phenotypes observed on Nicotiana benthamiana leaves upon in planta overexpression of CRN effectors. C-terminal effector domains of CRNs were tested for cell death phenotypes on N. benthamiana leaves by Agrobacterium tumefaciens-mediated transient expression of CRNs, inf1 (positive control), crn2 (positive control), and green fluorescent protein (GFP) (negative control). The domains DC, DBF, D2 and DXW-DXX-DXS, like the DXZ domain of crn2, were found to induce necrosis. Cell death phenotypes were visible at 4 days post infiltration. Photos were taken 7 days after infiltration. d, CRNs with necrosis domains D2 and DXZ along with pseudogene copies are found co-clustered across P. infestans scaffold 1.48 (1.2 Mb). Genes and domain structures are illustrated according to the top and bottom strands of the genomic scaffold. Pseudogenes are indicated by Ψ; non-CRN genes are shown as unfilled boxes.

PowerPoint slide

We explored the ability of diverse CRNs to perturb host cellular processes. In assays for necrosis in planta (Supplementary Information), deletion mutants of the previously described CRN2 secreted protein19 defined a C-terminal 234 amino-acid region (positions 173–407, domain DXZ) that is sufficient to induce cell death when expressed inside plant cells (Supplementary Fig. 24). Assays with representative P. infestans CRN genes identified four other distinct C termini that also trigger cell death inside plant cells (Fig. 3c). These include the newly defined DC domain (P. infestans: 18 genes and 49 pseudogenes (ψ)) and the D2 (14 and 43ψ) and DBF (2 and 1ψ) domains, which have similarity to protein kinases (Supplementary Table 11). These results indicate that the CRN protein domains expressed in planta are retained (lacking signal peptides and hence not secreted) by the plant cell and stimulate cell death by an intracellular mechanism, supporting the view that CRNs, like RXLRs, are cytoplasmic effectors. We propose that the conserved CRN N-terminal LFLAK domain may function similarly to the RXLR motif for delivery of CRN effectors into plant cells, and experiments to test this hypothesis are under way.

A further 255 CRN genes are fragmented or otherwise disrupted and presumably non-functional (Supplementary Table 10). CRN genes and pseudogenes are aggregated in large clusters at several genomic loci, typically clustered by domain type (Supplementary Fig. 25). One extraordinary example is scaffold 1.48 (1.2 Mb), containing 21 CRN genes and 31 CRN pseudogenes of the DXZ and D2 necrosis inducing domain-types (Fig. 3d). Many of the pseudogenes show only a few base changes, indicating recent conversion to pseudogenes. This high degree of expansion and pseudogene formation suggests that, like RXLR effector genes, CRN genes have undergone relatively rapid birth and death evolution.

Both CRN and RXLR genes typically occur in repeat-rich, gene-sparse regions of the genome, where conserved gene order with P. sojae and P. ramorum is either absent or disrupted (Fig. 2g, h and Supplementary Fig. 19). Expansion of large RXLR and CRN effector gene families seems to have been driven by non-allelic homologous recombination and tandem gene duplication. Although the genome is heavily populated by mobile elements, no direct evidence of transposition of effector genes was observed. Instead, the repeat-rich regions of effector clusters probably facilitate non-allelic-homologous-recombination-based expansion. In one intriguing case, nearly identical tandem arrays of CRNs are present on scaffold 1.6 in a perfect head-to-tail arrangement that is similar to that observed for some helitrons (Supplementary Fig. 26). This region of the genome is heavily enriched for helitron elements, implicating helitron-based rolling circle replication as a possible mechanism for establishing this CRN cluster.

To explore transcriptional responses to plant infection, we constructed a NimbleGen microarray based on the genome annotation. P. infestans gene expression during potato infection was monitored using samples from infected potato at 2–5 days post-inoculation (d.p.i.). In all, 494 genes were induced at least twofold during infection relative to mycelial growth. Days 2–4 of infection correlate with formation of infectious structures called haustoria. Mycelial necrotrophic growth on dead plant material occurs later at 5 d.p.i., and shows a similar expression profile to mycelial growth in plant extract media (Supplementary Fig. 27a and Supplementary Table 12). Seventy-nine RXLR genes exhibited this pattern of expression, including previously studied avirulence genes Avr3a (ref. 20), Avr4 (ref. 21), and Avr-blb1 (also known as ipiO) (ref. 22) (Supplementary Fig. 27b). Apoplastic effector genes, including protease inhibitors, cysteine-rich secreted proteins, and NPP1-family members, were among the most highly upregulated genes during infection of potato. Few CRNs were induced during infection; however most CRNs were very highly expressed, with 50% of CRNs within the top 10% of gene expression intensities (Supplementary Fig. 28). Several genes encoding metabolic enzymes were upregulated in planta (Supplementary Table 12), suggesting considerable metabolic adaptation of the pathogen to the host environment23. A related pattern of downregulation mirrors the induction of effectors, involving 115 genes (Supplementary Table 12). Among those repressed were elicitin-like genes and pseudogenes, suggesting that reduced expression during infection or mutation to pseudogene could contribute to evading activation of host innate immunity24.

P. infestans remains a critical threat to world food security, and the genome sequence is a key tool to understanding its pathogenic success. The sequence of the P. infestans genome showed an extremely high repeat content (74%) and unusual discontinuous distribution of gene density that correlate intriguingly with its biology. Gene-dense regions with conserved gene order across Phytophthora species are interrupted by repeat-rich expanded regions that are sparsely populated with genes, many of which are fast-evolving pathogenicity effectors such as the RXLR and CRN families. The localization of the effectors to dynamic regions of the genome probably both enables the rapid evolutionary changes and accounts for the considerable expansion in CRN and RXLR effector genes observed in P. infestans. This expansion provides a species-specific repertoire of effector genes, the dynamic nature of which probably provides an advantage in the arms race with host species. We postulate that these dynamic regions promote the evolutionary plasticity of effector genes, generating the enhanced genetic variation required to drive the rapid evasion of plant resistance that is a hallmark of the potato late blight pathogen.

Methods Summary

Genomic sequence and gene annotations

The updated P. infestans genome sequence and annotation can be accessed through GenBank accession number AATU01000000, and are available through the Broad Institute website at http://www.broad.mit.edu/annotation/genome/phytophthora_infestans. All genome sequence reads have been deposited in the NCBI trace repository (http://www.ncbi.nlm.nih.gov/Traces/home/). Paired reads of P. infestans cDNAs are available in dbEST with accessions in the range GR284383–GR301386. The NimbleGen microarray data are available in GEO under accession number GSE14480. Full methods description and associated references are provided as Supplementary Information.