Abstract:
Cupin superfamily of proteins, including germin and germin-like proteins (GLPs) from higher plants, is known to play crucial roles in plant development and defense. To date, no systematic analysis has been conducted in soybean (Glycine max) incorporating genome organization, gene structure, expression compendium. In this study, 69 putative Cupin genes were identified from the whole-genome of soybean, which were non-randomly distributed on 17 of the 20 chromosomes. These Gmcupin proteins were phylogenetically clustered into ten distinct subgroups among which the gene structures were highly conserved. Eighteen pairs (52.2%) of duplicate paralogous genes were preferentially retained in duplicated regions of the soybean genome. The distributions of GmCupin genes implied that long segmental duplications contributed significantly to the expansion of the GmCupin gene family. According to the RNA-seq data analysis, most of the Gmcupins were differentially expressed in tissue-specific expression pattern and the expression of some duplicate genes were partially redundant while others showed functional diversity, suggesting the Gmcupins have been retained by substantial subfunctionalization during soybean evolutionary processes. Selective analysis based on single nucleotide polymorphisms (SNPs) in cultivated and wild soybeans revealed sixteen Gmcupins had selected site(s), with all SNPs in Gmcupin10.3 and Gmcupin07.2 genes were selected sites, which implied these genes may have undergone strong selection effects during soybean domestication. Taken together, our results contribute to the functional characterization of Gmcupin genes in soybean.