Enhancing the genomic prediction accuracy of swine agricultural economic traits using an expanded one-hot encoding in CNN models

Highlights

  • The CNN model achieved the highest genomic prediction accuracy for swine traits when using SNP sets comprising 1,000 markers.
  • A novel one-hot encoding strategy representing 16 genotypes with eight binary variables significantly outperformed traditional encoding methods in CNN-based prediction.
  • The improved CNN framework offers a powerful tool for enhancing genomic prediction accuracy, providing valuable support for data-driven swine breeding programs.

Abstract

Deep learning (DL) methods like multilayer perceptrons (MLPs) and convolutional neural networks (CNNs) have been applied to predict the complex traits in animal and plant breeding. However, improving the genomic prediction accuracy still presents significant challenges. In this study, we applied CNNs to predict swine traits using previously published data. Specifically, we extensively evaluated the CNN model’s performance by employing various sets of single nucleotide polymorphisms (SNPs) and concluded that the CNN model achieved optimal performance when utilizing SNP sets comprising 1,000 SNPs. Furthermore, we adopted a novel approach using the one-hot encoding method that transforms the 16 different genotypes into sets of eight binary variables. This innovative encoding method significantly enhanced the CNN’s prediction accuracy for swine traits, outperforming the traditional one-hot encoding techniques. Our findings suggest that the expanded one-hot encoding method can improve the accuracy of DL methods in the genomic prediction of swine agricultural economic traits. This discovery has significant implications for swine breeding programs, where genomic prediction is pivotal in improving breeding strategies. Furthermore, future research endeavors can explore additional enhancements to DL methods by incorporating advanced data pre-processing techniques.

Key words

swine
agricultural economic traits
genomic prediction
deep learning
one-hot encoding
convolutional neural networks (CNNs)

1. Introduction

Animal breeding plays a vital role in ensuring the availability of sufficient and high-quality food for the growing population worldwide (Flachowsky et al. 2013). In order to keep up with the increasing demand, advanced methodologies have been used to accelerate the breeding process and achieve maximum genetic gain, thereby enhancing agricultural productivity. Agriculturally important quantitative traits of animals and plants are traditionally estimated using various statistical methods, such as quantitative trait locus (QTL) mapping (Bovenhuis et al. 1997; Asins 2002), heritability analysis (Nyquist and Baker 1991; Kempthorne 1997), and breeding value estimation (Calus 2010). Then, using genetic markers such as microsatellites and single nucleotide polymorphisms (SNPs) have led to the widespread adoption of marker-assisted selection (MAS) in animal and plant breeding for many years (Collard and Mackill 2008; Wakchaure et al. 2015).
Genomic selection (GS) has recently emerged as a powerful tool in animal and plant breeding programs (Goddard and Hayes 2007; Jannink et al. 2010; Xi et al. 2023), which utilizes genomic information and observed phenotypic data to predict the unobserved individuals’ phenotypes or breeding values. GS offers numerous advantages, such as increased selection accuracy (Schierenbeck et al. 2011; Pryce et al. 2012; Meuwissen et al. 2013), reduced breeding program time and cost (Heffner et al. 2010; Resende et al. 2012), and uncovering genetic bases of complex traits (Zeng et al. 2013). Moreover, GS has also been proposed as a valuable tool for analyzing complex human traits, such as height and weight (Lello et al. 2018), and predicting genetic risk scores for inherited diseases (Altshuler et al. 2008)..
The breeding values of genomic prediction are usually calculated based on various statistical methods, such as genomic best linear unbiased prediction (GBLUP) (VanRaden 2008), Bayesian regression models, e.g., BayesB and BayesC (Meuwissen et al. 2001), and non-linear mixture models, e.g., Markov chain Monte Carlo (McMC) procedures (Guo et al. 2018). These methods generally assume that complex traits are affected by many minor genes and that the relationship between genotype and phenotype is linear. Although effective, assuming linearity may limit their performance on genome-wide prediction (Nelson et al. 2013).
Recent developments in machine learning models, such as support vector machines (SVM) and random forests (RF), allow for high-dimensional non-linear regressions that effectively capture the intricate relationship between genotype and phenotype (Montesinos-López et al. 2021). Additionally, deep learning (DL) methods, such as convolutional neural networks (CNNs), have been applied to predict complex traits in plants (Wei et al. 2020; Xi et al. 2020; Pan et al. 2022; Yang et al. 2022) or inherited diseases (Cao et al. 2018; Kermany et al. 2018) in humans. Despite DLs’ proven efficiency, their application to the genomic prediction of complex traits in animals remains rare.
Therefore, this study employs a CNN model to predict economic traits using previously published data in pigs and assesses its performance using various SNP sets. Our findings suggest that the highest level of accuracy is achieved using 1K SNPs. Furthermore, we also develop a one-hot encoding method to recode the 16 distinct genotypes, representing each genotype as a set of eight binary variables. Adding this one-hot encoding method significantly enhances CNN’s ability to predict swine complex traits more accurately. While our study underscores the efficacy of swine trait prediction, caution is warranted when extrapolating these outcomes to other species without rigorous validation and considering species-specific factors.

2. Materials and methods

2.1. Dataset

This work utilizes a publicly available dataset with genotype and six traits of 2,797 individuals from the Duroc population, as summarized in Table 1.

Table 1. Summary of experimental dataset1)

Traits Sample (N) Mean±SD
Total teat number (TTN) 2,797 10.73±1.07
Left teat number (LTN) 2,797 5.35±0.66
Right teat number (RTN) 2,797 5.38±0.64
Back fat thickness at 100 kg (BF, mm) 2,796 10.99±2.66
Loin muscle depth at 100 kg (LMD, mm) 2,796 46.15±3.93
Lean meat percentage at 100 kg (LMP, %) 2,795 54.02±1.58
1)
All data in this table were cited from Yang et al. (2021).

2.2. Quality control and genome-wide association study (GWAS) analysis

The previous study used a low-coverage whole-genome sequencing strategy (0.73×) and provided only 258,662 SNPs that tagged all other SNPs with minor allele frequency (MAF)>1% at LD (r2)<0.98 and a call rate of >95% (Yang et al. 2021). In order to obtain enough SNP sites for further study, we performed genotype imputation analysis using Beagle 5.0 software with default parameters (Browning et al. 2021). To conduct association analysis, SNPs that meet the quality criteria (MAF>5% and Hardy-Weinberg Equilibrium (HWE) P-value>0.0001) (Turner et al. 2011) were retained to reduce the potential bias due to genotyping technology and population size. This process was executed using PLINK 1.9. GEMMA v0.98.5 (Zhou and Stephens 2012) was used to perform GWAS. In the context of GWAS, the mixed linear model applied can be expressed as: y=Xβ+Zu+e
where y represents the vector of phenotype values; X denotes the matrix of fixed effects (e.g., covariates); β signifies the vector of fixed effect coefficients; Z is the matrix of random effects (genetic relatedness); u signifies the vector of random effect coefficients; and e represents the vector of residuals.

2.3. CNN model design and training

Our CNN model was designed to predict the phenotype value for a given sample with genetic variances as input features. The CNN model contained three convolutional layers followed by a max-pooling layer. The model also included two fully connected layers and a final output layer with a linear activation function. The model’s input was a matrix of genomic variants, where each row represented a variant, and each column represented a sample. Fig. 1 illustrates CNN’s architecture. The mathematical formulation used for training the CNN model can be represented by: CNN output=f (Wx+b)

  1. Download: Download high-res image (128KB)
  2. Download: Download full-size image

Fig. 1. Convolutional neural network (CNN) architecture of our deep learning model. The model consists of multiple convolutional layers followed by max pooling layers for down-sampling, and fully connected layers for phenotype value predict. The input to the network is the SNP matrix. The convolutional layers use 1×3 filters with ReLU activation, and the number of filters increases from 16 to 256 in deeper layers. The output of the last fully connected layer is fed to a linear activation function. SNPs, single nucleotide polymorphisms.

where W denotes the weights, x represents the input data, b signifies the bias term, and f denotes the activation function.
During the training phase, we randomly divided the data into a training dataset (80%), a validation dataset (10%), and a test set (10%). The CNN model was constructed using TensorFlow’s (Abadi et al. 2016), Keras package (Jin et al. 2019). The model was trained for 100 epochs, employing a fixed batch size of 50, kernels set at 8, and a stride set to 8, meticulously adjusted after rigorous validation against the performance metrics derived from the validation dataset. Early stopping mechanisms were implemented to forestall overfitting tendencies and concurrently optimize training efficiency.

2.4. Genotype encoding

SNPs are commonly used in genetic association studies, and their genotypes must be encoded to be used in deep learning models. The first method is the standard variable coding, which represents each SNP genotype using 3 values (AA, Aa, and aa), with “A” representing the major allele and “a” representing the minor allele. Usually, one-hot encoding converts each genotype into a set of 3 binary variables (Fig. 2-A). The second method is a novel approach that encodes each genotype using the original alleles and a 1×4 vector. Specifically, A is represented as 1,000, G as 0100, C as 0010, and T as 0001. This result in a 1×8 vector representation for each SNP genotype, was used as input to the deep learning model (Fig. 2-B).

  1. Download: Download high-res image (114KB)
  2. Download: Download full-size image

Fig. 2. Comparison of two methods for encoding single nucleotide polymorphism (SNP) genotypes for input to a deep learning model. A, standard variable coding representation of each SNP genotype as three values (AA, Aa, aa) using the major (A) and minor (a) alleles, with optional one-hot encoding to convert each genotype into a set of 3 binary variables. B, novel approach that encodes each genotype using the original alleles and a 1×4 vector representaton, with A as 0001, T as 0010, G as 0100, and C as 1,000, resulting in an 1×8 vector for each SNP genotype that is used as input to the deep learning model.

2.5. Assessment of prediction accuracy

The prediction performances of each trait were evaluated based on a 10-fold cross-validation (CV) procedure. The samples were randomly divided into 10 equal folds, of which 8 were used for training, one for validation, and the remaining one-fold to test the model’s performance. The CV process was repeated 10 times with different random seeds. The prediction accuracy performance per method was measured based on the mean square error (MSE) between the predicted and the true values.

3. Results

3.1. GWAS analysis and identification of SNPs associated with economic traits

Following stringent quality control measures, we filtered the imputation results based on an imputation info score >0.5 and a Hardy Weinberg equilibrium P-value>0.0001. Subsequently, we obtained 11,276,160 SNPs with an average minor allele frequency (MAF) of 0.225, as highlighted in Fig. 3-A. These filtering criteria were instrumental in ensuring the accuracy and reliability of the imputed SNPs. To assess the accuracy further, we conducted a principal component analysis (PCA) on a comprehensive set of pigs. Notably, the PCA results depicted in Fig. 3-B revealed no discernible population stratification, reaffirming the integrity of our dataset post-imputation and quality control measures. Subsequent GWAS analysis revealed a broad spectrum of traits associated with genetic variants, as illustrated in Fig. 3-C. The strength of these associations was supported by calculated P-value, providing robust statistical evidence for the identified relationships between genetic variants and traits.

  1. Download: Download high-res image (636KB)
  2. Download: Download full-size image

Fig. 3. Overview of the genetic analysis results for a cohort of 2,797 Duroc pigs. A, bar chart showing the minor allele frequency (MAF) frequency for each single nucleotide polymorphism (SNP). B, scatter plot showing the principal component analysis (PCA) results for the 2,797 samples. C, Manhattan plot showing the genome-wide association study (GWAS) analysis results, with the x-axis representing each chromosome and the y-axis representing the –log10(P-value) for each SNP. BF, back fat thickness; LMD, loin muscle depth; LMP, lean meat percentage; LTN, left teat number; RTN, right teat number; TTN, total teat number.

To identify potential candidate genes associated with economic traits in pigs, we determined the closest genes within a 10-kb region centered on the identified SNPs, given that the linkage disequilibrium extent region of a significant SNP for Duroc ranges from approximately 10 to 100 kb (Badke et al. 2012). Our results corroborated previous findings, revealing a significant association between pigs’ neuronal growth regulator 1 (NEGR1) gene and back fat thickness (Lee et al. 2011). In addition, the vertebrae development-associated gene (VRTN) was identified as a candidate gene associated with teat number, further supporting previous reports suggesting its involvement in this trait (Tang et al. 2017).
Note that we identified additional candidate genes associated with multiple traits in pigs. Specifically, the von Willebrand factor C domain containing 2 like (VWC2L), previously reported to be related to the feed conversion ratio between 30 and 100 kg (Wang et al. 2015), was also associated with back fat thickness. Similarly, potassium voltage-gated channel interacting protein 4 (KCNIP4), previously associated with growth traits in pigs (Xie et al. 2023), was identified as a candidate gene for lean meat percentage. These observations suggested the possibility of gene pleiotropy, where a single gene influences multiple traits, as a potential explanation for the observed associations. Our results indicated that the samples and significant SNPs identified by GWAS analysis were suitable for further analysis.

3.2. Model training for genomic prediction

The CNN model underwent training and validation using a combined dataset, followed by evaluation using a distinct testing dataset that was excluded during the training process (refer to the methods section). Most importantly, the predictive model’s performance was inherently influenced by the number of input features derived from SNPs (Liu et al. 2021). We assessed our model’s efficacy using various SNPs to identify the optimal SNP subsets for each trait. These included the top 0.5K, 1K, 5K, 10K, 20K, and 30K SNPs, ranked according to their lowest P-value in a GWAS analysis. Our model’s performance was evaluated by determining the MSE between the predicted and the observed phenotype values in the testing dataset. Fig. 4 depicts the MSE values for six traits as predicted by our model while using different SNP sets as input features. The optimal SNP set across all six traits was 1K, where the MSE was the lowest. Additionally, we noted a marked increase in the MSE for BF and teat number (TTN, LTN, and RTN) when the number of SNPs exceeded 1K, particularly from 5 k onwards. In contrast, for LMD and LMP, the optimal number of SNPs was 20K, beyond which the MSE increased sharply. By carefully examining the MSE values for different SNP subsets, we identified the optimal SNP sets for each trait, providing insights into the number of informative SNPs required for accurate predictions. These findings contribute to refining our predictive model and have implications for selecting relevant SNP subsets when studying specific traits.

  1. Download: Download high-res image (229KB)
  2. Download: Download full-size image

Fig. 4. Mean square error (MSE) between the genomic predicted phenotypes and the observed phenotype values in the testing dataset. The x-axis represents the number of SNP sites, and the y-axis represents the MSE from cross-validation, respectively. BF, back fat thickness; LMD, loin muscle depth; LMP, lean meat percentage; LTN, left teat number; RTN, right teat number; TTN, total teat number.

3.3. Model performance comparison between standard and expanded one-hot encoding method

One-hot encoding is a widely used method for converting categorical variables into numerical format, which has proven to be an essential tool in machine learning and data analysis. In neural networks, one-hot encoding is often utilized to represent categorical features as input to models. The traditional approach to one-hot encoding involves representing each category as a vector of 0 and 1 s, with a 1 in the position corresponding to the category and 0 s elsewhere.
Recently, an expanded one-hot encoding method has been proposed to overcome some of the limitations of the traditional methods. The proposed technique represents each genotype as an eight-dimensional vector, which provides a more detailed and refined representation of categorical variables. To evaluate our method’s performance, we challenged the traditional and proposed one-hot encoding methods to train a CNN.
The results demonstrated that our expanded one-hot encoding method outperformed the traditional method significantly. Fig. 5 shows the MSE values between the predicted and phenotypic values on the validation dataset, which compare the accuracy of the trained CNN models. The results reveal that the novel encoding method consistently outperforms the standard variable coding approach across all six traits, presenting significantly lower MSE values for each trait (P<0.01).

  1. Download: Download high-res image (62KB)
  2. Download: Download full-size image

Fig. 5. Comparison of mean square error (MSE) values for a convolutional neural networks (CNNs) model using 2 different one-hot encoding methods for 6 different traits. The bar chart shows the average MSE values for each trait using the standard variable coding approach with optional one-hot encoding (orange bars) and the novel method that directly encodes each SNP using a 1×4 vector (blue bars). Error bars indicate the standard error of the mean across replicates. **, P-value<0.01. BF, back fat thickness; LMD, loin muscle depth; LMP, lean meat percentage; LTN, left teat number; RTN, right teat number; TTN, total teat number.

These findings suggest that our expanded one-hot encoding method improves the performance of neural networks in categorical feature modeling. Indeed, the ability to represent categorical variables in a more detailed and refined format can provide valuable insights and improve the accuracy of machine learning models in a wide range of applications. Future studies can explore our method’s potential on other types of neural networks and datasets. Furthermore, these results demonstrate the utility of the novel encoding method for improving the performance of deep learning models in genetic analyses.

4. Discussion

This study used previously published data to apply CNN to predict swine agricultural economic traits. A major finding was developing and implementing a novel one-hot encoding method, significantly improving CNN’s performance. By transforming the 16 different genotypes into sets of eight binary variables, this innovative encoding technique enhanced CNN’s ability to capture and utilize the underlying genetic information effectively. Our investigation provided crucial insights into the potential of CNNs, coupled with the improved one-hot encoding method, to accurately predict complex animal traits, offering promising prospects for their practical application in breeding programs and genetic improvement strategies.
Additionally, evaluating the CNN performance using different SNP sets provided valuable insights into the optimal SNP selection for complex trait prediction. Our results indicated that utilizing SNP sets comprising 1,000 SNPs led to the highest predictive capabilities. This finding emphasized the importance of a focused selection of SNPs most relevant to the specific traits under investigation. This strategy allowed the CNN model to provide more accurate predictions and contribute to effective breeding strategies by capturing the most genetic variations that impact complex traits.
We also noted that the chosen encoding method was linked with the specific dataset or task. For instance, if the categories have a hierarchical relationship, traditional one-hot encoding may be more appropriate as it preserves the categories’ order. However, binary values may be more effective if the categories are unordered. The success of our approach lies in incorporating more comprehensive genomic information into the CNN models, enhancing the CNN’s ability to detect subtle variations and patterns within genetic data. This likely contributed to the observed improvement in prediction accuracy. These results suggest that appropriate encoding techniques are crucial in maximizing the predictive power of CNNs for complex animal trait prediction.
This study’s findings significantly affect swine breeding programs and genetic improvement strategies. The optimized SNP set and novel encoding method can be applied practically to enhance breeding decisions and accelerate genetic progress. By leveraging the predictive capabilities of CNNs, breeders and geneticists can make more informed decisions regarding selection, mating, and breeding strategies, ultimately leading to the desired trait improvements in swine populations. Although our study focused on genomic prediction for swine traits, it has broad applications beyond this field. For example, the expanded one-hot encoding approach could be applied to human genetics or other animals’ breeding to enhance the accuracy of genomic prediction.
While our study showcases the potential of CNNs in predicting swine traits, it is important to acknowledge certain limitations. First, the generalizability of the results may be influenced by the dataset used, including the implications of the expanded one-hot encoding approach. Further validation using diverse datasets is necessary to establish the robustness of our findings across different populations and genetic backgrounds, considering the potential trade-offs and computational demands introduced by this encoding method. Additionally, while our approach demonstrated improved prediction accuracy, it is crucial to consider the computational resources required for implementing CNN models and the associated time and cost implications, especially concerning the expanded one-hot encoding technique. Future research should incorporate additional genomic features, such as epigenetic markers or gene expression data, to enhance the CNN model’s predictive capabilities. Exploring different CNN architectures or hybrid models that combine CNNs with other machine-learning approaches may further improve swine trait prediction accuracy.

5. Conclusion

This study demonstrates the effectiveness of expanded one-hot encoding approaches in improving the accuracy of genomic prediction models for swine agricultural economic traits. Our experiments revealed that our scheme achieved higher predictive performance than traditional encoding methods. These findings have significant implications for genomic prediction and suggest its application to other fields to enhance predictive accuracy. Overall, our study contributes to continuously improving the effectiveness of genomic prediction and provides a promising avenue for future research.

Ethical statements

This article does not raise any ethical issues, and as such, formal ethical approval was not required.

Declaration of competing interest

The authors declare that they have no conflict of interest.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (32102513), the National Key Scientific Research Project (2023YFF1001100), the Shenzhen Innovation and Entrepreneurship Plan-Major Special Project of Science and Technology, China (KJZD20230923115003006) and the Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ZDRW202006).

References

Altshuler et al., 2008

D Altshuler, M J Daly, E Lander
Genetic mapping in human disease
Science, 322 (2008), pp. 881-888

View in ScopusGoogle ScholarAsins, 2002

M Asins
Present and future of quantitative trait locus analysis in plant breeding
Plant Breeding, 121 (2002), pp. 281-291

Google ScholarBadke et al., 2012

Y M Badke, R O Bates, C W Ernst, C Schwab, J P Steibel
Estimation of linkage disequilibrium in four US pig breeds
BMC Genomics, 13 (2012), pp. 1-10

Google ScholarBovenhuis et al., 1997

H Bovenhuis, J Van Arendonk, G Davis, J M Elsen, C Haley, W Hill, P Baret, D Hetzel, F Nicholas
Detection and mapping of quantitative trait loci in farm animals
Livestock Production Science, 52 (1997), pp. 135-144

View PDFView articleView in ScopusGoogle ScholarBrowning et al., 2021

B L Browning, X Tian, Y Zhou, S R Browning
Fast two-stage phasing of large-scale sequence data
The American Journal of Human Genetics, 108 (2021), pp. 1880-1890

View PDFView articleView in ScopusGoogle ScholarCalus, 2010

M P Calus
Genomic breeding value prediction: Methods and procedures
Animal, 4 (2010), pp. 157-164

View PDFView articleView in ScopusGoogle ScholarCao et al., 2018

C Cao, F Liu, H Tan, D Song, W Shu, W Li, Y Zhou, X Bo, Z Xie
Deep learning and its applications in biomedicine
Genomics, Proteomics and Bioinformatics, 16 (2018), pp. 17-32

View PDFView articleCrossrefView in ScopusGoogle ScholarCollard and Mackill, 2008

B C Collard, D Mackill
Marker-assisted selection: An approach for precision plant breeding in the twenty-first century
Philosophical Transactions of the Royal Society, 363 (2008), pp. 557-572

View in ScopusGoogle ScholarFlachowsky et al., 2013

G Flachowsky, U Meyer, M Gruen
Plant and animal breeding as starting points for sustainable agriculture
Sustainable Agriculture Reviews, 12 (2013), pp. 201-224

Google ScholarGoddard and Hayes, 2007

M Goddard, B Hayes
Genomic selection
Journal of Animal Breeding Genetics, 124 (2007), pp. 323-330

View in ScopusGoogle Scholar

  • Guo et al., 2018
    P Guo, B Zhu, H Niu, Z Wang, Y Liang, Y Chen, L Zhang, H Ni, Y Guo, E H A Hay
    Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis
    BMC Bioinformatics, 19 (2018), pp. 1-11

Heffner et al., 2010

E L Heffner, A J Lorenz, J L Jannink, M E Sorrells
Plant breeding with genomic selection: Gain per unit time and cost
Crop Science, 50 (2010), pp. 1681-1690

View in ScopusGoogle ScholarJannink et al., 2010

J L Jannink, A J Lorenz, H Iwata
Genomic selection in plant breeding: From theory to practice
Briefings in Functional Genomics, 9 (2010), pp. 166-177

View in ScopusGoogle ScholarJin et al., 2019

H Jin, Q Song, X Hu
Auto-keras: An efficient neural architecture search system
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019), pp. 1946-1956

View in ScopusGoogle ScholarKempthorne, 1997

O Kempthorne
Heritability: Uses and abuses
Genetica, 99 (1997), pp. 109-112

Google ScholarKermany et al., 2018

D S Kermany, M Goldbaum, W Cai, C C Valentim, H Liang, S L Baxter, A McKeown, G Yang, X Wu, F Yan
Identifying medical diagnoses and treatable diseases by image-based deep learning
Cell, 172 (2018), pp. 1122-1131
e1129.

Google ScholarLee et al., 2011

K T Lee, M J Byun, K S Kang, E W Park, S H Lee, S Cho, H Kim, K W Kim, T Lee, J E Park
Neuronal genes for subcutaneous fat thickness in human and pig are identified by local genomic sequencing and combined SNP association study
PLoS ONE, 6 (2011), Article e16356

View in ScopusGoogle ScholarLello et al., 2018

L Lello, S G Avery, L Tellier, A I Vazquez, G de Los Campos, S D Hsu
Accurate genomic prediction of human height
Genetics, 210 (2018), pp. 477-497

View in ScopusGoogle ScholarLiu et al., 2021

L Liu, X Feng, H Li, S Cheng Li, Q Qian, Y Wang
Deep learning model reveals potential risk genes for ADHD, especially Ephrin receptor gene EPHA5
Briefings in Bioinformatics, 22 (2021), Article bbab207

Google ScholarMeuwissen et al., 2013

T Meuwissen, B Hayes, M Goddard
Accelerating improvement of livestock with genomic selection
Annual Review of Animal Biosciences, 1 (2013), pp. 221-237

View in ScopusGoogle ScholarMeuwissen et al., 2001

T H Meuwissen, B J Hayes, M Goddard
Prediction of total genetic value using genome-wide dense marker maps
Genetics, 157 (2001), pp. 1819-1829

View in ScopusGoogle Scholar

  • Montesinos-López et al., 2021
    O A Montesinos-López, A Montesinos-López, P Pérez-Rodríguez, J A Barrón-López, J W Martini, S B Fajardo Flores, L S Gaytan-Lugo, P C Santana Mancilla
    A review of deep learning applications for genomic selection
    BMC Genomics, 22 (2021), pp. 1-23

Nelson et al., 2013

R M Nelson, M E Pettersson, Ö Carlborg
A century after fisher: Time for a new paradigm in quantitative genetics
Trends in Genetics, 29 (2013), pp. 669-676

View PDFView articleView in ScopusGoogle ScholarNyquist and Baker, 1991

W E Nyquist, R J Baker
Estimation of heritability and prediction of selection response in plant populations
Critical Reviews in Plant Sciences, 10 (1991), pp. 235-322

View in ScopusGoogle ScholarPan et al., 2022

S Pan, J Qiao, W Rui, H Yu, W Cheng, K Taylor, H Pan
Intelligent diagnosis of northern corn leaf blight with deep learning model
Journal of Integrative Agriculture, 21 (2022), pp. 1094-1105

View PDFView articleView in ScopusGoogle ScholarPryce et al., 2012

J Pryce, B Hayes, M J Goddard
Novel strategies to minimize progeny inbreeding while maximizing genetic gain using genomic information
Journal of Dairy Science, 95 (2012), pp. 377-388

View PDFView articleCrossrefView in ScopusGoogle ScholarResende et al., 2012

M Resende Jr, P Muñoz, J Acosta, G Peter, J Davis, D Grattapaglia, M Resende, M Kirst
Accelerating the domestication of trees using genomic selection: Accuracy of prediction models across ages and environments
New Phytologist, 193 (2012), pp. 617-624

View in ScopusGoogle ScholarSchierenbeck et al., 2011

S Schierenbeck, E Pimentel, M Tietze, J Körte, R Reents, F Reinhardt, H Simianer, S König
Controlling inbreeding and maximizing genetic gain using semi-definite programming with pedigree-based and genomic relationships
Journal of Dairy Science, 94 (2011), pp. 6143-6152

View PDFView articleCrossrefView in ScopusGoogle ScholarTang et al., 2017

J Tang, Z Zhang, B Yang, Y Guo, H Ai, Y Long, Y Su, L Cui, L Zhou, X Wang
Identification of loci affecting teat number by genome-wide association studies on three pig populations
Asian–Australasian Journal of Animal Sciences, 30 (2017), p. 1

View PDFView articleGoogle ScholarTurner et al., 2011

S Turner, L L Armstrong, Y Bradford, C S Carlson, D C Crawford, A T Crenshaw, M De Andrade, K F Doheny, J L Haines, G Hayes
Quality control procedures for genomewide association studies
Current Protocols in Human Genetics, 68 (2011), pp. 1-19

Google ScholarVanRaden, 2008

P M VanRaden
Efficient methods to compute genomic predictions
Journal of Dairy Science, 91 (2008), pp. 4414-4423

View PDFView articleCrossrefView in ScopusGoogle Scholar

  • Wakchaure et al., 2015
    R Wakchaure, S Ganguly, P Praveen, A Kumar, S Sharma, T Mahajan
    Marker assisted selection (MAS) in animal breeding: A review
    Journal of Drug Metabolism & Toxicology, 6 (2015), p. e127

Wang et al., 2015

K Wang, D Liu, J Hernandez Sanchez, J Chen, C Liu, Z Wu, M Fang, N Li
Genome wide association analysis reveals new production trait genes in a male Duroc population
PLoS ONE, 10 (2015), Article e0139207

View in ScopusGoogle Scholar

  • Wei et al., 2020
    W Wei, T Yang, L Rui, C Chen, L Tao, Z Kai, C Sun, C Li, X Zhu, W Guo
    Detection and enumeration of wheat grains based on a deep learning method under various scenarios and scales
    Journal of Integrative Agriculture, 19 (2020), pp. 1998-2008
  • Xi et al., 2020
    Q Xi, Y LI, G Y Su, H K Tian, S Zhang, Z Y Sun, Y Long, F Wan, W Qian
    MmNet: Identifying Mikania micrantha Kunth in the wild via a deep convolutional neural network
    Journal of Integrative Agriculture, 19 (2020), pp. 1292-1300
  • Xi et al., 2025
    T Xi, X Lei, Y Min, L Li, T Yao, S Liu, W Xu, S Xiao, N Ding, Z Zhang
    Genomic selection for meat quality traits based on VIS/NIR spectral information
    Journal of Integrative Agriculture, 24 (2025), pp. 235-245

Xie et al., 2023

Q Xie, Z Zhang, Z Chen, J Sun, M Li, Q Wang, Y Pan
Integration of selection signatures and protein interactions reveals NR6A1, PAPPA2, and PIK3C2B as the promising candidate genes underlying the characteristics of licha black pig
Biology, 12 (2023), p. 500

View in ScopusGoogle ScholarYang et al., 2022

G F Yang, Y Yang, Z K He, X Y Zhang, Y He
A rapid, low-cost deep learning system to classify strawberry disease based on cloud service
Journal of Integrative Agriculture, 21 (2022), pp. 460-473

Google ScholarYang et al., 2021

R Yang, X Guo, D Zhu, C Tan, C Bian, J Ren, Z Huang, Y Zhao, G Cai, D Liu
Accelerated deciphering of the genetic architecture of agricultural economic traits in pigs using a low-coverage whole-genome sequencing strategy
Giga Science, 10 (2021), Article giab048

Google ScholarZeng et al., 2013

J Zeng, A Toosi, R L Fernando, J Dekkers, D Garrick
Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action
Genetics Selection Evolution, 45 (2013), pp. 1-17

Google ScholarZhou and Stephens, 2012

X Zhou, M Stephens
Genome-wide efficient mixed-model analysis for association studies
Nature Genetics, 44 (2012), pp. 821-824

View in ScopusGoogle Scholar