Bayesian Probabilistic Selection index

José Tiago Barroso Chagas ,
Luiz de Queiroz College of Agriculture. University of São Paulo.

14 November 2025

Introduction

This workflow will help you in the Bayesian probabilistic selection index (BPSI) usage. To apply this tool for your crop, it’s recommended pay attention to each section and run carrefully the codes. Any questions? Contact us by ProbBreed or e-mail.

Pan-African Soybean Trials dataset

Cultivar recommendation is a critical stage in plant breeding programs, and selecting superior genotypes for multiple traits remains a challenge due to the genotypes × environment (G × E) interaction.

To address this, we analyzed multi-environment trials data from Pan-African Trials, evaluating 65 soybean genotypes across 19 environments (Araújo et al. 2025).

Our methodological approach integrates Bayesian probabilistic framework to estimate genotypes risk recommendation to each trait and select genotypes according to a multitrait ideotype (Chagas et al. 2025). The Bayesian probabilistic selection index (BPSI) select the 10% genotypes with higher probability of superior performance across environments to Grain Yield (GY), Plant Height (PH) and Number of Days to Maturity (NDM).

Bayesian probabilistic selection index (BPSI)

The Bayesian probabilistic selection index enables design ideotype by increase or decrease and weighting traits. Along with the probabilities of superior performance across environments, it accounts for the distance between the worst genotype performance and candidate genotypes for each trait.

\[ BPSI_i = \sum_{m=1}^{t} \frac{\gamma_{pt} -\gamma_{it} }{(1/\lambda_t)} \]

where \(\gamma_p\) is the probability of superior performance of the worst genotype for the trait \(m\), \(\gamma\) is the probability of superior performance of genotype \(i\) for trait \(m\), \(t\) is the total number of traits evaluated, \(\left(m = 1, 2, ..., t \right)\), and \(\lambda\) is the weight for each trait \(t\).

BPSI example at soybean dataset

In order to provide a step-by-step guide, we applied

Bayesian Model

The Bayesian model are available at ProbBreed package (Chaves et al. 2024). We used the entry mean model.

\[ y_{jk} = \mu + l_k + g_j + \varepsilon_{jk} \] where the \(y_{jk}\) is the phenotypic record of the genotype \(j^{th}\) in the \(k^{th}\) location, \(\mu\) is the intercept, \(l_k\) is the main effects of the location, \(g_j\) is the main effect of the genotype, and \(\epsilon_{jk}\) is the residual effect.

library(ProbBreed)

met_df=read.csv("https://raw.githubusercontent.com/tiagobchagas/BPSI/refs/heads/main/Data/blues_long.csv",header=T)

head(met_df)
##   env gen       PH       GY NDM
## 1 E01 G02 77.56172 3690.918  NA
## 2 E01 G05 81.16229 1004.449  NA
## 3 E01 G09 62.17745 3062.170  NA
## 4 E01 G11 33.37286 1804.674  NA
## 5 E01 G14 52.03038 2719.217  NA
## 6 E01 G21 97.85586 1747.515  NA
mod = bayes_met(data = met_df,
                gen = "gen",
                loc = "env",
                repl = NULL,
                trait = "PH",
                reg = NULL,
                year = NULL,
                res.het = T,
                iter =400, cores = 4, chain = 4) #recommended run at least 4k iterations


mod2 = bayes_met(data = met_df,
                 gen = "gen",
                 loc = "env",
                 repl = NULL,
                 trait = "GY",
                 reg = NULL,
                 year = NULL,
                 res.het = T,
                 iter = 400, cores = 4, chain = 4) #recommended run at least 4k iterations

mod3 = bayes_met(data = met_df,
                 gen = "gen",
                 loc = "env",
                 repl =  NULL,
                 trait = "NDM",
                 reg = NULL,
                 year = NULL,
                 res.het = T,
                 iter = 400, cores = 4, chain = 4) #recommended run at least 4k iterations

Extracting the probability of superior performance

models=list(mod,mod2,mod3)
names(models) <- c("PH","GY","NDM")
inc=c(FALSE,TRUE,FALSE)
names(inc) <- names(models)

results <- lapply(names(models), function(model_name) {
  x <- models[[model_name]]  # actual model object

  outs <- extr_outs(model = x,
                    probs = c(0.05, 0.95),
                    verbose = TRUE)

  a <- prob_sup(extr = outs,
                int = .2,
                increase = inc[[model_name]],  # ← now model_name is a character!
                save.df = FALSE,
                verbose = TRUE)

  return(a)
})
names(results) <- names(models)

Running the BPSI

source("Data/bpsi.R")



index=bpsi(problist=results,
          increase = c(FALSE,TRUE,FALSE),
          int = 0.1,
          lambda =c(1,2,1),
          save.df = F)

BPSI Ranks

The selected genotypes (G52, G26, G41, G43, G35, G54) have superior performance for the three traits evaluated Grain Yield (GY), Plant Height (PH) and Number of Days to Maturity (NDM).

plot(index,category = "Ranks")
Rank of probability of superior performance of soybean Pan-African Trials across locations in Kenya

Rank of probability of superior performance of soybean Pan-African Trials across locations in Kenya

BPSI plot

The selected genotypes (G52, G26, G41, G43, G35, G54) have minor risk of cultivar recomendation to the multitrait ideotype across the Kenya environments.

plot(index,category = "BPSI")
Bayesian probability selection index of soybean genotypes from Pan-African Trials across locations in Kenya

Bayesian probability selection index of soybean genotypes from Pan-African Trials across locations in Kenya

Probabilities of superior performance

The probability os superior performance across environments show the risk recommendation of the soybean genotypes being in the top 20% to Grain Yield (GY) and the bottom 20% for Plant Height (PH) and Number of Days to Maturity (NDM).

df=index$df
df |> kableExtra::kable()
PH GY NDM
G01 0.93250 0.24125 0.26000
G02 0.03625 0.16625 0.11250
G03 0.00625 0.16125 0.26625
G04 0.44875 0.19500 0.01375
G05 0.00000 0.12375 0.00000
G06 0.00000 0.23250 0.00000
G07 0.98375 0.09625 0.73375
G08 0.11500 0.13125 0.26500
G09 0.01750 0.25250 0.92875
G11 0.99875 0.35500 0.66250
G12 0.00625 0.25875 0.51250
G13 0.11250 0.18375 0.01375
G14 0.80250 0.13375 0.56500
G15 0.41500 0.17750 0.21000
G16 0.11875 0.13125 0.19375
G17 0.74750 0.16875 0.05500
G18 0.00250 0.19125 0.01250
G19 0.17750 0.16500 0.00000
G20 0.00500 0.15000 0.02500
G21 0.00000 0.10750 0.00750
G22 0.00000 0.16250 0.01125
G23 0.00000 0.19375 0.02250
G24 0.00000 0.17375 0.02625
G25 0.01375 0.23000 0.37500
G26 0.00000 0.37125 0.00250
G27 0.49625 0.28625 0.39125
G28 0.00000 0.19250 0.08125
G29 0.99875 0.19125 0.52750
G30 0.02125 0.26125 0.00750
G31 0.91625 0.14250 0.11125
G32 0.95125 0.12125 0.78750
G33 0.99375 0.13750 0.71750
G34 0.99375 0.18750 0.74750
G35 0.00000 0.40125 0.09250
G38 0.13500 0.09625 0.00500
G39 0.01500 0.22500 0.28625
G40 0.00000 0.14500 0.01250
G41 0.00000 0.29750 0.00125
G42 0.00000 0.27625 0.07875
G43 0.00000 0.36500 0.00750
G44 0.00000 0.12250 0.00000
G45 0.00000 0.19375 0.00000
G46 0.00000 0.25375 0.09500
G47 0.95375 0.14125 0.05375
G48 0.02500 0.19750 0.29375
G49 0.00625 0.10125 0.00750
G50 0.02125 0.11750 0.11750
G51 0.00000 0.21750 0.07500
G52 0.00000 0.39750 0.00000
G53 0.00000 0.32500 0.20250
G54 0.00000 0.25750 0.00750
G55 0.00000 0.17625 0.14500
G56 0.00000 0.19375 0.30375
G57 0.00000 0.21750 0.08250
G58 0.09625 0.21625 0.01250
G59 0.00000 0.24750 0.78625
G60 0.00000 0.11875 0.03875
G61 0.00500 0.15500 0.38625
G62 0.00000 0.23750 0.10000
G63 0.00000 0.07875 0.03750
G64 0.43125 0.20250 0.02625
G65 0.00000 0.19000 0.01375

Genotypes selected

The genotypes by BPSI selected were G52, G26, G41, G43, G35, G54. It have the greater distance to the worst genotype to Grain Yield (GY), Plant Height (PH) and Number of Days to Maturity (NDM). The genotype selected are the 10% multitrait top performance across enviroments.

df=index$bpsi

df |> kableExtra::kable()

Bibliography

Araújo, Mauricio S., Saulo Chaves, Gérson N. C. Ferreira, Godfree Chigeza, Erica P. Leles, Michelle F. Santos, Brian W. Diers, Peter Goldsmith, and José B. Pinheiro. 2025. “High-Resolution Soybean Trial Data Supporting the Expansion of Agriculture in Africa.” Scientific Data. https://doi.org/10.1038/s41597-025-06190-3.
Chagas, José T. B., Kaio O. G. Dias, Vinicius Q. Carneiro, Lawrência M. C. De Oliveira, Núbia X. Nunes, José D. P. Júnior, Pedro C. S. Carneiro, and José E. de S. Carneiro. 2025. “Bayesian Probabilistic Selection Index in the Selection of Common Bean Families.” Crop Science 65 (May): e70072. https://doi.org/10.1002/CSC2.70072.
Chaves, Saulo F. S., Matheus D. Krause, Luiz A. S. Dias, Antonio A. F. Garcia, and Kaio O. G. Dias. 2024. “ProbBreed: A Novel Tool for Calculating the Risk of Cultivar Recommendation in Multienvironment Trials.” G3 Genes|Genomes|Genetics 14 (March). https://doi.org/10.1093/G3JOURNAL/JKAE013.