1 Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo
2 Department of General Biology, Federal University of Viçosa

Introduction

This workflow will help you in the Bayesian probabilistic selection index (BPSI) usage. To apply this tool for your crop, it’s recommended pay attention to each section and run carrefully the codes. Any questions? Contact us by ProbBreed or e-mail.

Pan-African Soybean Trials dataset

Cultivar recommendation is a critical stage in plant breeding programs, and selecting superior genotypes for multiple traits remains a challenge due to the genotypes × environment (G × E) interaction.

To address this, we analyzed multi-environment trials data from Pan-African Trials, evaluating 65 soybean genotypes across 19 environments (Araújo et al. 2025).

Our methodological approach integrates Bayesian probabilistic framework to estimate genotypes risk recommendation to each trait and select genotypes according to a multitrait ideotype (Chagas et al. 2025). The Bayesian probabilistic selection index (BPSI) select the 10% genotypes with higher probability of superior performance across environments to Grain Yield (GY), Plant Height (PH) and Number of Days to Maturity (NDM).

Bayesian probabilistic selection index (BPSI)

The Bayesian probabilistic selection index enables design ideotype by increase or decrease and weighting traits. Along with the probabilities of superior performance across environments (Dias et al. 2022). It accounts for the distance between the worst genotype performance and candidate genotypes for each trait.

\[ BPSI_i = \sum_{m=1}^{t} \frac{\gamma_{pt} -\gamma_{it} }{(1/\lambda_t)} \]

where \(\gamma_p\) is the probability of superior performance of the worst genotype for the trait \(m\), \(\gamma\) is the probability of superior performance of genotype \(i\) for trait \(m\), \(t\) is the total number of traits evaluated, \(\left(m = 1, 2, ..., t \right)\), and \(\lambda\) is the weight for each trait \(t\).

BPSI example at soybean dataset

In order to provide a step-by-step guide, we applied

Bayesian Model

The Bayesian model are available at ProbBreed package (Chaves et al. 2024). We used the entry mean model.

\[ y_{jk} = \mu + l_k + g_j + \varepsilon_{jk} \] where the \(y_{jk}\) is the phenotypic record of the genotype \(j^{th}\) in the \(k^{th}\) location, \(\mu\) is the intercept, \(l_k\) is the main effects of the location, \(g_j\) is the main effect of the genotype, and \(\epsilon_{jk}\) is the residual effect.

library(ProbBreed)


met_df=read.csv("https://raw.githubusercontent.com/tiagobchagas/BPSI/refs/heads/main/Data/blues_long.csv",header=T)

head(met_df)
##   env gen       PH       GY NDM
## 1 E01 G02 77.56172 3690.918  NA
## 2 E01 G05 81.16229 1004.449  NA
## 3 E01 G09 62.17745 3062.170  NA
## 4 E01 G11 33.37286 1804.674  NA
## 5 E01 G14 52.03038 2719.217  NA
## 6 E01 G21 97.85586 1747.515  NA
mod = bayes_met(data = met_df,
                gen = "gen",
                loc = "env",
                repl = NULL,
                trait = "PH",
                reg = NULL,
                year = NULL,
                res.het = T,
                iter =400, cores = 4, chain = 4) #recommended run at least 4k iterations


mod2 = bayes_met(data = met_df,
                 gen = "gen",
                 loc = "env",
                 repl = NULL,
                 trait = "GY",
                 reg = NULL,
                 year = NULL,
                 res.het = T,
                 iter = 400, cores = 4, chain = 4) #recommended run at least 4k iterations

mod3 = bayes_met(data = met_df,
                 gen = "gen",
                 loc = "env",
                 repl =  NULL,
                 trait = "NDM",
                 reg = NULL,
                 year = NULL,
                 res.het = T,
                 iter = 400, cores = 4, chain = 4) #recommended run at least 4k iterations

Extracting the probability of superior performance

models=list(mod,mod2,mod3)
names(models) <- c("PH","GY","NDM")
inc=c(FALSE,TRUE,FALSE)
names(inc) <- names(models)

results <- lapply(names(models), function(model_name) {
  x <- models[[model_name]]  # actual model object

  outs <- extr_outs(model = x,
                    probs = c(0.05, 0.95),
                    verbose = TRUE)

  a <- prob_sup(extr = outs,
                int = .2,
                increase = inc[[model_name]],  # ← now model_name is a character!
                save.df = FALSE,
                verbose = TRUE)

  return(a)
})
names(results) <- names(models)

Running the BPSI

index=bpsi(problist=results,
          increase = c(FALSE,TRUE,FALSE),
          int = 0.1,
          lambda =c(1,2,1),
          save.df = F)

BPSI Ranks

The selected genotypes (G52, G43, G26, G41, G54, G58) have superior performance for the three traits evaluated Grain Yield (GY), Plant Height (PH) and Number of Days to Maturity (NDM).

plot(index,category = "Ranks")
Rank of probability of superior performance of soybean Pan-African Trials across locations in Kenya

Rank of probability of superior performance of soybean Pan-African Trials across locations in Kenya

BPSI plot

The selected genotypes (G52, G43, G26, G41, G54, G58) have minor risk of cultivar recomendation to the multitrait ideotype across the Kenya environments.

plot(index,category = "BPSI")
Bayesian probability selection index of soybean genotypes from Pan-African Trials across locations in Kenya

Bayesian probability selection index of soybean genotypes from Pan-African Trials across locations in Kenya

Probabilities of superior performance

The probability os superior performance across environments show the risk recommendation of the soybean genotypes being in the top 20% to Grain Yield (GY) and the bottom 20% for Plant Height (PH) and Number of Days to Maturity (NDM).

df=index$df
df |> kableExtra::kable()
PH GY NDM
G01 0.95500 0.20250 0.19750
G02 0.05875 0.15625 0.09625
G03 0.01000 0.14750 0.23250
G04 0.49875 0.16375 0.03375
G05 0.00000 0.13375 0.00000
G06 0.00000 0.19000 0.00125
G07 0.98375 0.11375 0.73625
G08 0.10875 0.18000 0.29250
G09 0.01625 0.19375 0.96250
G11 1.00000 0.34625 0.63375
G12 0.00500 0.29750 0.43500
G13 0.12875 0.18875 0.01750
G14 0.80125 0.15875 0.52375
G15 0.39250 0.18250 0.23500
G16 0.10250 0.13625 0.20125
G17 0.76500 0.15875 0.06000
G18 0.00125 0.18750 0.00375
G19 0.17250 0.21500 0.00000
G20 0.00750 0.13500 0.02375
G21 0.00000 0.15250 0.01375
G22 0.00000 0.13375 0.02250
G23 0.00000 0.16750 0.03250
G24 0.00000 0.14000 0.02500
G25 0.01250 0.18750 0.35500
G26 0.00000 0.28000 0.00000
G27 0.54125 0.29000 0.39625
G28 0.00000 0.17750 0.06875
G29 1.00000 0.19000 0.59125
G30 0.01250 0.22875 0.00375
G31 0.93250 0.15500 0.11125
G32 0.91875 0.11375 0.81125
G33 0.99250 0.12125 0.68875
G34 0.99375 0.18875 0.74500
G35 0.00000 0.29375 0.06750
G38 0.08875 0.14000 0.00375
G39 0.00625 0.22625 0.30000
G40 0.00125 0.14750 0.01625
G41 0.00000 0.26000 0.00125
G42 0.00000 0.32000 0.11500
G43 0.00000 0.31750 0.01125
G44 0.00000 0.15750 0.00000
G45 0.00000 0.17750 0.00000
G46 0.00000 0.30125 0.10750
G47 0.94875 0.16375 0.04875
G48 0.01000 0.20000 0.32250
G49 0.01000 0.14250 0.01000
G50 0.01750 0.13125 0.07375
G51 0.00000 0.18000 0.05500
G52 0.00000 0.43125 0.00125
G53 0.00000 0.35375 0.18875
G54 0.00000 0.24625 0.00250
G55 0.00000 0.20125 0.23000
G56 0.00000 0.24250 0.30625
G57 0.00000 0.18500 0.07125
G58 0.09375 0.29875 0.00750
G59 0.00000 0.23875 0.78375
G60 0.00000 0.14750 0.03875
G61 0.00625 0.16125 0.39750
G62 0.00000 0.21125 0.07125
G63 0.00000 0.11500 0.03250
G64 0.40625 0.19125 0.02625
G65 0.00000 0.21625 0.02125

Genotypes selected

The genotypes by BPSI selected were G52, G43, G26, G41, G54, G58. It have the greater distance to the worst genotype to Grain Yield (GY), Plant Height (PH) and Number of Days to Maturity (NDM). The genotype selected are the 10% multitrait top performance across enviroments.

df=index$bpsi

df |> kableExtra::kable()

Bibliography

Araújo, Mauricio S., Saulo Chaves, Gérson N. C. Ferreira, et al. 2025. “High-Resolution Soybean Trial Data Supporting the Expansion of Agriculture in Africa.” Scientific Data, ahead of print. https://doi.org/10.1038/s41597-025-06190-3.
Chagas, José T. B., Kaio O. G. Dias, Vinicius Q. Carneiro, et al. 2025. “Bayesian Probabilistic Selection Index in the Selection of Common Bean Families.” Crop Science 65 (May): e70072. https://doi.org/10.1002/CSC2.70072.
Chaves, Saulo F. S., Matheus D. Krause, Luiz A. S. Dias, Antonio A. F. Garcia, and Kaio O. G. Dias. 2024. “ProbBreed: A Novel Tool for Calculating the Risk of Cultivar Recommendation in Multienvironment Trials.” G3 Genes|Genomes|Genetics 14 (March). https://doi.org/10.1093/G3JOURNAL/JKAE013.
Dias, Kaio O. G., Jhonathan P. R. dos Santos, Matheus D. Krause, et al. 2022. “Leveraging Probability Concepts for Cultivar Recommendation in Multi-Environment Trials.” Theoretical and Applied Genetics 135 (April): 1385–99. https://doi.org/10.1007/S00122-022-04041-Y/FIGURES/4.