Publication date: 11th March 2026
The efficiency of organic photovoltaics was estimated using a machine learning (ML) approach.
We used the organic photovoltaics database built in-house by the Korea Research Institute of Chemical Technol
ogy. The dataset comprises reliable and representative experimental results for 1,010 ternary organic solar cells (
D1:D2:A), obtained through repeated measurements. The data included 67 donors and 24 non-fullerene acceptor
s, device structures, donor/acceptor structures, donor-to-acceptor ratios, active-layer thicknesses, experimental c
onditions, and local symmetry.
We fragmented the donors and acceptors using a self-developed method. A dataset was created by generating d
escriptors of the fragmented molecules and used to train various ML algorithms, including random forest, XGBo
ost, LightGBM, support vector regression, and multilayer perceptron. Model performance was evaluated using t
he coefficient of determination (R²). XGBoost showed the highest R² of 0.849. The contributions of key features
were interpreted using SHAP analysis. This paper presents an ML framework that combines molecular fragment
ation and data-driven modeling.
