nanoGe - MATSUS24 - #AI - Automation and Nanomaterials (machine learning, artificial intelligence, robotics, accelerated discovery)

#AI - Automation and Nanomaterials (machine learning, artificial intelligence, robotics, accelerated discovery)

Wed Mar 06 2024

15:20 - 15:30

#AI Opening. Room Port Vell 1

#AI 1.1

Chair: Federica Bertolotti

15:30 - 16:00
1.1-I1

Arciniegas, Milena

Istituto Italiano di Tecnologia (IIT)

Towards Data-Driven Next Generation of Broadband Emitting Layered Perovskites: Incorporating Digital Data Analysis into Routine Research Tasks

Milena Arciniegas

Istituto Italiano di Tecnologia (IIT), IT

I am an energetic, creative, female scientist with a solid expertise in Material Science and Technology. I have successfully implemented an engineering approach to guide the development of functional nanohybrids through general and simple routes. Throughout my work, I have introduced important mechanisms on the cooperative coupling of dissimilar materials in single structures, which represents a fundamental knowledge for the creation of a new-generation of nano and macro hybrid materials.

Authors

Milena Arciniegas^a, Elana Borvick^b, Seda Kutkan^c, Roman Krahne^c, Assaf Anderson^b, Liberato Manna^a

Affiliations

a, Nanochemistry, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy

b, Materials Zone

c, Optoelectronics Research Line, Istituto Italiano di Tecnologia, Via Morego 20, 16163 Genoa, Italy

Abstract

The self-intercalation of organic and inorganic components in two-dimensional (2D) layered perovskites brings up new avenues for the structural engineering of efficient white light emitters from a single material. Recent studies focused on the impact of organic cations on the material’s dimensionality, bandgap, and crystallographic structure.^[1,2] Yet, their effects on the emission properties have only been investigated using random selection.^[3] Here, we show an experimental approach based on molecular descriptors of organic cations to study their role in the emission characteristics of organic-inorganic layered perovskites. To this aim, we collected experimental data in real-time and uploaded it to a digital platform for data interaction, analysis, and preservation. We first carefully selected primary and secondary amines and established a robust synthetic protocol that is performed at relatively low temperatures and through simple steps.^[4] Up to date, we have prepared 47 different layered structures, and established seven digital protocols that allow straightforward correlation between structural and optical. Initial correlations between amine molecular descriptors and the collected synthesis and optical data indicate that the use of short organic cations with heteroatoms and with a low number of valence electrons allows broadband emitting layered structures, and the heteroatom position might lead to tunable emission from blue to white. Such data integration and analysis in real-time, including sorting, interactive exploration, and data graphical representation, open alternative avenues to fabricate efficient white-emitting structures from a single material, with a broad perspective for other functionalities.

16:00 - 16:30
1.1-I2

Abolhasani, Milad

North Carolina State University

Accelerated Materials Discovery and Optimization with a Self-Driving Fluidic Lab

Milad Abolhasani

North Carolina State University, US

Authors

Milad Abolhasani^a

Affiliations

a, North Carolina State University, Department of Chemical and Biomolecular Engineering, Partners Way, 911, Raleigh, US

Abstract

Accelerating materials discovery as well as green and sustainable ways to manufacture them will have a profound impact on the global challenges in energy, sustainability, and healthcare. The current human-dependent paradigm of experimental research in chemical and materials sciences fails to identify technological solutions for worldwide challenges in a short timeframe. This limitation necessitates the development and implementation of new strategies to accelerate the pace of materials discovery. Recent advances in reaction miniaturization, automated experimentation, and data science provide an exciting opportunity to reshape the discovery, development, and manufacturing of new advanced functional materials related to energy transition and sustainability. In this talk, I will present a Self-Driving Fluidic Lab for accelerated discovery, optimization, and manufacturing of emerging advanced functional materials with multi-step chemistries, through the integration of flow chemistry, online characterization, and artificial intelligence (AI).^1-3 I will discuss how modularization of different synthesis and processing stages in tandem with a constantly evolving AI-assisted modeling and decision-making under uncertainty can enable resource-efficient navigation through high dimensional experimental design spaces. Example applications of SDFL for the autonomous synthesis of clean energy nanomaterials will be presented to illustrate the potential of autonomous labs in reducing materials discovery timeframe from +10 years to a few months. Finally, I will present the unique reconfigurability aspect of self-driving fluidic labs to close the scale gap in clean energy materials research through on-demand switching from reaction exploration/exploitation to smart manufacturing mode.

16:30 - 16:45
1.1-O1

Laufer, Felix

Karlsruhe Institute of Technology

Analyzing Perovskite Thin-Film Formation with Machine Learning and Explainable AI

Felix Laufer

Karlsruhe Institute of Technology

Authors

Felix Laufer^a, Lukas Klein^b^,^c^,^d, Sebastian Ziegler^e^,^d, Charlotte Debus^f^,^g, Markus Götz^f^,^g, Klaus Maier-Hein^e^,^d, Ulrich W. Paetzold^a^,^h, Fabian Isensee^e^,^d, Paul F. Jäger^b^,^d

Affiliations

a, Light Technology Institute, Karlsruhe Institute of Technology, Engesserstr. 13, Karlsruhe, 76131, DE

b, Interactive Machine Learning Group, German Cancer Research Center

c, Institute for Machine Learning, ETH Zürich

d, Helmholtz Imaging, German Cancer Research Center

e, Division of Medical Image Computing, German Cancer Research Center

f, Steinbuch Centre for Computing, Karlsruhe Institute of Technology

g, Helmholtz AI, Karlsruhe Institute of Technology

h, Institute of Microstructure Technology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, DE

Abstract

Perovskite solar cells (PSCs) hold great potential as a technology for the next generation of thin-film photovoltaics. For PSCs to be commercially viable, scalable fabrication processes must be improved based on state-of-the-art techniques used for small-area devices. Hence, there is a need to investigate and optimize the formation of perovskite thin films fabricated through scalable deposition methods to facilitate large-scale production of solution-based, large-area PSCs. However, the formation of perovskite thin films from precursor solution is complex and involves the interrelated phases of drying, nucleation, and crystal growth. Therefore, in-depth understanding and precise control of these stages are vital to consistently fabricate high-quality optoelectronic films.

The application of in situ multi-channel photoluminescence (PL) imaging allows recording the temporal evolution of the perovskite thin-film formation, while also providing spatial resolution and spectral information. We have generated and made publicly available unique experimental data that comprises in situ PL data of blade-coated PSCs and corresponding quality measures, such as power conversion efficiency and perovskite layer thickness. Exploring this in situ PL data has the potential to reveal important variations in the thin-film formation process, however, the limits of human analysis are exceeded due to the high dimensionality. Here, machine learning (ML) methods are a promising route to investigate such complex, multi-dimensional PL data to elucidate the large-area formation of blade-coated perovskite thin-films during vacuum quenching.

To accelerate empirical-based traditional, incremental scientific progress, we employ deep learning and explainable artificial intelligence (XAI) techniques to identify correlations between the in situ PL data collected during the perovskite thin-film fabrication and resulting solar cell performance metrics, while making these correlations humanly understandable. Our analysis shows that variations in the quality of PSCs can be understood by examining the thin-film formation process with ML and XAI. By applying various XAI methods we explain not only which data features are important but also provide a data-driven explanation why they are important. Additionally, our research illustrates how these insights can be translated into practical guidelines for optimizing perovskite thin-film fabrication by giving actionable recommendations to experimental scientists. Remarkably, we are able to generate these insights just by analyzing the given dataset without having to perform additional laborious and expensive trial-and-error experiments.

Our research improves the understanding of the complex large-area formation of perovskite thin-films and highlights the pivotal role of ML, and XAI methodologies in particular, in accelerating energy materials research. As XAI methods are not limited to the investigated dataset, this study is a prime example of how similar analyses can be performed to interpret and improve processes in numerous other areas of sustainable materials research.

For an in-depth analysis and further discussion of these findings, see our full research paper “Discovering Process Dynamics for Scalable Perovskite Solar Cell Manufacturing with Explainable AI”, L. Klein*, S. Ziegler*, F. Laufer et al., Adv. Mater., 202307160 (2023), doi: 10.1002/adma.202307160.

20:00 - 22:00

Social Dinner

Thu Mar 07 2024

#AI 2.1

Chair: Milena Arciniegas

09:00 - 09:30
2.1-I1

Bertolotti, Federica

University of Insubria, Department of Science and High Technology

Deep learning for the efficient size classification of quantum dots from total scattering data

Federica Bertolotti

University of Insubria, Department of Science and High Technology, IT

Authors

Federica Bertolotti^a, Lucia Allara^a, Antonietta Guagliardi^b

Affiliations

a, Dipartimento di Scienza e Alta Tecnologia & To.Sca.Lab, Università dell’Insubria, via Valleggio 11, 22100 Como, Italy

b, Istituto di Cristallografia & To.Sca.Lab, Consiglio Nazionale delle Ricerche, via Valleggio 11, 22100 Como, Italy

Abstract

Optoelectronic properties of ultrasmall semiconductor nanocrystals are strongly related to their structural and microstructural features. However, due to the complexity of these materials, this intermingled relationship remains mostly elusive.

Over the past decades, total scattering methods, in particular the ones based on the Debye Scattering Equation (DSE) and operating in reciprocal space, have been established as essential tools for characterizing the structure, microstructure, and morphology of nanocrystals, including ultrasmall Quantum Dots (QDs).[1–5]

Although wide-angle scattering-based techniques are primarily sensitive to the atomic-scale structures of materials, reciprocal space total scattering methods provide robust information on multiple length scales, in particular if nanocrystalline materials are considered.

Nevertheless, constructing reliable, material-oriented atomistic models, to be optimized against the experimental data in order to extract structural and microstructural parameters remains a highly challenging task and often poses a bottleneck for scattering-based methods.[6–9]

To overcome this limitation, we tackle the challenge of developing reliable, efficient, and user-friendly methods for determining the average size of colloidal QDs, with a combination of reciprocal space total scattering methods based on DSE and an all-convolutional neural network (all-CNN) that provides physically interpretable results.

In this talk, I will present the development and first application of this novel tool to a selected class of lead-chalcogenide binary QDs that serves as a benchmark system. Indeed, they have been extensively characterized within the DSE approach, which has provided well-established knowledge about their structural and morphological features.[1]

The presented automated tool can be readily employed for real-time size classification of PbS QDs, even from diluted colloidal suspensions, within the limitations of the Q-range and signal-to-noise ratio typically encountered in in-situ and in-operando diffraction experiments.[10]

Additionally, it may serve as a rapid screening tool for the optimization of synthetic protocols.

Furthermore, the proposed method can be easily extended to other classes of nanocrystals, allowing non-experts in crystallography and X-ray diffraction to utilize the automated workflow for creating DSE pattern libraries used for training the all-CNN.

09:30 - 09:45
2.1-O1

Fernández-Pendás, Mario

BCMaterials

Machine Learning Force Fields for Colloidal Quantum Dots

Mario Fernández-Pendás

BCMaterials, ES

Authors

Mario Fernández-Pendás^a, Ivan Infante^a^,^b

Affiliations

a, BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, 3rd. Floor, UPV/EHU Science Park, 48940, Leioa, Spain

b, Ikerbasque, Basque Foundations for Science, María Díaz de Haro 3, 48013 Bilbao, Spain.

Abstract

Colloidal quantum dots (QDs) and in particular perovskite QDs have attracted broad interest for their highly efficient, spectrally tunable and narrowband photoluminescence (PL). Their potential implementation in a plethora of nanomaterials technologies spanning efficient lighting and light harvesting to bioimaging, demand a thorough understanding of the effects of QD structure and surface structure dynamics on PL characteristics. In principle, low-cost simulations with classical force fields (FFs) reveal no electronic structure information and thus provide restricted insight into PL features. Enabling molecular dynamics (MD) simulations in both the ground and excited states with electronic structure information up to the nanosecond timescale is a desirable advance.

For this purpose, we propose a customized platform that helps constructing machine learning FFs for colloidal QDs, expanding the already existing platform dedicated to the parameterization of classical FF parameters for QDs. By considering quantum mechanical properties (such as potential energies and bandgaps) previously computed at a high-level of theory like density functional theory (DFT), along with position and forces, this platform automatically trains a neural network to generate the so-called machine learning FFs (MLFF). These MLFFs not only allow to expand the timescale of the MD simulations from picoseconds to nanosecond, but will also enhance sampling efficiency and simulation accuracy by including also information on the electronic density of states at each point of the trajectory. The machine learning platform has been tested for CsPbBr3 QDs obtaining promising results.

09:45 - 10:00
2.1-O2

Zito, Juliette

Department of Nanochemistry, Istituto Italiano di Tecnologia, Italy

A Universal Database of Surface Ligands for Quantum Dots

Juliette Zito

Department of Nanochemistry, Istituto Italiano di Tecnologia, Italy, IT

Authors

Juliette Zito^a, Bas F. van Beek^b, Luca de Trizio^a, Lucas Visscher^b, Liberato Manna^a, Ivan Infante^c

Affiliations

a, Instituto Italiano di Tecnologia

b, Vrije Universiteit Amsterdam, De Boelelaan, 1108, Amsterdam, NL

c, BCMaterials, Basque Center for Materials, Applications and Nanostructures, UPV/EHU Science Park, 48940 Leioa, Spain

Abstract

Quantum Dots (QDs) combine inorganic semiconductors scaffolds with organic surface ligands. A key role played by the ligands in these systems is the stabilization of the inorganic region in organic solvents to prevent its dissolution. The suitability of a certain ligand depends on several factors such as magnitude of the QD-ligand binding strength, ligand-ligand packing, and ligand-solvent interactions. In practice, each QD type has its best ligand, however finding such ligand is a tedious work if one considers the labor and material costs involved in the experiments. In this context, computational screening provides an attractive alternative to browse the largely unexplored space of potentially interesting ligands. In this work, we introduce a public database of organic molecules, derived by an advanced filtering of the PUBCHEM database,[1] with the aim of building a subset of surface ligands candidates that are potentially suitable to passivate the surface of any inorganic semiconductor substrate. For each ligand in the database, we additionally provide relevant chemical and physical properties, from the boiling and melting points to more specific properties that account for the interactions with the solvent. These ligand properties are material-independent and can therefore be used as an orientation tool for ligand-capped semiconductor materials research in general.

10:00 - 10:30
2.1-I2

Visscher, Lucas

Vrije Universiteit Amsterdam

Quantum chemistry methods and workflows for nano-sized materials

Lucas Visscher

Vrije Universiteit Amsterdam, NL

Born June 23, 1966 in Meppel, The Netherlands

Professor in Theoretical Chemistry, Vrije Universiteit Amsterdam, The Netherlands

Ph. D. (cum laude) University of Groningen (1993), postdoctoral stays at NASA Ames (1994-1995) and at the University of Odense (1996-1997). Professor at Vrije Universiteit Amsterdam (1998-present). Visiting professor stays at University of Strasbourg and at Pacific Northwest Laboratories. Awards: KNCV Clemens Roothaan Prize (1996), NWO vici (2005), WATOC Dirac Medal (2006).

Main research Interests

1. Subsystem electronic structure methods

2. Reducing the time-to-solution of computational models

3. Development and application of relativistic computational chemistry techniques

Authors

Lucas Visscher^a

Affiliations

a, Vrije Universiteit Amsterdam, Department of Theoretical Chemistry, Faculty of Science, 1081 HV Ámsterdam, Países Bajos, Ámsterdam, NL

Abstract

In this presentation I will report on our progress on developing quantum chemistry methods and workflows for nano-sized materials.

For the quantum chemical methods I will mostly focus on (approximate) density functional theory [1,2], but will also briefly mention our recent work on higher-level methods[3,4] that can be used to benchmark density functional approximations. I will thereby in particular focus on the efficient description of electronically excited states.

Two different approaches to deal with electronically excited states in large systems will be discussed. In the subsystem approach we first solve for local excitations and a limited number of charge-transfer states, before proceeding to the treatment of the full system. For systems with inherently delocalized (plasmonic) excitations and a high density of states in the absorption spectrum, we need to take a different approach however. For this purpose, we have developed techniques in which the most time-consuming steps of the calculations are identified and approximated. I will discuss the effect of these approximations in terms of computational efficiency and adequacy to retain the quality of the original method.

Concerning workflows, I will discuss examples of our work related to (solar) energy conversion by dye-sensitized solar cells[5]. Here I would like to discuss in particular also the design of the workflow software[6] for which deployment on a parallel compute architecture is a key requirement.

10:30 - 11:15

Coffee Break

#AI 2.2

Chair: Lucas Visscher

11:15 - 11:30
2.2-O1

Saleh, Gabriele

CompuNet, Istituto Italiano di Tecnologia (IIT), Genova

Discovery of Novel Chalcohalide and Other Semiconductor Materials through the Combination of Big Data and Artificial Intelligence

Gabriele Saleh

CompuNet, Istituto Italiano di Tecnologia (IIT), Genova, IT

Authors

Gabriele Saleh^a, Ivan Infante^b, Liberato Manna^a

Affiliations

a, Nanochemistry, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy

b, BCMaterials, Basque Center for Materials, Applications and Nanostructures, UPV/EHU Science Park, Spain., Barrio Sarriena s/n, 48940 Leioa, ES

Abstract

Data science and artificial intelligence are bringing about a revolution in our society, and materials science is no exception. In the future, it will be possible to design materials with tailored properties by relying almost completely on existing data. Indeed, combining data mining with cleverly designed machine learning (ML) algorithms can overcome the need to rely on large experimental and/or computational facilities for discovering new materials[i]. One promising field of application for these types of approaches is that of semiconductors. On the one hand, these materials are central to many technological applications. On the other hand, there is a constant need for improvement, for example for replacing toxic or rare elements while fulfilling optoelectronic and stability criteria.

In this contribution, we present our platform for the discovery of new chalcohalides, an emerging class of semiconducting materials which display the well-known excellent optoelectronic properties of metal halide perovskites while also being considerably more stable[ii]. Our approach consists of two steps. First, the platform connects to databases (AFLOW[iii], NOMAD[iv], and Materials Project[v]) to automatically find all the materials with user-defined constituting elements and properties. For chalcohalides, a key property for optical applications is the direct character of the band gap. Then, ML algorithms are trained to predict the stability, and the width and character of the band gap. The ML model is then applied to predict new compounds, and the prediction is then verified by quantum chemical simulations. We present representative examples of the application of our approach. We note that this protocol can be easily extended to the data mining of other semiconductors (an example on perovskites will be presented).

References:

[i] Keith J.A. et al. "Combining machine learning and computational chemistry for predictive insights into chemical systems." Chemical reviews 121.16 (2021): 9816-9872.

[ii] Ghorpade U.V. et al. "Emerging chalcohalide materials for energy applications." Chemical Reviews 123.1 (2022): 327-378.

[iii] Curtarolo S. et al. "AFLOW: An automatic framework for high-throughput materials discovery." Computational Materials Science 58 (2012): 218-226.

[iv] Draxl C. et al. "NOMAD: The FAIR concept for big data-driven materials science." Mrs Bulletin 43.9 (2018): 676-682.

[v] Jain A. et al. "Commentary: The Materials Project: A materials genome approach to accelerating materials innovation." APL materials 1.1 (2013).

11:30 - 11:45
2.2-O2

Bozal-Ginesta, Carlota

Empa-Swiss Federal Laboratories for Materials Science and Technolog

Prediction of oxygen reduction performance of quaternary perovskites La0.8Sr0.2(Co,Fe,Mn)O3 with machine learning based on spectroscopic characterization data

Carlota Bozal-Ginesta

Empa-Swiss Federal Laboratories for Materials Science and Technolog, CH

Authors

Carlota Bozal-Ginesta^a^,^b, Juande Sirvent^a, Sergio Pablo-García^b, Francesco Chiabrera^a, Changhyeok Choi^b, Lisa Laa^a, Federico Baiutti^a, Alex Morata^a, Alán Aspuru-Guzik^b, Albert Tarancón^a

Affiliations

a, Nanoionics and Fuel Cells group, Catalonia Institute for Energy Research, Jardins de Les Dones de Negre 1, 08930 Sant Adrià de Besòs, Barcelona, Spain

b, Departments of Chemistry and Computer Science, University of Toronto, Lash Miller Chemical Laboratories, 80 St George Street, Toronto, ON M5S 3H6, Canada

Abstract

Lanthanum strontium-based perovskites (ABO₃) are among the state-of-the-art cathode materials for solid oxide fuel cell operating at intermediate and low temperatures (<800 ºC).(1,2) However, the effects of the composition on the nanostructure and the intrinsic properties of the materials and on the electrochemical performance are typically non-linear and hard to generalize.(3,4) Machine learning techniques have emerged as an unprecedented tool to identify complex patterns in large datasets, also in heterogeneous electrocatalysis (5,6). Herein, we have applied these techniques to delve deeper in the composition-property-performance relationships of La_0.8Sr_0.2(Mn,Co,Fe)O_3±_𝞭 and predict performance maps that can help optimize these materials. High-throughput characterization of a compositional map of La_0.8Sr_0.2(Mn,Co,Fe)O_3±_𝞭 has been carried out: information on the metal stoichiometry, the crystallinity, electrochemical performance, the structural symmetry, and the electronic configuration was obtained from X-ray diffraction (XRD), X-ray fluorescence (XRF), electrochemical impedance spectroscopy (EIS), Raman spectroscopy and ellipsometry, respectively. We processed the raw data to derive characteristic features and match the samples from different measurements. Then, a variety of supervised and unsupervised modern machine learning methods were utilized to build highly generalizable models correlating experimental features relative to the composition, the optical properties and the electrochemical pperformance of the materials, and to identify the most relevant ones. Experimental data from Raman and ellipsometry and XRD measurements was demonstrated to model the material composition and the electrochemical performance with R² of 0.913 ± 0.002 and 0.900 ± 0.003, and mean absolute errors of 0.053 ± 0.001 and 0.189 ± 0.005, respectively, with 5-fold cross-validation.

11:45 - 12:15
2.2-I1

Siahrostami, Samira

Simon Fraser University

Computational Screening Of Non-copper-based Catalysts For Electrochemical CO2 Reduction Reaction

Samira Siahrostami

Simon Fraser University, CA

Dr. Samira Siahrostami is an Associate Professor and Canada Research Chair in the Department of Chemistry at Simon Fraser University in Canada. Prior to that, she was an associate professor (2022-2023) and assistant professor (2018-2022) in the Department of Chemistry at the University of Calgary. Prior to that, she was a research engineer (2016–2018) and postdoctoral researcher (2014–2016) at Stanford University's Department of Chemical Engineering. She also worked as a postdoctoral researcher at the Technical University of Denmark from 2011 to 2013. Her work uses computational techniques such as density functional theory to model reactions at (electro)catalyst surfaces. Her goal is to develop more efficient catalysts for fuel cells, electrolyzers, and batteries by comprehending the kinetics and thermodynamics of reactions occurring at the surface of (electro)catalysts. Dr. Siahrostami has written more than 100 peer-reviewed articles with an h-index of 47 and over 13,000 citations. She has received numerous invitations to give talks at universities, conferences, and workshops around the world on various topics related to catalysis science and technology. Dr. Siahrostami is the recipient of the Environmental, Sustainability, and Energy Division Horizon Prize: John Jeyes Award from the Royal Society of Chemistry (RSC) in 2021. She received the Tom Zeigler Award and the Waterloo Institute in Nanotechnology Rising Star award in 2023. She has been named as an emerging investigator by the RSC in 2020, 2021 and 2022. Dr. Siahrostami's contribution to energy research was recognized in the most recent Virtual Issue of ACS Energy Letters as one of the Women at the forefront of energy research in 2023. She is currently the board member of the Canadian Catalysis Foundation and editor of Chemical Engineering Journal (CEJ) and APL Energy journal (AIP Publishing).

Authors

Samira Siahrostami^a

Affiliations

a, Associate Professor, Department of Chemistry, Simon Fraser University

Abstract

Currently, researchers are exploring copper-based catalysts as the most effective agents for electrochemical CO2 reduction reactions (CO2RR) that convert carbon dioxide into valuable hydrocarbons like methane, ethylene, and ethanol. However, copper catalysts are susceptible to restructuring and degradation, prompting the quest for alternative catalyst families not based on copper that facilitate CO2RR. Our approach involves employing high-throughput Density Functional Theory (DFT) calculations to systematically explore a diverse materials space, expediting the identification of catalyst materials that outperform copper in CO2RR. In this presentation, I will elaborate on the application of robust high-throughput DFT calculations to screen an extensive library of unconventional materials, including perovskites, transition metal nitrides, and metal-organic frameworks. In the computational screening of perovskites, we place a particular emphasis on screening a massive library of ABO3 Perovskites based on their electrochemical and thermochemical stability to minimize the restructuring and degradation under CO2RR operating pH and electrode potentials. I will discuss the valuable insights gained from these investigations.

12:15 - 12:45
2.2-I2

Unger, Eva

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Germany

The FAIRification of PV research data: a prerequisite for the efficient use of AI tools

Eva Unger

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Germany, DE

Authors

Eva Unger^a^,^b

Affiliations

a, Helmholtz-Zentrum Berlin für Materialen und Energie, Department Solution-Processing of Hybrid Materials and Devices, Albert-Einstein-Straße, 15, 12489 Berlin, Germany

b, Humboldt Universität zu Berlin, Department of Chemistry, Center for the Science of Materials Berlin, Zum Großen Windkanal 2, 12489 Berlin, Germany

Abstract

Sustainable global human existence requires a shift in the management and utilization of research data across all domains. In the realm of energy conversion materials, the intense exploration of halide perovskite materials since 2012 for potential use in optoelectronics has led to numerous startups actively pursuing commercialization efforts. However, the escalating volume of globally generated research data presents a critical challenge: the traditional dissemination strategies reliant on the publication of peer-reviewed articles. This causes big issues leading to the underutilization of information and research results generated. On the one hand, the spreading of information across numerous scientific journals results in an overwhelming, disparate dataset, impossible for any individual researcher to navigate. On the other hand, only a fraction of the experimental data gathered is being disseminated, with a bias toward positive outcomes, creating an incomplete and skewed dataset.

In response to this growing issue and “out of despair”, we initiated a collaborative data collection campaign in 2019 to consolidate research data from the sub-domain of perovskite solar cell test-cell development into a unified database [1]. While this dataset is currently one of the largest and most comprehensive in its field, it requires modernization to adhere to FAIR data principles and facilitate integration with machine learning tools, thereby making research more sustainable.

This presentation outlines our recent strides in expanding the Perovskite "literature" database on www.perovskitedatabase.com into an experimental database utilizing the NOMAD data infrastructure and management tools. Alongside the technical implementation, we are standardizing the data ontology for experimental samples and datasets, promoting easy replication and linkage to collect and utilize data generated by the global research community.

As a collaborative group of idealists, we dedicate our time and effort to this initiative, seeking to connect with like-minded researchers—especially those with advanced abilities and experience in research data management and AI. At this stage, input from AI experts is crucial to ensure the research data infrastructure we are building for experimental PV scientists fulfills the needs of the research community while enabling the efficient use of AI tools to make information exploitation faster and more sustainable.

12:45 - 15:30

Lunch Break

#AI 2.3

Chair: Milad Abolhasani

15:30 - 16:00
2.3-I1

Tamblyn, Isaac

University of Ottawa

Machine learning is failing us

Isaac Tamblyn

University of Ottawa, CA

Authors

Isaac Tamblyn^a^,^b

Affiliations

a, University of Ottawa, Laurier Avenue East, 75, Ottawa, CA

b, Vector Institute for Artificial Intelligence

Abstract

Supervised machine learning (ML) has proven to be an incredibly powerful enabling technology for electronic structure calculations. We now routinely see highly accurate predictions at a fraction of the computational cost of traditional methods. Machine learning has enabled electronic structure simulation at length-scales which previously seemed out-of-reach. The number of papers which use machine learning is increasing exponentially, with no signs of slowing down.

Unfortunately, these advantages have come at a high price - ML models such as deep neural networks provide no intuitive explaination about how they arrived at a particular prediction. They also offer limited generalization capabilities: each problem is treated as a new one. Unlike simple models that appear in textbooks (and inform our intuition), machine learning models are often treated as black boxes by practitioners. Even popular explainability tools fall short - they highlight correlations rather than causation.

I will discuss some of the failures and limitations of machine learning and provide examples which attempt to provide generalized insight and intutition.

16:00 - 16:30
2.3-I2

Li, Kangming

Univerisity of Toronto

Prediction Robustness and Data Redundancy in Machine Learning for Materials Science

Kangming Li

Univerisity of Toronto, CA

Kangming Li is a post-doctoral fellow in the Department of Materials Science and Engineering at University of Toronto. He received his PhD in Physics from Université Paris-Saclay, where he was a CEA-NUMERICS Fellow funded under the Marie Curie Actions. He was awarded the Dalla Torre Medal by the French Society for Metallurgy and Materials for his PhD work on finite-temperature magnetic effects in concentrated alloys. Currently he is using machine learning and high-throughput first principles calculations to accelerate the discovery of novel inorganic materials.

Authors

Kangming Li^a, Daniel Persaud^a, Kamal Choudhary^b, Brian DeCost^b, Michael Greenwood^c, Jason Hattrick-Simpers^a

Affiliations

a, Department of Materials Science and Engineering, University of Toronto, Canada

b, Material Measurement Laboratory, National Institute of Standards and Technology, USA

c, Canmet MATERIALS, Natural Resources Canada, Canada

Abstract

The rapid growth of big data in materials science has led to significant advancements in materials property prediction by machine learning (ML) models. However, big data does not necessarily lead to robust prediction performance of ML models. In addition, the issue of information redundancy in materials data has been largely overlook. This talk intends to present an examination of these two correlated challenges related to materials data: prediction robustness and data redundancy.

First, we will discuss the challenges in ensuring the prediction robustness of ML models, by showcasing the severe performance degradation when the models are trained on the Materials Project 2018 dataset and tested on the Materials Project 2021 dataset. We will demonstrate the impact of distribution shifts and use tools such as UMAP and query-by-committee to foresee performance degradation and to improve prediction accuracy. Next, we will delve into the issue of data redundancy across large materials datasets, revealing that up to 95% of materials data can be safely removed with little impact on the model performance. We will highlight the application of uncertainty-based active learning algorithms to create smaller but informative datasets, leading to more efficient data acquisition and ML training. By examining these challenges, this talk aims to provide insights into building more efficient and robust materials databases and ML models for accurate and reliable predictions in materials science.

16:30 - 16:45
2.3-O1

Charkin-Gorbulin, Anton

Laboratory for Chemistry of Novel Materials, University of Mons, Belgium

Advancing Machine Learning Force Field Designs through Atomic Graph-Based Symmetry Search

Anton Charkin-Gorbulin

Laboratory for Chemistry of Novel Materials, University of Mons, Belgium, BE

Authors

Anton Charkin-Gorbulin^a^,^b, Igor Poltavsky^b, Alexandre Tkatchenko^b, Claudio Quarti^a, David Beljonne^a

Affiliations

a, Laboratory for Chemistry of Novel Materials, University of Mons, Place du Parc 20, 7000 Mons, Belgium.

b, Department of Physics and Materials Science, University of Luxembourg, 1511 Luxembourg City, Luxembourg

Abstract

Machine-learning force fields (MLFF) show high accuracy and efficiency for modeling the potential energy surfaces of molecules, materials, and interfaces. However, the performance of MLFFs greatly depends upon incorporating the physical symmetries. Finding all relevant symmetries becomes a challenging task for large system sizes. Here, we develop a data-driven method based on molecular graphs to reveal relevant permutational symmetries and distinguish atoms with different chemical environments in molecules and materials. The kernel-based model architecture of BIGDML and the message-passing neural network architecture of MACE were enhanced to demonstrate the applicability of the developed method to the most widely used MLFF architectures. The BIGDML model, enhanced with extracted symmetries, demonstrates superior accuracy and performance, enabling comprehensive investigations of complex systems like the 1,8-naphthyridine/graphene interface and its behavior at finite temperatures. MACE was enhanced by expanding atomic species using the extracted distinctive chemical environments, resulting in improved accuracy for CsPbI₃ slab systems, particularly notable with larger training sets. Overall, this research underscores the critical role of symmetries in advancing MLFFs for complex systems with broad implications in various research fields.

16:45 - 16:50

#AI Closing

17:45 - 19:30

Poster session