AI Integration in Polymer Science
AI Integration in Polymer Science
Chemistry
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
2457
Khalid Ferji
In recent years, artificial intelligence (AI) has emerged as a transformative force across scientific disci-
plines, offering new ways to analyze data, predict material properties, and optimize processes. Yet, its
integration into polymer science remains a challenge, as the field has traditionally relied on empirical
methods and intuition-driven discovery. The complexity of polymer systems, combined with technical
barriers and a lack of interdisciplinary training, has slowed AI adoption, leaving many researchers uncertain
about where to begin. This perspective serves as an entry point for polymer scientists, introducing AI’s
Received 15th February 2025, real-world applications, accessible tools, and key challenges. Rather than an exhaustive review for special-
Accepted 25th April 2025
ists, it aims to lower entry barriers and spark interdisciplinary dialogue, bridging the gap between conven-
DOI: 10.1039/d5py00148j tional polymer research and data-driven innovation. As AI reshapes material discovery, those who
[Link]/polymers embrace this transformation today will define the future of polymer science.
1. Introduction But its role would go far beyond theoretical predictions. It could
directly interact with an automated laboratory, executing real-
Imagine a scientific assistant that never sleeps—an intelligent time experiments and dynamically adjusting reaction para-
system operating 24/7, continuously analyzing vast amounts of meters to optimize synthesis conditions. Such a system could
data, identifying research gaps, and pinpointing pressing self-adapt by learning from experimental feedback, iteratively
industrial and societal needs. This assistant could suggest inno- refining reaction conditions to minimize material waste,
vative polymers tailored for specific applications, recommend enhance polymer properties, and accelerate discovery cycles.
optimal synthesis pathways, predict degradation behavior, and Practical tasks such as sourcing reagents, ensuring quality
propose strategies for enhanced recyclability and sustainability. control, storing and organizing data, as well as maintaining
experiment logs would be seamlessly managed.
This vision, once considered science fiction, is now on the
Lorraine university-CNRS Laboratoire de Chimie Physique Macromoleculaire verge of becoming reality, made possible by Artificial
(LCPM), France. E-mail: [Link]@[Link] Intelligence (AI).1,2 The concept of AI-driven “self-driving labora-
tories” is no longer speculative.3,4 The technologies required for
seamless integration of AI, automation, and laboratory work-
Khalid Ferji is an associate pro- flows are already emerging or actively under development.
fessor in polymer chemistry at AI refers to a broad set of computational techniques that
the Laboratoire de Chimie enable machines to analyze data, recognize patterns, and make
Physique Macromoléculaire predictions beyond human capabilities. At the core of this revolu-
(LCPM) – CNRS, Université de tion is Machine Learning (ML), a subset of AI that empowers
Lorraine, Nancy. His research computers to learn from data and refine predictions without
focuses on the design and self- explicit programming.5,6 ML has already revolutionized materials
assembly of functional polymers, science and biology,7–11 as evidenced by DeepMind’s AlphaFold,
with an emerging specialization which solved the long-standing protein folding problem.12
in integrating machine learning The adoption of AI in polymer science has surged exponen-
approaches into polymer science. tially in recent years, as reflected in the increasing number of
With a multidisciplinary back- publications on the topic (Fig. 1). In this research field—where
Khalid Ferji ground, he promotes the develop- traditional trial-and-error methods struggle to navigate the
ment of accessible AI tools and immense combinatorial complexity, ML is unlocking new pos-
fosters collaboration between experimentalists and data scientists sibilities by predicting material properties, designing novel
to accelerate the digital transformation of materials discovery. polymers, and optimizing synthesis conditions with unpre-
This journal is © The Royal Society of Chemistry 2025 Polym. Chem., 2025, 16, 2457–2470 | 2457
View Article Online
Fig. 1 Number of publications related to AI in polymer science, extracted from Web of Science using the keywords (‘machine learning’ AND
‘polymer’) OR (‘artificial intelligence’ AND ‘polymer’) within the research areas: Materials Science, Polymer Science, and Chemistry. The geographical
distribution highlights the leading contributors to this emerging field. EU: European union.
cedented efficiency.13–18 Despite the rapid progress of AI in easily confused with conventional modeling techniques or
polymer science,13–19 significant challenges remain. Many industrial automation. Researchers accustomed to traditional
researchers, while intrigued by AI’s potential, find themselves computational methods may struggle to differentiate AI from
overwhelmed by its complexity and the lack of clear entry explicitly programmed simulations or automated control
points. How does AI truly work? What ML techniques are most systems. This could lead to misconceptions about what truly
relevant to polymer research? And how can these tools be effec- defines AI and how it differs from other digital tools. As a
tively implemented? result, many may assume that any computational system, from
This perspective seeks to bridge the gap between polymer molecular simulations to factory sensors, qualifies as AI—a
science and AI by offering researchers a practical starting misunderstanding that could blur the true distinction between
point. By focusing on key applications, foundational ML meth- data-driven intelligence and traditional computing.21,22
odologies, and accessible tools, we aim to demystify AI and For example, a scientific calculator could be mistaken for AI
lower the barriers to its adoption. Rather than presenting a because it performs complex calculations. However, it simply
complete mastery of the subject, this work serves as a stepping follows predefined mathematical rules and provides determi-
stone—a first step in a learning journey that will require nistic outputs—meaning the same input will always yield the
further exploration. For readers seeking deeper technical same result. It does not learn from data, adapt to user behav-
detail, several recent reviews provide complementary insights. ior, or refine its responses over time. In contrast, an AI-
For instance, the work by Aspuru-Guzik and coll.4 explores the powered system—such as an adaptive math assistant—could
integration of machine learning in self-driving laboratories, recognize handwritten equations, suggest alternative solu-
with a particular focus on Bayesian optimization, autonomous tions, and improve its predictions based on past interactions.
experiment loops, and decision-making algorithms for mole- This distinction highlights a fundamental aspect of AI: it is not
cular and materials discovery. Meanwhile, Stenzel and coll.20 just about performing computations but about learning, adapting,
offer a polymer-focused perspective, addressing challenges in and making independent decisions based on data.
data curation, the translation of chemical structure into Similarly, in industrial settings, the presence of sensors on
machine-readable descriptors, and the practical use of ML for a machine does not necessarily mean AI is at play. While auto-
property prediction, synthesis planning, and emerging bio- mation and control systems follow pre-programmed instruc-
medical applications. These resources are valuable for polymer tions, AI must learn from data, identify patterns, and adapt to
chemists aiming to move beyond introductory concepts and new conditions dynamically. True AI in polymer research
explore more advanced or specialized AI-driven strategies. extends beyond basic automation—it involves self-optimizing
synthesis, predictive modeling, and data-driven material
discovery.
2. AI as a new scientific paradigm in Unlike molecular dynamics (MD)23 simulations or density
polymer science functional theory (DFT),24 which rely on explicit physical
equations to predict behaviors like phase transitions, chain
As AI gains traction, its role may be misunderstood, especially conformations, or mechanical properties, AI offers an entirely
for new users in fields like polymer science, where it could be new paradigm by extracting patterns directly from data. This
2458 | Polym. Chem., 2025, 16, 2457–2470 This journal is © The Royal Society of Chemistry 2025
View Article Online
enables accurate predictions even when the underlying physics connected neurons that process and learn from data (Fig. 2).
is not fully understood.25 However, unlike traditional models, The input layer receives raw data, which is then processed
many AI techniques function as “black boxes”, making predic- through hidden layers where patterns are identified, before
tions based on complex statistical correlations rather than reaching the output layer, which generates predictions or
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
explicit physical laws.26 This lack of interpretability can lead to classifications. Each artificial neuron refines its parameters
skepticism within the scientific community, as AI-generated over time through training, improving the model’s accuracy.
results may be difficult to explain using physical principles. One way to visualize this process is to think of a team of
For example, Bhattacharya and Patra27 demonstrated that specialists solving a complex puzzle: the first layer gathers
AI could accurately predict polymer phase transitions, such as basic clues, the middle layers analyze deeper relationships
the coil-to-globule transition, while significantly reducing the between the clues, and the final layer makes an informed
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
This journal is © The Royal Society of Chemistry 2025 Polym. Chem., 2025, 16, 2457–2470 | 2459
View Article Online
Fig. 2 Overview of main machine learning methods and their applications in polymer science. Deep learning (DL) can be applied across all three
categories (supervised, unsupervised, and reinforcement learning) to analyze complex polymer data, predict properties, and optimize synthesis.
Example of experimental data sources used in ML driven polymer research include Atomic Force Microscopy (AFM), Transmission Electron
Microscopy (TEM), and Nuclear Magnetic Resonance (NMR) spectroscopy.
first converts the chemical structures of polymers and solvents In another application, Lu et al.36 employed SL to predict
into numerical descriptors that encode key molecular pro- phase behavior in polymerization induced self-assembly (PISA)
perties such as size, polarity, and functional groups. These using random forest models, a widely used decision tree-based
descriptors are then compressed into a simplified mathemat- algorithm for classification tasks. Their model was trained on
ical representation (known as a latent space), where the neural a dataset of 592 experimental data points, where each entry
network detects patterns that govern polymer–solvent inter- was labeled with the experimentally observed morphology
actions. Finally, the trained model predicts whether a new (e.g., spheres, worms, or vesicles). By analyzing key features
polymer–solvent pair will be compatible. This approach such as monomer composition, polymerization conditions,
achieved an impressive 93% accuracy—significantly outper- and block ratio, the algorithm learned to classify new PISA
forming traditional heuristic methods such as the Hildebrand systems with high accuracy. A key advantage of this approach
and Hansen solubility parameters. Such advancements are is its interpretability, allowing researchers to identify which
particularly valuable in plastics recycling, membrane science, molecular parameters most influence phase transitions.
and drug delivery, where selecting the appropriate solvent is Building on this foundation, Fonseca Parra et al.37
essential for material processing and performance. employed DL framework to construct 3D pseudo-phase dia-
2460 | Polym. Chem., 2025, 16, 2457–2470 This journal is © The Royal Society of Chemistry 2025
View Article Online
Fig. 3 Machine learning workflow for predicting polymer–solvent compatibility. The trained neural network model processes polymer and solvent
descriptors separately, transforming them into latent space representations before merging them for final classification. The model evaluates a given
polymer structure against 24 solvents and predicts whether they act as good solvents or non-solvents based on learned compatibility patterns.
Reproduced with permission from ref. 41 Copyright 2020, American Chemical Society.
grams for block copolymers (Fig. 4). Their approach utilized a a SL pipeline where a Convolutional Neural Networks (CNNs),
deep neural network trained on literature data to capture a specific type of neural network, model is trained on labeled
complex morphology transitions. Unlike traditional 2D phase datasets of nanoparticle positions and sizes. The dataset con-
diagrams that only consider a few experimental variables, their sists of 72 TEM images, from which 279 057 labeled sub-
model incorporates multiple processing parameters simul- images were extracted using an automated cropping and label-
taneously, offering a predictive understanding of phase behav- ing method (DOPAD). Once trained, the model accurately pre-
ior. The neural network learns nonlinear relationships dicts the positions and sizes of nanoparticles in new TEM
between polymer composition, concentration, and self-assem- images, significantly improving the speed and precision of
bly behavior, making it a more powerful tool for predicting nanoparticle characterization compared to manual methods.
morphologies that may not follow simple heuristic rules. This technique enhances polymer nanocomposite analysis,
SL has been used to automate complex data analysis tasks, facilitating research in advanced materials, coatings, and func-
particularly in microscopy image processing. A significant tional polymer-based nanotechnologies.
challenge in polymer nanocomposite research is the precise It is important to differentiate between types of input data
localization and characterization of nanoparticles within when designing supervised learning pipelines. While property
polymer matrices, which is traditionally done manually or with prediction tasks (e.g., Tg, solubility) typically rely on structured
labor-intensive image analysis techniques. To address this, Qu chemical descriptors derived from SMILES or molecular fin-
et al.82 developed a deep learning-based method to detect and gerprints, image-based analyses (e.g., TEM, AFM) require
quantify nanoparticles in transmission electron microscopy entirely different approaches. These involve models such as
(TEM) images. Their approach, summarized in Fig. 5, involves CNNs or object detection architectures like YOLOv, which
Fig. 4 Deep learning workflow for predicting 3D pseudo-phase diagrams of copolymer self-assembly. Experimental data were collected from the
literature and processed to ensure consistency before being used to train a deep neural network. The model classifies polymer compositions into
different self-assembled morphologies—spheres (S), worms (W), or vesicles (V)—and generates high-resolution 3D pseudo-phase diagrams.
Reproduced with permission from ref. 37 Copyright 2025, American Chemical Society.
This journal is © The Royal Society of Chemistry 2025 Polym. Chem., 2025, 16, 2457–2470 | 2461
View Article Online
Fig. 5 Supervised learning workflow for nanoparticle detection in polymer nanocomposites using a Convolutional Neural Network (CNN). A
dataset of 72 TEM images was processed into 279 057 labeled sub-images using automated labeling and cropping (DOPAD). The trained CNN model
detects and localizes nanoparticles in new images, predicting their positions and sizes with high accuracy, thereby streamlining the characterization
process. Reproduced with permission from ref. 82 Copyright 2021, American Chemical Society.
operate directly on pixel-level information. Each data modality the complex structural variations. Their approach provided
presents unique challenges: image data often demands exten- deeper insights into self-assembly mechanisms, which are
sive annotation and data augmentation strategies, while crucial for drug delivery and biomaterials development.
descriptor-based models are sensitive to the choice and quality Another interesting example is the work of Sutliff et al.,33
of input features. Recognizing and adapting to these differ- who applied UL to analyze near-infrared (NIR) spectra of poly-
ences is crucial for model performance and interpretability. olefins. NIR spectroscopy generates rich spectral data contain-
ing valuable chemical information, but interpreting this data
3.2. Unsupervised learning manually is challenging due to its complexity. To simplify the
analysis, the researchers used functional principal component
Unsupervised learning (UL) is a powerful approach that ident-
analysis (fPCA), a mathematical technique, that transforms the
ifies patterns in unlabeled data, meaning that no predefined
original complex data into a smaller number of new variables
outputs are available.59 Unlike supervised learning, which
called principal components. These components are calcu-
relies on explicit input–output pairs, UL models explore data
lated in such a way that they retain most of the variability
autonomously to detect hidden structures, clusters, or relation-
present in the original data. In simpler terms, fPCA acts like a
ships. In other words, it is like a student analyzing books inde-
“compression” method that keeps the most important chemi-
pendently to identify common themes without a teacher
cal signals while filtering out noise and redundancy. In this
guiding them.
case, each spectrum was treated as a function across wave-
This makes UL particularly valuable for understanding
lengths, and fPCA identified common patterns (or “shapes”)
complex polymer datasets where experimental labels may be
across the spectra. This allowed the researchers to cluster the
scarce or difficult to define. UL is particularly useful for clus-
polyolefins based on similarities in their spectral fingerprints,
tering, where polymers with similar chemical properties or
without requiring prior labeling of the samples (Fig. 6). This
structural characteristics are grouped together, and for dimen-
dimensionality reduction not only made the dataset easier to
sionality reduction, which simplifies high-dimensional
visualize and interpret, but also highlighted meaningful
polymer datasets while preserving essential information.84,85
groupings linked to polymer composition and structure. As a
UL techniques have been successfully applied in polymer
result, UL revealed chemical trends that would have been
research to extract meaningful insights from complex datasets.
difficult to extract using traditional analysis methods.
Ziolek et al.55 used UL methods to investigate the nanoscale
structure of micelles formed by four-arm and linear block
copolymers. By clustering molecular conformations, they 3.3. Reinforcement learning and closed-loop optimization
identified groups of micelle structures with similar corona Reinforcement Learning (RL) is a distinct category of machine
arrangements, while dimensionality reduction helped simplify learning in which models learn by interacting with an environ-
2462 | Polym. Chem., 2025, 16, 2457–2470 This journal is © The Royal Society of Chemistry 2025
View Article Online
Fig. 6 Workflow of the unsupervised learning (UL) approach applied to polyolefins using near-infrared (NIR) spectroscopy. (1) Raw NIR spectra of
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
different polymer types: polypropylene (PP), low-density polyethylene (LDPE), linear low-density polyethylene (LLDPE), medium-density polyethyl-
ene (MDPE), high-density polyethylene (HDPE), and polypropylene-co-polyethylene (PP-co-PE). (2) Functional principal component analysis (fPCA)
reduces the spectral data into a low-dimensional space, clustering samples based on spectral similarities. (3) The extracted principal components
correlate with crystallinity, demonstrating how UL can reveal hidden relationships in polymer data without predefined labels. Reproduced with per-
mission from ref. 33 Copyright 2024, American Chemical Society.
ment and receiving rewards for taking optimal actions.60 datasets, RL dynamically adjusts strategies based on continu-
Unlike SL, where models are trained on labeled datasets, RL ous feedback, requiring extensive computational resources and
algorithms discover optimal strategies through trial and error, advanced algorithms. These properties make RL a powerful
making them particularly suited for tasks requiring sequential tool for optimizing polymerization processes and autonomous
decision-making. A useful analogy is that of a child learning experimental control, but they also contribute to its greater
that fire is dangerous only after touching it—the knowledge is mathematical and implementation complexity.3,48,86,87
gained through direct experience rather than prior instruction. Li et al.87 developed a strategy to regulate the molecular
Compared to supervised and unsupervised learning, RL is weight distribution (MWD) in atom transfer radical polymeriz-
significantly more complex as it involves sequential decision- ation (ATRP). Instead of relying on predefined reaction proto-
making, long-term reward optimization, and an exploration– cols, their model learns dynamically by interacting with the
exploitation trade-off. Unlike models that learn from static polymerization process. As illustrated in Fig. 7, the system
Fig. 7 Reinforcement learning framework for optimizing molecular weight distribution (MWD) in atom transfer radical polymerization (ATRP). The
AI agent observes the reaction state, selects actions (adjusting reagent addition), and updates its strategy based on real-time feedback and reward
evaluation, iteratively improving polymerization outcomes. Reproduced with permission from ref. 87 Copyright 2018, Royal Society of Chemistry.
This journal is © The Royal Society of Chemistry 2025 Polym. Chem., 2025, 16, 2457–2470 | 2463
View Article Online
follows a classic RL framework, where the reactor acts as the tory tasks. Hybrid strategies that combine RL with Bayesian
environment, and the AI agent ( policy network and value optimization (BO)88 or SL may offer more practical solutions in
network) selects reagent addition strategies based on observed the near term. A recent example by Pittaway et al.89 illustrates
reaction states (e.g., monomer and initiator concentrations). how such hybrid strategies can be implemented in practice,
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
The model continuously compares the current MWD to the combining multi-objective BO with real-time analytical feed-
target distribution (e.g., Gaussian or bimodal profiles) and back (DLS) to enable closed-loop self-optimization of emulsion
updates its decision-making policy based on rewards received polymerization in a continuous-flow reactor platform.
for achieving optimal polymer properties. By iteratively refin- Warren et al.48 developed an AI-driven closed-loop polymer-
ing reagent addition, the RL-based system optimizes ATRP ization system to optimize reversible-addition fragmentation
conditions in real time, improving precision in molecular chain transfer (RAFT) polymerization conditions, achieving
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
weight control and enabling the design of custom polymer targeted molecular weight and dispersity with minimal experi-
architectures with minimal experimental trials. mental trials. Their approach (Fig. 8) integrates real-time
While RL holds great promise, its application in polymer experimental feedback with BO, specifically the Thompson
science remains limited by several factors. RL typically requires Sampling Efficient Multi-Objective Optimization (TSEMO)
either extensive real-time experimentation or high-fidelity algorithm. The system iteratively tests reaction conditions,
simulation environments, both of which are resource-inten- evaluates the results, and refines its strategy based on real-
sive. Moreover, defining suitable reward functions and action time feedback from nuclear magnetic resonance (NMR) and
spaces for polymer systems can be non-trivial. As such, RL may gel permeation chromatography (GPC). Instead of relying on
be best suited for narrowly defined problems (e.g., optimizing predefined datasets, the platform learns from its own experi-
a specific polymerization protocol) rather than broad explora- ments, systematically adjusting temperature and reaction time
Fig. 8 AI-guided closed-loop optimization of reversible addition–fragmentation chain transfer (RAFT) polymerization using Bayesian optimization.
The system integrates real-time feedback from nuclear magnetic resonance (NMR) and gel permeation chromatography (GPC) to dynamically adjust
reaction parameters such as temperature and time, optimizing monomer conversion and controlling molar mass dispersity (Đ). The panels show (a)
a generalized scheme for the RAFT synthesis platform, (b) representative GPC chromatograms, (c) 1H NMR spectra, (d) a schematic of the automated
platform, and (e) an overview of the structure of the Thompson-sampling efficient multi-objective optimisation (TSEMO) algorithm-based experi-
ments. Reproduced with permission from ref. 48 Copyright 2022, Royal Society of Chemistry.
2464 | Polym. Chem., 2025, 16, 2457–2470 This journal is © The Royal Society of Chemistry 2025
View Article Online
to maximize monomer conversion while minimizing disper- istics of each ML approach, highlighting their data require-
sity. To make informed decisions, it builds a predictive model ments, optimization strategies, and relevance to polymer
that estimates the outcome of untested reaction conditions, research.
and uses this model to select the most informative next experi- While machine learning offers powerful tools to accelerate
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
ments. The algorithm balances exploration (testing uncertain discovery and optimize polymer systems, it is important to
regions of the parameter space) and exploitation (focusing on emphasize that it is not always the most effective or appropri-
promising conditions), enabling efficient optimization across ate solution. In certain contexts, especially when the system is
multiple objectives. well-characterized or the design space is limited, simpler pro-
Despite not being a pure RL system, the work by Warren grammatic screening approaches may outperform more soph-
et al. compellingly demonstrates how autonomous experimen- isticated ML-based optimization methods. As such, compara-
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
tation and adaptive optimization can be applied to complex tive benchmarking and critical method selection should
polymer synthesis challenges. This approach lays the ground- remain integral to any data-driven strategy in polymer science.
work for semi-autonomous, self-learning platforms that reduce
human workload and enable more precise control over
polymerization processes. It represents a significant step 4. Real-world ML tools
forward toward fully integrated AI-driven material discovery.
Through these simplified examples, we have demonstrated To facilitate the integration of ML in polymer science, numer-
the diverse potential of ML in polymer science, from predict- ous AI tools and platforms are available to support data man-
ing polymer properties to autonomously optimizing synthesis agement, analysis, and modeling. Table 2 organizes these
conditions. Each ML technique—supervised, unsupervised resources by functionality, from no-code ML platforms and
and reinforcement learning—offers distinct capabilities, execution environments to data handling libraries, visualiza-
whether for making accurate property predictions, uncovering tion tools, cheminformatics toolkits, and polymer-specific
hidden patterns, or enabling self-learning experimental work- repositories. Open-source tools play a central role in this eco-
flows. These methods differ in learning process, compu- system, fostering transparency, reproducibility, and accessibil-
tational complexity, and scope of application. To provide a ity, empowering a broader scientific community to engage in
structured comparison, Table 1 summarizes the key character- AI-driven materials discovery.
Feature Supervised learning (SL) Unsupervised learning (UL) Reinforcement learning (RL)
Data type Labeled data (input–output pairs) Unlabeled data (finding patterns) No predefined labels, learns from
interaction
Goal Predict outputs (classification/ Cluster/group similar data or reduce Learn a sequence of actions to
regression) dimensions maximize rewards
Learning process Learns from explicit examples Identifies hidden structures Learns by trial & error via environment
autonomously feedback
Optimization focus Minimize loss (error) Find clusters, patterns, representations Maximize long-term rewards
Computational Moderate Moderate to high Very high (complex decision-making)
complexity
Table 2 Overview of essential tools and platforms for machine learning in polymer science. OS: open-source, FT: free-tier, and P: proprietary
No-code/low-code ML Teachable machine (FT), Weka (OS), KNIME (OS), ML without coding via graphical interfaces; ideal for
platforms Google AutoML (P), Azure ML (P) classification, clustering, basic workflows.
Programming and execution Google Colab (FT), Jupyter Notebooks (OS), Interactive coding, script execution, environment
environments Anaconda (FT), Python (OS) management for data science workflows
Data manipulation & Numpy (OS), Pandas (OS) Efficient handling of arrays, tables, and structured
preprocessing experimental data
Data visualization Matplotlib (OS), Seaborn (OS) Graphical representation of data and model outputs for
analysis and communication
Machine learning libraries Scikit-learn (OS), TensorFlow (OS), PyTorch (OS) Libraries for classical machine learning and deep learning:
regression, classification, neural networks
Chemical representation & SMILES, BigSMILES (OS), RDKit (OS) Encoding of molecular/polymeric structures and generation
descriptors of chemical descriptors
Polymer data repositories Polymer Genome (FT), PoLyInfo, PI1M, CROW, Databases of experimental and computational polymer
NIST DB (R/FT), Materials Project (FT) properties
Collaboration & sharing GitHub (OS), Zenodo (OS), Hugging Face (OS), Hosting of code, datasets, and trained models; support for
platforms Figshare (OS) version control and DOI-based citation
This journal is © The Royal Society of Chemistry 2025 Polym. Chem., 2025, 16, 2457–2470 | 2465
View Article Online
For researchers new to ML, Python has become the primary machine learning techniques. However, despite this surge in
programming language due to its simplicity, flexibility, and research, significant barriers still hinder widespread ML adop-
extensive ecosystem of scientific libraries. User-friendly plat- tion in experimental and industrial settings.
forms like Google Colab and Jupyter Notebooks provide inter- These challenges stem from data availability and quality
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
active coding environments, allowing researchers to write and issues, the learning curve for polymer scientists, compu-
execute Python code without requiring advanced compu- tational constraints, and the lack of standardized frameworks
tational resources or complex installations. These tools facili- for integrating ML into polymer research. Addressing these
tate key tasks such as loading datasets, cleaning and prepro- obstacles is essential to ensuring that ML evolves from a prom-
cessing data, as well as applying ML models. Open-source ising concept into an accessible, widely used tool. The follow-
libraries such as Pandas and Numpy streamline data handling ing sections outline key hurdles and potential solutions to
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
and numerical processing, while visualization libraries such as bridge the gap between AI potential and real-world implemen-
Matplotlib and Seaborn enable researchers to generate high- tation in polymer science.
quality scientific graphs and complex data visualizations.
For researchers who prefer minimal coding, no-code or low- 5.1. Data resources: availability, accessibility, and challenges
code platforms provide an alternative entry point. KNIME, for The integration of ML into polymer science relies heavily on
instance, offers a drag-and-drop interface for building ML the availability of structured, high-quality datasets and colla-
workflows, making it possible to preprocess data, train borative coding platforms. Several initiatives (Table 2) have
models, and evaluate predictions without writing code. been developed to support researchers by providing curated
Similarly, Teachable Machine by Google simplifies classifi- databases, machine-readable polymer representations, and
cation tasks, while platforms like Google AutoML and Azure repositories for sharing ML models. These resources enable
ML enable researchers to train custom models through intui- scientists to train and fine-tune ML models effectively, acceler-
tive web interfaces. ating both material discovery and AI-driven innovation. Among
A significant application of ML in polymer science is the these, Polymer Genome offers ML-driven polymer property pre-
processing of molecular representations using cheminfor- dictions, Materials Project includes computationally derived
matics tools. RDKit converts chemical structures into polymer-related data, and the NIST Polymer Database compiles
machine-readable formats, such as SMILES strings or mole- experimentally validated polymer properties, serving as a
cular fingerprints, which serve as inputs for ML models. benchmark for AI applications. Other domain-specific
BigSMILES90 extends this functionality to stochastic polymers, resources such as PoLyInfo and PI1M, also offer structured
allowing for the representation of structural variations in datasets of polymer structures and properties, though often
polymer chains. Meanwhile, Polymer Genome offers pre- with limited interoperability.
trained models for polymer property prediction, facilitating Although these platforms are growing in number, most
rapid screening of polymer candidates based on molecular available datasets in polymer science still fall into the “small
descriptors. data” category—typically comprising dozens to hundreds of
Navigating and analyzing large polymer datasets is another entries, often collected manually or extracted from literature.
common challenge that ML tools effectively address. For This contrasts sharply with big data contexts and limits the
example, using Python’s Pandas library, a researcher can filter scope and robustness of ML models, particularly for deep
polymers based on molecular weight, calculate property corre- learning applications. Addressing this issue requires both
lations, or generate statistical insights within seconds—tasks community-driven data generation and improved access to
that would be time-consuming with traditional tools like standardized, high-volume datasets.
Excel. These workflows accelerate analysis, improve reproduci- Beyond polymer-specific databases, various general plat-
bility, and enhance data-driven decision-making. forms facilitate collaborative coding, AI model sharing, and
With the growing accessibility of open-source libraries, data accessibility, which can be leveraged by the polymer
user-friendly platforms, and pre-trained ML models, integrat- science community (Table 2). These platforms not only facili-
ing ML into polymer research has never been more feasible. tate interdisciplinary collaboration but also serve as prototypes
Researchers can start with beginner-friendly tools such as for developing specialized equivalents tailored to polymer
Scikit-learn for predictive modeling or KNIME for workflow research. Hugging Face is widely recognized for its repository
automation, progressively expanding their expertise into deep of pre-trained ML models, including polymer-specific tools,
learning frameworks like TensorFlow and PyTorch as needed. while Zenodo serves as an open-access repository for struc-
tured datasets and ML models, ensuring proper attribution
through Digital Object Identifiers (DOIs). Meanwhile, GitHub
5. Challenges and considerations remains an essential platform for collaborative coding, dataset
hosting, and version-controlled AI workflows, enhancing trans-
While AI holds great promise for transforming polymer parency and reproducibility.
science, its integration into the field requires overcoming Despite the increasing availability of these resources, sig-
several key challenges. The growing number of AI-driven nificant challenges persist in data standardization and accessi-
studies (Fig. 1) reflects increasing interest, particularly in bility. Many studies still suffer from fragmented, inconsistent,
2466 | Polym. Chem., 2025, 16, 2457–2470 This journal is © The Royal Society of Chemistry 2025
View Article Online
or inaccessible datasets, often lacking sufficient metadata or industry, where demand for interdisciplinary expertise is
omitting critical details about synthesis conditions, character- increasing.
ization techniques, and experimental outcomes. Without stan- Several industrial leaders have already integrated AI-driven
dardized data-sharing protocols, polymer science risks falling strategies into their research and development efforts. BASF
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
behind disciplines such as biology and materials science, has invested in AI for materials discovery, Dow Chemical is
where open data practices have already enabled rapid AI and exploring ML for process optimization, Covestro is leveraging
ML adoption. Scientific journals and funding agencies should AI for sustainable polymer design, and Arkema has initiated
take an active role in addressing this issue by mandating struc- AI-based material innovation programs. However, the full
tured dataset publication alongside research articles to potential of AI in the polymer industry remains underutilized,
enhance reproducibility and accessibility. Establishing com- largely due to the limited availability of professionals who can
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
munity-wide norms for data collection, annotation, and disse- bridge the gap between data science and polymer engineering.
mination is essential for creating interoperable datasets that To close this gap, universities should introduce ML, data
serve as a foundation for ML-driven polymer research. science, and AI courses specifically tailored to polymer science
To move from raw data to ML-ready datasets, researchers applications. Early exposure to AI tools and computational
are encouraged to consider the following workflow: (i) standar- methods will enable future polymer researchers to integrate these
dize chemical representation (e.g., using SMILES or techniques into their workflows with confidence. Additionally,
BigSMILES), (ii) enrich datasets with metadata (synthesis con- workshops, summer schools, and online training programs
ditions, characterization techniques), (iii) perform basic data should be expanded to provide current researchers and industry
cleaning (handling missing values, duplicates), and (4) professionals with foundational ML and AI skills. These initiatives
publish structured datasets via open platforms such as will ensure that AI adoption in polymer science is not limited to a
Zenodo, GitHub, or the Polymer Genome repository. Ensuring small group of interdisciplinary experts but becomes a standard
datasets are machine-readable (CSV, JSON, HDF5) and version- component of both academic and industrial education.
controlled is essential for reproducibility. Additionally,
researchers are encouraged not only to share datasets but also 5.3. Computational costs
to publish their ML workflows, and if possible pre-trained Integrating AI into polymer research requires substantial com-
models to foster transparency and collaboration. Open-source putational power, particularly for deep learning and other data-
initiatives and collaborative coding environments have the intensive techniques. Training large neural networks or analyz-
potential to reduce redundancy, improve model accuracy, and ing high-dimensional datasets from molecular simulations or
create a shared knowledge base that benefits the entire field. spectroscopy can be highly resource-intensive, making access to
Whenever applicable, both data and code should comply with high-performance computing (HPC) infrastructure a limiting
the FAIR principles (Findable, Accessible, Interoperable, and factor for many academic and industrial laboratories.
Reusable). By moving toward a more open and collaborative To address these challenges, government-led initiatives
research culture, the polymer community can fully harness worldwide provide researchers with access to advanced com-
ML’s potential, ensuring that data is widely available, standar- puting facilities:
dized, and effectively utilized for accelerating material discov- France and Europe. In France, the GENCI (Grand
ery and polymer informatics. Équipement National de Calcul Intensif ) provides state-of-the-
art supercomputing resources, such as the Jean Zay supercom-
5.2. Educational gaps in polymer science: the need for puter, which is optimized for AI applications. At the European
interdisciplinarity level, the EuroHPC (European High-Performance Computing)
The adoption of AI in polymer science represents a fundamen- program offers access to world-class infrastructures like LUMI
tal shift for many researchers accustomed to empirical (Finland) and MeluXina (Luxembourg), designed to support
methods or traditional computational approaches. While ambitious scientific projects, including AI-driven research in
short-term collaborations between polymer scientists and AI materials science.
experts help bridge this gap, the long-term solution lies in USA. The Department of Energy (DOE) provides access to
integrating AI and ML education into polymer science curri- supercomputers such as Summit and Frontier, which are
cula. Given the specialized nature of polymer science and its among the most powerful in the world. These facilities are
experimental nuances, teaching AI and ML to polymer made available to researchers through collaborative programs
researchers is often more practical than training computer with universities and national labs, supporting innovative
scientists in polymer chemistry and engineering. interdisciplinary research.
Despite the growing impact of AI on materials research, Asia. In Japan, the RIKEN Center for Computational Science
structured AI education within polymer science curricula operates the Fugaku supercomputer, one of the most powerful
remains scarce. Few master’s programs offer specialized train- systems globally, which is accessible to researchers across mul-
ing that integrates polymer science and data-driven tiple disciplines. Similarly, China has invested heavily in AI-
approaches, limiting the number of researchers capable of focused supercomputing facilities in cities like Tianjin and
advancing ML-driven polymer research. This educational gap Shenzhen, fostering rapid advancements in computational
not only slows academic progress but also affects the polymer science.
This journal is © The Royal Society of Chemistry 2025 Polym. Chem., 2025, 16, 2457–2470 | 2467
View Article Online
2468 | Polym. Chem., 2025, 16, 2457–2470 This journal is © The Royal Society of Chemistry 2025
View Article Online
16 W. Sha, Y. Li, S. Tang, J. Tian, Y. Zhao, Y. Guo, W. Zhang, 42 A. Bera, T. S. Akash, R. Ishraaq, T. H. Pial and S. Das,
X. Zhang, S. Lu, Y.-C. Cao and S. Cheng, InfoMat, 2021, 3, Macromolecules, 2024, 57, 1581–1592.
353–361. 43 L. A. Miccio and G. A. Schwartz, Macromolecules, 2021, 54,
17 T. B. Martin and D. J. Audus, ACS Polym. Au, 2023, 3, 239–258. 1811–1817.
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
18 J. Wang, K. Tian, D. Li, M. Chen, X. Feng, Y. Zhang, 44 L. Schneider and J. J. de Pablo, Macromolecules, 2021, 54,
Y. Wang and B. Van der Bruggen, Sep. Purif. Technol., 2023, 10074–10085.
313, 123493. 45 Y. Zhang and X. J. Xu, Polym. Chem., 2021, 12, 843–851.
19 Y. K. Zhao, R. J. Mulder, S. Houshyar and T. C. Le, Polym. 46 N. H. Park, D. Y. Zubarev, J. L. Hedrick, V. Kiyek, C. Corbet
Chem., 2023, 14, 3325–3346. and S. Lottier, Macromolecules, 2020, 53, 10847–10854.
20 W. Ge, R. De Silva, Y. Fan, S. A. Sisson and M. H. Stenzel, 47 B. K. Wheatle, E. F. Fuentes, N. A. Lynd and V. Ganesan,
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
This journal is © The Royal Society of Chemistry 2025 Polym. Chem., 2025, 16, 2457–2470 | 2469
View Article Online
66 E. S. Thrall, F. Martinez Lopez, T. J. Egg, S. E. Lee, J. Schrier 81 T. Jin, C. W. Coley and A. Alexander-Katz, Macromolecules,
and Y. Zhao, J. Chem. Educ., 2023, 100, 4933–4940. 2023, 56, 1798–1809.
67 A. M. Hupp, J. Chem. Educ., 2023, 100, 1377–1381. 82 E. Z. Qu, A. M. Jimenez, S. K. Kumar and K. Zhang,
68 L. Breiman, Mach. Learn., 2001, 45, 5–32. Macromolecules, 2021, 54, 3034–3040.
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
69 R. G. Brereton and G. R. Lloyd, Analyst, 2010, 135, 230–267. 83 E. van de Reydt, N. Marom, J. Saunderson, M. Boley and
70 H. Abdi and L. J. Williams, Wiley Interdiscip. Rev.:Comput. T. Junkers, Polym. Chem., 2023, 14, 1622–1629.
Stat., 2010, 2, 433–459. 84 S. Solorio-Fernández, J. A. Carrasco-Ochoa and
71 A. Likas, N. Vlassis and J. J. Verbeek, Pattern Recognit., J. F. Martínez-Trinidad, Artif. Intell. Rev., 2020, 53, 907–948.
2003, 36, 451–461. 85 A. Saxena, M. Prasad, A. Gupta, N. Bharill, O. P. Patel,
72 A. Géron, Hands-on machine learning with Scikit-Learn, A. Tiwari, M. J. Er, W. P. Ding and C. T. Lin,
Open Access Article. Published on 28 April 2025. Downloaded on 8/14/2025 [Link] AM.
Keras, and TensorFlow, O’Reilly Media, Inc., 2022. Neurocomputing, 2017, 267, 664–681.
73 B. Mahesh, Int. J. Sci. Res., 2020, 9, 381–386. 86 R. Ma, H. Zhang and T. Luo, ACS Appl. Mater. Interfaces,
74 F. Chollet, Deep learning with Python, Simon and Schuster, 2022, 14, 15587–15598.
2021. 87 H. Li, C. R. Collins, T. G. Ribelli, K. Matyjaszewski,
75 F. Rosenblatt, Psychol. Rev., 1958, 65, 386. G. J. Gordon, T. Kowalewski and D. J. Yaron, Mol. Syst. Des.
76 Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, Proc. IEEE, Eng., 2018, 3, 496–508.
1998, 86, 2278–2324. 88 E. Brochu, V. M. Cora and N. De Freitas, 2010, arXiv pre-
77 J. L. Elman, Cognit. Sci., 1990, 14, 179–211. print arXiv:1012.2599, DOI: 10.48550/arXiv.1012.2599.
78 S. Hochreiter, Neural Computation, MIT-Press, 1997. 89 P. M. Pittaway, S. T. Knox, O. J. Cayre, N. Kapur, L. Golden,
79 G. E. Hinton, S. Osindero and Y.-W. Teh, Neural Comput., S. Drillieres and N. J. Warren, Chem. Eng. J., 2025, 507,
2006, 18, 1527–1554. 160700.
80 M. E. Deagen, B. Dalle-Cort, N. J. Rebello, T.-S. Lin, 90 C. Yan, X. M. Feng, C. Wick, A. Peters and G. Q. Li,
D. J. Walsh and B. D. Olsen, Macromolecules, 2024, 57, 42–53. Polymer, 2021, 214, 12.
2470 | Polym. Chem., 2025, 16, 2457–2470 This journal is © The Royal Society of Chemistry 2025
Effective implementation of AI in polymer research requires high-performance computing resources, which can be a barrier due to their resource-intensive nature. Researchers face challenges like accessing supercomputing facilities and overcoming skepticism due to AI's black-box nature. Governments and organizations are responding by providing access to state-of-the-art computing facilities, which support AI-driven projects and foster collaborative research efforts .
AI is crucial for sustainable polymer development as it facilitates the rational design of biodegradable polymers, assists recycling strategies, and optimizes energy-efficient synthesis processes. It will drive the transition toward a circular polymer economy by leveraging AI's ability to derive efficient, sustainable practices from complex data sets, ensuring that innovation aligns with environmental and economic goals .
AI's 'black-box' nature can lead to skepticism due to the difficulty in explaining AI-generated results with physical principles. This impacts its acceptance, as researchers may prefer models they can directly interpret. Mitigating these concerns includes developing more interpretable AI models, improving AI literacy among scientists, and integrating AI with traditional methods for validation purposes .
AI differs from traditional molecular dynamics (MD) simulations by predicting polymer behavior through pattern extraction from data, rather than relying on explicit physical equations. AI allows for accurate predictions even when the full physical understanding is lacking, and it significantly reduces computational costs compared to MD simulations. However, AI functions as a black box, which can challenge the understanding of underlying principles, whereas MD simulations are more transparent and based on known physical laws .
Machine learning benefits polymer research by efficiently navigating the immense combinatorial complexity of material properties, enabling the design of novel polymers, and optimizing synthesis conditions. Unlike traditional trial-and-error methods, ML can predict outcomes with unprecedented efficiency, making it a powerful tool for both discovery and optimization in complex systems .
AI in polymer science is expected to transition research from empirical iteration to data-driven hypothesis generation. AI methodologies like reinforcement and unsupervised learning will increasingly play roles in optimizing polymerization processes and enhancing structure-property relationship understanding. The evolution involves moving beyond mere automation to developing sophisticated decision-making capabilities that enable dynamic adjustment and optimization in real-time, paving the way for precision-driven material design .
Current tools and methodologies for assisting polymer scientists include dedicated AI software, user-friendly platforms for data analysis, and collaborative frameworks that integrate ML with experimental research. These tools contribute to reducing adoption barriers by simplifying complex AI processes, offering practical applications, and facilitating learning among researchers with limited AI background, thus bridging the knowledge gap .
Ethical considerations include the potential reduction in human oversight as AI systems gain decision-making autonomy, which could lead to biases in material design and synthesis. Ensuring transparency and accountability in AI processes is crucial, as well as addressing the balance between AI efficiency and human control to prevent unintended consequences in material properties and environmental impact .
AI can be misunderstood in polymer science when it is confused with conventional modeling techniques or industrial automation. Unlike traditional methods that rely on predefined calculations and deterministic outputs, AI systems learn from data, adapt, and make independent decisions. While traditional models follow explicit physical equations, AI extracts patterns from data, offering predictive capabilities in situations where underlying physics is not fully understood .
Efforts to bridge the gap between polymer science and AI include offering practical starting points, focusing on key applications, foundational methodologies, and using accessible tools to demystify AI. These measures are effective as stepping stones for researchers unfamiliar with AI, easing their entry into advanced AI strategies by reducing complexity and integrating AI understanding in polymer science .