Automated Recyclable Waste Detection Using Deep Neural
Networks
Nagaleela Gangula1, J S. Sukanya2, Nagesh Chilamakuru3, Kummara Lokeshnath4, Shabana G5, Ganesh G6.
1,2,3,4,5,6. Assistant Professor, Dept. of CSE, Srinivasa Ramanujan Institute of Technology, Anantapur, India.
Email: [Link]@[Link], sukanyaprasanna2009@[Link], [Link]@[Link], lokeshnathever@[Link],
[Link]@[Link], [Link]@[Link].
Abstract
Automating the classification of trash into reusable and non-reusable types plays a key role in improving waste management
and minimizing environmental impact. This study proposes a deep learning-based system for recognizing household waste using
a publicly available dataset with diverse waste categories. We evaluate the performance of MobileNetV2, EfficientNet-B3, and
Vision Transformer (ViT), selected for their accuracy and efficiency in image classification tasks. To enhance model reliability
and prevent overfitting, standard preprocessing and data augmentation techniques were applied. Evaluation focused on how
reliably and consistently the system handled various categories of waste materials. Results show that these advanced neural
networks perform well in identifying recyclable materials, offering a scalable solution for real-time waste sorting. The approach
holds potential for integration into smart bins, recycling facilities, and mobile platforms, supporting more sustainable waste
management practices.
Keywords: Deep learning, recyclable waste detection, MobileNetV2, EfficientNet-B3, Vision Transformer, image
classification, CNN, waste sorting.
1. Introduction
While CNNs offer excellent local feature extraction, recent
Effective waste management has become increasingly advancements in vision transformers have introduced new
important with the rise in global waste production. Manual possibilities in image classification [3]. While earlier models
sorting, though common, is often inefficient, prone to errors, analyze limited sections of an image at a time, newer transformer-
and labor-intensive. Automating this process using deep based designs are capable of interpreting overall visual structure
learning and computer vision can significantly improve using attention layers. This makes them highly effective in
accuracy and efficiency. In particular, image-based distinguishing between visually similar waste categories. Although
classification systems have shown great promise in identifying they are typically more resource-intensive, when combined with
various types of waste using neural networks [1]. This project appropriate preprocessing and augmentation strategies, ViTs can
proposes an intelligent system that classifies recyclable and generalize well even on moderately sized datasets. By integrating
non-recyclable household waste using advanced deep learning both CNN and ViT models, this project provides a comparative
analysis to determine which architecture offers the best trade-off
architectures. By training models on a labeled image dataset,
between performance and computational demand for waste
the system aims to support faster and more consistent waste classification.
segregation with minimal human involvement. Lightweight
convolutional neural networks (CNNs) such as MobileNetV2 The dataset used in this study is publicly available and includes a
and EfficientNet-B3 have emerged as strong candidates for wide range of labeled waste images spanning multiple categories.
tasks that require both accuracy and computational efficiency It provides better diversity and complexity than earlier datasets
[2]. These models are particularly suitable for deployment in limited to binary classification [4]. Initial adjustments such as
edge devices like smart bins or mobile applications due to their simplifying colors, converting to shades of gray, and applying
compact design. Their architecture enables them to capture controlled changes to image appearance help reduce distortion and
relevant features while maintaining low processing costs. In improve learning quality. Each recognition method is judged based
this project, both models are trained using a well-labeled on how well it identifies different waste types. Results show that
dataset after applying preprocessing techniques like resizing, MobileNetV2 and EfficientNet-B3 work effectively and are well-
suited for quick, practical applications. while ViT demonstrates
grayscale conversion, and normalization. These steps reduce
superior performance on complex or ambiguous images. This work
data complexity and improve model generalization, ensuring
plays a role in advancing intelligent, deployable waste
that the system performs reliably in real-world conditions. classification systems that can be embedded in mobile tools and
automated recycling infrastructures.
Sunardi et al. (2023) had done research on an efficient waste
2. Literature Review separate system using a simplified CNN trained on a Kaggle
dataset containing different waste containing images. The model
Mao et al. (2021) worked on a recyclable waste separate used grayscale conversion and resized images to 50x50 pixels,
system using DenseNet121, enhanced with a Genetic significantly reducing computational complexity. Despite its
simplicity, the model achieved an impressive accuracy of 98.92%
Algorithm (GA) to optimize the fully connected layers. The
with minimal overfitting, outperforming more complex models
model, trained on the TrashNet dataset with augmentation like Bagchi’s CNN, which only reached 87.10% accuracy. Key
techniques like flipping and rotation, achieved a notable contributions included optimized preprocessing and the use of
accuracy of 99.6%. This work demonstrated that fine-tuning early stopping to prevent overfitting. However, the study was
hyperparameters using GA could significantly improve CNN limited to binary classification and did not consider multi-class
performance. Moreover, they visualized model focus using waste types. While effective, the approach lacks scalability for
real-world applications requiring more nuanced classification and
Grad-CAM for better interpretability. Despite its high
mobile deployment. This highlights the need for benchmarking
accuracy, the study was limited to a single CNN architecture modern lightweight and transformer-based models for improved
and did not explore comparisons with lightweight or generalization and practical use in automated waste detection
transformer-based models. It also lacked evaluation in real- systems. [4].
time environments, making it less adaptable for mobile or
embedded applications.[1]. 3. Proposed Methodology
Zhang et al. (2021) studied on a classification model based The system introduced in this work focuses on intelligently
on ResNet18, integrating a Self-Monitoring Module (SMM) to sorting household waste images with the help of modern image
recalibrate channel-wise feature attention. This approach recognition techniques. As illustrated in the initial workflow
diagram, the process begins by arranging the image collection into
aimed to enhance the representation power of residual
labeled groups representing different material types such as plastic
networks by capturing global context and improving spatial
containers, glass fragments, metallic items, paper-based
feature understanding. Tested on TrashNet, the model
packaging, and biodegradable matter. Each image is resized to a
achieved an accuracy of 95.87%, proving effective in learning standard size, and visual transformations such as flipping, rotating,
discriminative waste features. However, the model depth was and zooming are applied to increase diversity and help the system
relatively shallow, and no comparisons were made. recognize items from different angles and scenarios. The entire
Furthermore, the study focused primarily on structural image pool is then divided into two segments—one used for
improvements rather than optimizing performance for low- building the recognition ability and the other to verify its
resource environments or real-time classification scenarios reliability.
[2].Yang and Thung (2016) introduced the TrashNet dataset,
includes 2527 labeled images across six different waste To extract meaningful information from the visuals,
lightweight and high-capacity recognition models like
materials. They applied traditional models like SVM and
MobileNetV2, EfficientNet-B3, and Vision Transformer are used.
deeper networks like ResNet50, achieving 63% and 87%
These models, originally trained on broader image collections, are
accuracy respectively. This dataset became a benchmark for
adjusted to handle specific categories of discarded materials. This
evaluating waste classification models. While it helped fine-tuning approach reduces the need for large amounts of new
standardize research in the domain, the dataset is limited in data and improves the learning speed. Finally, the system is tested
size and diversity, lacking environmental variation such as for its sorting ability, making it suitable for real-time applications
lighting conditions and object orientation. These limitations in automated recycling systems, including smart disposal units and
restrict model generalization and real-world applicability. eco-friendly collection points.
Additionally, the baseline models used did not include recent
lightweight or transformer-based approaches, highlighting a
gap for future exploration [3].
Figure1: methodology of the proposed work
Dataset Specification These steps ensure that the training data is both consistent and
This study utilizes a substantial collection of labeled diverse, contributing to more accurate and resilient models.
images depicting various types of domestic waste. The image
set includes thousands of high-quality photos, each classified Data Splitting
into categories such as plastic-based items, metallic Once the image preparation was completed, the entire
components, glass materials, paper products, packaging board, collection was carefully arranged into two groups—one for
and biodegradable matter. These images, originally in varying teaching the system and the other for checking how well it
dimensions and file formats, were standardized through performs on unfamiliar examples. Around 80 percent of the visuals
resizing and formatting to maintain consistency. To ensure were assigned to the learning phase, while the remaining 20
balanced learning, the data was carefully organized so that percent were used to assess how accurately the system responds to
each waste type is equally represented, preventing any one new information. Special attention was given to keeping all waste
category from skewing the training process. types evenly represented in both groups. This balanced approach
helps the model learn from a wide range of examples and ensures
dependable results across different types of discarded materials.
Model Architecture
This project utilizes three state-of-the-art deep learning
architectures: MobileNetV2, EfficientNet-B3, and Vision
Transformer (ViT), each selected for their proven effectiveness in
image classification tasks.
MobileNetV2 Architecture
The design implemented here is compact and tailored for efficient
visual recognition tasks. It starts by accepting input visuals of size
128×128 with three color channels. These are passed through a
series of filters sized 3×3 to extract critical visual patterns,
Figure 2: Random images from the Dataset. followed by a 2×2 down sampling operation that reduces the
The visuals were taken under a range of lighting conditions spatial layout.
and from different angles, which improves the system’s
adaptability to everyday waste detection scenarios. For this
work, the image collection was arranged into two parts one for
learning patterns and the other for checking how well the
model performs. Before training, all visuals were adjusted to a
size of 224 by 224 pixels. Their color values were scaled down
to a consistent range between zero and one. To add more
variety and reduce overfitting, methods like image flipping,
turning, and slight zooming were applied. This diverse and
clearly labeled set forms the groundwork for building reliable
recognition systems for waste categorization.
Figure 3: MobileNetV2 Architecture
Data Preprocessing The architecture employs a specialized filtering method that
processes each feature map separately, allowing the system to
Prior to building the recognition system, several operate with fewer resources while still capturing important
adjustments were made to prepare the image collection for information. Feature maps evolve through various stages,
effective analysis. Each visual was resized to a fixed resolution shrinking from 64×64 to as small as 4×4. Eventually, all extracted
of 224 by 224 pixels, aligning with the input needs of the features are condensed into a single vector, which feeds into a final
selected models such as MobileNetV2, EfficientNet-B3, and decision mechanism that assigns a category to the image.
Vision Transformer. Color values were scaled down to a
consistent range between zero and one, which improves EfficientNet-B3 Architecture
stability and speeds up learning. To help the system recognize
a variety of waste appearances, the visuals were slightly The EfficientNet-B3 architecture, as depicted in Figure 3, is used
modified using techniques like flipping, turning, slight zooms, in conjunction with U-Net for enhanced feature learning and
and position shifts. These variations mimic different classification. Input visuals sized at 256×256 move through
viewpoints and lighting conditions. In certain model versions, multiple processing stages, each stabilized with normalization
a grayscale format was tested to simplify the visuals while techniques that help streamline learning. Dimensionality is
preserving essential shapes and contrasts, helping the system reduced while key patterns are retained. The network increases its
focus on important features with reduced visual complexity. depth and complexity progressively, enhancing its ability to learn
fine- grained patterns in waste images. A dropout layer helps
prevent overfitting. After essential patterns are captured, the
output is transformed into a one-dimensional format and directed
through a sequence of dense layers. The final stage in the process
assigns a suitable label based on the image’s MobileNetV2, EfficientNet-B3, and ViT—are effectively
characteristics. This framework is well-suited for recognizing recyclable and non-recyclable waste across all
scenarios where limited computing power is available, categories.
offering a balanced solution that doesn't compromise
recognition quality. Evaluation Metrics
Accuracy: It gives a general idea of how often the system
produces the expected result by comparing the number of right
outcomes to the total number of attempts made during the process.
Precision: focuses specifically on the reliability of positive
classifications, measuring how often predicted positives are
correct compared to the total number of positive predictions made
by the model.
Figure 4: EfficientNet-B3 architecture
Recall (Sensitivity): Recall measures the system's ability to detect
Vision Transformer (ViT) Architecture actual threats accurately, minimizing false negatives.
The architecture employed in this approach begins by
slicing the input image into smaller sections of equal size.
Each of these segments is converted into a numerical format
suitable for processing, while additional spatial markers are
included to help the system recognize where each part belongs F1-Score: F1-Score balances precision and recall, ensuring a well-
in the overall structure. This sequence of segments flows rounded detection model.
through a layered system that can assess long-range
connections across the entire image. Within these layers, the
mechanism gives attention to specific areas based on their
importance, refining the way visual details are interpreted.
Once processed, the final output moves through a decision Confusion Matrix: A visual chart is used to compare the system’s
layer that identifies the most appropriate category. This outcomes with actual categories, helping to spot where errors
strategy is particularly strong in interpreting images where occur. By reviewing this layout, one can see how well different
elements may appear complex or closely resemble one types are recognized and where confusion happens, offering
another.
insights for improving future results.
Figure 6: Confusion Matrix
Figure 5: Vision Transformer (ViT) architecture
4. Results and Discussion
Performance Metrics Evaluation Metrics
In this study, performance metrics are used to evaluate The evaluation of the proposed system, Automated Recyclable
Waste Detection Using Deep Neural Networks, demonstrates
how accurately the models classify different types of waste. It
mainly evaluated through evaluation metrics and confusion notable advancements in identifying and classifying various types
matrix. These metrics help assess whether the models— of household waste with high precision. Among the implemented
models, EfficientNet-B3 delivered the most accurate results,
achieving 91% accuracy, along with a precision of 0.91, recall
of 0.90, and F1-score of 0.90.
These values indicate its strong ability to consistently and The confusion matrix of MobileNetV2 reveals that it
correctly categorize recyclable and non-recyclable materials. performs moderately well in recognizing most waste categories,
The Vision Transformer (ViT) closely followed, showing especially “Glass,” “Organic,” and “Cardboard.” However, it
90% accuracy across all metrics, highlighting its strength in struggles to clearly distinguish between “Plastic” and “Metal,”
handling visually complex or overlapping categories due to its resulting in some incorrect predictions. Misclassifications are also
attention-based architecture. Meanwhile, MobileNetV2 seen with “Trash,” suggesting the model faces challenges with
performed reliably with 87% accuracy, 0.88 precision, 0.87 visually similar materials. Despite this, MobileNetV2 shows
recall, and an F1-score of 0.87, making it a suitable choice for balanced results across several classes, making it a lightweight yet
lightweight or real-time deployment scenarios. The effective model for real-time applications. Its overall performance
comparison reveals that while all three models are effective indicates that while it is accurate in many areas, more refinement
for waste detection, EfficientNet-B3 offers the best overall is needed for closely related categories to improve waste detection
balance between accuracy and efficiency, and ViT provides accuracy.
added strength in handling challenging visual patterns. This
supports the potential of integrating these models into
practical waste management systems like smart bins or mobile
applications.
Table 1: Evaluation Metrics of algorithms
Figure 8: Confusion Matrix of MobileNetV2
Confusion Matrix
The confusion matrix for EfficientNetB3 highlights its
The confusion matrices for proposed models illustrate each strong classification capabilities, particularly in accurately
model’s ability to classify waste categories accurately. identifying “Metal,” “Organic,” and “Trash.” It exhibits minimal
misclassifications and shows the best balance among all tested
models. Although a few predictions for “Plastic” and “Cardboard”
are incorrect, the errors are less frequent compared to other
models. EfficientNetB3 demonstrates superior generalization and
feature extraction, allowing it to handle complex and visually
similar waste types effectively. This reliability makes it highly
suitable for automated waste detection systems where precision
and robustness are critical. Its consistent results affirm its strength
in practical classification tasks.
Figure 7: Confusion Matrix of vision transformer (ViT)
The confusion matrix of the ViT shows efficient
classification in categories such as “Cardboard,” “Glass,” and
“Organic,” with minimal misclassifications. However, the
model shows slight confusion when identifying “Trash” and
“Paper,” likely due to visual similarities among waste items.
While the majority of classes are predicted accurately, some
overlap between “Metal” and “Plastic” is also observed. The
overall performance reflects ViT’s capability in handling
complex visual data, yet improvement is needed in
differentiating between classes with overlapping features.
Despite this, it proves to be a promising model for recyclable
waste classification. Figure 9: Confusion Matrix of EfficientNetB3
[5]. Bobulski, J., Kubanek, M. (2019). Waste classification
5. Conclusion system using image processing and convolutional neural
networks. In Advances in Computational Intelligence:
The study demonstrated the potential of deep 15th International Work-Conference on Artificial Neural
learning in automating the classification of recyclable and Networks, IWANN 2019, Gran Canaria, Spain, June 12
non-recyclable household waste. By leveraging advanced 14, Springer International Publishing. Proceedings, Part
models like MobileNetV2, EfficientNet-B3, and Vision II 15: 350-361. [Link]
Transformer (ViT), the system achieved high accuracy and 20518-8_30.
reliability in identifying various waste categories. Among the
three models, EfficientNet-B3 delivered the best overall [6]. Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.
performance in terms of precision, recall, F1-score, and (2018). A guide to convolutional neural networks for
accuracy, making it a suitable choice for deployment in real- computer vision. Synthesis Lectures on Computer
world scenarios. ViT also performed strongly, especially in Vision, 8(1): 1-207.
handling complex visual patterns, while MobileNetV2 offered
a lightweight and efficient solution for resource- constrained [7]. Lindsay, G.W. (2021). Convolutional neural networks as
environments. The evaluation metrics and confusion matrices a model of the visual system: Past, present, and future.
confirmed the robustness of the models, with minimal Journal of Cognitive Neuroscience, 33(10): 2017-2031.
misclassifications and high consistency across classes. This [Link]
work highlights how deep learning can contribute to smarter
waste management by reducing manual effort, improving [8]. Adedeji, O., Wang, Z., 2019. Intelligent waste
sorting efficiency, and supporting environmental classification system using deep learning convolutional
sustainability. Future enhancements may include real-time neural network. Procedia Manuf. 35, 607–612.
implementation, integration with smart bins, and expanding [Link] 10.1016/[Link].2019.05.086.
the model to handle additional waste categories or materials.
Overall, this study lays a strong foundation for developing [9]. Gundupalli, S.P., Hait, S., Thakur, A., 2018.
intelligent and scalable waste detection systems using modern Classification of metallic and non-metallic fractions of e-
neural network architectures. waste using thermal imaging-based technique. Pro. Saf.
and Environ. Protect. 118, 32–39.
6. References [Link]
[1]. W. Mao, J. Zhang, J. Li, and Y. Gao, “Recycling [10]. Liu, X., Wang, Z., Li, W., Li, G., Zhang, Y., 2019.
waste classification using optimized convolutional Mechanisms of public education influencing waste
neural network,” Sustainability, vol. 13, no. 13, p. classification willingness of urban residents. Res.,
7163, 2021, presented at the Environmental AI Conserv. and Recycl.. 149, 381–390.
Applications Symposium, China, July 2021. [Link]
[2]. Q. Zhang, Y. He, and X. Zhuang, “Recyclable waste [11]. Frid-Adar, M., Diamant, I., Klang, E., Amitai, M.,
image recognition based on deep learning,” IOP Goldberger, J., Greenspan, H., 2018. GAN-based
Conference Series: Earth and Environmental synthetic medical image augmentation for increased
Science, vol. 769, no. 4, p. 042031, 2021, presented CNN performance in liver lesion classification.
at the International Conference on Energy, Neurocomputing 321, 321–331.
Environment and Sustainable Development (EESD),
Shanghai, China, April 2021. [12]. Kim, J., Nocentini, O., Scafuro, M., Limosani, R.,
Manzi, A., Dario, P., Cavallo, F., 2019. An innovative
[3]. Sunardi, E. Yusuf, R. Fahmi, and A. Muflikh, automated robotic system based on deep learning
“Improving waste classification using convolutional approach for re cycling objects. ICINCO (2), 613–622.
neural networks: An application of machine learning
for effective environmental management,” Review of [13]. Abdallah, M., Abu Talib, M., Feroz, S., Nasir, Q.,
International Geographical Education (RIGEO), Abdalla, H., Mahfood, B., 2020. Artificial intelligence
vol. 12, no. 4, applications in solid waste management: a systematic
pp. 398–407, 2022, presented at the International research review. Waste Manage. (Oxford) 109, 231– 246.
Environmental Data Science Forum, Indonesia,
December 2022. [14]. Gundupalli, S.P., Hait, S., Thakur, A., 2017. Multi-
material classification of dry recyclables from municipal
[4]. R. Dwivedi and L. Elluri, "Exploring generative solid waste based on thermal imaging. Waste Mange. 70,
artificial intelligence research: A bibliometric 13–21. [Link] dec.
analysis approach," IEEE Access, vol. 12, 2024,
presented at the IEEE AI Research Forum, Austin, [15]. Togaçar, ˘ M., Ergen, B., Comert, ¨ Z., 2020. Waste
TX, September 2024. classification using AutoEncoder network with integrated
feature selection method in convolutional neural network
models. Meas. 153, 107459.