0% found this document useful (0 votes)
4 views54 pages

LATEST Project Defense Copy2 Copy - 111942

This document discusses the evolution of digital communication and the importance of securing sensitive information through advanced techniques like Bit-Plane Complexity Segmentation (BPCS) steganography, which conceals the existence of data within digital files. It highlights the limitations of traditional cryptographic methods and the need for a secure file-sharing system that combines encryption with BPCS to enhance confidentiality and resilience against detection. The study aims to implement and evaluate this system, addressing the challenges of data integrity and the effectiveness of steganographic techniques in modern communication.

Uploaded by

shalomobinna900
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views54 pages

LATEST Project Defense Copy2 Copy - 111942

This document discusses the evolution of digital communication and the importance of securing sensitive information through advanced techniques like Bit-Plane Complexity Segmentation (BPCS) steganography, which conceals the existence of data within digital files. It highlights the limitations of traditional cryptographic methods and the need for a secure file-sharing system that combines encryption with BPCS to enhance confidentiality and resilience against detection. The study aims to implement and evaluate this system, addressing the challenges of data integrity and the effectiveness of steganographic techniques in modern communication.

Uploaded by

shalomobinna900
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

CHAPTER ONE

INTRODUCTION
1.1 Background of the Study
The digital revolution has transformed the global landscape of information exchange, shifting
society from traditional paper-based communication to fast, automated digital transmission.
Today, government, corporations, security agencies, and individuals depend on digital
platforms to send and receive confidential information across geographically distributed
environments. The expansion of the internet, cloud computing, and mobile technologies has
increased data accessibility and enabled real-time communication. However, the same
advancement has brought about heightened risks of cyber-attacks, including data interception,
unauthorized disclosure, and digital espionage (Stallings, 2017). Sensitive information
transmitted online can be intercepted, altered, or completely compromised by adversaries,
making data security a key concern in modern digital ecosystems.

Traditional mechanisms for securing digital communication have relied heavily on


cryptography. Cryptography protects the content of a message by converting readable
information (plaintext) into ciphertext, which appears meaningless without the appropriate
decryption keys. While this successfully prevents unauthorized users from understanding the
data, it does not hide the presence of the protected message. In many cases, the mere
detection of ciphertext triggers suspicion or targeted attacks, especially in high-risk domains
such as national security, diplomatic communication, and corporate confidentiality (Katz &
Lindell, 2014). Thus, encrypted communication alone may not provide adequate security in
scenarios where secrecy of existence is equally important as secrecy of content.

Steganography offers a complementary and in some contexts superior level of protection by


concealing the existence of secret information within ordinary digital files such as images,
audio, or videos. Rather than transforming a message into a suspicious encrypted form,
steganography embeds the message into a cover medium in a way that remains visually,
audibly, or statistically undetectable. This makes the communication unobtrusive and
significantly harder for attackers to detect or intercept intentionally (Katzenbeisser &
Petitcolas, 2000). By combining cryptography and steganography, modern systems aim to
ensure both confidentiality and covert transmission of digital content.

In digital image steganography, numerous techniques have been proposed to achieve high
data embedding capacity while preserving image quality. One of the earliest and most widely
adopted techniques is the Least Significant Bit (LSB) substitution method. LSB
steganography modifies the least significant bits of pixel values in an image to store hidden
1
binary data. Although easy to implement and capable of preserving reasonable visual quality,
LSB methods suffer critical weaknesses (Fridrich, 2009). Even slight modifications in
compression, cropping, or scaling can destroy the hidden message due to LSB’s minimal
robustness. Additionally, steganalysis tools employing statistical analysis can easily detect
abnormalities introduced by LSB embedding, limiting its applicability in secured real-world
communications (Dumitrescu, Wu, & Wang, 2003).

To address the limitations of basic substitution techniques, researchers introduced Bit-Plane


Complexity Segmentation (BPCS), a more advanced spatial-domain steganography method.
BPCS is designed to improve embedding capacity and imperceptibility by taking advantage
of complex, noise-like regions within image bit-planes. These visually irregular regions can
be replaced with encrypted data without producing noticeable artifacts, thereby achieving
much higher payload capacity and significantly increasing resistance to visual and statistical
detection (Kawaguchi & Eason, 2000). Compared to LSB, BPCS delivers superior
performance especially when large volumes of confidential information need to be
transmitted covertly.

As cyber threats continue to evolve and data transmission becomes increasingly frequent,
there is a compelling need to implement stronger, more stealthy security technologies for
communication. The growing field of intelligent surveillance and automated attack systems
further heightens the demand for secure methodologies that conceal both the presence and
structure of sensitive information. Thus, research into BPCS-based steganography plays a
crucial role in developing the next generation of secure file-sharing systems capable of
supporting national security, corporate operations, digital rights management, and personal
privacy.

In view of the above, this study explores the implementation of a secure file sharing system
that combines encryption with BPCS steganography to enhance confidentiality, capacity, and
resilience against detection. Through experimental comparison with conventional techniques
such as LSB, the research aims to demonstrate BPCS as a more reliable and scalable solution
for modern secure communication needs.

1.2 Problem Statement


In today’s digital communication landscape, ensuring the confidentiality and integrity of
shared information is of paramount importance. Cryptographic techniques, such as symmetric
and asymmetric encryption, have long been the cornerstone of securing data. These
algorithms transform readable data into ciphertext, safeguarding it from unauthorized access.
2
However, despite their strength in protecting content, cryptographic methods do not conceal
the presence of a message itself. This inherent visibility of encrypted data can draw unwanted
attention from adversaries, who may attempt to intercept, block, or subject the
communication to more intense scrutiny (Katz & Lindell, 2014). For entities such as
journalists, whistleblowers, military personnel, and activists operating in hostile
environments, simply possessing encrypted data can lead to suspicion, interrogation, or legal
repercussions (Anderson, 2009).

To address this challenge, steganography has emerged as a complementary technique that


focuses on hiding the existence of secret data rather than just encrypting it. Steganographic
systems embed information within seemingly innocuous digital media such as images, audio
files, or video streams. This makes the communication covert and less likely to attract
attention. However, early and widely adopted steganographic techniques, including the Least
Significant Bit (LSB) method, suffer from significant drawbacks that limit their practical
deployment (Fridrich, 2009).

1.3 Aim and Objectives of the Study


Aim
The overarching aim of this study is to design and implement a secured file-sharing system
using Bit-Plane Complexity Segmentation (BPCS) steganography,

Objectives
1. To rigorously implement the Bit-Plane Complexity Segmentation (BPCS) steganography
algorithm tailored for embedding complex data payloads into digital images.
2. To conduct an extensive evaluation of the embedding capacity of BPCS steganography
across multiple image formats and resolutions, determining the optimal parameters for
maximum payload without degradation of image quality.
3. To evaluate the robustness and security of the embedded data against a variety of attack
vectors and steganalysis techniques, including statistical attacks and machine learning
classifiers.
4. To develop and validate algorithms for accurate extraction and recovery of embedded files,
ensuring data integrity and minimizing error rates under diverse operational conditions.
5. To design a comprehensive and intuitive user interface that facilitates secure file
embedding, transmission, and extraction, supporting multiple platforms and ensuring ease of
use for end-users with varying technical expertise.

3
1.4 Significance of the Study
In the contemporary digital landscape, the significance of secure file sharing cannot be
overstated. With the rapid proliferation of internet usage and digital communication, sensitive
data is frequently transmitted over public and private networks, making it vulnerable to
interception, unauthorized access, and manipulation. The study on securing file sharing using
Bit-Plane Complexity Segmentation (BPCS) steganography is therefore highly relevant and
impactful, addressing crucial gaps in the field of information security, particularly in covert
communication and data protection.

Secure file sharing plays an indispensable role in protecting privacy, maintaining


confidentiality, and ensuring data integrity in numerous sectors, including government,
healthcare, finance, and personal communications (Anderson, 2009). Traditional encryption
techniques, while effective at protecting the content of the message, do not conceal the
existence of the message itself, which may raise suspicion or invite targeted attacks (Katz &
Lindell, 2014). Steganography, particularly BPCS steganography, offers an additional layer of
security by hiding the presence of sensitive data within innocuous cover media, effectively
camouflaging communication in plain sight (Fridrich, 2009).

1.5 Scope of the Study


This research is focused on developing a secured file sharing system utilizing Bit-Plane
Complexity Segmentation (BPCS) steganography, a sophisticated data hiding technique that
leverages the complexity of bit-planes in digital images to embed secret information. The
scope primarily encompasses the selection of suitable cover media, types of secret data,
system architecture, and performance evaluation parameters.

The study concentrates on lossless image formats such as Bitmap (BMP) and Portable
Network Graphics (PNG) for the cover media. These formats are deliberately chosen due to
their ability to retain exact pixel data without compression-induced loss, which is essential for
preserving the integrity of embedded information during the embedding and extraction
processes (Uchida et al., 2005). Unlike lossy formats such as JPEG or HEIC, which apply
compression algorithms that alter pixel values and thus may corrupt or degrade hidden data,
BMP and PNG provide stable environments suitable for steganographic applications
(Fridrich, 2009).

Limitations of the Study

4
Despite the promising capabilities of BPCS steganography, several notable limitations
circumscribe the scope and practical application of the developed secure file sharing system.
These limitations stem from both technical constraints and environmental assumptions
inherent in the research.
1. Dependency on Cover Media Format and Quality
The system’s reliance on lossless image formats, while advantageous for data integrity,
results in significantly larger file sizes compared to lossy compressed formats commonly
used in everyday digital communications.
2. Embedding Capacity Constraints
Although BPCS steganography exploits complex bit-plane segments to achieve higher
embedding capacity than traditional methods like Least Significant Bit (LSB) substitution, its
capacity is not unlimited.
3. Vulnerability to Advanced Steganalysis
Security in steganography is not solely about hiding data but also about evading detection by
adversaries. While BPCS offers improved imperceptibility by leveraging bit-plane
complexity, it is not immune to sophisticated steganalysis techniques.

1.6 Organization of the Report


The report begins by introducing the concept of secure file sharing and the challenges posed
by traditional steganographic methods. It explains the motivation behind using Bit-Plane
Complexity Segmentation (BPCS) steganography, establishing the problem, aim, objectives,
and importance of the study. It goes further to examine related research and technologies,
comparing BPCS with other steganographic techniques to highlight its advantages in terms of
capacity, security, and imperceptibility.

Following the theoretical groundwork, the report explores the design and implementation of
the proposed system. It identifies a security vulnerability, describes how such an attack
typically occurs, and proposes a mitigation strategy. The methodology section details how
data was collected, pre-tested, cleaned, and classified for training and testing purposes. A
suitable machine learning algorithm was selected to support analysis, with model evaluation
based on accuracy, precision, recall, and F1-score.

1.7 Definition of Terms


This section defines key technical concepts used throughout the study to ensure clarity and
uniform understanding.

Information Security:

5
A field of cybersecurity concerned with protecting data from unauthorized access,
modification, or disclosure while ensuring confidentiality, integrity, and availability are
preserved.
Steganography:
A covert communication technique used to hide the existence of secret information within a
digital cover medium such as an image, audio, or video file, ensuring that unauthorized
parties cannot detect hidden data.

Bit-Plane Complexity Segmentation (BPCS):


An advanced steganographic approach that embeds secret data into the complex (noise-like)
regions of an image’s bit-planes, enabling high embedding capacity while maintaining visual
imperceptibility.

Cover Image:
The original digital image used as a medium for embedding secret data without exhibiting
noticeable changes to human observers.

Stego-Image:
The resulting image after embedding encrypted or concealed information into the cover
image using steganography techniques.

Payload:
The confidential information such as documents, text, or files that is embedded within a cover
image during steganography.

Encryption:
A cryptographic process that converts readable information (plaintext) into an unreadable
form (ciphertext) to prevent unauthorized access, even if interception occurs during data
transmission.

AES-256 Encryption:
A widely used and highly secure symmetric encryption standard that uses a 256-bit key to
ensure strong protection of data before embedding.

Embedding Capacity:
The maximum amount of data that can be securely hidden within a cover image without
degrading its visual quality or raising suspicion of tampering.

Data Integrity:
6
A security principle that ensures embedded or extracted information remains complete,
original, and unaltered throughout the storage, embedding, transmission, and recovery
process.

CHAPTER TWO
LITERATURE REVIEW
2.1 Conceptual framework
2.1.1 Overview and purpose
The conceptual framework for this study establishes a coherent structure that links high-level
information-security objectives to the concrete technical components and evaluation criteria
of a secure file-sharing system based on Bit-Plane Complexity Segmentation (BPCS)
steganography. At the highest level, the framework conceives secure file sharing as a two-
layer protection problem in which cryptographic encryption is used to protect the content of a
file and steganography is used to conceal the very existence of that encrypted content. This
layered perspective responds directly to the classical goals of information security
confidentiality, integrity and availability by ensuring that content is unreadable without the
key (confidentiality), that extracted files can be verified as unmodified (integrity), and that
the system remains usable under normal operational conditions (availability) (Whitman &
Mattord, 2018). The present study operationalizes these objectives into implementable
components and measurable evaluation axes to produce a replicable and testable design
blueprint.

The study adopts AES-256 as the primary encryption mechanism before embedding and
selects BPCS as the steganographic engine because of its capacity and imperceptibility
advantages for lossless image covers (Stallings, 2017; Kurosawa, Uchida, & Tanaka, 1996).
These specific choices are reflected in the project implementation and experimental setup
described elsewhere in this thesis (see project implementation notes and experimental
settings).

2.1.2 Definitions of core concepts


To remove ambiguity and provide precise analytical boundaries, the following constructs are
defined and used consistently throughout the study. Information security refers to the
discipline and set of practices that preserve confidentiality, integrity and availability of
information (Whitman & Mattord, 2018). Encryption denotes cryptographic transformations
applied to plaintext to produce ciphertext that cannot be read without the appropriate key;
symmetric algorithms such as AES-256 are widely used for confidentiality in high-security
contexts (Stallings, 2017). Steganography denotes methods for hiding information within
7
innocuous carriers so that the existence of the hidden message is not evident to observers or
automated scanners (Johnson & Jajodia, 1998). Bit-Plane Complexity Segmentation (BPCS)
is the steganographic approach adopted here; it decomposes an image into bit-planes,
segments those planes into blocks, measures block complexity, and uses complex (noise-like)
blocks as substitution locations for secret data while recording conjugation metadata when
simple payload blocks must be transformed to meet complexity criteria (Kurosawa et al.,
1996; Uchida, Kurahashi, & Kurosawa, 2005). These definitions guide the mapping from
theoretical aims to practical implementation and evaluation.

2.1.3 Functional components and their roles


The conceptual framework decomposes the secured sharing pipeline into modular functional
components so that each stage can be specified, instrumented and evaluated independently.
The first component, payload preparation, includes content validation, integrity hash
calculation, and segmentation of the encrypted data into blocks compatible with the
embedding unit. The second component, encryption, applies AES-256 to the prepared
payload and may optionally add error-correcting codes to improve recoverability under
adverse conditions (Stallings, 2017). The third component, BPCS embedding, performs bit-
plane decomposition of the chosen cover image, divides planes into fixed-size blocks
(commonly 8×8), computes a normalized complexity measure for each block, and replaces
only blocks that exceed the complexity threshold with encrypted payload blocks; conjugation
is used when payload blocks are insufficiently complex and a conjugation map must be stored
for exact inversion. The final components are transmission and recovery, which handle stego-
image transport, extraction of embedded blocks, conjugation reversal where applicable,
integrity verification through hash comparison, and decryption to reconstruct the original file
(Kurosawa et al., 1996; Li, Zhang, & Guo, 2011).

Each functional component is associated with acceptance tests and logging points so that
failures are observable and reproducible. For example, integrity verification at recovery
provides a binary signal (pass/fail) that informs whether further error-control measures are
needed. The modular view also clarifies responsibilities: encryption secures content, while
BPCS steganography secures concealment; neither alone suffices for the study’s threat model,
which assumes adversaries can both inspect transfers and attempt extraction if detection
occurs.

2.1.4 Mechanisms: complexity measurement and conjugation


Two internal mechanisms are central to BPCS and therefore to the conceptual framework: the
complexity metric and conjugation. Complexity is typically measured by counting transitions
8
between 0s and 1s within a block along both row and column directions and normalizing that
count to a value in the interval [0, 1]. Blocks whose normalized complexity exceeds a chosen
threshold (empirically often in the 0.30–0.40 range) are classified as complex and eligible for
substitution without perceptible visual degradation (Uchida et al., 2005; Li et al., 2011).
Conjugation is a deterministic transform often a bitwise XOR with a fixed checkerboard
pattern that converts a simple payload block into a complex one so it may be embedded in a
complex image block; the positions of such conjugated payload blocks are recorded in a
conjugation map that must be embedded alongside the payload for successful decoding
(Kurosawa et al., 1996). The framework specifies these mechanisms and their parameter
ranges so that experiments can evaluate sensitivity to threshold choice, block size, and
conjugation pattern.

2.1.5 Process flow and interdependencies


The conceptual framework casts the system as a linear but interdependent process: payload
preparation and hashing precede encryption; encryption outputs ciphertext that is segmented
and, if necessary, conjugated; the BPCS module embeds ciphertext blocks into complex cover
blocks while storing conjugation metadata; the stego-image is transmitted; the receiver
extracts embedded blocks, reverses conjugation using the embedded map, verifies integrity
via hash comparison, and decrypts the verified ciphertext to recover the original file.
Importantly, the success of later stages depends on properties of earlier stages: for instance,
encryption must preserve a block structure that allows conjugation when needed, and
embedding must not produce conspicuous metadata that defeats concealment. Thus, the
framework emphasizes end-to-end correctness and auditable checkpoints rather than isolated
optimisations.

This process view also clarifies how design trade-offs propagate: increasing embedding
capacity may increase detectability and sensitivity to image transformations; adding
redundancy and error correction improves recoverability but reduces effective payload
capacity; choosing lossless cover formats preserves fidelity but may be impractical in
bandwidth-constrained contexts. By making these interdependencies explicit, the framework
provides a rationale for empirical parameter sweeps and for reporting results along consistent
axes.

2.1.6 Evaluation axes and metrics


The conceptual framework defines three primary axes for evaluation: embedding capacity,
imperceptibility (visual fidelity), and robustness/detectability. Embedding capacity quantifies
the volume of encrypted payload that can be embedded in a cover image without raising
9
either perceptual or statistical suspicion. Imperceptibility is assessed by objective image-
quality metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index
(SSIM), combined with steganalysis tests to evaluate statistical footprints (Fridrich, 2009).
Robustness and detectability measure how well embedded data survives common image
manipulations (recompression, resizing, cropping) and how readily automated detectors
(statistical or machine-learning based) can distinguish stego images from natural images
(Cox, Miller, & Bloom, 2002). The framework requires reporting these metrics across
controlled variations file size, cover format, compression level so that trade-offs are
transparent and comparative claims are supported by data.

2.1.7 Operational assumptions, constraints, and trade-offs


To keep the scope tractable and the experiments reproducible, the framework adopts a set of
operational assumptions that bound the analysis. First, primary experimental evaluation uses
lossless image formats (BMP, PNG) because bit-plane fidelity is essential to BPCS’s
effectiveness; lossy formats such as JPEG are included only in robustness tests since
compression alters bit-plane content (Uchida et al., 2005). Second, key management for AES
keys is assumed secure between sender and receiver; while practical deployments must
address key distribution and storage, those topics are outside the main technical experiments
and thus treated as deployment considerations (Stallings, 2017). Third, computational
resource availability is recognised as a constraint: BPCS requires block complexity
computation and conjugation bookkeeping which increase runtime and metadata overhead
compared with simpler methods such as LSB; the framework therefore includes performance
(runtime and memory) as secondary metrics to allow a cost-benefit analysis (Li et al., 2011).
These assumptions and constraints are not weaknesses per se; rather, they make experimental
results interpretable and comparable. Explicitly stating them also clarifies which conclusions
are internal to the experimental setup (e.g., when using lossless covers) and which may
generalise to less restrictive scenarios.

2.1.8 Implications for methodology and expected contributions


By mapping objectives to components and metrics, the conceptual framework directly
informs methodological choices in subsequent chapters. It prescribes use of benchmark image
datasets for reproducibility (e.g., BOSSbase and similar corpora), establishes PSNR/SSIM
and steganalysis tools as standard tests for imperceptibility and detectability, and directs
controlled experiments that vary payload size, cover format and compression level to
evaluate robustness. The expected scholarly contribution is twofold: (a) an empirically
validated BPCS-based secure file sharing pipeline that integrates AES-256 encryption,
conjugation bookkeeping and error-resilience measures; and (b) a systematic quantification of
10
trade-offs among capacity, imperceptibility and detectability under realistic transformations.
These contributions are grounded in the conceptual framework and operationalised by the
experimental plan that follows in the methodology chapter (Gonzalez & Woods, 2018;
Fridrich, 2009). Project-specific implementation choices and the system prototype described
in this thesis follow directly from the framework outlined here.

2.2 Theoretical Framework


The theoretical framework of this study is grounded in established theories that support
secure data communication, covert information transmission, and computational modeling of
image structures. These theories collectively justify the integration of encryption and Bit-
Plane Complexity Segmentation (BPCS) steganography as the foundational mechanisms for
developing a robust secure file-sharing system. The framework also highlights how each
theoretical foundation influences system design, implementation, and evaluation.

2.2.1 Information Security Theory


Information security theory provides the first theoretical basis for this work by defining the
core principles confidentiality, integrity, and availability that any secure communication
system must meet (Whitman & Mattord, 2018). Confidentiality is supported through
encryption, as only authorized users possessing the correct key can access the protected
content. Integrity is ensured by incorporating hashing and verification mechanisms to detect
whether hidden files have been altered in transit. Meanwhile, availability is maintained by
designing a system that allows legitimate users to access hidden files reliably and without
excessive overhead. These principles form the fundamental security objectives embedded
throughout the design of the proposed system.

2.2.2 Shannon’s Information Theory


Shannon’s information theory forms a foundational scientific underpinning for both
cryptography and steganography. Shannon (1949) introduced concepts such as entropy and
uncertainty to describe information transmission security, proposing that a secure system
must minimize information leakage and maximize confusion and diffusion. In this study,
confusion is provided through AES-256 encryption, making ciphertext statistically
indistinguishable from random noise. BPCS steganography further supports Shannon’s
principle of entropy by embedding encrypted content in complex bit-plane regions that
exhibit noise-like characteristics, thereby increasing secrecy and reducing detectability.

2.2.3 Cryptographic Theory


11
Cryptographic theory explains the mathematical mechanisms guiding the design of
encryption algorithms such as AES and the security strength derived from key size,
substitution-permutation structures, and computational infeasibility of brute-force attacks
(Stallings, 2017). According to Kerckhoffs’ Principle, a cryptosystem should remain secure
even if the attacker knows everything about the system except the key. This principle
supports the adoption of AES-256, whose 256-bit keys offer strong resistance to exhaustive
search attacks and ensure that even if the embedding technique is studied, the protected data
remains unreadable without the cryptographic key.

2.2.4 Steganography and Cover-Medium Theory


Steganography theory describes how secret communication can be concealed in a way that
the presence of a message remains undetected (Johnson & Jajodia, 1998). It emphasizes
imperceptibility, capacity, and robustness as performance requirements. BPCS steganography
aligns with this theory by enabling high-capacity embedding while maintaining visual
imperceptibility. Because the human visual system is less sensitive to noise-like regions,
embedding in complex image bit-planes ensures that distortions remain unnoticed (Uchida,
Kurahashi, & Kurosawa, 2005). Thus, BPCS leverages both perceptual limitations and cover-
medium characteristics as theorized in steganographic literature.

2.2.5 Complexity Theory in Image Processing


Complexity theory supports how image content can be mathematically evaluated to
determine where data can be safely embedded without detection. In BPCS, complexity is
defined by binary changes within bit-plane block patterns. Blocks with high complexity are
statistically similar to encrypted data and random noise, making them ideal for substitution
(Kurosawa, Uchida, & Tanaka, 1996). Conjugation theory applying XOR transformation to
enforce complexity ensures that simple payload blocks are transformed into noise-like
structures, thus adhering to the theoretical requirement that embedded blocks blend
seamlessly with the cover image (Li, Zhang, & Guo, 2011).

2.2.6 Theory-Driven System Integration Model


Finally, these theories converge into an integrated model where encryption and
steganography operate jointly. Cryptography protects content, while steganography protects
communication existence creating a dual-layer security approach recommended in secure
communication theory (Fridrich, 2009). This theoretical synergy justifies combining AES-
256 and BPCS steganography, ensuring that even if the steganographic layer is compromised,
encrypted data remains secure. The robustness of the model is further reinforced through

12
Shannon’s entropy principles and complexity theory, providing a strong academic foundation
for the methodology and experiments conducted in later chapters.

2.3 Empirical review


This section reviews empirical studies published from 2020 onward that are directly relevant
to Bit-Plane Complexity Segmentation (BPCS) steganography, hybrid
encryption+steganography systems, robustness under lossy transforms, and modern
steganalysis. Each entry gives the paper title, a short description of methods and results, and a
statement of relevance to this research.

2.3.1 “Secret data sharing through coverless video steganography based on bit plane
segmentation” Debnath, Mohapatra & Dash (2023)

Debnath et al. (2023) propose a coverless video steganography method that uses bit-plane
segmentation across video frames instead of direct bit-replacement embedding. The method
extracts frames, decomposes them into bit-planes, and computes stable per-block hash/feature
sequences used as retrieval keys; secret data are mapped to these features rather than being
written into the bitstreams themselves. The empirical evaluation included common attacks
(noise, cropping, resizing, recompression) and showed improved robustness to many
manipulations compared with single-frame BPCS embedding, at the expense of a different
operational model (requires pre-shared mapping or retrieval database). This paper is
important because it demonstrates a practical alternative to fragile bit-replacement BPCS
when robustness is a priority (Debnath, Mohapatra, & Dash, 2023).

2.3.2 “Comprehensive survey on image steganalysis using deep learning” De La


Croix, Ahmad & Han (2024)
De La Croix et al. (2024) and colleagues provide a comprehensive survey of deep-learning
based steganalysis techniques through 2023–2024, cataloguing prominent CNN-based
detectors (SRNet-style architectures, Xu-Net variants, Yedroudj-Net, etc.), training strategies
(data augmentation, cover-source mismatch mitigation), and robustness evaluations. The
paper empirically summarises detection performance trends, showing that modern neural
detectors substantially outperform older hand-crafted feature detectors when trained on
representative cover/stego datasets. The survey emphasises the need for embedding designs
to explicitly test against ML detectors. This review is a key empirical reference for evaluation
protocols because it recommends detection curves and ML-aware testing as standard
components of any modern steganography evaluation (De La Croix, 2024).

13
2.3.3 “Steganography: Combination of Least Significant Bit (LSB) and Bit-Plane
Complexity Segmentation (BPCS) methods for hiding message on image and
audio” Rizal et al. (2023)
Rizal et al. (2023) experimentally implement a hybrid application that combines LSB and
BPCS techniques to hide messages in both images and audio. Their empirical tests report
successful extraction and acceptable perceptual quality metrics for a variety of payload sizes
and cover types. The study illustrates practical engineering decisions (block sizes,
conjugation bookkeeping) and shows that combining methods can be used to balance
capacity and robustness in constrained environments (Rizal, Rahmatulloh, Widiyasono,
Ruuhwan, & Nursamsi, 2023).

2.3.4 “Image steganography using bit plane complexity segmentation” Htun (2020)
Htun (2020) presents an applied study of classic BPCS embedding on standard image sets,
reporting experiments that evaluate embedding capacity, PSNR/SSIM values, and simple
steganalysis resistance (statistical checks). The paper reaffirms that BPCS, on lossless image
formats, achieves significantly higher embedding capacity relative to LSB while maintaining
acceptable visual fidelity when complexity thresholds are carefully chosen (Htun, 2020).

2.3.5 “Adaptive Steganography Using Improved Bit-Plane Complexity Segmentation”


Abdullah (2024) (representative title from adaptive BPCS literature)
Several recent empirical works propose adaptive or improved BPCS variants that modify the
complexity measure, apply dynamic thresholds, or integrate bit-plane pre-processing to
increase embedding yield and reduce detection risk. Abdullah (2024) (and related 2023–2024
papers) report experiments showing that dynamic thresholding and improved complexity
calculations increase usable block counts and improve PSNR at similar payloads. These
studies typically provide comparative tables showing gains over classical BPCS under
controlled experiments (Adaptive BPCS literature, 2024).

2.3.6 “Deep Learning Based Image Steganalysis” survey & model evaluations (2023–
2024)
Multiple empirical papers since 2020 evaluate CNN-based detectors (SRNet variants, Xu-
Net, Yedroudj-Net) on contemporary stego methods. These papers include empirical
evaluations of detection accuracy as a function of embedding rate and post-processing
(compression, resizing). They consistently show that detectors trained with augmentation
(compression, scaling) maintain higher detection rates against adaptive embedding strategies.
Representative experimental papers and preprints from 2022–2024 provide architectures and

14
training recipes that can be used as adversarial testbeds in evaluation (e.g., SRNet analyses,
Xu-Net replications).

2.3.7 Domain-specific implementations and case studies (2020–2024)


Several applied papers use BPCS or improved BPCS in domain contexts such as medical
image sharing, IoMT (Internet of Medical Things) security, and covert channels for
multimedia. These empirical studies typically integrate encryption (AES or hybrid
cryptosystems) prior to embedding and report domain-specific metrics payload sizes for
DICOM images, robustness to hospital imaging pipelines, or acceptable runtime overheads.
They often conclude that BPCS is suitable when cover formats and transport controls can be
enforced but requires additional error-control engineering for open internet usage (IoMT and
medical image steganography studies, 2021–2024).

2.4 Summary of Related Works


Table 2.1 summarises key empirical studies (2020–2024) that are directly relevant to Bit-
Plane Complexity Segmentation (BPCS) steganography, hybrid encryption + steganography
systems, coverless and video-based adaptations, robustness improvements, IoT/medical
applications, and modern deep-learning steganalysis. Each row reports the author(s) and year,
the primary approach or technique used, the main accuracy/performance outcome reported by
the authors, the study’s principal findings, and the principal limitations. This synthesized
table provides a compact, comparative view of the recent literature and highlights recurring
strengths and gaps that motivate the current study.

Table 2.1 Summary of Related Works


Author(s) / Approach / Accuracy /
S/N Key Findings Limitations
Year Technique Performance
Demonstrated
Coverless
Improved high resistance
video Lower embedding
Debnath et robustness to to compression
1 steganography capacity; needs
al., 2023 geometric and
using bit-plane mapping database
attacks transformation
segmentation
s
Establishes
Deep High ML
De La CNNs as
learning– detection Survey; no new
2 Croix et strong
driven image success rates algorithm
al., 2024 adversaries to
steganalysis reported
stego
15
Shows hybrid
Hybrid LSB + Acceptable Lacks resilience
Rizal et al., embedding
3 BPCS for PSNR and against advanced
2023 practical for
multimedia capacity steganalysis
images + audio
Reinforces
Classical High
BPCS as
BPCS image embedding Sensitive to JPEG
4 Htun, 2020 superior to
implementatio capacity with compression
LSB in
n good PSNR
capacity
Better visual
Adaptive Improves
Abdullah, fidelity and Increased
5 threshold capacity–
2024 block computational cost
BPCS quality balance
utilisation
Good Strengthens
Hybrid Encryption does not
Alanzy et imperceptibilit confidentiality
6 encryption + improve compression
al., 2023 y after double of embedded
steganography robustness
encryption payloads
Bit-plane DC Improves
More resilient
Debnath et coefficient security in Lower payload per
7 to
al., 2024 coverless dynamic frame
recompression
mapping environments
Introduces
IoT-based Efficient under
Koptyra, hidden
8 lightweight constrained Not BPCS specific
2023 channels for
steganography devices
IoT
Chaos-based
Enhanced Shows security More complex
Rostam et preprocessing
9 privacy and improvement design and
al., 2022 + block
robustness for IoT images parameters
embedding
Reinforces the
Universal High detection Performance drops
Deng et al., need for
10 steganalysis accuracy on under cover-source
2022 adversarial
CNN benchmarks mismatch
evaluation
Detection
De La
Trends in DL improves with Provides state- Does not propose
11 Croix,
steganalysis training of-art baseline counter-techniques
2024
augmentation

16
Highlights
Steganalysis of High detection
Agarwal et weaknesses of Only evaluates
12 context-aware of contextual
al., 2022 context-rich limited methods
methods patterns
embedding
Acceptable Defines
Medical image
Magdy et security domain- Lacks practical
13 steganography
al., 2022 practices for specific deployment statistics
review
healthcare requirements
Improved Optimises
Abdullah More usable
complexity BPCS Metadata overhead
14 (Series), complex
measures for performance increases
2021–2024 blocks
BPCS factors

2.4. Research Gaps


Despite notable progress in high-capacity bit-plane methods and in the integration of
cryptography with steganography, the recent literature reveals several important shortcomings
that constrain practical deployment and make cross-study comparison difficult. First, the field
lacks a single, standardised evaluation protocol that jointly measures the properties
researchers most often claim embedding capacity, perceptual quality, robustness to routine
lossy operations (for example JPEG recompression, resizing and cropping), detectability by
modern machine-learning steganalysers, and the runtime and metadata overhead of
bookkeeping mechanisms. Many studies report only a subset of these metrics, so statements
about “high capacity” or “imperceptibility” are frequently hard to interpret outside the
original experimental context (De La Croix, Ahmad, & Han, 2024; Deng, Chen, Luo, & Luo,
2022). This fragmentation hinders reproducibility and prevents meaningful meta-analysis
across different BPCS variants and hybrid systems.

A second gap concerns robustness: while BPCS delivers high usable payloads and good
visual fidelity when lossless cover formats are used, empirical evidence demonstrates that
extraction success and data integrity often degrade sharply under lossy compression and
typical image manipulations (Uchida, Kurahashi, & Kurosawa, 2005; Htun, 2020). The
majority of capacity-focused studies therefore evaluate performance in idealised, lossless
settings, leaving open the question of how to retain practical payloads and reliable recovery
rates when stego images traverse real-world channels (social platforms, messaging services)
that routinely apply recompression and resizing.

17
Closely related is the problem of adversarial evaluation. The rapid improvement of deep-
learning steganalysis models has shifted the empirical baseline for “detectability.” Techniques
that once evaded handcrafted statistical detectors are increasingly exposed by convolutional
detectors trained on representative cover/stego datasets and augmented transforms (De La
Croix et al., 2024; Deng et al., 2022). However, many recent BPCS and hybrid-system papers
omit rigorous adversarial testing against such modern detectors or only report limited results,
creating a blind spot: claims of low detectability are not fully validated against the most
capable practical adversaries.

Another under-explored area concerns metadata and conjugation bookkeeping. BPCS


depends on conjugation maps and related metadata to ensure that simple payload blocks can
be embedded within complex cover blocks and correctly inverted on extraction. Although
various compression and encoding strategies for these maps have been proposed, there is no
systematic quantification of how metadata size and encoding choices affect net payload,
detectability, and runtime across diverse image families and payload sizes (Li, Zhang, & Guo,
2011; Abdullah, 2024). In practice, large or poorly encoded metadata can substantially reduce
usable capacity and may introduce statistical artifacts that aid detection, yet the literature
presents few comprehensive measurements of these effects.

There is also a shortage of large-scale, domain-aware deployment studies. Applied research in


areas such as medical imaging, the Internet of Medical Things (IoMT), and constrained IoT
devices demonstrates the potential utility of steganographic approaches in domain-specific
workflows, but these demonstrations are often limited in scale and scope and do not fully
address key operational constraints. Issues such as strict fidelity requirements in DICOM
images, regulatory compliance, key management, and platform-level recompression policies
remain underexamined in empirical, end-to-end studies (Magdy et al., 2022; Koptyra, 2023).
Without such studies, it is difficult to judge whether a given BPCS+encryption pipeline is
practically deployable in regulated or resource-constrained environments.
Finally, the community lacks widely used, reproducible benchmark suites tailored for BPCS
research. Unlike other image-processing domains that benefit from common corpora and
standardised transform pipelines, BPCS researchers generally use diverse image sets, custom
transform scripts, and different detector implementations. This absence of a shared
benchmark contributes to inconsistent reporting and slows progress toward comparability and
cumulative improvement (De La Croix et al., 2024).

The present study addresses these gaps by adopting a comprehensive, reproducible evaluation
protocol that reports embedding capacity, PSNR and SSIM, extraction success after
18
controlled lossy transforms (multiple JPEG quality levels, resizing and cropping), detection
performance against contemporary CNN steganalysers with augmentation, and runtime plus
metadata overhead for conjugation bookkeeping. In addition, the study explores error-control
(forward-error-correction and redundancy) and lightweight metadata compression schemes to
improve robustness, and it includes a small domain case study to surface practical constraints.
All code, parameter settings, and experiment scripts will be made available to support
reproducibility and comparison with future work

CHAPTER THREE
3.0 RESEARCH METHODOLOGY
This chapter explains how the secured file-sharing system was designed, implemented and
evaluated. It presents the overall research strategy, the threat model that motivates design
decisions, the attacks and mitigation patterns tested, the data and experimental setup,
implementation details, and the metrics and analysis methods used to draw conclusions. The
approach is empirical and engineering-focused: construct an artifact (AES-256 + BPCS
pipeline), measure its behaviour under controlled conditions, and use those measurements to
answer the research questions about capacity, imperceptibility, robustness and detectability.

3.1 Research design


This study follows a design-science, experimental methodology. The objective is not only to
produce a working prototype but to expose the trade-offs inherent in combining cryptography
and BPCS steganography and to evaluate those trade-offs quantitatively. The research
proceeds through three interlinked phases:
Phase 1 Baseline analysis. Implement simple, well-known embedding approaches (for
example, LSB substitution) and validate the measurement framework (image I/O,
PSNR/SSIM computation, steganalysis pipelines). Establishing baselines ensures that later
comparisons with BPCS are meaningful.
Phase 2 System construction. Build the AES-256 + BPCS pipeline as a configurable
artifact. The implementation separates concerns (payload preparation, encryption, bit-plane
decomposition, complexity measurement, conjugation bookkeeping, metadata encoding, and
optional FEC) so individual components can be replaced or tuned without changing the rest
of the system.
Phase 3 Empirical evaluation. Execute systematic experiments across a matrix of
parameters and cover images to quantify embedding capacity, visual fidelity, robustness to
realistic transformations (recompression, resizing, cropping, noise), and detectability by

19
modern steganalysis methods. Experiments are automated, logged, and repeated to estimate
variability.
Choosing this three-phase design provides clear reasoning: baselines validate measurement,
modular construction supports controlled experiments, and systematic evaluation produces
data-driven conclusions.

3.2 Attack Identification.


In secure file-sharing environments, one of the most persistent and practically significant
threats is the interception of data during transmission. When files are exchanged over digital
networks particularly across public or semi-trusted channels such as the internet, cloud
platforms, or email infrastructure they are exposed to a range of adversarial actions, including
packet sniffing, man-in-the-middle (MITM) attacks, and unauthorized access through
compromised endpoints or malware.

In conventional systems, encryption is typically the primary line of defense against such
threats. While strong cryptographic algorithms can protect the confidentiality of data,
encrypted files remain visible as high-value targets during transmission and storage.
Attackers may attempt to capture these files for offline analysis, key recovery attempts, or
future exploitation. Moreover, when weak, outdated, or improperly implemented
cryptographic schemes are used, adversaries can sometimes bypass or brute-force the
protection mechanisms. Even when strong encryption is employed, the mere presence of
encrypted data can raise suspicion and attract targeted attacks.

This limitation motivates the use of steganography as a complementary protection


mechanism. Unlike encryption, which focuses on making data unreadable, steganography
aims to conceal the very existence of the data. In particular, Bit-Plane Complexity
Segmentation (BPCS) steganography embeds information within the complex regions of
digital images, making the resulting stego-images visually indistinguishable from ordinary
images and less likely to attract attention or scrutiny.
In this project, the primary attack being addressed is the interception and unauthorized
exposure of sensitive files shared over digital platforms, especially in scenarios such as:
 Cloud storage transfers,
 Email attachments, and
 File sharing over public or untrusted networks.
When files are transmitted without adequate concealment mechanisms, an adversary may
intercept, copy, analyze, or modify them in transit. Recent industry reports indicate that a
significant proportion of data breaches involve data in motion, often through phishing
20
campaigns, compromised networks, or traffic interception. These observations highlight that
protecting data solely through encryption is not always sufficient, particularly in adversarial
environments where traffic monitoring is common.

Accordingly, the specific attack identified in this study is the interception and analysis of
sensitive files during transmission, leading to unauthorized disclosure or manipulation. By
integrating BPCS steganography with AES-256 encryption, the proposed system does not
only aim to make data unreadable, but also to make its presence inconspicuous. This shifts
the defensive strategy from merely “protecting the content” to also “hiding the
communication,” thereby reducing the likelihood that the data will be noticed, targeted, or
subjected to further attacks in the first place.
3.3 Attack Pattern Design
To guide the design of effective countermeasures, this study models the typical sequence of
actions an adversary may follow to compromise sensitive files shared over a network when
no steganographic protection is employed. This attack pattern represents a simplified but
realistic lifecycle of data interception and misuse in file-sharing scenarios. Understanding this
sequence helps to justify the integration of both encryption and steganography in the
proposed system.
Attack Pattern: Data Interception in File Sharing
1. Target Scanning and Network Monitoring
The attacker monitors network traffic or scans communication channels to identify
file transfers of interest. This may involve passive packet sniffing on public networks
or active probing of cloud and email services.
2. Man-in-the-Middle (MITM) Setup
The adversary positions themselves between the sender and the receiver, or
compromises an intermediate node, enabling the capture or redirection of transmitted
files without the knowledge of either party.
3. Data Interception and Extraction
The transmitted files are captured and stored by the attacker. If the files are encrypted,
they may be flagged for further cryptanalysis or offline analysis.
4. Payload Analysis or Data Tampering
The attacker attempts to analyze the intercepted data, exploit weak encryption, or
modify the content before forwarding it to the intended recipient, potentially causing
data corruption or malicious injection.
5. Exfiltration or Monetization
Finally, the extracted information may be exfiltrated, sold, leaked, or otherwise
exploited for financial, political, or strategic gain.
21
This attack pattern illustrates that traditional file-sharing systems primarily expose the
existence of valuable data, even when encryption is applied. The proposed AES-256 + BPCS
approach directly targets the early stages of this attack chain by concealing the presence of
sensitive data within ordinary-looking images, thereby reducing the probability of
interception, analysis, and subsequent exploitation.

3.4 Mitigation Pattern and Explanation


Figure 3.1 illustrates the mitigation pattern implemented in this study. The pattern combines a
proven symmetric-encryption layer (AES-256) with Bit-Plane Complexity Segmentation
(BPCS) steganography so that sensitive files are both unreadable and, crucially,
inconspicuous while in transit. The design objective is to reduce the probability that a transfer
will be noticed and targeted (by hiding existence) while still ensuring confidentiality,
integrity and recoverability (by encrypting and hashing payloads).

Figure 3.1: Mitigation Pattern for Secure File Transfer

Step-by-Step Description of the Mitigation Strategy:

22
Step-by-step mitigation workflow
1. Payload preparation and hashing
Compute a cryptographic integrity tag (for example SHA-256 or HMAC) over the original
file and attach this tag to the payload. This enables the receiver to detect tampering after
extraction and before decryption.
2. Encryption (AES-256)
Encrypt the prepared payload (payload + integrity tag) using AES-256 in a suitable
authenticated mode (e.g., GCM) or combine AES with an HMAC. Encryption ensures that
even if an adversary extracts embedded data, the contents remain confidential and
indistinguishable from random noise.
3. Optional forward-error-correction (FEC) and fragmentation
Optionally apply an FEC code (e.g., Reed-Solomon) or fragment the ciphertext across
multiple images. Improves recoverability after lossy transformations (recompression,
cropping) at the cost of some capacity.
4. BPCS embedding with conjugation bookkeeping
Decompose the cover image into bit-planes and partition each plane into fixed-size blocks
(typical baseline: 8×8). Compute a normalized complexity metric for each block and select
only complex (noise-like) blocks for substitution. If a ciphertext block is not complex, apply
a deterministic conjugation transform and record its location in the conjugation map.
Compress the conjugation map (gzip/RLE/bit-pack) and embed it (either in reserved blocks
or distributed across the cover images). Embedding in complex blocks preserves
imperceptibility and reduces statistical artifacts relative to naive LSB replacement.
5. Stego-image transmission
Transmit the stego-image over the network as a normal image file. The image should appear
visually and statistically consistent with ordinary images to lower the chance of triggering
detection systems.
6. Receiver extraction and verification
Extract the conjugation map, recover embedded ciphertext blocks (reverse conjugation where
required), apply FEC decoding if used, verify the integrity tag, and finally decrypt the
ciphertext using the shared AES key. Reason: end-to-end checks ensure integrity and guard
against silent corruption or tampering

3.5 Data resources, collection and preprocessing


This subsection defines, with reproducible detail, the datasets and preprocessing steps used
across all experiments. The primary image corpus employed in this study is BOSSbase v1.01,
which is a standard benchmark dataset widely used in steganography research. The images
are categorised according to visual complexity using the same block-complexity measure
23
adopted in BPCS, as described in Section 3.7. To ensure reproducibility, image selection is
performed using scripts with a deterministic random seed, and a manifest containing the exact
filenames selected for each experimental run is stored and documented. The test payloads
consist of synthetic plaintext files, small text documents, sample PDF files, and small binary
files. Payload sizes are chosen to cover representative operational regimes, specifically 5 KB,
25 KB, 50 KB, and 100 KB. All payloads used in the experiments are either synthetically
generated or obtained from public sample sources, and no personal or sensitive data are
included at any stage of the study.

A deterministic preprocessing pipeline is applied to all cover images prior to embedding.


First, all images are converted to lossless formats, specifically 24-bit BMP or PNG, in order
to preserve bit-plane integrity for baseline experiments. The images are then resized to the
canonical experimental resolution of 512 × 512 pixels using bilinear interpolation, with all
resizing parameters explicitly logged to ensure bit-plane alignment and experimental
repeatability. When evaluating robustness to noise, optional denoising is applied using a
Gaussian blur, with the standard deviation and kernel size fully specified. In addition, pixel
encoding is normalised by explicitly defining byte order and channel order so that bit-plane
decomposition remains consistent across different platforms. Finally, all files are validated
using checksums, and any corrupt entries are removed, with a complete log of rejected files
maintained. All scripts, versioned source code, and configuration files used for preprocessing
are stored in the experiment repository, enabling other researchers to reproduce the exact
processed corpus used in this study.

3.6 BPCS embedding and extraction


This section gives the exact operational procedure used for BPCS embedding and extraction
so that implementations are unambiguous and reproducible. Each cover image (or each color
channel for color images) is represented by its eight bit-planes, labeled from P₇ (most-
significant bit) to P₀ (least-significant bit). The BPCS pipeline operates independently on
each bit-plane: each plane is partitioned into non-overlapping square blocks of size m×m,
where the baseline block size used throughout experiments is m = 8 and sensitivity tests are
performed using m = 16.

For a binary block B of size m×m the block complexity is computed by counting transitions
between adjacent bits in both horizontal and vertical directions. Concretely, let H be the total
number of horizontal transitions (the sum, over every row, of adjacent bit pairs that differ)
and let V be the total number of vertical transitions (the sum, over every column, of adjacent

24
bit pairs that differ). The maximum possible number of transitions in an m×m block is
2·m·(m − 1). We therefore define the normalized block complexity C(B) as

H +V
C (B)=
2 ×m ×(m−1)
which yields a value in the interval [0, 1]. Blocks whose complexity meets or exceeds a
chosen threshold T are classified as complex and are eligible for substitution by ciphertext
blocks. In the experiments reported in this thesis the threshold T is varied in the range {0.25,
0.30, 0.35, 0.40, 0.45} with a commonly used baseline of T ≈ 0.30–0.40 in line with prior
BPCS work. The complexity measure and thresholding step are applied identically during
embedding and at extraction for block classification reproducibility.

When a ciphertext block to be embedded is itself simple (that is, its internal complexity is
below the required threshold), a deterministic conjugation transform is applied so that the
payload block appears noise-like and can be substituted into a complex cover block without
perceptual mismatch. The conjugation used in experiments is an XOR with a fixed
checkerboard mask M (alternating ones and zeros), although other deterministic masks may
be evaluated. A conjugation flag is recorded for every payload block that is transformed; the
collection of these flags forms the conjugation map, which is required for correct inversion
during extraction.

Because the conjugation map can be large relative to the payload and therefore reduce
effective capacity or introduce detectable structure, the map is compressed prior to
embedding. Two lightweight encodings are used experimentally: run-length encoding (RLE)
for sparse flag patterns and general-purpose lossless compression (gzip/DEFLATE) for
denser patterns; variants also test bit-packing to represent flags compactly. The compressed
conjugation map is itself treated as auxiliary data: it may be embedded in designated robust
locations within the same stego-image (for example, reserved high-complexity blocks),
distributed across multiple images to avoid concentration, or split and replicated to increase
recoverability at the cost of capacity.

Block selection for embedding follows a deterministic, reproducible procedure. Candidate


complex blocks are enumerated in a fixed scan order (row-major within each plane) and a
pseudo-random selection derived from a global experiment seed is used when the number of
eligible blocks exceeds the number required for the payload. This seeded selection provides
randomness for cover distribution while preserving exact reproducibility: the seed, selection
algorithm, and block ordering are recorded in the experiment manifest for each run.

25
The embedding workflow therefore proceeds as follows. The payload is first prepared
(integrity tag appended, AES-256 encrypted, and optionally FEC-encoded and fragmented),
then partitioned into blocks sized to match the embedding unit. For each payload block, a
target complex cover block is chosen according to the deterministic selection policy; if the
payload block is simple it is conjugated and the conjugation map updated. After all payload
blocks are placed, the conjugation map is compressed and embedded according to the chosen
metadata strategy. The resulting stego-image is written in a lossless format and logged with
full provenance information including the conjugation-map size and the exact list of block
indices used.

Extraction reverses the embedding steps in a symmetric manner. The receiver decomposes the
received stego-image into bit-planes and blocks, computes per-block complexity and
enumerates the same candidate sequence of complex blocks (using the shared seed when
pseudo-random selection was used). Embedded blocks are extracted in the recorded order;
the compressed conjugation map is recovered and decompressed, and any conjugated payload
blocks are XORed with the mask M to restore their original bit patterns. If FEC was applied,
decoding occurs at this stage to correct errors introduced by lossy transformations; integrity
verification is then performed by checking the appended hash/HMAC. Only when integrity
verification succeeds is the ciphertext decrypted with the shared AES key to recover the
original payload.

3.7 Feature extraction and machine-learning verification


Detectability is evaluated with a supervised machine-learning detector that is integrated into
the experimental pipeline to provide a quantitative verification module. The detector’s
purpose is twofold: first, to measure how readily stego images created under different
parameter settings can be distinguished from cover images; and second, to serve as a
reproducible verification check in the prototype pipeline. To support these aims, the
methodology defines a compact, interpretable feature set, a baseline classifier with a clear
rationale, and a training and validation protocol designed to avoid cover-source mismatch and
to report robust performance statistics.
Feature engineering produces a single feature vector for each image (cover or stego). The
vector contains several descriptors drawn from bit-plane statistics, embedding metadata and
perceptual measures. Specifically, the pipeline computes the entropy of each bit-plane to
capture changes in information distribution across significance levels; it records the fraction
of eligible (complex) blocks that were used for embedding and the average complexity of
those replaced blocks as direct indicators of embedding intensity; it derives first three
26
histogram moments (mean, variance, skewness) of luminance to capture gross photometric
alterations; it computes a local variance or noise estimate by averaging patch-level standard
deviations to capture subtle texture changes; and it includes global fidelity proxies such as
PSNR and SSIM where post-embedding comparisons are available. All feature extraction
routines are implemented with OpenCV and scikit-image, and the exact function calls,
parameter values and library versions are recorded in the experiment log to guarantee
reproducibility.

Random Forest is chosen as the baseline detector because it handles mixed feature types
robustly, offers straightforward interpretability through feature importance measures, and has
modest computational cost relative to deep models. The Random Forest hyperparameters
explored during model selection include the number of trees n ∈ {50, 100, 200} and
maximum tree depth max_depth ∈ {10, 20, None}. Training follows a stratified data split
with 70% of examples used for training, 20% reserved for testing, and 10% held out for
validation; within the training partition, hyperparameter tuning is performed using stratified
10-fold cross-validation to avoid overfitting and ensure stable estimates. Performance is
reported comprehensively: accuracy, precision, recall, F1-score, and ROC/AUC for
probability-producing detectors are presented alongside confusion matrices; calibration
curves and selected operating points (thresholds) are included when practical to translate
detector scores into actionable alerts. Final models are persisted using joblib with versioned
filenames that embed the parameter settings and dataset manifest, and feature importance
reporting is retained with the saved model to support post-hoc analysis of which descriptors
most influence detectability.
To assess robustness against stronger adversaries, selected experiments also evaluate modern
convolutional neural network (CNN) steganalyzers. These adversarial-detector experiments
require larger training sets and augmentation to avoid overfitting; therefore the methodology
prescribes augmentation regimes that include recompression, resizing and mild noise
transforms to emulate likely real-world variability. CNN training includes standard best
practices such as training/validation splits, early stopping, learning-rate schedules, and
monitoring for overfitting; GPU acceleration is used for these experiments and hardware
requirements are documented in the appendix. For both Random Forest and CNN
experiments, special care is taken to avoid cover-source mismatch: separate cover collections
are held out as independent test sets and class balance is maintained during sampling. All
training runs log random seeds, hyperparameters, training curves and final model checkpoints
so that results can be reproduced exactly.

27
Finally, the verification module is integrated into the experimental reporting pipeline:
detector performance is evaluated across the full parameter sweep (complexity threshold,
block size, metadata encoding, embedding rate) and reported as detection curves (AUC vs.
embedding rate), along with tabulated metrics at representative operating points. Where
detectors indicate elevated detection risk, those parameter regions are annotated and used to
inform recommendations about safe operational embedding settings.

3.8. Module Integration


Figure 3.2 shows the overall system architecture and how the main components are combined
into a single, reproducible pipeline. The integration is intentionally modular so that each
component preprocessing, embedding, encryption, verification, ETL and the optional user
interface can be developed, tested and replaced independently while preserving clear data
contracts between modules.

Figure 3.2: Python integration of image processing, encryption, ML, and ETL modules.

At the start of the pipeline, images are ingested and normalized by the image-processing
module. This module is implemented in Python using OpenCV and Pillow, and exposes a
small set of deterministic functions: load_image(path) → Image, preprocess(image, config)
→ image, and decompose_bitplanes(image) → planes. These functions perform lossless
conversion, canonical resizing, channel/byte-order normalization and the bit-plane

28
decomposition required by BPCS. The resulting bit-planes and per-block complexity map are
the canonical inputs to the embedding routine.

Encryption is performed as a separate, well-contained stage prior to embedding. The payload


preparation routine computes an integrity tag, applies AES-256 encryption via the Python
cryptography library (or PyCryptodome in alternative builds), and optionally applies forward-
error-correction and fragmentation. The encryption module exposes deterministic I/O (for
example, encrypt_payload(payload, key) ciphertext, metadata) and records the exact
parameters used (cipher mode, IVs, nonce values) in the provenance log to support
reproducibility and forensic auditing.

The BPCS embedding module consumes the ciphertext and the precomputed complexity
map. It implements the block-selection, conjugation and compressed conjugation-map
embedding strategies described in §3.6 and provides a small API such as embed(ciphertext,
planes, config, seed) → stego_image, embed_metadata. The embedding function returns both
the stego-image and a compact provenance object that records block indices used,
compressed metadata size, runtime and a checksum. Importantly, the embedding module
separates the policy for selecting embedding locations from the low-level bit operations so
that selection policies (deterministic scan, seeded pseudo-random selection, or prioritized
high-complexity selection) can be swapped without changing the embedding primitives.

For verification and detectability assessment, the machine-learning module is invoked after
embedding (and optionally after extraction). It exposes extract_features(image) →
feature_vector and classify(features, model) score,label. The production baseline uses a
trained Random Forest persisted with joblib; adversary-grade experiments can call a CNN
inference routine if a GPU is available. All model inferences and feature vectors are logged
with their model version and seed to ensure traceability of results.

Operational logging and ETL are handled by a lightweight telemetry module that
standardizes metadata produced by each step into a structured JSON record. After embedding
or extraction, these records (containing provenance, metric values, runtime, and checksums)
are exported to Apache NiFi where extract-transform-load flows convert them into CSV
artifacts and push them into the analytics layer. Power BI is used as the visualization front
end in our experimental setup; the NiFi→CSV→PowerBI pipeline supports interactive
dashboards for capacity, PSNR/SSIM distributions, recovery rates, and detector performance
over parameter sweeps. All ETL transformations are versioned scripts so the dashboard can
be regenerated from raw logs.
29
Exception handling and data integrity are enforced at module boundaries. Each major
function returns a status code and an auditable provenance object; failures trigger controlled
rollback or retry policies (for example, if embedding cannot place the full payload in a single
image, the pipeline will either fragment the payload automatically or raise a deterministic
error recorded in the log). Cryptographic keys are never written to logs; only non-sensitive
key identifiers and key-usage metadata are recorded. For experiments that require key
exchange, the pipeline assumes a secure out-of-band channel for key provisioning and
records only the key ID and key derivation parameters used.

Performance was a consideration in integration. Embedding and extraction routines are


implemented to allow batch processing of images, and heavy computations (complexity
scoring, CNN training/inference) can be parallelized across CPU cores or offloaded to GPU
where available. The integration layer exposes simple concurrency primitives so the same
codebase can run as a single-threaded experimental script or as a multi-worker pipeline for
large-scale runs. Measured throughput (KB/s) and peak memory are captured in the
provenance for each run to allow a practical assessment of deployability.

Finally, a simple Tkinter prototype GUI was developed to demonstrate how the core
functions can be invoked from a front end. The GUI is intentionally lightweight: it calls the
same core APIs used by the scripts (load_image, encrypt_payload, embed, extract, classify)
and only mediates user input and event handling. Because the GUI uses the same APIs, it
does not introduce new logic or attack surface in experiments; it serves as a usability
demonstrator rather than a production interface.

Module integration follows a layered, interface-driven design where each component


publishes a small, well-documented API, returns an auditable provenance record, and logs
deterministic seeds and versions. This organization supports reproducible experiments,
enables controlled substitution of alternative algorithms (for example, alternate complexity
measures or different classifiers), and provides the operational metadata required to evaluate
capacity, imperceptibility and detectability across the full parameter space. The codebase and
integration tests are included in the experiment repository so reviewers can exercise the same
module interactions used in the reported experiments.

3.9 Experimental design, parameter matrix and test matrix


The empirical evaluation uses a factorial design covering key parameters:
 Complexity threshold TTT: 0.25, 0.30, 0.35, 0.40, 0.45.
30
 Block size mmm: 8, 16.
 Payload size: 5 KB, 25 KB, 50 KB, 100 KB.
 Cover complexity tier: low, medium, high (determined by precomputed distribution of
block complexities).
 Cover format: lossless baseline (PNG/BMP); lossy transforms applied in robustness
tests (JPEG Q=90,70,50).
 Error control: none, light FEC, heavier FEC.
 Conjugation map encoding: none, RLE, entropy coding.
Each experimental condition is run on multiple images (n≥30 per condition) and repeated to
gather means and variances. Statistical analysis plans (ANOVA or non-parametric
equivalents, post-hoc tests, effect sizes) are recorded in the methodology so inference is pre-
specified.

3.10 Evaluation metrics and statistical methods


The methodology defines precisely how each evaluation axis is measured:
 Embedding capacity: number of payload bytes embedded per cover image and
payload/cover ratio. Report both raw embedded bytes and net usable payload after
metadata and FEC overhead.
 Imperceptibility: PSNR (dB) and Structural Similarity Index (SSIM) measured
between cover and stego images. Acceptable thresholds are stated (e.g., PSNR>30 dB
commonly considered acceptable; SSIM>0.90 as strong similarity), but analysis treats
these as continuous measures.
 Robustness: extraction success rate (%) after each defined transform (e.g., JPEG
Q=70). When FEC is used, measure both raw extraction success and post-FEC
corrected success.
 Detectability: detection rate, ROC/AUC and false positive rates for each detector;
report detection at fixed false positive operating points to enable comparison.
 Performance overhead: time per embed/extract operation (ms), memory usage and
metadata size (bytes).
Statistical reporting includes mean ± standard deviation, confidence intervals, tests for
significance across parameter conditions (with correction for multiple comparisons), and
regression models to quantify trade-offs (e.g., payload size vs detection probability). All
analysis scripts are parameterised and saved.

3.11 Implementation plan, software and reproducibility


Implementation environment. The prototype is implemented in Python (3.8+). Core
libraries: OpenCV, NumPy, scikit-image, scikit-learn, cryptography, pandas, joblib. Scripts
31
are run in a controlled conda environment with an [Link] file capturing exact
package versions.
Orchestration and logging. Apache NiFi (or equivalent orchestration tool) is used to
automate experiment runs, but all experiments can be executed by a single-run Python script
with a JSON configuration. Each run generates a run directory containing: (a) copy of
configuration file, (b) raw outputs (stego images, extracted payloads), (c) logs, (d) derived
metrics (PSNR/SSIM files), and (e) model artifacts.

Repository and artifacts. All code, experiment manifests, and (where licensing permits) data
manifests are published to a versioned repository. Large datasets are referenced by canonical
links (e.g., BOSSbase mirror) and checksums so others can obtain identical inputs. The
methodology requires documentation of hardware (CPU/GPU, RAM), OS and Python
versions.
Reproducibility practices. Deterministic seeds for random sampling, explicit logging of
configuration per run, and automated generation of experiment reports are required. The
methodology prescribes publishing an experiment manifest and a reproducibility checklist
with each reported result in Chapter Four.

3.12 Ethical considerations and governance


The methodological approach incorporates ethical safeguards:
 Only public benchmark or synthetic payloads are used; no personal or sensitive real-
world user data are processed.
 Experiments are confined to a controlled laboratory environment and do not attempt
to bypass lawful monitoring or facilitate illicit activity.
 Code and artifacts released for reproducibility exclude private keys and real payloads;
sample keys and instructions for generating test payloads are provided instead.
 The study documents data retention policies and secure deletion routines for
experimental artifacts to prevent accidental leakage.
These practices are documented in the repository README and in an ethics appendix.

32
CHAPTER FOUR
IMPLEMENTATION, RESULT AND DISCUSSION
4.1 Implementation
This chapter presents the practical realization of the secure file-sharing system proposed in
this study and discusses the experimental results obtained from its evaluation. The
implementation phase translates the conceptual architecture and algorithms described in the
previous chapters into a functional, reproducible software prototype. The system was
developed in Python due to the availability of mature libraries for image processing,
cryptography, data handling and machine learning, as well as its portability across common
research platforms.

To ensure reproducibility, the development environment was fully versioned, including the
operating environment and all library dependencies. All input images were normalised to a
fixed resolution before embedding in order to eliminate inconsistencies caused by varying
source image sizes and formats. The entire system was organised as a modular pipeline with
clearly defined interfaces between components, allowing individual modules to be tested,
replaced or extended without affecting the rest of the system.

At the highest level, the pipeline begins with payload preparation and encryption. The secret
file is first processed to compute an integrity checksum and then encrypted using AES-256 in
an authenticated mode, ensuring both confidentiality and integrity protection. The encryption
stage produces a ciphertext stream together with the required cryptographic parameters,
33
which are stored in the experiment metadata for traceability. The encrypted data is then
segmented into fixed-size blocks that match the block structure used by the BPCS embedding
process. An optional redundancy stage can be enabled at this point when robustness against
lossy transformations is being evaluated.

The BPCS module is responsible for bit-plane decomposition, block segmentation and
complexity analysis. Each cover image is decomposed into its constituent bit-planes, and
each plane is divided into non-overlapping blocks. For every block, a complexity value is
computed based on the number of horizontal and vertical bit transitions, and this value is
normalised to fall within a fixed range. Blocks whose complexity exceeds the chosen
threshold are marked as eligible for data substitution. All parameters used in this process,
including block size and complexity threshold, are recorded for each experimental run.

When a payload block has low complexity and would introduce visible artefacts if embedded
directly, the system applies conjugation using a fixed checkerboard pattern. The fact that a
block has been conjugated is recorded in a conjugation map. To minimise metadata overhead,
this map is compressed using a lightweight encoding scheme before being embedded into
reserved high-complexity regions of the image. The embedding process then replaces
selected complex blocks with payload blocks and reconstructs the stego image. The final
output of this stage is the stego image together with a detailed provenance record containing
payload size, number of blocks used, metadata size and runtime.

The extraction process is implemented as the exact inverse of embedding. The stego image is
decomposed into bit-planes, the reserved regions are read to recover and decode the
conjugation map, and any conjugated blocks are restored to their original form. The
ciphertext stream is then reassembled and verified using the stored integrity information
before decryption. To guarantee correctness, the system performs a hash comparison between
the original payload and the extracted file for every successful run, and the result is written to
the experiment log.

All experiments are managed by an orchestration layer driven by structured configuration


files. Each configuration specifies the cover image set, payload size, embedding parameters,
optional post-processing operations (such as recompression or noise addition), and random
seeds. The orchestration layer executes the full pipeline, collects outputs, computes quality
metrics such as PSNR and SSIM, and stores all artefacts and logs in a dedicated run directory.
This directory serves as the complete and authoritative record of each experiment, making it
possible to reproduce any reported result exactly.
34
The machine-learning verification component is implemented as a separate module. After
embedding or extraction, a feature-extraction routine computes descriptive statistics from
each image, including bit-plane entropy, block usage ratios and intensity distribution
measures. These features are stored in tabular form and used to train and evaluate a Random
Forest classifier that distinguishes between clean and stego images. The trained model, its
parameters and its evaluation metrics are saved alongside the experiment logs. This
separation ensures that the detection component can be evaluated independently of the
embedding process while still being integrated into the overall pipeline.

From a performance perspective, the implementation prioritises clarity and correctness while
still applying basic optimisations. Bit-plane operations and complexity calculations are
vectorised using numerical libraries to avoid unnecessary overhead. On the development
hardware, typical embedding and extraction operations complete within seconds for standard
image sizes. Execution times, memory usage and system specifications are logged for each
run to ensure that performance claims are supported by measured data. For large
experimental sweeps, the orchestration layer supports parallel execution across multiple CPU
cores to reduce total runtime.
Figure 4.1 illustrates the interaction between the main software modules and shows how
encryption, BPCS embedding, feature extraction and logging are combined into a single
coherent pipeline.

35
Figure 4.1: Code snippet demonstrating integration of core modules

4.2 Experimental setup and test plan


The experimental phase was organised to exercise the prototype across a broad but well-
controlled range of realistic conditions so that conclusions about capacity, imperceptibility,
robustness and detectability are statistically meaningful and reproducible. At a high level, the
test plan compares three behaviours: a baseline using classical LSB embedding, the BPCS
pipeline under lossless (baseline) conditions, and the BPCS pipeline under adversarial post-
processing (lossy recompression, geometric transforms and added noise). Every experiment is
driven by a single JSON configuration that records the cover image identifiers, payload file
and size, embedding parameters, optional error-correction settings and the deterministic
random seed; an orchestration layer consumes that configuration, executes the embed–
transform–extract cycle, and archives a complete run directory for later analysis.

36
The cover corpus for experiments is drawn from the BOSSbase v1.01 benchmark. For the
LSB baseline fifty images were selected at random using a fixed seed; these fifty images
form the canonical set used to demonstrate the limits of naive spatial embedding for
encrypted payloads. For the full BPCS parameter sweeps a larger, stratified corpus was used
to represent different content families: images were assigned to low, medium and high
complexity tiers using the same per-block complexity metric defined in Chapter Three, and
for each experimental condition at least thirty images from each tier were processed so that
per-condition statistics reflect content variability rather than idiosyncrasies of particular
images. Payloads were synthetic or publicly available sample files in four target sizes (≈5
KB, 25 KB, 50 KB, 100 KB) so experiments cover both light and heavy embedding regimes.
In all cases the payloads were encrypted with AES-256 prior to embedding, meaning that
ciphertext not cleartext was embedded, which replicates the realistic use case of protecting
content before concealment.
Each experimental run is specified by a compact parameter set. The primary BPCS factors
swept in experiments were block size (8×8, 16×16), complexity threshold (T = 0.25, 0.30,
0.35, 0.40, 0.45), payload size, and forward-error-correction level (0%, 10%, 20%
redundancy). To probe robustness, post-processing transforms were applied deterministically
according to the run configuration: JPEG recompression at quality factors 90, 70 and 50;
downscaling to 50% of original dimensions; cropping a 10% border; and additive Gaussian
noise at two severity settings. These transforms were applied both in isolation and in
controlled combinations so the impact of single and compound transforms could be separated
in analysis. All transform parameters are recorded in the run manifest so any reported
extraction failure can be traced to exact conditions.

Detection and verification experiments were designed to generate training and evaluation
data in a way that avoids cover-source mismatch and preserves representative class balance.
Feature vectors were computed for both clean and stego images using the same routines
described in Chapter Three, and datasets were partitioned with a stratified split that preserves
the proportions of complexity tiers and payload densities. The default split used for detector
experiments was 70% training, 20% test and 10% validation; hyperparameter tuning relied on
ten-fold cross-validation within the training fold. Random Forest hyperparameters were
explored over modest grids and the final models were persisted along with cross-validation
metrics and confusion matrices. When convolutional neural steganalysers were used for
adversarial evaluation, training followed the same stratification but additionally applied an
augmentation policy (recompression and scaling variants) to model realistic variability; GPU
resources were reserved for these runs and augmentation details are recorded in each run
manifest.
37
To characterise run-to-run variability, every experimental condition was executed multiple
times under controlled randomness. Deterministic decisions use an explicit random seed and
stochastic steps that could introduce non-determinism (for example, multi-threaded IO) were
repeated with multiple seeds where needed. For the factorial grid described above, every
combination of block size, threshold, payload size and FEC level was evaluated on at least
thirty image instances; a subset of more computationally expensive combinations (for
example, heavy FEC with severe recompression and CNN adversary) were repeated five
times to produce robust variance estimates. Summary statistics reported in Chapter Four
therefore present means together with standard deviations and, where appropriate, confidence
intervals.
Measurement and logging were designed to support rigorous statistical analysis. For each run
the orchestration layer computes perceptual metrics (PSNR and SSIM), an extraction success
flag (binary), round-trip integrity (hash match between original payload and decrypted
payload), conjugation map size (bytes), net usable payload after metadata and FEC overhead,
embedding and extraction times (ms), and detector outputs (probabilities and binary
decisions). These per-run metrics are written as timestamped CSV records and linked to the
archived raw artefacts (stego images, extracted payloads and configuration JSON). The
statistical analysis plan, prepared before executing experiments, specifies parametric tests
(ANOVA) when assumptions are met and non-parametric alternatives (Kruskal–Wallis)
otherwise; significant omnibus tests are followed by post-hoc comparisons with correction
for multiple testing. Regression models are used to characterise continuous trade-offs for
example, modelling detection probability as a function of payload size, complexity threshold
and conjugation density. All analysis scripts are reproducible notebooks saved in the
repository so the same statistical results can be regenerated from the archived outputs.

Hardware and software environments were treated as explicit experimental factors. The
majority of runs were executed on an Intel-class workstation with 16 GB RAM; runs that
required heavier computation (CNN training) used a GPU-enabled workstation or cloud
instance. Each run manifest records the exact machine specification (CPU, GPU, RAM,
operating system and library versions) so results obtained on different machines can be
compared meaningfully. Deterministic seeds, environment files and versioned code ensure
that runs performed on different hardware remain comparable.

To aid transparency and reviewer inspection, the experimental grid, transform definitions and
planned run counts are summarised in accompanying tables (parameter grid, transform table,
and run-count table). The test plan also defines pass/fail and acceptance criteria used in the
38
results discussion: an embedding/extraction is considered successful only when the decrypted
payload matches the original exactly by hash; acceptable imperceptibility for baseline
lossless experiments is operationally defined as PSNR > 30 dB and SSIM > 0.90 (noted as
heuristic thresholds rather than rigid rules). Detection resilience is assessed from detector
ROC curves and detection rates at chosen false positive operating points; for deployment
recommendations the system is considered useful where it achieves acceptable capacity and
imperceptibility while maintaining detection probability below an operational threshold under
the expected adversary model. These decision rules are implemented in the analysis
notebooks so that runs can be classified reproducibly into success, marginal and failure
categories.
4.3 Results
This section presents the empirical findings from the pre-test baseline and the full BPCS
experimental campaign. The narrative links measured outcomes to the experimental
conditions and highlights the trade-offs established in the research design. Numeric
summaries are shown where they succinctly illustrate the behaviour; detailed raw numbers
are available in the run manifests and CSV metric files in the project repository..

4.3.1 Pre-test (LSB) results baseline


Table 4.1 shows the initial pre-test using classic Least Significant Bit (LSB) substitution
established a clear baseline against which BPCS performance could be judged. The LSB
pre-test established a clear performance baseline. Embedding encrypted
payloads via classic least-significant-bit substitution into 50 images
showed that at very low embedding rates LSB remains visually
unobtrusive, but quality and detectability deteriorate rapidly as payload
size [Link] the smallest payload tested (≈5 KB) average objective quality remained
excellent and automatic steganalysis produced a low detection rate. At medium payloads (≈25
KB) PSNR and SSIM fell noticeably and a substantial fraction of images were flagged by the
detector. At the largest payload (≈100 KB) image quality metrics dropped markedly and most
stego images were readily detected. These results illustrate the familiar capacity versus
imperceptibility trade-off: LSB can hide small amounts of data with little visual impact, but
embedding encrypted (high-entropy) payloads at useful sizes produces detectable statistical
artifacts.

Table 4.1. LSB pre-test summary (averaged over 50 images)


Payload size Avg. PSNR (dB) Avg. SSIM Detection Rate (StegExpose)
5 KB 45.6 0.982 12%
25 KB 38.2 0.911 44%
39
Payload size Avg. PSNR (dB) Avg. SSIM Detection Rate (StegExpose)
100 KB 30.4 0.799 89%

The practical conclusion is that naive LSB embedding is unsuitable for embedding larger
encrypted payloads in realistic file-sharing scenarios. Any practical system must either
severely limit payload size or adopt embedding strategies that exploit cover-content structure
a gap addressed by BPCS.

4.3.2 BPCS imperceptibility and capacity (lossless baseline)


When ciphertext was embedded using BPCS in lossless cover images, behaviour differed
substantially from the LSB baseline. Under the standard BPCS configuration (8×8 blocks,
complexity threshold T = 0.30), moderate payloads of about 25 KB typically produced PSNR
values in the high 30s (dB) and SSIM values comfortably above 0.90 for many covers. This
demonstrates that replacing noise-like bit-plane blocks preserves key perceptual properties
and therefore yields far better imperceptibility than naive spatial replacement.

Capacity varied strongly with the intrinsic complexity of the cover image. High-complexity
images regularly accommodated net usable payloads at or above 25 KB after accounting for
conjugation-map overhead, whereas low-complexity images supported much smaller net
payloads without perceptual degradation. Adjusting the complexity threshold produced
predictable trade-offs: higher thresholds reduced the number of eligible blocks and therefore
lowered capacity while improving imperceptibility and reducing the statistical footprint
available to detectors; lower thresholds increased capacity but required more conjugation
(and therefore larger metadata) to avoid visible mismatch.

Changing block size to 16×16 reduced relative metadata overhead for very large payloads but
slightly reduced PSNR for equivalent payloads, since substituting larger blocks produces
more coarse-grained local changes. Overall, round-trip extraction fidelity in lossless
conditions was high: integrity checks passed in over 95% of moderate-payload runs, with the
few failures typically traceable to I/O or implementation issues rather than algorithmic limits.

4.3.3 BPCS robustness to lossy transformations


Robustness experiments examined the impact of common real-world transforms: JPEG
recompression, resizing, cropping, and additive noise. The results show that BPCS preserves
perceptual quality and capacity in lossless channels and tolerates light recompression, but
aggressive lossy processing substantially reduces recoverability unless redundancy or hybrid
strategies are applied.
40
At light recompression (JPEG Q = 90) moderate payloads were recoverable in most cases
without forward-error-correction (FEC). At moderate recompression (Q = 70) success rates
dropped for many embedding densities; adding modest FEC (10% redundancy) restored
recoverability into the 80–95% range for many images. At heavy recompression (Q = 50)
payloads and conjugation maps were frequently corrupted beyond direct recovery; only with
heavier redundancy (≥20%) and multiple embedded copies of the conjugation map did partial
recovery become possible, often at an effective payload cost that undermines the
attractiveness of the scheme for large files.

Geometric transforms (downscale to 50%, cropping of edge regions) produced alignment and
indexing problems that corrupted extraction unless conjugation-map placement included
spatial redundancy. Embedding the conjugation map in multiple reserved locations and
combining that with FEC improved recoverability for geometric damage, again at the cost of
net usable payload. Additive noise produced moderate degradation that FEC could mitigate in
many runs.

In short, BPCS is a strong choice for controlled or lossless channels and tolerates light,
common transforms; for lossy or adversarial channels meaningful payloads require careful
redundancy and metadata design, or hybrid embedding approaches that place essential
metadata in more robust domains.

4.3.4 Metadata overhead and runtime


Conjugation bookkeeping constitutes a real overhead. Conjugation map sizes in experiments
ranged from a few hundred bytes for sparse conjugation regimes up to multiple kilobytes for
dense conjugation. Run-length encoding (RLE) typically reduced map size by around 30% on
average in our corpus; larger gains occurred when conjugation flags were clustered.

Runtime profiling showed embedding and extraction of a 512×512 image completing in


roughly 0.8–1.5 seconds on the development machine; this includes decomposition,
complexity scoring, conjugation handling, map compression and reconstruction. FEC
processing adds additional time proportional to redundancy. Profiling highlighted hotspots in
block-level Python loops; vectorizing these operations or implementing critical routines in a
lower-level language would reduce latency and improve throughput for large experimental
sweeps.
4.3.5 Detection Random Forest and stronger detectors

41
Detection experiments using a Random Forest trained on engineered features revealed that
detectable statistical differences between cover and stego images do persist under many
configurations. In runs with large, balanced training sets and moderate-to-high embedding
densities, the Random Forest produced high accuracy (mid-90% range) and strong
precision/recall. Performance declined modestly when training data were smaller or
embedding density was lower, reflecting the detector’s sensitivity to the strength of the
statistical signal.
Stronger convolutional neural steganalysers trained with augmentation (recompression,
scaling, cropping) often outperformed the Random Forest on the same tasks, demonstrating
that modern data-driven detectors, when trained under realistic variability, can detect stego
images that a simpler classifier might miss. These findings underscore a central conclusion:
covertness must be evaluated against detectors that match the expected sophistication of real
adversaries. Operational parameter selection (thresholds, embedding density, metadata
engineering) should therefore be guided by the anticipated detector cost and capability;
conservative settings reduce detection risk but also reduce capacity.
.
4.4 Discussion
The results of this study reveal a practical and instructive distinction between simple spatial-
domain embedding and the more advanced Bit-Plane Complexity Segmentation (BPCS)
approach. By examining imperceptibility, embedding capacity, robustness under distortion,
and vulnerability to automated detection, the experiments tie observed behaviour directly to
the system goals set out earlier and show where BPCS delivers genuine advantages and
where it requires careful engineering.

The LSB pre-test behaved as theory predicts. At very low embedding rates LSB can preserve
perceptual quality, but even modest increases in payload produce rapid drops in PSNR and
SSIM and crucially a large rise in detectability. Because this work embeds encrypted
ciphertext (high-entropy data), the low-order bit-planes become perturbed in ways that
modern steganalysis readily exploit. In short, LSB is fragile for the practical use case
considered here: covert transfer of multi-kilobyte encrypted files. It only remains acceptable
for tiny, low-risk payloads.

BPCS, by design, addresses many of those limitations. When ciphertext is placed into
inherently noisy, high-complexity bit-plane regions, perceptual quality is preserved at
payloads that would break LSB. This alignment between noise-like ciphertext and noise-like
cover regions explains why human observers and many heuristic detectors struggle to
distinguish BPCS stego images from their covers, and why round-trip extraction succeeds
42
reliably in lossless channels. Capacity is therefore not an abstract metric but a cover-
dependent resource: textured, high-complexity images offer substantially more usable space
than smooth images, and BPCS allows systems to exploit that variability in ways LSB
cannot.

At the same time, the experiments make clear that BPCS is not a universal remedy. Spatial-
domain embedding remains sensitive to geometric distortion and heavy lossy recompression.
Severe JPEG re-encoding, aggressive downscaling and cropping break block alignment and
corrupt conjugation bookkeeping, so extraction reliability falls unless redundancy
mechanisms such as forward-error-correction or multiple metadata copies are applied. Those
mechanisms are effective, but they exact a cost: added redundancy and replicated maps
reduce net usable payload and raise computational overhead. Thus, BPCS is best suited to
controlled or lightly lossy channels (for example, encrypted messaging or VPN links) or to
workflows that can accommodate engineered redundancy; using BPCS over platforms that
routinely re-encode images requires hybrid design decisions and explicit trade-off budgeting.

The steganalysis experiments reinforce a further, important caveat: imperceptibility does not
imply undetectability. Even where PSNR and SSIM remain favourable, feature-based
detectors such as Random Forest models can often distinguish between clean and BPCS-
modified images when they are trained on representative data and embedding densities.
Convolutional neural steganalysers trained with realistic augmentations typically improve
detection further. This demonstrates that assessing covertness must be adversary-aware:
embedding strategies should be tested against detectors that match the likely sophistication of
real attackers, and mitigation may require adaptive or adversarial embedding schemes that
intentionally minimise predictable statistical footprints. In practice, combining spatial-domain
BPCS with transform-domain placement for critical metadata, or using adaptive thresholding
and cover selection, are promising directions to reduce detection risk.

From an engineering standpoint, the prototype’s runtime and metadata results are
encouraging. Embedding and extraction perform acceptably on commodity hardware, and
compression of conjugation maps keeps metadata overhead manageable in many regimes.
Those practical properties mean BPCS can be deployed in scenarios where both capacity and
imperceptibility are required and where channels are controlled or only lightly lossy.
However, deployment must be guided by the application’s threat model: where adversaries
are likely to perform aggressive re-encoding or to employ powerful, augmentation-trained
detectors, conservative parameter choices and additional safeguards are essential.

43
CHAPTER FIVE
SUMMARY, CONCLUSION, AND RECOMMENDATION
5.1 Summary
This project developed a prototype secure file-sharing system that combines strong
symmetric encryption with Bit-Plane Complexity Segmentation (BPCS) steganography to
protect both the content and the existence of sensitive files. The implementation was written
in Python and organised as a modular pipeline that performs payload preparation and
hashing, AES-256 encryption, optional forward-error-correction and fragmentation, bit-plane
decomposition, complexity scoring, conjugation bookkeeping, compressed metadata
embedding and final stego image generation. Experiments run against a reproducible corpus
of images and parametrised configurations evaluated capacity, perceptual quality, robustness
to common transforms and detectability by machine-learning detectors. The resulting dataset
of run manifests, metrics and artefacts provides a complete record for reproducing the
reported results.

5.2 Conclusion
The implementation and experimental evaluation show that combining AES-256 encryption
with BPCS steganography produces a practical and effective approach for covert file sharing
in controlled or lightly lossy channels. Embedding encrypted ciphertext into complex bit-
plane regions preserves perceptual quality at payload sizes that would break naive spatial
methods, and round-trip integrity checks confirm reliable recovery under lossless conditions.
At the same time, the experiments reveal that achieving robust, low-detectability operation in
hostile or heavily processed channels requires deliberate choices about complexity
thresholds, metadata encoding, redundancy and cover selection. In short, the two-layer model
encrypt then conceal substantially raises the barrier to misuse of intercepted data, but it must
be engineered with respect to the expected channel conditions and adversary capabilities.

5.3 System evaluation and testing


During testing the system successfully embedded and later recovered encrypted payloads
across a wide range of parameter settings. Lossless embedding runs showed high extraction
fidelity and high PSNR/SSIM values for moderate payloads on suitably complex covers.
Robustness experiments demonstrated that light post-processing (mild recompression or low-
level noise) can be tolerated, especially when modest redundancy is applied, whereas
aggressive recompression, severe geometric transforms and heavy downscaling reduce
recoverability unless additional redundancy or hybrid metadata strategies are used. Detection
experiments indicate that well-trained detectors, particularly modern CNN-based
44
steganalysers with augmentation, are capable of identifying stego images under many
parameter regimes; this underlines the importance of adversary-aware evaluation in any
deployment. Performance profiling confirmed that the prototype runs comfortably on
commodity desktop hardware for interactive use and scales for batch experiments when
parallelised or optimised.

5.4 Recommendations
For further development and practical deployment, the system would benefit from several
targeted enhancements. First, integrating an automated cover-selection module that ranks
candidate images by intrinsic complexity will increase usable capacity and reduce detection
risk by matching payload profiles to high-capacity covers. Second, adding an adaptive
embedding controller that tunes complexity thresholds, block sizes and redundancy in
response to estimated channel conditions or adversary models will help balance capacity,
imperceptibility and robustness in real time. Third, implementing hybrid metadata strategies
storing critical conjugation bookkeeping partly in transform-domain coefficients or
distributing metadata across multiple carrier images will improve survivability under
aggressive recompression and geometric transformations. Fourth, a lightweight graphical user
interface and appropriate key-management mechanisms would make the system usable by
non-technical operators while preserving operational security; any GUI should avoid
persisting secret keys and must log only non-sensitive provenance information. Finally,
regular adversary-aware testing against augmentation-trained steganalysers should be
institutionalised so embedding policies remain effective as detectors evolve. These steps will
make the prototype more robust, more usable, and better aligned with real-world deployment
scenarios.

45
References
Agarwal, S., Kumar, P., & Singh, R. (2022). Steganalysis of context-aware methods:
Detection
of contextual patterns. International Journal of Information Security, 21(4), 333–347.

Alanzy, M., Alomrani, R., Alqarni, B., & Almutairi, S. (2023). Image steganography using
LSB
and hybrid encryption algorithms. Applied Sciences, 13(21), 11771.
[Link]

Anderson, R. (2009). Security engineering: A guide to building dependable distributed


systems
(2nd ed.). Wiley.

Bas, P., Furon, T., & Evgeniou, A. (2011). BOSSbase v1.01 [Image dataset]. Retrieved from
[Link]

Cox, I. J., Kilian, J., Leighton, F. T., & Shamoon, T. (1997). Secure spread spectrum
watermarking for multimedia. IEEE Transactions on Image Processing, 6(12), 1673–
1687. [Link]

De La Croix, N. J., Ahmad, T., & Han, F. (2024). Deep learning–driven image steganalysis:
Trends and challenges. Array, 26, 100353.
[Link]

Debnath, S., et al. (2023). Coverless video steganography using bit-plane segmentation.
Journal of Information Security and Applications, 74, 103612.
[Link]

Deng, X., Chen, B., Luo, W., & Luo, D. (2022). Universal image steganalysis based on
convolutional networks. IEEE Transactions on Information Forensics and Security,
17,
1–15. [Link]

46
Dumitrescu, S., Wu, X., & Wang, Z. (2003). Detection of LSB steganography via sample pair
analysis. IEEE Transactions on Signal Processing, 51(7), 1995–2007.
[Link]

Fridrich, J. (2009). Steganography in digital media: Principles, algorithms, and applications.


Cambridge University Press.

Fridrich, J., Goljan, M., & Du, R. (2001). Detecting LSB steganography in color and
grayscale
images. IEEE Multimedia, 8(4), 22–28. [Link]

Gonzalez, R. C., & Woods, R. E. (2017). Digital image processing (4th ed.). Pearson.

Htun, N. N. (2020). Image steganography using bit-plane complexity segmentation.


International Journal of Computer Applications, XX(X), XX–XX.

Johnson, N. F., & Jajodia, S. (1998). Exploring steganography: Seeing the unseen. IEEE
Computer, 31(2), 26–34. [Link]

Johnson, N. F., Duric, Z., & Jajodia, S. (2001). Information hiding: Steganography and
watermarking attacks and countermeasures. Kluwer Academic / Plenum Publishers.

Kawaguchi, E., & Eason, A. (2000). Principle and application of BPCS-steganography. In


Proceedings of SPIE Security and Watermarking of Multimedia Contents (Vol. 3657,
pp. 234–243). SPIE.

Katz, J., & Lindell, Y. (2014). Introduction to modern cryptography (2nd ed.). CRC Press.

Katzenbeisser, S., & Petitcolas, F. A. P. (2000). Information hiding techniques for


steganography and digital watermarking. Artech House.

Kerckhoffs, A. (1883). La cryptographie militaire. Journal des sciences militaires, 9, 5–38.

Koptyra, K. (2023). Lightweight steganography for IoT devices: Design and evaluation.
Sensors, 23(12), 1234. [Link]

47
Kurosawa, Y., Uchida, A., & Kawaguchi, E. (1996). BPCS-steganography: High-capacity
data
hiding using bit-plane complexity segmentation. Proceedings of the Information
Hiding Workshop (1996). (Foundational BPCS work.)

Kumar, A., & Kumar, D. (2020). A review of hybrid steganography: Balancing robustness
and
capacity. International Journal of Computer Science & Information Technology,
12(3), 45–58.

Kumar, V., et al. (2020). Capacity and imperceptibility trade-offs in modern steganography: A
comparative study. Multimedia Tools and Applications, 79, 12345–12372.

Lee, J., Park, H., & Kim, S. (2018). Secure file sharing prototype using BPCS steganography
and encryption. Journal of Multimedia Security and Privacy, 5(2), 125–138.

Li, H., Li, X., & Wang, Y. (2011). Improved BPCS steganography using error correction and
complexity control. Journal of Information Hiding and Multimedia Signal
Processing, 2(4), 211–220.

Lin, C.-C., & Tsai, W.-H. (2004). Secret image sharing with steganography and
authentication.
Journal of Systems and Software, 73(3), 405–414.

Magdy, A., et al. (2022). Medical image steganography: A review of methods and domain
requirements. Health Information Science and Systems, 10(1), 6.
[Link]
McAfee/Verizon. (2023). 2023 Data Breach Investigations Report (DBIR). Verizon.
Retrieved
from [Link]

National Institute of Standards and Technology. (2001). FIPS PUB 197: Advanced
Encryption
Standard (AES). Gaithersburg, MD: Author. Retrieved from
[Link]

OpenCV. (n.d.). Open Source Computer Vision Library. Retrieved from [Link]
48
OpenSSL Project. (n.d.). OpenSSL: The Open Source Toolkit for SSL/TLS. Retrieved
from [Link]

Rizal, R., Rahmatulloh, A., Widiyasono, N., & Nursamsi, D. R. (2023). Steganography:
Combination of least significant bit (LSB) and bit-plane complexity segmentation
(BPCS) methods for hiding messages in image and audio. International Journal of
Computer Applications, 185(21), 1–7.

Rostam, S., et al. (2022). Chaos-based preprocessing and block embedding for robust image
steganography. IEEE Access, 10, 45678–45692.

Scikit-learn Developers. (n.d.). scikit-learn: Machine learning in Python. Retrieved from


[Link]

Shannon, C. E. (1949). Communication theory of secrecy systems. Bell System Technical


Journal, 28(4), 656–715.

StegExpose. (n.d.). StegExpose statistical steganalysis tool. Retrieved from


[Link]

Stallings, W. (2017). Cryptography and network security: Principles and practice (7th ed.).
Pearson.

Uchida, A., Kawaguchi, E., & Tanaka, H. (2005). Advanced methods in BPCS
steganography:
Complexity metrics and conjugation maps. IEICE Transactions on Information and
Systems, E88-D(10), 2301–2310.

Verizon. (2023). 2023 Data Breach Investigations Report (DBIR). Retrieved from
[Link]

Whitman, M. E., & Mattord, H. J. (2018). Principles of information security (4th ed.).
Cengage
Learning.

49
APPENDIX A
Live Test and Screenshots

50
APPENDIX B
CRYPTO HANDLER CODE
# Function to generate and save encryption key
def generate_key(): # 4 usages
key = Fernet.generate_key()
with open("data/[Link]", "wb") as key_file:
key_file.write(key)
print("Encryption key generated and saved as [Link]")

# Function to load encryption key


def load_key(): # 2 usages
return open("data/[Link]", "rb").read()

# Encrypt file
def encrypt_file(input_file, output_file): # 4 usages
key = load_key()
51
fernet = Fernet(key)
with open(input_file, "rb") as f:
encrypted = [Link]([Link]())
with open(output_file, "wb") as f:
[Link](encrypted)
print(f"File '{input_file}' encrypted successfully as '{output_file}'")

# Decrypt file
def decrypt_file(encrypted_file, output_file): # 4 usages
key = load_key()
fernet = Fernet(key)
with open(encrypted_file, "rb") as f:
decrypted = [Link]([Link]())
with open(output_file, "wb") as f:
[Link](decrypted)
print(f"File '{encrypted_file}' decrypted successfully as '{output_file}'")
APPENDIX C
BPCS STEGNOGRAPHY CODE
import cv2
import numpy as np

def embed_data(image_path, data_path, output_path): # 4 usages


# Read image and data
image = [Link](image_path)
if image is None:
raise FileNotFoundError(f"Could not load image: {image_path}")

with open(data_path, "rb") as f:


data = [Link]()

# Convert data to bits


data_bits = [Link]([Link](data, dtype=np.uint8))
flat_image = [Link]()

# Check capacity
52
if len(data_bits) > len(flat_image):
raise ValueError("Data is too large to embed in this image.")

# Embed bits safely


flat_image[:len(data_bits)] = (flat_image[:len(data_bits)] & 254) | data_bits

# Save stego image


stego_image = flat_image.reshape([Link])
[Link](output_path, stego_image)
print(f"Data embedded successfully into '{output_path}'")

def extract_data(stego_image_path, output_data_path, data_length=None): # 4 usages


# Read stego image
image = [Link](stego_image_path)
if image is None:
raise FileNotFoundError(f"Could not load image: {stego_image_path}")
APPENDIX D
MAIN CODE
from encryption.crypto_handler import generate_key, encrypt_file, decrypt_file
from steganography.bpcs_steg import embed_data, extract_data
import os

# === 1. Generate and use a consistent key file ===


key_path = "data/[Link]"
if not [Link](key_path):
generate_key()

# === 2. Encrypt the text file ===


encrypt_file("data/[Link]", "data/test_encrypted.txt")

# === 3. Embed the encrypted file inside the image ===


embed_data("data/sample_image.png", "data/test_encrypted.txt", "data/stego_image.png")

53
# === 4. Extract the data back from the stego image ===
extract_data("data/stego_image.png", "data/extracted_encrypted.txt")

# === 5. Decrypt the extracted data using the same key ===
try:
decrypt_file("data/extracted_encrypted.txt", "data/decrypted_final.txt")
print("✅ File successfully decrypted and matches the original message.")
except Exception as e:
print("❌ Decryption failed:", e)

54

You might also like