ROBOTIC
S
Robotic
s
• Word “ROBOT” was coined by “Karel Capek(Czech Novelist)” in
1920. ROBOT in Czech means Worker or a Servant.
• Robot is a reprogrammable, multifunctional manipulator
designed to move materials, parts, tools, or specialized devices
through variable programmed motions for the performance of a variety
of tasks.
• Variable programmed motions : Refers to a control strategy that
allows for dynamic adjustment of a machine's motion during operation.
• Definition: Robotics is a field of engineering and computer science that
deals with the design, construction, operation, and application of robots.
Type
s
Legged Robots: Legged robots are a type
Manipulator: A manipulator robot is a of mobile robot which use articulated limbs,
robotic system that performs physical such as leg mechanisms, to provide
tasks without direct human contact. locomotion. They are more versatile than
They are made up of mechanical, wheeled robots and can traverse many
electrical, and electronic components different terrains, though these advantages
that are programmed to carry out require increased complexity and power
repetitive tasks. consumption.
Wheeled Robot: Robots that move Autonomous Robot: Refers to
on the ground with the use of their robots capable of operating and
wheels. This design is frequently performing tasks independently,
preferred because it is much simpler without direct human intervention.
than legged designs and design, They can move around in a more or
production and programming less vast environment, at the heart of
processes for moving on flat terrain is work areas and congested universes.
easier.
Industrial Robot: An industrial robot Remote Controlled Robot: The
is one that has been developed to remote-controlled robot uses cameras to
automate intensive production tasks inspect line conditions and discover
such as those required by a irregularities, while also employing a
constantly moving assembly line. As smart navigation system to pinpoint
large, heavy robots, they are placed locations in need of attention. Does
in fixed positions within an industrial complicated tasks using human as a
plant and all other worker tasks and guide. Example : NASA robot designed to
processes revolve around them. explore volcanoes.
Aquatic Robot: An underwater robot is a
technologically advanced machine
designed for submerged operations. These
robots range from remotely operated Consumer Robot: Robots that are
vehicles (ROVs) to autonomous designed for personal or domestic use.
underwater vehicles (AUVs), each They are usually small, portable, and
equipped with specialized functionalities to relatively simple to operate. Common
perform tasks beneath the water surface. examples of consumer robots include
vacuum cleaners, floor cleaners, and
window cleaners.
Delivery Robot: A delivery robot is Drones: A flying robot that can be
an autonomous robot that provides remotely controlled or fly
"last mile" delivery services. An autonomously using software-
operator may monitor and take controlled flight plans in its embedded
control of the robot remotely in systems, which work in conjunction
certain situations that the robot with onboard sensors and a Global
cannot resolve by itself such as when Positioning System (GPS). Drones are
it is stuck in an obstacle. most often associated with the military.
Educational Robot: Through play, Exoskeleton Robot: A robotic
educational robots help children exoskeleton is a mechanical device worn
develop one of the basic cognitive skills by a human being for certain purposes or
of mathematical thinking at an early applications. especially designed to help
age: computational thinking. That is, people who have suffered from diseases
they help develop the mental process such as stroke, for example, or who have
we use to solve problems of various some kind of injury and need help to walk
kinds through an orderly sequence of again or strengthen their muscles.
actions.
Humanoid Robot: They can handle
lifting heavy loads, toxic substances and
repetitive tasks. This has helped
companies to prevent many accidents,
also saving time and money. In the
medical field robots are used for intricate
surgeries such as prostate cancer surgery.
• Actuators: Actuators are the components that
convert energy into mechanical motion. They
are the "muscles" of the robot, responsible for
producing movement and force.
• Power Supply: The power supply provides the
electrical energy needed for the robot to
operate. It can be a battery, a power adapter,
or a direct connection to the electrical grid. The
type of power supply used depends on the
robot's application and the need for portability
or continuous power.
• Electric Motors: They convert electrical
energy into mechanical energy, causing
rotation or linear motion. They are used
in a wide range of applications, from
simple toy cars to complex industrial
robots.
• Pneumatic Air Muscles: These use
compressed air to produce linear motion.
They are often used in applications
where a soft, compliant force is needed,
such as in robotic prosthetics or soft
robotics.
• Muscles Wire: The muscles wire, also known
as a control wire or signal wire, connects the
controller to the actuators. It carries electrical
signals that tell the actuators when to move
and how fast. The type of wire used depends
on the specific requirements for electrical
conductivity, flexibility, and durability.
• Piezo Motors and Ultrasonic Motors:
These tiny motors use the piezoelectric effect
to generate vibrations, which can be
converted into linear or rotational motion.
They are often used in precision applications,
such as in microscopes or laser cutting
machines and industrial robots.
• Sensors: These are like the
robot's eyes and ears. They help
the robot see, hear, and feel its
surroundings. For example, a
camera can help the robot see
objects, while a touch sensor can
help it feel if something is in its
way.
The Asimov Laws of Robotics are like rules
for robots.
• Don't hurt humans.
• Do what humans tell you, unless it hurts a
human.
• Protect yourself, but don't hurt humans.
Applications of
Robotics
Robotics in Defense Sectors:
• Unmanned Aerial Vehicles (UAVs or Drones): Used for surveillance,
reconnaissance, target acquisition, and even strikes.
• Autonomous Ground Vehicles (AGVs): Employed for logistics,
transportation, and potentially combat operations.
• Explosive Ordnance Disposal (EOD) Robots: Handle hazardous tasks
like defusing bombs and explosive devices.
• Robotic Exoskeletons: Enhance human strength and endurance for
soldiers, aiding in tasks like carrying heavy equipment.
Robotics in Medical Sector:
• Surgical Robots: Assist surgeons in performing minimally invasive
procedures with greater precision and accuracy.
• Rehabilitation Robots: Help patients with physical therapy by providing
assistance and feedback.
• Medical Imaging Robots: Improve the accuracy and efficiency of
diagnostic procedures.
• Pharmaceutical Manufacturing Robots: Automate tasks like filling
prescriptions and preparing medications.
Robotics in Industrial Sector:
• Assembly Line Robots: Perform repetitive tasks like welding, painting, and
assembly.
• Material Handling Robots: Transport materials from one place to another,
load and unload machines.
• Quality Control Robots: Inspect products for defects and ensure quality
standards are met.
• Hazardous Materials Handling Robots: Handle dangerous substances
without putting humans at risk.
Robotics in Entertainment Sector:
• Theme Park Robots: Provide interactive experiences and entertainment
for visitors.
• Film and Television: Used to create special effects and perform stunts.
• Companion Robots: Designed to provide companionship and
entertainment for individuals.
Robotics in Mining Industry:
• Mining Robots: Used for tasks like drilling, blasting, and transporting
materials.
• Autonomous Haulage Vehicles (AHVs): Transport materials underground
without human intervention.
• Inspection Robots: Inspect mines for hazards and ensure safety.
Advantag
es
• Increased efficiency and productivity: Robots can perform tasks faster,
more accurately, and with less downtime than humans.
• Improved safety: Robots can handle dangerous or hazardous tasks,
reducing the risk of injury to human workers.
• Reduced costs: Robots can reduce labor costs and improve product
quality, leading to lower overall costs.
• Increased flexibility: Robots can be easily reprogrammed to perform
different tasks, making them adaptable to changing needs.
• Improved precision and accuracy: Robots can perform tasks with greater
precision and accuracy than humans.
• Consistent quality: Robots can produce consistent results, reducing
variability and improving product quality.
• 24/7 operation: Robots can work continuously without breaks, increasing
production capacity.
Disadvantag
• Highes
initial cost: The development, purchase, and maintenance of robots can
be expensive.
• Job displacement: Automation through robotics can lead to job losses in
certain industries.
• Dependency: Overreliance on robots can make systems vulnerable to failures
or disruptions.
• Ethical concerns: The use of robots raises ethical questions regarding their
decision-making, and potential harm.
• Technical limitations: Current robotics technology may have limitations in
terms of intelligence, and adaptability.
• Environmental impact: The production and operation of robots can have
environmental impacts, such as energy consumption and waste generation.
• Security risks: Robots can be vulnerable to hacking and other security
threats.
Computer
• Vision
Computer vision is one of the most important field of artificial intelligence and
computer science engineering that deals with teaching computers to interpret and
understand the world through images and videos. It's like giving computers eyes
and the ability to make sense of what they see.
• it also helps to take appropriate actions and most popular recommendations based
on the extracted information.
• In other words,
⚬ Computer vision is a field of artificial intelligence (AI) that enables computers
to understand, analyze, and interpret visual information from the world, such
as images and videos. It seeks to automate tasks that the human visual
system can do, like recognizing objects, understanding scenes, tracking
movements, and making decisions based on visual inputs. Essentially,
computer vision allows machines to see and understand the visual world in a
way that is similar to how humans do.
Applications of Computer Vision
• Facial Recognition
⚬ Identifies individuals: Compares facial features to a database to recognize
people.
⚬ Used in: Security systems, access control, social media platforms.
• Healthcare and Medicine
⚬ Medical image analysis: Detects abnormalities in X-rays, MRIs, and CT
scans.
⚬ Surgical assistance: Guides robotic surgical systems for precise procedures.
• Self-Driving Vehicles
⚬ Perceives environment: Recognizes objects like pedestrians, traffic signs,
and other vehicles.
⚬ Makes decisions: Enables cars to navigate roads, avoid obstacles, and
ensure safety.
• Optical Character Recognition (OCR)
⚬ Extracts text: Reads text from images or documents.
⚬ Used in: Document scanning, digitization, and data entry.
• Machine Inspection
⚬ Quality control: Inspects products for defects and ensures quality
standards.
⚬ Manufacturing: Monitors production processes and identifies issues.
• Retail
⚬ Product recognition: Identifies products in images or videos for visual
search and inventory management.
⚬ Customer behavior analysis: Analyzes customer behavior to improve store
layout and product placement.
• 3D Model Building
⚬ Creates 3D models: Constructs 3D representations of objects from images
or videos.
⚬ Used in: Architecture, gaming, and virtual reality.
• Medical Imaging
⚬ Analyzes medical images: Detects abnormalities in X-rays, MRIs, and CT
scans.
⚬ Assists in diagnosis: Provides valuable information for medical
• Automotive Safety
⚬ Advanced driver assistance systems (ADAS): Enables features like lane
departure warning, adaptive cruise control, and pedestrian detection.
• Surveillance
⚬ Monitors environments: Analyzes video footage to detect suspicious
activity or identify individuals.
• Fingerprint Recognition and Biometrics
⚬ Verifies identity: Compares fingerprints or other biometric data to
authenticate individuals.
⚬ Used in: Security systems, access control, and law enforcement.
Computer Vision Techniques
1. Image Classification
• Definition: Image classification assigns a label or category to an entire image
based on its content. It answers the question: What is in this image?
• How It Works:
⚬ Uses Convolutional Neural Networks (CNNs) to automatically learn features
from images.
⚬ During training, the model learns to recognize patterns and features, such
as edges, textures, and more complex structures.
⚬ After training, the model can classify new images based on learned
features.
• Common Models: AlexNet, ResNet, EfficientNet.
• Use Cases: Medical image diagnosis (e.g., classifying X-rays as normal or
abnormal), categorizing photos (e.g., nature, urban, people), and product
categorization in e-commerce.
Image Classification Process
a. Pre-processing
• What It Is: Preparing the images so that they are ready for the classification
model.
• Why It’s Important: Raw images can have various issues like different sizes,
colors, or noise (unnecessary information). Pre-processing ensures that images
are consistent in quality and format.
b. Data Cleaning
• What It Is: Removing or fixing images that are incorrect, irrelevant, or poor
quality.
• Why It’s Important: The model learns from the data you provide, so cleaner
data leads to better learning.
c. Object Detection
• What It Is: Identifying where objects are in an image by drawing boxes
around them.
• Why It’s Important: Before you classify an object, you may need to know its
location in the image. This is especially useful when there are multiple objects.
d. Object Recognition
• What It Is: Recognizing the type of object present inside the detected areas
of the image.
• Why It’s Important: Helps differentiate between objects that might look
similar. For example, recognizing that an object is specifically a "dog" and not
a "cat."
e. Object Classification
• What It Is: Assigning a category or label to the detected object, like "cat,"
"dog," or "car."
• Why It’s Important: This is the core step where the system decides what
category the object belongs to, making sense of the image.
f. Connecting to an AI Workflow
• What It Is: Integrating the image classification model into a larger AI-based
system to make it work automatically in real-life scenarios.
• Why It’s Important: It allows the model to function in real-world
applications, like an app or a website, and makes the results available for
practical use.
Summarized process:
• Pre-processing prepares and enhances the images.
• Data Cleaning ensures that the images are of good quality and
correctly labeled.
• Object Detection finds the objects within an image.
• Object Recognition identifies what kind of object is present.
• Object Classification assigns the specific category or label to the
object.
• Connecting to an AI Workflow integrates the entire process into a real-
world application, enabling it to operate automatically and provide
insights to users.
2. Object Detection
• Definition: Object detection identifies and locates multiple objects within an
image, providing bounding boxes around each object. It answers: What objects
are present, and where are they?
• How It Works:
⚬ Combines image classification with localization. CNNs extract features, and
bounding boxes are predicted for each detected object.
⚬ Two-stage detectors: Models like Faster R-CNN first propose regions of
interest (ROIs) and then classify each region.
⚬ Single-stage detectors: YOLO (You Only Look Once) and SSD (Single Shot
Detector) predict objects and bounding boxes directly in a single pass.
• Common Models: YOLOv3/v4/v5, Faster R-CNN, RetinaNet.
• Use Cases: Detecting pedestrians and vehicles for self-driving cars, tracking
objects in video surveillance, and identifying products in retail applications.
3. Semantic Segmentation
• Definition: Semantic segmentation assigns a category label to each pixel in
an image, meaning all pixels belonging to the same class share a label. It
answers: What category does each pixel belong to?
• How It Works:
⚬ Uses fully convolutional networks (FCNs) or models like U-Net that replace
fully connected layers with convolutional layers to produce pixel-level
predictions.
⚬ Output is often a heatmap where each pixel’s value corresponds to a
category.
⚬ Techniques like dilated convolutions and encoder-decoder architectures are
used for better resolution.
• Common Models: U-Net, DeepLab (DeepLabv3, DeepLabv3+), PSPNet.
• Use Cases: Medical image analysis (e.g., segmenting tumors from MRI scans),
autonomous driving (e.g., recognizing roads, sidewalks, and sky), and image
editing.
4. Instance Segmentation
• Definition: Extends semantic segmentation by identifying each object
instance separately. It answers: Which individual object does each pixel belong
to?
• How It Works:
⚬ Combines object detection and semantic segmentation. After detecting
objects, it uses pixel-level masks to segment each detected object.
⚬ Uses a two-stage approach: first, it generates region proposals (ROIs), then
refines them to produce masks.
⚬ Mask R-CNN extends Faster R-CNN by adding a branch for predicting
segmentation masks.
• Common Models: Mask R-CNN, PANet, Cascade Mask R-CNN.
• Use Cases: Differentiating between multiple instances of cars or people in
crowded scenes, analyzing cell structures in biological images, and content
creation for augmented reality (AR).
5. Panoptic Segmentation
• Definition: Combines the strengths of both semantic segmentation and
instance segmentation. It assigns a label to each pixel, distinguishing both
instance-specific objects (like cars) and stuff categories (like sky, road).
• How It Works:
⚬ Uses a unified approach that combines instance segmentation methods for
objects and semantic segmentation methods for background classes.
⚬ Produces two outputs: one for segmenting individual objects (instance
segmentation) and another for segmenting broader categories (semantic
segmentation).
⚬ The results are then merged to create a final panoptic map.
• Common Models: Panoptic FPN (Feature Pyramid Network), Panoptic-
DeepLab.
• Use Cases: Autonomous driving, robotics navigation (understanding and
interacting with environments), and complex scene understanding in video
analysis.
6. Keypoint Detection
• Definition: Identifies specific, critical points on objects or human bodies, such
as joints (elbows, knees), facial landmarks (eyes, mouth), or object landmarks
(corners of an object).
• How It Works:
⚬ Uses CNNs to detect features and output a set of coordinates
corresponding to the key points.
⚬ Human pose estimation is a common application, using models that detect
body parts and construct a "skeleton."
⚬ Multi-stage models refine the initial predictions through stages.
• Common Models: OpenPose, DeepPose, HRNet.
• Use Cases: Sports analytics (tracking player movements), gesture
recognition, facial recognition, and human-computer interaction (HCI).
7. Person Segmentation
• Definition: Focused specifically on separating human figures from the
background in an image at a pixel level.
• How It Works:
⚬ Uses models similar to those in semantic segmentation but tuned to detect
human silhouettes.
⚬ Can leverage pretrained models like DeepLab or Mask R-CNN with a focus
on detecting human features.
⚬ Often used in real-time applications with optimized architectures for fast
inference.
• Use Cases: Virtual backgrounds in video calls (e.g., Zoom or Teams), AR
effects (placing virtual clothes on users), and security monitoring.
8. Depth Perception
• Definition: Estimates the distance of objects from the camera, creating a
depth map where each pixel corresponds to a distance.
• How It Works:
⚬ Stereo Vision: Uses two cameras to capture different perspectives of a
scene and triangulates depth from the disparity between images.
⚬ Monocular Depth Estimation: Uses a single image with deep learning
models to predict depth based on learned features and patterns.
⚬ LiDAR: Uses laser sensors to directly measure distances and create precise
3D maps.
• Common Models: Monodepth, DPT (Dense Prediction Transformer).
• Use Cases: Autonomous driving (3D scene reconstruction), AR applications,
robot navigation, and 3D photography.
9. Image Captioning
• Definition: Automatically generates a descriptive textual caption for an
image, interpreting both objects and their relationships.
• How It Works:
⚬ Combines CNNs for feature extraction with RNNs (Recurrent Neural
Networks) or Transformers for generating language descriptions.
⚬ The CNN extracts a high-level representation of the image, which is then
used as input for an RNN or Transformer that generates a sequence of
words.
⚬ Attention mechanisms help focus on relevant parts of the image while
generating words.
• Common Models: Show and Tell, Show, Attend, and Tell, Transformer-based
models like ViLBERT.
• Use Cases: Assisting visually impaired users, automated image tagging,
storytelling from photos, and content recommendation.
10. 3D Object Reconstruction
• Definition: Reconstructs a 3D model of an object or scene from 2D images or
videos.
• How It Works:
⚬ Structure-from-Motion (SfM): Uses multiple 2D images from different angles
to estimate 3D points and camera positions.
⚬ Multi-View Stereo (MVS): Uses depth estimation from multiple images to
build dense point clouds and reconstruct surfaces.
⚬ Deep Learning Approaches: Models like NeRF (Neural Radiance Fields) learn
to predict 3D structures and appearance from a collection of 2D images.
• Common Models: NeRF, COLMAP (for SfM), MVSNet.
• Use Cases: VR/AR content creation, cultural heritage preservation, CAD
model generation, and gaming.
Thank you
for listening!