0% found this document useful (0 votes)
11 views38 pages

Storage Management Techniques Explained

Uploaded by

andreainana30
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views38 pages

Storage Management Techniques Explained

Uploaded by

andreainana30
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lesson 5

Storage Management

Storage management refers to the software and processes that improve the
performance of data storage resources in a computer system.

Application of Storage management techniques:

1. primary storage
2. backup storage
3. archived storage

Primary storage holds actively or frequently accessed data.

Backup storage holds copies of primary storage data for use in disaster
recovery.

Archived storage holds outdated or seldom-used data that must be retained for
compliance or business continuity.

To manage the storage in Windows 10 OS:


1. Open Settings

2. Click System
3. Click Storage Category on the left side of the window to view the computer
storage

Under the storage section, we can click a hard drive to see what is taking up
space on the drive.

In the storage usage screen, the items are broken down into categories. We can
click any category shown to see items wherein we want to remove from the hard
drive or not.
Storage management also includes 10 processes which help businesses store more
data on existing hardware, speed up data retrieval, prevent data loss, meet
data retention requirements, and reduce IT expenses. These processes are:

1. Network virtualization
2. Replication
3. Mirroring
4. Security
5. Data Compression
6. Deduplication
7. Traffic analysis
8. Process automation
9. Storage provisioning
[Link] management

Network Virtualization

A Network is a collection of computers, servers, mainframes, network devices,


peripherals, or other devices connected to allow data sharing.
An example of a network is the Internet, which connects millions of people all
over the world.

Example image of a home network with multiple computers and other network
devices all connected.

A Virtualization in computing is the process of simulating hardware and


software in a virtual (software) environment such as computers, operating
system, storage, and networking. And it does in a virtual or software
environment.

A simulation is the imitation of the operation of a real-world process or


system over time.

In the traditional way a business operates is by having one machine for one
application.

Example: A business has 3 servers.

• One server is assigned to run an email service and that server has MS
Windows as its OS.

• 2nd server is used for running a website and it is using a Linux as its
OS.

• 3rd server is used for running a database and it is using Unix as its OS.

So, one machine is running with one application. Those 3 servers are running
3 different OS.

But instead of having 3 servers running one application each, what if just one
server could do the job just as good and do more efficiently. So, basically
one server would take the place of the three and run all the applications and
even run their different Operating Systems.
So, this is what Virtualization does.

Virtualization is basically consolidating all of the physical servers with


their different Operating Systems and applications and running them on just
one physical server in a virtual environment.

One server is running three VMs or virtual machines wherein it runs:

• 3 different applications: Email, Webserver services, and Databases


• 3 different Operating systems: Windows, Linux, and Unix

And they are running side by side on one machine.

But not only the applications, but it’s also running the different Operating
systems side by side and it is doing this all by using software and it does it
so well when the user interact with a virtual server they would interact the
same way as if they were still on multiple physical servers. They won’t be
able to tell the difference.

The software that creates and runs the Virtualization is called Hypervisor.

A hypervisor is what allows one machine to run multiple virtual machines. It


allocates and controls the sharing of machine’s resources such as storage
space, RAM, CPUs, and so on.

Hypervisor comes into 2 different types:

1. Type 1
2. Type II
A Type 1 hypervisor is installed on empty, bare metal hardware. Meaning that
there are no existing OS or any other software installed on the machine.

A Type II hypervisor is installed and runs on top of an existing OS, such as


MS Windows, Mac OS, Unix, Linux, and so on. The OS sits in between the machine
and the hypervisor.

Type 1 hypervisors are the most common because they are used in Enterprise
Data Centers.

Data Centers are a large group of networked computer servers typically used by
organizations for the remote storage, processing, or distribution of large
amounts of data.

A data center can take on various tasks, from simple processes such as storage
for data backup to the storage and execution of basic IT processes. Some data
centers act as a connection point that bring together different colocation
environments.

A data center's design is based on a network of computing and storage resources


that enable the delivery of shared applications and data. The key components
of a data center design include routers, switches, firewalls, storage systems,
servers, and application-delivery controllers.

Some examples of Type 1 hypervisors are:

VMware ESXi, also called VMware ESXi Server, is a bare-metal hypervisor


developed by VMware for vSphere. ESXi is one of the primary components in the
VMware infrastructure software suite.

ESXi is a Type 1 hypervisor, meaning it runs directly on system hardware


without the need for an OS. Type 1 hypervisors are also referred to as bare-
metal hypervisors because they run directly on hardware. Hypervisors help run
multiple VMs efficiently on a physical server.
Citrix XenServer is an open source server virtualization platform based on the
Xen hypervisor. Citrix also offers a supported version that you can purchase,
with two options: Standard and Enterprise.

Microsoft Hyper-V, codenamed Viridian, and briefly known before its release as
Windows Server Virtualization, is a native hypervisor; it can create virtual
machines on x86-64 systems running Windows.

Starting with Windows 8, Hyper-V superseded Windows Virtual PC as the hardware


virtualization component of the client editions of Windows NT (New Technology).

A server computer running Hyper-V can be configured to expose individual


virtual machines to one or more networks.

Hyper-V was first released with Windows Server 2008, and has been available
without additional charges since Windows Server 2012 and Windows 8.

A standalone Windows Hyper-V Server is free, but has a command-line interface


only. The last version of free Hyper-V Server is Hyper-V Server 2019, which is
based on Windows Server 2019.

A Type II hypervisor runs on top of an existing Operating system. These are


typically used on personal computers.

Example:

People will use a Type II hypervisor on their computer if they want to test
out new software for research purposes or to try out and test different
Operating systems.

The machine is running 2 VMs. One MS Windows and the other is Linux.

Examples of Type 2 hypervisors are:

1. Oracle VM Virtualbox
2. Microsoft Virtual PC
3. VMWare Workstation

Oracle VM VirtualBox is a cross-platform virtualization software. It allows


users to extend their existing computer to run multiple operating systems
including Microsoft Windows, Mac OS X, Linux, and Oracle Solaris, at the same
time. Designed for IT professionals and developers, Oracle VM VirtualBox is
ideal for testing, developing, demonstrating, and deploying solutions across
multiple platforms from one machine.
Microsoft Virtual PC is an x86 emulator for PowerPC Mac hosts and a
virtualization app for Microsoft Windows hosts. It was created by Connectix in
1997 and acquired by Microsoft in 2003.

VMware Workstation Pro (known as VMware Workstation until release of VMware


Workstation 12 in 2015) is a hosted (Type 2) hypervisor that runs on x64
versions of Windows and Linux operating systems (an x86-32 version of earlier
releases was available).

It enables users to set up virtual machines (VMs) on a single physical machine


and use them simultaneously along with the host machine.

Each virtual machine can execute its own operating system, including versions
of Microsoft Windows, Linux, BSD (Berkeley Software Distribution), and MS-DOS.
It is developed and sold by VMware, Inc.

Benefits of Virtualization

1. Saves money on hardware and electricity.

Because a business won’t need as many physical machines or the power that
it takes to run the machines. It can just create a virtual machine
instead.

2. Saves money on floor space

A business won’t need to purchase a lot of floor space to accommodate a


large number of machines.

3. Saves money on maintenance and management

Because by having physical machines it requires administrators to


maintain and manage a machine in case something happens. Such as a change
in the configuration, or equipment failure, or maybe even a fire.

4. Portability

VMs on a physical machine can easily be transferred to another physical


machine if needed. Like for example, if the current machine that running,
the VM is old and outdated, or if it is running out of space, those VMs
can easily be transferred to a new and more powerful machine very rapidly
which makes things very convenient.
5. Full computing capability of a machine.

And this is because today computers and servers are so powerful that most
of the time their full potential is not being used.

The software applications that they are running are not able to utilize
the full machine’s potential.

So, the majority of the machines power is not being used. But with
virtualization, it can create virtual machines.

So, it can push a machine’s capacity to its limit and take full advantage
of a machine’s capability.

6. Disaster and recovery

Virtual machines are just software files and those files can be backed
up and they can be uploaded to multiple machines. So, if a machine goes
down, the other machines will be there to take over.

How the packet moves without network virtualization by using Traditional


Network Infrastructure Approach?
NP = Network Packet

*VM1 and VM12 lies on 2 different VLAN IDs

A Virtual Local Area Network (VLAN) is any broadcast domain that is partitioned
and isolated in a computer network at the data link layer.

With the color code, we can identify that the particular VM1 lies on VLAN ID
100 and VM12 lies on VLAN ID 101.

Moving the NP from VM1 to VM12, so how this network packet moves.

In the absence of Network Virtualization, let’s see how the NP moves from VM1
to VM12.

1. The NP has to come out from the VM1 and then go to the physical network
of Server 1.
2. From the Server 1, it has to go to the uplink switch where it is installed.

On the switch, we have to configure it into VLAN ID 100 and VLAN ID 101.

If we did not configure it, the switch will not get to know to which VM
we have to forward the particular NP.

3. To move the packet from VM1 to VM12, the NP will go all the way from the
VM1 to the Server1 and from the Server1 to the Switch. The Switch will
get to know that which VLAN ID it has to pass the NP but if the VLAN ID
is different then it has to go through the router also.

But once the NP pass through the Router, it will go back to the Switch
and then going to the Server1, from the Server1 it will go to the Virtual
Switch until it landed to the VM12 virtual machine.

But if we configure the network with a Network Virtualization Infrastructure,


if we send the NP from VM1 to VM12, the NP from VM1 will go to the Virtual
Switch1. Then the Virtual Switch1 will directly transfer the NP to VM12.
What is Network Virtualization and How it is implemented?

Network Virtualization is a particular abstraction of the physical network


that allows the support for multiple logical networks running on a common
shared physical substrate and view as a container of network services.

A physical substrate is a common set of physical routers links that support


multiple logical network topologies on top of that physical infrastructure.

A Network Topology is the arrangement with which computer systems or network


devices are connected to each other.

There are various aspects of a network technologies that can be virtualized to


support network virtualization:

1. Nodes

A network node is a connection point in a communications network.

Each node is an endpoint for data transmissions or redistribution.

Virtual Machine is one example of technology used for virtualizing the


nodes.
2. Links

A Link virtualization is a basic building block for network virtualization


that allows the co-existence of different Internet protocols.

The way of virtualizing the links is through tunnels.

3. Storage

The Network storages are defined as a special type of dedicated data


storage server that includes storage devices such as disk arrays, CD /
DVD drives, tape drives or removable storage media, and embedded system
software that provide cross-platform file sharing capabilities.

A storage virtualization is "the process of presenting a logical view of the


physical storage resources to" a host computer system, "treating all storage
media in the enterprise as a single pool of storage."

A "storage system" is also known as a storage array, disk array, or filer.

A physical view of a network shows the network topology with the physical
aspects like ports, cables, racks, routers, switches, hubs, etc.

A logical view of a network shows the “invisible” elements and connections


flowing through the physical objects on the network.

Example: IP Addressing (Internet Protocol Addressing)

The Internet Protocol is a set of rules for communication over the internet,
such as sending mail, streaming video, or connecting to a website.

An IP address identifies a network or device on the internet.

An IP address has two parts:

1. Network ID
2. Host ID

Network ID comprises the first three numbers of the address, and a


Host ID comprises the fourth number in the address.

Example:

On a home network — [Link], the 192.168.1 is the network ID, and the final
number is the host ID.

The Network ID indicates which network the device is on.

The Host ID refers to the specific device on that network. (Usually your router
is .1, and each subsequent device gets assigned .2, .3, and so on.)

IT staff can create Virtual Private Networks (VPN) without changing hardware
configuration and provision virtualized services like SD-WAN (Software-Defined
Wide Area Network) and security faster, making it easier to connect users to
applications.

SD-WAN simplifies the deployment and management of Wide Area Network (WAN) and
provides secure, reliable connectivity for all employees on any devices
everywhere.

Similar to a storage unit, network virtualization and cloud services can


provide a business with options other than adding servers into their own data
center.

Data Center – Cloud and Virtualization


In general, when we are talking about cloud, we are talking about 3 things:

1. Data Centers
2. Cloud Computing or Cloud Services
3. Virtualization or Virtual Computing
Data Centers

Though Data Centers can be smaller, they are generally large facilities which
provides massive amounts of power cooling and bandwidth.

Only large organizations like Facebook or Google can afford to build their own
private data centers to provide services to their users.

Other smaller organizations lease space from a data center.

Cloud Computing or Cloud Services

Cloud Service Providers like CISCO, or Amazon Web Services (AWS) or Microsoft
Azure, offer their services out of data centers.

There are 2 basic types of clouds:

1. Public clouds
2. Private clouds

Public clouds offer services and applications to the general population.

Private clouds are intended for specific organizations or entities such as


governments and are only accessed by those private organizations.

There are different categories of cloud services:

1. SaaS (Software as a Service)


2. Paas (Platform as a Service)
3. Iaas (Infrastructure as a Service)
SaaS refers to on-demand software or a subscription model where the license
and delivery of the software happens through the cloud.

We can find SaaS in Office 365, Adobe Creative Cloud, or even computer gaming
software in which access to the software happens typically through a Web
browser.

PaaS is where the cloud service provider provides the platform like the Java
or .Net platform for a developer to develop an application. The application
involves providing data bases and tools to the developer so that they can
quickly develop an application.

IaaS refers to virtual computing that can be provided over the Internet on
demand.

IaaS includes Virtual Computing that can be provisioned, allocated, and


supplied on-demand on an as-needed basis such as:

1. Virtual servers
2. Virtualized storage
3. Virtualized networking capabilities

Virtual servers re-create the functionality of a dedicated physical servers.


It exists transparently to users as a partition space inside a physical server.

Virtualizing servers makes it easy to reallocate resources and adapt to dynamic


workloads.

Virtualized storage is the pooling of physical storage from multiple network


storage devices into what appears to be a single storage device that is managed
from a central console.

The central console is an OS window where users interact with the OS or with
a text-based console application by entering text input through the computer
keyboard, and by reading text output from the computer terminal.

It is the text entry and display device for system administration messages,
particularly those from the BIOS or boot loader, the kernel, from the init
(initialization) system and from the system logger.

Storage Virtualization is commonly used in cloud storage. It involves a host


that presents a virtual drive to guest machines.

The guest machines can be cloud based virtual machines, file shares, or physical
servers. The virtualization is done at the host level via software regardless
of the physical storage array.

Virtualized networking capabilities enables the communication between multiple


computers, virtual machines (VMs), virtual servers or other devices across
different office and data center locations.

Replication

Replication is the process of copying and maintaining database objects, such


as tables, in multiple databases that make up a distributed database system.

It includes setting up the management servers, adding storage connections and


paths, creating sessions and copy sets, setting up data replication, performing
planned maintenance, and recovering from a disaster.
Components of Replication Storage Management:

1. Administering
2. Managing management servers
3. Managing storage systems
4. Managing host systems
5. Managing logical paths
6. Setting up data replication
7. Practicing disaster recovery
8. Monitoring health and status
9. Managing security

Administering

Administrative tasks within Copy Services Manager include authorizing


users, starting and stopping services, and performing backup and recovery
operations.

Managing management servers

You can set up active and standby management servers, restore a lost
connection between the management servers, or complete a takeover on the
standby management server.

Managing storage systems

To replicate data among storage systems using Copy Services Manager, you
must add connections to the storage systems. After a storage system is
added, you can associate a location, modify connection properties, set
volume protection, and refresh the storage configuration for that storage
system.

Managing host systems

A host system is an IBM® z/OS® system that connects to storage systems to


enable certain replication features for those systems.

Managing logical paths

Logical paths define the relationship between a source logical subsystem


(LSS) and a target LSS that is created over a physical path. To configure
logical paths for IBM DS8000® use the ESS/DS Paths panel in Copy Services
Manager.

Setting up data replication

This topic describes the how to set up data replication in your


environment, including creating sessions and adding copy sets to those
sessions.

Practicing disaster recovery

You can use practice volumes to test your disaster recovery actions while
maintaining disaster recovery capability.

Monitoring health and status

There are several options within Copy Services Manager for monitoring the
health and status of sessions, storage systems, host systems, and
management servers.
Managing security

The Copy Services Manager authentication process uses a configured user


registry from either: the basic user registry (on distributed systems); an
operating system repository, such as Resources Access Control Facility
(RACF) (on the z/OS platform); or a Lightweight Directory Access Protocol
(LDAP) registry.

A distributed system is the collection of autonomous computers that are


connected using a communication network and they communicate with each
other by passing messages. The different processors have their own local
memory. It requires concurrent components, communication network and a
synchronization mechanism. A distributed system allows resource sharing,
including software by systems connected to the network.

Examples of distributed systems / applications of distributed computing:

• Intranets, Internet, WWW, email.

• Telecommunication networks: Telephone networks and Cellular


networks.

• Network of branch office computers -Information system to handle


automatic processing of orders,

• Real-time process control: Aircraft control systems,

• Electronic banking,

• Airline reservation systems,

• Sensor networks,

• Mobile and Pervasive Computing systems.

z/OS is a 64-bit operating system for IBM z/Architecture mainframes,


introduced by IBM in October 2000.

Mirroring

How to set-up a mirrored array in Windows 10 OS?

• Need 2 partitions equal in size or 2 hard drives equal in size.


There are 2 ways of doing mirroring in OS for an existing hard drive:
• Back-up
• Mirror 2 brand new hard drives
Back-up mirroring:

• Assuming that we have 2 additional hard drives


• Right click this PC
• Click Manage

• Click Disk Management


• Drive C mirroring is not recommended because the computer might not
boot.
• Mirrored Disk 1 to Disk 2

• If we open the drive, there are files on it. We can do it if the


computer has 2 partitions.

• Mirror Disk 1 to Disk 2. Whatever contents in Disk 1 will be


copied to Disk 2.
• Right click the drive that has the content and Select Add Mirror
• Select Disk 2
• Click Add Mirror
• Click Yes (which means it will turn the hard drive into dynamic
disk)

• It will take a while to synch depending how much information the


computer disk has.

• If done, we can open any drive that you want. The 2 disks content are
the same.
• If we add content, it will automatically be copied to the drive.
• We can use it as a backup. That is recommended.

Just in case the computer has 2 brand new disk drive,


• Select any of the drive
• Right click and select New Mirrored Volume

• Click Next
• Select the 2nd disk to be added to the mirror and
• Click Next

• We can change the drive letter if we want.


• Click Next

• Change the volume label and check the box in perform a quick format
• Click Next

• Finish the wizard


• The synch process is much faster because it doesn’t have any
information. Remember to click on Yes to the pop-up that will turn the
hard drive to dynamic disk.

• Once done, since both are empty, we are going to add a folder, we will
notice that it automatically add into the 2nd drive.
• Just keep in mind that this is a redundant system, not a back up
system.
• So, everything copied on the 1st drive will automatically be copied on
the 2nd drive. That is how mirroring works.
Security
Operating system security (OS security) is the process of ensuring OS
integrity, confidentiality and availability.
OS security refers to specified steps or measures used to protect the OS from
threats, viruses, worms, malware or remote hacker intrusions.

It encompasses all preventive-control techniques, which safeguard any


computer assets capable of being stolen, edited or deleted if OS security is
compromised.

OS security encompasses many different techniques and methods which ensure


safety from threats and attacks. OS security allows different applications
and programs to perform required tasks and stop unauthorized interference.

OS security approached in many ways, including adherence to the following:

1. Performing regular OS patch updates


2. Installing updated antivirus engines and software
3. Scrutinizing all incoming and outgoing network traffic through a
firewall.
4. Creating secure accounts with required privileges only (i.e., user
management).

A patch is a set of changes to a computer program or its supporting data


designed to update, fix, or improve it. This includes fixing security
vulnerabilities and other bugs, with such patches usually being called bugfixes
or bug fixes. Patches are often written to improve the functionality.

Patch management is the process of ensuring that your software and operating
systems are kept up-to-date with the latest security patches and software
updates. These updates are released by software vendors to fix bugs, patch
security vulnerabilities, and improve the overall performance of your
software.

In computing, a firewall is a network security system that monitors and


controls incoming and outgoing network traffic based on predetermined
security rules.

User management allows administrators to manage resources and organize users


according to their needs and roles while maintaining the security of IT
systems.
How to Create a local user or administrator account in Windows

You can create a local user account (an offline account) for anyone who will
frequently use your PC. The best option in most cases, though, is for everyone
who uses your PC to have a Microsoft account. With a Microsoft account, you
can access your apps, files, and Microsoft services across your devices.

If needed, the local user account can have administrator permissions; however,
it's better to just create a local user account whenever possible.

Caution: A user with an administrator account can access anything on the


system, and any malware they encounter can use the administrator permissions
to potentially infect or damage any files on the system. Only grant that level
of access when absolutely necessary and to people you trust.

As you create an account, remember that choosing a password and keeping it


safe are essential steps. Because we don’t know your password, if you forget
it or lose it, we can't recover it for you.

Create a local user account

1. Select Start > Settings > Accounts and then select Family & other
users. (In some versions of Windows, you'll see Other users.)
2. Next to Add other user, select Add account.
3. Select I don't have this person's sign-in information, and on the next
page, select Add a user without a Microsoft account.
4. Enter a user name, password, or password hint—or choose security
questions—and then select Next.

Open Settings and create another account

Change a local user account to an administrator account

1. Select Start > Settings > Accounts.


2. Under Family & other users, select the account owner name (you should see
"Local account" below the name), then select Change account type.

Note: If you choose an account that shows an email address or doesn't say
"Local account", then you're giving administrator permissions to a
Microsoft account, not a local account.

3. Under Account type, select Administrator, and then select OK.


4. Sign in with the new administrator account.

Data Compression
Data compression squeezes(reduces) data into a smaller size.
Data compression can decrease the amount of storage a file takes up.
To do compression, we have to encode data using fewer bits than the original
presentation.
For example, in a 2:1 compression ratio, a 20-megabyte (MB) file takes up 10
MB of space. As a result of compression, administrators spend less money and
less time on storage.

The image shown above is a PACMAN image with 4 pixels x 4 pixels.


The image data is typically stored as a list of pixel values.

The pixel -- a word invented from picture element -- is the basic unit of
programmable color on a computer display or in a computer image. It is the
smallest unit in a digital display.
It is a tiny dot of comprises in a computer-based images.
These are the small dots we see if you put your face too close to your
television or computer screen. Each digital image is comprised of thousands or
millions of individual pixels, each with its own color.

To know where rows end, image files have metadata, which defines properties
like dimensions.
Each pixel’s color is a combination of 3 additive primary colors: red, green
and blue.

It stores each of the values in 1 byte, giving us a range of 0 to 255 each


color.
Mixing full intensity red, green, and blue – that is 255 for all 3 values,
the result color is white.

Mixing full intensity, red and green, but no blue, but no blue (it’s 0), it
results to yellow color.

The data image has 16 pixels, and each of those pixels needs 3 bytes of color
data.
That means the data images will consume 48 bytes of storage.
16 pixels x 3 bytes = 48 bytes
The data can be compressed and pack it into a smaller number of bytes than 48
bytes.
Ways to compressed data:
1. Reduce repeated or redundant information.

The straightforward way to do reduce repeated or redundant information


is called Run-Length Encoding.

Run-Length Encoding takes advantage of the fact that there are often
runs of identical values in files.

For example, in PACMAN’s image, there are 7 yellow pixels in a row.

Instead of encoding redundant data: yellow pixel, yellow pixel, yellow


pixel, and so on, we can just say “there are 7 yellow pixels in a row”
by inserting an extra byte that specifies the length of the run, like
so:

And then we can eliminate the redundant data behind it.

To ensure that computers don’t get confused with which bytes are run lengths
and which bytes represent color, we have to be consistent on how we apply the
scheme.

3 3 3 9 9

So, we need to preface all pixels with their run-length.


In some cases, this actually adds data, but on the whole, we’ve dramatically
reduced the number of bytes we need to encode the image.

We can expand back to the original position without any degradation. A


compression technique that has this characteristic is called lossless
compression, because we don’t loss anything.

The decompressed data is identical to the original before compression, bit


for bit.
There is another type of lossless compression, where blocks of data are
replaced by more compact representations.

This is sort of like, “Don’t Forget To Be Awesome” being replaced by “DFTBA”.

To do this, we need a dictionary that stores the mapping from codes to data.

Example:

We can view our image as not just as a string of individual pixels, but as
little blocks of data.

For simplicity, we are going to use pixel pairs, which are 6 bytes long, but
blocks can be of any size.

In our example, there are only four pairings:

• White-yellow
• Black-yellow
• Yellow-yellow
• White-white
Those are the data blocks in our dictionary we want to generate compact codes
for.
What’s is interesting is that the blocks occur at different frequencies.

There are 4 yellow-yellow pairs, 2 white-yellow pairs, and 1 each of black-


yellow, and white-white.

Because Yellow-yellow is the most common block, we want that to be


substituted for the most compact representation.

On the other hand, black-yellow and white-white, can be substituted for


something longer because those blocks are infrequent.

One method for generating efficient codes is building a Huffman Tree,


invented by David Albert Huffman while he was a student at MIT in the 1950s.
His algorithm goes like this:

1. First, layout all their possible blocks and their frequencies.

2. At every round, you select the 2 with the lowest frequencies.

- In the figure, BY and WW, each with a frequency of 1.

3. Combine it into a little tree, which have a combined frequency of 2 and


keep a record on it.

4. Now, we repeat the process. This time we have 3 things to choose from.

Just like before, we select the two with the lowest frequency, put them
into a little tree and record the new total frequency of all the sub-
items.

5. This time it’s easy to select the 2 items with the lowest frequency
because there are only 2 things left to pick.

6. Then combine the figure above into a tree. Below is the final figure of
our tree and it having a very cool property; it is arranged by
frequency, with less common items lower down.
Then, how the tree gets us to a dictionary?

By using the frequency-sorted tree we can generate the codes we need by


labeling each branch with a 0 or 1:

Yellow-Yellow is encoded as 0 .
White-yellow is encoded as 1 0 .
Black-Yellow is 1 1 0 .
White is 1 1 1.

The really cool thing about these codewords is that there’s no way to have
conflicting codes, because each path down the tree is unique.

This means our code is prefix-free, that is no code starts with another
complete code.

Example: {a = 0, b = 10, c = 110, d = 111} is a prefix code.

The pixel pair, white-yellow, is substituted for the bits, “ 1 0 “.


The pixel pair, black-yellow, is substituted for the bits, “ 1 1 0 “.
The pixel pair, yellow-yellow, is substituted for the bits, “ 0 “.
The pixel pair, white-white, is substituted for the bits, “ 1 1 1 “.
And the process repeats on the rest of the image.
So, instead of 48 bytes of image data, the process is encoded it into 14 bits
of data. That is less than 2 bytes of data.

WY = 10
BY = 110
YY = 0
YY = 0
YY = 0
WW = 111
WY = 10
YY = 0
10 110 0 0 0 111 10 0 = 14 bits
But this data is meaningless unless we also save our code dictionary.
So, we will need to append it to the front of image data like below:

Now, including the dictionary, the image data is 30 bytes long (refer to
blocks).
That is still a significant improvement over 48 bytes.
The 2 approaches, removing redundancies and using more compact
representations, are often combined and underlie almost all lossless
compressed file formats like GIF, PNG, PDF, and ZIP files.
GIF stands for Graphics Interchange Format.
GIF is a raster file format designed for relatively basic images that appear
mainly on the internet. Each file can support up to 8 bits per pixel and can
contain 256 indexed colors. GIF files also allow images or frames to be
combined, creating basic animations.
PNG, short for Portable Network Graphics, is a popular and high-quality
graphic file format. The PNG format is both lossless and supports
transparency, making it great for webpages. You can view PNG files in almost
any graphic program, image viewer, and web browser.
Adobe PDF files—short for portable document format files—are one of the most
commonly used file types today. If you've ever downloaded a printable form or
document from the Web, such as an IRS tax form, there's a good chance it was
a PDF file. Whenever you see a file that ends with . pdf, that means it's a
PDF file.

A zip file is a file format that can contain multiple files combined and
compressed into one file. Files that are zipped have a file extension of.
zip. Since it's a type of compressed file, a zip file can be smaller in size
than the files it contains. This makes the zip file easier and faster to
download.
Both run-length encoding and dictionary coders are lossless compression
techniques.
Lossless compression technique means no information is lost. When decompress,
the original file is still there.
That’s really important for many types of files.
Like, it would be very odd if I zipped up a word document to send to you, and
when you decompressed it on your own computer, the text was different.
But there are other types of files that we can get away with little changes,
perhaps by removing unnecessary or less important information, especially
information that human perception is not good of detecting.
And this trick underlies most lossy compression techniques.
These tend to be pretty complicated, so we are going to attack this at a
conceptual level.
Let’s take sound as an example.
Your hearing is not perfect.
We can hear some frequencies of sound better than others.
And there are some we can’t hear at all, like ultrasound.
Basically, if we make a recording of music, and there’s data in the
ultrasonic frequency range, we can discard it, because we know that humans
can’t hear it.

On the other hand, humans are very sensitive to frequencies in the vocal
range, like people singing, so it’s best to preserve quality there as much as
possible.

Deep bass is somewhere in between. Humans can hear it but we are less attuned
to it.
We mostly sense it.
Lossy audio compressors take advantage of this, and encode different
frequency bands at different precisions.
Even if the result is rougher, it’s likely that users won’t perceive the
difference.
Or at least it doesn’t affect the experience.
And here comes the hate mail of audiophiles.
Audiophiles are an exceptional breed of people who are fascinated by pure
audio, motivated by sound quality and addicted to audio gadgets. Audiophiles
take their passion for music one step further. They're curious about how
songs are recorded and the science behind how sounds are reproduced.

You encounter this type of audio compression all the time. It’s one of the
reasons we sound different on a cellphone versus in person.

The audio data is being compressed, allowing more people to take calls at
once.
As the signal quality or bandwidth get worse, compression algorithms remove
more data, further reducing precision, which is why Skype calls sometimes
like robots talking.
Compared to an uncompressed audio format, like a WAV or a FLAC (there we go,
got the audiophiles back).
The full form of WAV is Waveform Audio File Format. It's used on Computing, File
Extensions in Worldwide Waveform Audio File Format (WAVE) or WAV.

FLAC is an acronym for Free Lossless Audio Codec. Files with the .flac file
extension contain audio files that are compressed using lossless audio
compression. The compression of a FLAC file is similar to the compression of
a ZIP file, making the file more manageable and saving file storage space.
Compressed audio files like, MP3s, are often 10 times smaller.
This idea of discarding or reducing precision in a manner that aligns with
human perception is called perceptual coding, and it relies on models of
human perception, which come from a field of study called Psychophysics.
This same idea is the basis of lossy compressed of image formats, most
famously JPEGs (Joint Photographic Expert Groups).
Like hearing, the human visual system is imperfect (not perfect).
We are really good in detecting sharp contrasts, like the edges of objects,
but our perceptual system is not so hot with subtle color variations.
JPEG takes advantage of this by breaking images up into blocks of 8x8 pixels,
then throwing away a lot of high-frequency spatial (relating to) data.
Example:
A dog’s picture. On the other side is a patch of 8x8 pixels.
The one on the left is 1/3 the file size of the one on the right when using
JPEG file compression.
Data Deduplication
Traditional data back-up does not provide any inherent capability to prevent
duplicate data from being backed-up.
With the growth of information and 24 by 7 application availability
requirements, back-up windows are shrinking.
Traditional back-up process back-up a lot of duplicate data.
Backing up of duplicate data significantly increased the back-up windows
size, requirements, and results in unnecessary consumption of resources such
as storage space and network bandwidth.
Data deduplication helps to reduce the storage requirement for back-up, shot
in the back of window and remove the network burden.
It also helps to store more back-ups on the disk and retain the data for a
longer period of time.
Data deduplication is a process that eliminates excessive copies of data and
significantly decreases storage capacity requirements.
It is a technique used for eliminating duplicate copies of
repeating/redundant data.
It is one of the most important technologies used for online back-up and
recovery solution.

There are 2 methods of deduplication:


1. File level deduplication methods
2. Subfile level deduplication methods

File level deduplication methods is also called a singles instant storage, it


detects and removes redundant copy of identical files. Basically, it enables
storing only one copy of the file and the subsequent copies are replaced with
the pointers that points to an original file.

File level deduplication is simple and fast but does not address the problem
with the duplicate content inside the files.

Example:

10MB Powerpoint presentations with different title page are not considered as
duplicate files because each file will be stored separately.
Subfile level deduplication it breaks the file into smaller chunks and then
uses specialized algorithm to detect redundant data within and across files.
Traffic Analysis
Traffic Analysis is the analysis of patterns in communications for the
purpose of gaining intelligence about a system or its users.
The key benefits of network traffic analysis
Improved visibility into devices connecting to your network (e.g. IoT
devices, healthcare visitors) meet compliance requirements.
Network traffic analysis is defined as a method of tracking network activity
to spot issues with security and operations, as well as other irregularities.
An Overview of Network Traffic Analysis

Analyzing your network’s traffic can be daunting (intimidating/difficult).

It involves:

1. Collecting, storing, and monitoring all the data traversing (moving back
and forth or sideways) your on-premises, hybrid, or multi-cloud
infrastructure.
2. Need to visualize and search this data for network planning and design.
3. Need notifications when something’s gone wrong to effectively
troubleshoot.

Steps that are needed in Traffic analysis.

1. Identify Your Data Sources

This is to find out what’s out there on the network. We can’t analyze and
monitor something if we don’t know it exists.

There are two parts in identifying the data sources.

a. Determine Data Source Types


b. Decide Methods of Identification

Determine Data Source Types

We need to identify and categorize the types of sources we can collect data
from. There are applications, desktops, servers, routers, switches, firewalls,
and more. Each of these can provide various metrics we can collect for
analysis.
Decide Methods of Identification

You’ll need to determine the best methods you can use to identify your data
sources.

You can use a manual or automated approach. The manual approach involves
shifting through topology maps and other documentation.

So, consider the automated method with application and network discovery.

Common auto-discovery methods include using SNMP, Windows Management


Instrumentation (WMI), flow-based protocols, and transaction tracing. Doing
this now will later help you find application and network dependencies and
maximize infrastructure visibility.

Simple Network Management Protocol (SNMP) is an application-layer protocol for


monitoring and managing network devices on a Local Area Network (LAN) or Wide
Area Network (WAN).

Windows Management Instrumentation (WMI) is a set of specifications from


Microsoft for consolidating the management of devices and applications in a
network from Windows computing systems. WMI provides users with information
about the status of local or remote computer systems.
Flow routing or flow-based protocols is a network routing technology that takes
variations in the flow of data into account to increase routing efficiency.
Network Traffic
Network traffic is the amount of data moving across a computer network at any
given time.

Network traffic, also called data traffic, is broken down into data packets
and sent over a network before being reassembled by the receiving device or
computer.

Network traffic has two directional flows:

1. north-south
2. east-west

North-south Traffic

North-south traffic refers to client-to-server traffic that moves between the


data center and the rest of the network (i.e., a location outside of the data
center).

East-west Traffic

East-west traffic refers to traffic within a data center, also known as server-
to-server traffic.

Traffic affects network quality because an unusually high amount of traffic


can mean slow download speeds or spotty Voice over Internet Protocol (VoIP)
connections.

Traffic is also related to security because an unusually high amount of


traffic could be the sign of an attack.
Data Packets

A data packet is a unit of data made into a single package that travels along
a given network path.

When data travels over a network or over the internet, it must first be broken
down into smaller batches so that larger files can be transmitted efficiently.

The network breaks down, organizes, and bundles the data into data packets so
that they can be sent reliably through the network and then opened and read by
another user in the network.

Each packet takes the best route possible to spread network traffic evenly.

Types of Network Traffic

To better manage bandwidth, network administrators decide how certain types of


traffic are to be treated by network devices like routers and switches.

There are two general categories of network traffic:

1. real-time
2. non-real-time

Real-time Traffic

In real-time traffic, wherein the traffic is deemed important or critical to


business operations must be delivered on time and with the highest quality
possible.

Examples of real-time network traffic include VoIP, video conferencing, and


web browsing.

Non-real-time Traffic

Non-real-time traffic, also known as best-effort traffic, is a network traffic


that network administrators consider less important than real-time traffic.

Process Automation
Process automation uses technology to automate complex business processes. It
typically has three functions: automating processes, centralizing information,
and reducing the requirement for input from people.
Storage Provisioning
Storage provisioning is a management technique that assigns storage capacity
to servers, computers, virtual machines and other devices. It may use
automation to allocate storage space in a networked environment.
Memory Management
Memory management is the process of controlling and coordinating a computer's
main memory.
It ensures that blocks of memory space are properly managed and allocated so
the operating system (OS), applications and other running processes have the
memory they need to carry out their operations.
The three major activities of the operating system with regard to memory
management are:
1. Keeping track of which parts of memory are currently being used and by
whom.
2. Deciding which processes are to be loaded into memory when memory space
becomes available.
3. Allocating and deallocating memory space as needed.
References:
[Link]
knbcthp3w8o7#:~:text=Page%20Number(p)%20%2D%20It,a%20word%20on%20a%20page.
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
ials&docid=603502557915658263&mid=790332991A71BD4329DD790332991A71BD4329DD&vi
ew=detail&FORM=VIRE
[Link]
[Link]
il&mid=FF598E7AC79EB2308EC4FF598E7AC79EB2308EC4&FORM=VDRVRV&ru=%2Fvideos%2Fse
arch%3Fq%3Dnetwork%2520virtualizatino%2520define%26FORM%3DVDVVXX&ajaxhist=0
[Link]
=vid&sa=X&ved=2ahUKEwiKi8PL5qj-
AhXkUPUHHRtVDLMQ_AUoA3oECAEQBQ&biw=1280&bih=581&dpr=1.5#fpstate=ive&vld=cid:d
2b6453f,vid:G1lCF5VALsc
[Link]
20operating%20system%20video&oq=replication%20in%20storage%20management%20in%
20operating%20system%20video#fpstate=ive&vld=cid:abf2bd66,vid:bB_qs0ZZci0
[Link]
[Link]/free
[Link]
[Link]
security#What_Does_Operating_System_Security_Mean
[Link]
[Link]
[Link]
=lnms&tbm=vid&sa=X&ved=2ahUKEwjyjM60283-
AhWXb2wGHc5zANoQ_AUoAnoECAEQBA&biw=1280&bih=581&dpr=1.5#fpstate=ive&vld=cid:2
215189f,vid:OtDxDvCpPL4
[Link]
=lnms&tbm=vid&sa=X&ved=2ahUKEwjyjM60283-
AhWXb2wGHc5zANoQ_AUoAnoECAEQBA&biw=1280&bih=581&dpr=1.5#fpstate=ive&vld=cid:2
215189f,vid:OtDxDvCpPL4
[Link]
m=vid&sa=X&ved=2ahUKEwjKxaiT2tb-
AhVTsVYBHcsvB5UQ_AUoAnoECAIQBA&biw=1280&bih=581&dpr=1.5#fpstate=ive&vld=cid:d
af0719e,vid:bcMBO39mzWo

You might also like