Unit Ii-1
Unit Ii-1
0
UNIT II
Introduction to Sustainable Development:
Sustainable development can be defined as an approach to the economic
development of a country without compromising with the quality of the environment for
future generations. In the name of economic development, the price of environmental
damage is paid in the form of land degradation, soil erosion, air and water pollution,
deforestation, etc. This damage may surpass the advantages of having more quality
output of goods and services.
(1) Poverty: This is the basis of many health and social dilemmas and psychological
and moral crises.
The local communities, national and international development policies and economic
reform plans eliminate the problems by creating employment and natural, human,
economic and educational development of the poorest and most backward areas.
(2) Debt: There is a debt crisis in the country when a country is unable to pay its bills.
But it does not occur overnight because there are many signs of warning.
This becomes a crisis when the leaders of the country ignore these signs and
indicators for political reasons.
The important problem is that many of the countries are unable to generate enough
public revenue.
When the risk of the debt crisis becomes high, a quick response to reducing immediate
financial stress could make all the difference between fast recovery and long-lasting
loss.
(4) War: armed conflicts and foreign occupation which adversely affect the
environment and its integrity, and the need to implement the United Nations
resolutions calling for the end of foreign occupation and the enactment of legislation
and obligations that prohibit and criminalize the pollution, deforestation or destruction
of the environment; and respect for dignity in the treatment of prisoners in accordance
with international law and to prevent the destruction of houses, civilian installations,
and water sources.
(6) Environmental degradation: The deterioration of the natural resource base and
its continued depletion to support current production and consumption patterns, which
increases the depletion of the natural resource base and impedes the achievement of
sustainable development in developing countries.
• The triple bottom line is a transformation framework for businesses and other
organizations to help them move toward a regenerative and more sustainable
future.
• Tools within the triple bottom line help to measure, benchmark, set goals,
improve, and eventually evolve toward more sustainable systems and models.
• The triple bottom line illustrates that if an organization is only focused on profit—
ignoring people and the planet—it cannot account for the full cost of doing
business and thus will not succeed long term.
“The triple bottom line wasn’t designed to be just an accounting tool. It was
supposed to provoke deeper thinking about capitalism and its future.”
—John Elkington in his Harvard Business Review article
While there are three categories that make up triple bottom line theory, it is important
to remember each category is not siloed. Through a systems theory lens, people,
planet, and prosperity are all interconnected.
People
The people category considers all stakeholders (versus solely shareholders) including
employees, communities within which an organization operates, individuals
throughout the supply chain, future generations, and customers etc., The connections
with corporate social responsibility (CSR) are central to this portion of the triple bottom
line. CSR is defined as a responsibility among organizations to meet the needs of their
stakeholders and a responsibility among stakeholders to hold organizations
accountable for their actions.
A few initiatives that an organization may consider as part of its CSR goals include:
advancing human rights; ending poverty and hunger; diversity, equity and inclusion;
gender equity; ensuring a healthy and safe work environment; and community
engagement and volunteerism. Not only are CSR initiatives beneficial for
stakeholders, but adopting this business strategy is also essential for business.
As part of a commitment to advance CSR initiatives, we also see businesses sharing
best practices with other businesses and organizations.
Planet
Public opinion, consumer purchasing power, the speed and transparency of
information sharing via social media, and even industry-led activism has made it easier
for stakeholders to hold organizations accountable for their actions. This is seen in
rewarding the positive impacts and reprimanding the negative.
Stakeholders are increasingly aware of not only the consequences businesses have
on the environment, community, and the economy but also of the importance of global
issues, such as climate change and social justice.
Over the past couple of decades, we’ve witnessed an increase of businesses adopting
practices that help minimize environmental impact. Also, more recently, leading
organizations like AT&T, DELL, EASTON, Hewlett Packard, Kohler Co., Levi Strauss
& Co., and Target have taken a step further down the sustainability path by creating
a net-positive or regenerative impact on the environment and society.
“To protect the planet, we must show others that impossible can be business
as usual.”
—Lisa Jackson, Vice President, Environment, Policy and Social Initiatives at Apple
Prosperity
Triple bottom line theory is systemic in nature through its view of people, planet, and
prosperity. With this connectivity in mind, the United Nations (U.N.)
created Sustainable Development Goals (SDGs) that “ensure all human beings can
enjoy prosperous and fulfilling lives and that economic, social, and technological
progress occurs in harmony with nature.”
Many of the U.N. SDGs aim to improve a wide range of areas related to environment,
people, and economic opportunities. One of the many prosperity-focused goals aims
to provide decent work (safe working conditions, living wages, compassionate
leadership) and economic growth for those in specific communities.
Examples from the U.N.’s SDGs of how businesses can help support the prosperity of
their stakeholders include:
• By 2025, take immediate and effective measures to eradicate forced labor, end
modern slavery, and human trafficking. Additionally, prohibit and eliminate all
forms of child labor, including recruitment and use of child soldiers.
• By 2030, devise and implement policies to promote sustainable tourism that
creates jobs and promotes local culture and products.
The future of the world has been redesigned. The United Nations (UN), and by
extension the entire population of the planet, face an exciting challenge that seek
nothing more, nothing less, ensuring the sustainable development.
It is year 2000. The UN draws up the Millennium Goals, eight aims to be fulfilled in
fifteen years:
• End poverty
• End hunger
• Conserve and sustainably use the oceans, seas and marine resources
• Promote peaceful and inclusive societies and provide access to justice for all
The term that the UN establishes for these goals is a fifteen-year period. By 2030 we’ll
have a new date with the planet, but, will we be victorious? Over the recent years
these achievements have been completed, and they are no doubt a great starting
point:
Big data analytics is the use of advanced computing technologies on huge data sets
to discover valuable correlations, patterns, trends, and preferences for companies to
make better decisions. In Industry 4.0, big data analytics plays a role in a few areas
including in smart factories, where sensor data from production machinery is
analyzed to predict when maintenance and repair operations will be needed. Through
application of it, manufacturers experience production efficiency, understand their
real-time data with self-service systems, predictive maintenance optimization, and
production management automation.
Collectively, the volume of data being generated has come to be termed big data and
analytics that include a wide range of faculties from basic data mining to advanced
machine learning is known as big data analytics. There isn't, as such, an exact
definition due to the relative nature of quantifying what can be large enough to meet
the criterion to classify any specific use case as big data analytics. Rather, in a generic
sense, performing analysis on large-scale datasets, in the order of tens or hundreds
of gigabytes to petabytes, can be termed big data analytics. This can be as simple as
finding the number of rows in a large dataset to applying a machine learning algorithm
on it.
At a fundamental level, big data systems can be considered to have four major layers,
each of which are indispensable. There are many such layers that are outlined in
various textbooks and literature and, as such, it can be ambiguous. Nevertheless, at
a high level, the layers defined here are both intuitive and simplistic:
Hardware: Servers that provide the computing backbone, storage devices that store
the data, and network connectivity across different server components are some of
the elements that define the hardware stack. In essence, the systems that provide the
computational and storage capabilities and systems that support the interoperability
of these devices form the foundational layer of the building blocks.
Software: Software resources that facilitate analytics on the datasets hosted in the
hardware layer, such as Hadoop and NoSQL systems, represent the next level in the
big data stack. Analytics software can be classified into various subdivisions. Two of
the primary high-level classifications for analytics software are tools that facilitate are:
Data mining: Software that provides facilities for aggregations, joins across datasets,
and pivot tables on large datasets fall into this category. Standard NoSQL platforms
such as Cassandra, Redis, and others are high-level, data mining tools for big data
analytics.
Statistical analytics: Platforms that provide analytics capabilities beyond simple data
mining, such as running algorithms that can range from simple regressions to
advanced neural networks such as Google TensorFlow or R, fall into this category.
End user: The end user of the analytics software forms the final aspect of a big data
analytics engagement. A data platform, after all, is only as good as the extent to which
it can be leveraged efficiently and addresses business-specific use cases. This is
where the role of the practitioner who makes use of the analytics platform to derive
value comes into play. The term data scientist is often used to denote individuals who
implement the underlying big data analytics capabilities while business users reap the
benefits of faster access and analytics capabilities not available in traditional systems.
Manufacturers use big data analytics in the same way as most other commercial
entities except with a narrower focus. They collect huge amounts of data from smart
sensors through cloud computing and IIoT platforms that allow them to uncover
patterns that help them improve the efficiency of supply chain management.
Big data analytics can help them discover hidden variables causing bottlenecks in
production that they didn’t even know existed. After identifying the source of the
problem, manufacturers use targeted data analytics to better understand the
underlying cause of bottleneck variables. This helps manufacturers improve output
while reducing cost and eliminating waste.
Automate Production Management with Big Data Analytics
There’s no doubt the fourth industrial revolution is the most disruptive to date. The way
that humans run companies, offer services in all fields and live their daily lives has
been altered in some way, often quite dramatically. Data and data management and
analysis form the background of all of the transformation and innovation we are living
through – now is the time to hire data talent and start to understand what data
management and analysis looks like for your organisation
While businesses likely have some concept of the kind of information that must be
reviewed, the particulars may be less obvious. The data may provide clues to
previously unseen patterns, and the need for further research becomes apparent if a
pattern is found. Begin by creating a handful of simple use cases. In doing so, you will
collect and acquire data that was not previously accessible, which will help you
discover these unknown unknowns. A data scientist's ability to identify crucial data and
develop insightful predictive and statistical models Improves when a data repository is
established, and more data is gathered.
There's also a chance that the company is aware of the information gaps inside it.
Identifying the external or third-party data sources and implementing a few use cases
that depend on this external data are the first steps in addressing these known
unknowns. The business should engage with a data scientist to do so.
Data Ingestion: The first step in deploying big data solutions is to collect data from a
variety of sources, such as an ERP system like SAP, a customer relationship
management system like Salesforce or Siebel, a relational database management
system (RDBMS) like MySQL or Oracle, or the log files, flat files, documents, images,
and social media feeds. HDFS is required to house this information. Either once-per-
day, once-per-hour, or once-per-fifteen-minute batch tasks, or real-time, 100-ms-to-
120-second streaming, may be used to take in data.
Data Processing: In the end, you'll want to put your data through some processing
framework (MapReduce, Spark, Hive, etc.). Study the tools and techniques utilized in
big data—checkout Knowledgehut Big Data Certification.
Overview
Benefits
Data replication enables consistent access to sensitive data even when spread across
numerous servers and storage devices. To facilitate low-latency data retrieval, a
cluster-wide load balancer distributes data uniformly across drives.
Hadoop transmits the bundled code to the many nodes in the cluster and then
distributes the files, allowing for parallel local data processing.
Business owners benefit from its elevated levels of scalability and availability;
application-level failures are detected and corrected. It's easy to add new YARN nodes
to the resource management so they can run tasks, and it's just as easy to remove
them so you can scale down the cluster.
Managed from a central location, users may direct the program to store data blocks of
their choosing in local caches located on several nodes. Users may keep just a certain
number of blocks read replicas in the buffer cache when using explicit pinning, freeing
up valuable memory space for other purposes.
Hadoop guarantees data integrity by not replicating the actual data but instead relying
on point-in-time snapshots of the file system to preserve the block list and file size. So
that up-to-date information may be quickly retrieved, it logs file system changes in
reverse chronological order.
Features
Compression codecs, native IO utilities for uses like centralized cache management,
and checksum implementations are just a few examples of the native components
included in the Hadoop Library.
HDFS NFS Gateway: When HDFS is mounted on a client's file system, the user can
browse HDFS files locally and download and upload them.
Since HDFS allows for off-heap memory writing, data in memory may be flushed to
disk without interfering with the IO pipeline, improving speed. Lazy Persist Writes are
data offloads that assist speed up the time it takes for queries to return results.
Extra information about inodes may be stored in extended attributes, which user
programs can use to associate metadata with a file or directory.
Limitations
There is no support for streaming data; only batch processing is allowed. Because of
this, it runs more slowly generally.
It is inefficient at iterative processing since it does not allow cyclic data flow.
Neither the storage nor the network layers of encryption are enforced. Kerberos
authentication is used for security, which is difficult to keep up with.
2. Apache Spark
Overview
Benefits
Spark SQL: Spark SQL provides data querying through SQL or a DataFrame API, with
support for many data sources, including Hive, Parquet, JSON, JDBC, and more. It
provides access to preexisting Hive warehouses and connections to business
intelligence tools by supporting the HiveQL syntax, Hive SerDes, and UDFs.
Streaming analytics: it reads data from HDFS, Flume, Kafka, Twitter, ZeroMQ, and
custom data sources, allowing for effective batch and stream processing, combining
streams against historical data, and performing ad hoc queries on data as it arrives in
real-time.
Features
Design: Spark's ecosystem includes not just RDDs but also Spark SQL, Scala, MLlib,
and the core Spark software. It uses a master-slave architecture, where a driver
application (which may be hosted on either the master or client node) controls a group
of executors (hosted on the worker nodes) to complete tasks in parallel.
Spark's main processing engine, called the "Spark Core," facilitates cluster-wide
memory management, fault recovery, scheduling, distribution, and monitoring of
activities.
Limitations
Since security is disabled by default, deployments may be open to attack if not set up
correctly.
There doesn't seem to be version compatibility between their major versions.
Having an in-memory processing engine means it uses a lot of RAM.
Overview
Yahoo developed Hortonworks in 2011 to ease the transition to Hadoop for large
businesses. In 2019 Hortonworks merged into Cloudera. Hortonworks Data Platform
(HDP) is a Hadoop distribution that is both open source and free. It also provides
competitive in-house expertise, making it an appealing option for businesses wishing
to adopt Hadoop. HDFS, MapReduce, Pig, Hive, and Zookeeper are just a few of the
Hadoop projects included. Ambari for administration, Stinger for query processing, and
Apache Solr for data searches are all open-source in HDP, which is noted for its
uncompromising adherence to open-source and comes with zero proprietary software.
HCatalog is a part of HDP that facilitates communication between Hadoop and other
business programs. This happened to be the go-to enterprise big data solutions.
Benefits
Deploy Anywhere: This solution may be deployed on-premises, in the cloud (as a
component of Microsoft Azure HDInsight), or as a hybrid solution known as
Cloudbreak. Cloudbreak offers elastic scalability for resource efficiency and is
designed specifically for businesses that already have on-premises data centers and
IT infrastructure in place.
Scalability and High Availability: With the help of NameNode federation, a company's
infrastructure may be expanded to accommodate thousands of nodes and billions of
files. NameNodes are responsible for managing the file path and the information
associated with mapping, and federation guarantees that they are independent of one
another. This results in increased availability at a reduced total cost of ownership. In
addition, erasure coding significantly improves the efficiency of data storage, enabling
more effective data replication.
Security and Governance: Apache Ranger and Apache Atlas both provide data
lineage tracing from its point of origin to the data lake. This enables the creation of
rigorous audit trails to govern confidential or classified information.
Reduced Time to Market: It gives organizations the ability to roll out apps in a matter
of minutes, reducing the time it takes to bring products to market. The use of graphics
processing units enables the incorporation of machine learning and deep learning into
applications (GPUs). The hybrid data architecture of this company provides cloud
storage for unlimited data that is kept in its original format. This cloud storage can be
found in ADLs, WASB, S3, and GCP.
Features
Centralized Architecture: Hadoop operators may expand their big data assets as
needed, thanks to Apache YARN on the backend. For operations, security, and
governance, YARN effortlessly provides resources and services to applications
dispersed across clusters. It helps firms to examine data derived from a wide range of
sources and formats.
Third-party apps deploy quicker to Apache Hadoop thanks to built-in YARN support
for Docker containers. Users may test different versions of the same application
without affecting the current one. When you combine this with the natural advantages
of containers - resource efficiency and increased task throughput - you have a
competitive solution.
Data Access: With YARN, various data access techniques may coexist in the same
cluster against common data sets. HDP takes advantage of this capacity to enable
users to engage with several data sets at the same time in several ways. As a result,
business users may manage and analyze data inside the same cluster using
interactive SQL, real-time streaming, and batch processing, therefore eliminating data
silos.
Limitations
Overview
Benefits
Resource Management: Through its Resource Manager, users may allow concurrent
workload to run at an efficient pace. It reduces CPU and memory utilization, as well as
disk I/O processing time, and compresses data by up to 90% without sacrificing
information. Its SQL engine supports massive parallel processing (MPP) and offers
active redundancy, automated replication, failover, and recovery.
Integrations: It assists in the analysis of data from Apache Hadoop, Hive, Kafka, and
other data lake systems using built-in connectors and standard client libraries like
JDBC and ODBC. It connects with BI products like Cognos, Microstrategy, and
Tableau, as well as ETL systems such as Informatica, Talend, and Pentaho.
Features
In terms of data preparation, flex tables allow users to import and examine both
structured and semi-structured data sets.
About Hadoop: The robust querying and analytics of Vertica for SQL are made
possible by its straight installation on Apache Hadoop. It can read Parquet and ORC
files, both of which are native to Hadoop, and write them back as Parquet as well.
Using flattened tables, analysts may quickly compose queries and execute
sophisticated JOIN operations. These are independent of the original databases, thus,
modifying one will not affect the other. Because of this, complicated database
structures can support large data processing at a faster pace.
Limitations
Overview
VMWare owns the Pivotal Big Data Suite, a comprehensive data warehousing and
analytics system. Its Hadoop distribution, Pivotal HD, is equipped with tools including
YARN, SQLFire, and GemFire XD, a NoSQL database that runs in memory and
provides real-time analytics on top of HDFS. It has complete support for SQL,
MapReduce parallel processing, and data collections in the hundreds of gigabytes
range, and it is accessible through a RESTful API.
Cloud providers including Amazon Web Services (AWS), Microsoft Azure, Google
Cloud Platform, VMware, vSphere, and OpenStack are all compatible with Pivotal
Greenplum's seamless deployment. It provides stateful data persistence for Cloud
Foundry apps in addition to automated, repeatable deployments using Kubernetes.
Benefits
Greenplum's MPP architecture, analytical interfaces, and security features are all
consistent with those of the open-source PostgreSQL community.
Pivotal GemFire's High Availability features include automated failover to other nodes
in the cluster should an operation fail. If nodes in a grid cluster are removed or added,
the grid will automatically rebalance and rearrange itself. By using WAN replication,
many sites may be used for DR at once.
GemFire's horizontal architecture and in-memory data processing are tailor-made for
the needs of low-latency applications, allowing for faster data processing. The
response time to queries is decreased by sending them to the nodes that have the
appropriate data, and the results are presented in a data table format for convenience.
Features
Its design, which consists of separate nodes, data replication, and permanent write-
optimized disk storage, allows for fast processing times.
With its rapid query optimizer, it can process petabyte-sized data sets in parallel with
more efficiency. This is made possible by the system's ability to choose the most
appropriate query execution model.
Returns are typically 1.5 times more expensive than standard shipping expenses.
Businesses utilize big data and analytics to reduce product return costs by assessing
the likelihood of product returns. As a result, businesses may take appropriate actions
to reduce product-return losses.
Big data solutions may increase operational efficiency by allowing you to acquire vast
volumes of important customer data via your interactions with consumers and their
valuable comments. Analytics may then extract relevant patterns from the data to build
tailored goods. Technology may automate mundane procedures and activities, freeing
up valuable time for people to undertake tasks that require cognitive abilities.
The insights gained via big data analytics are essential for innovation. Big data enables
you to improve current goods and services while developing new ones. The enormous
amount of data gathered assists organizations in determining what best suits their
client base. Product development may benefit from knowing what others think of your
products/services.
The insights may also be utilized to change corporate strategy, better marketing
tactics, and increase customer service and staff efficiency.
In today's competitive market, firms must establish protocols that allow them to track
customer feedback, product success, and competition. Big data analytics enables real-
time market monitoring and puts you ahead of competition. Big data predictive
analytics solutions are key to boosting businesses.
Things to Consider Before Big Data Implementation
While big data is quickly taking center stage in marketing, human resources, finance,
and technology departments throughout the world, it is vital to realize that this exciting
endeavor comes with its own set of challenges in terms of big data privacy and
compliance.
Businesses acquire data from several sources, including laptop and desktop
computers, as well as smart devices such as mobile phones and tablets, all of which
contribute to the growing IoT network.
In today's corporate environment, when hackers abound and never tire of discovering
new methods to access networks and steal data, this plethora of valuable information
is a major burden for firms. As a result, as your big data collection expands, so will
your worries about big data security.
As you begin your own big data project, it is critical to pose the following fundamental
question:
Even if your computer system has the storage to hold all of the big data you want to
collect, does it can deal with data to do data analytics and data visualization? Many
firms utilize out-of-date technologies when it comes to dynamically modifying data to
transform it into the valuable tool you want. To make the greatest use of your big data,
your firm must invest in the correct big data solution architecture.
3. Employee Education
Big data is one of the new kids on the block in the world of information technology, so
locating and onboarding skilled people may be difficult at first. Furthermore, this skill
is likely to be expensive to find.
Many firms that are just getting started with big data hire consultants to provide the
essential knowledge. Finding in-house data scientists may be time-consuming since
this crucial person must have exceptional mathematics and computing abilities, as well
as an amazing ability to see patterns and trends in data.
4. Appropriate Budgeting
Considering the previously stated factors for security, manpower, and system
integration, the expenditures associated with tackling big data might soon exceed your
original budget.
Although the expenses of gathering and storing data are relatively inexpensive these
days due to cloud storage and hosting, the cost of analyzing and displaying big data
is rather high. Finally, businesses must consider the long-term prospective outcomes
to assess if the initial investment in the finest data infrastructure and technologies is
worthwhile.
Once you've created a safe and cost-effective environment for your big data, recruited
the ideal data scientist, and examined the data, you'll need to know what to do with it
to make it all worthwhile. Businesses spend millions of dollars gathering and analyzing
data; therefore it is critical that the findings be used in practical and lucrative ways.
One important method used by firms is to ask meaningful questions about a piece of
data.
If you have a project-focused crew, wonderful. If not, find specialists. Sponsorship may
also be needed. Big data initiatives are costly and time-consuming. Calculate your
costs and determine whether you require sponsorship. You can go with open-source
options also if you do not wish to invest in enterprise solutions.
2. Obtain data
You'll need to identify all data sources to gather relevant data sets. Identify, prioritize,
and assess them before going ahead.
Data lakes may store data. Data lakes store organized and unstructured data. Lakes
store data flatly, unlike data warehouses. Data lakes may be built and deployed
utilizing cloud or on-premises technology. This will act as a staging layer for your
system.
Perform transformations and analytics to create data hubs. This information allows
you to alter your processes and learn how to utilize the data. Let things progress
incrementally to avoid project failure.
4. Validation