Introduction to Computational
Data Analytics
Computational Data Analytics
• The computational data analytics is a field which
allows students to build on the interdisciplinary
core curriculum to provide depth and
specialization in data science, including ML, deep
learning, natural language, AI, visualization,
databases, high-performance computing, etc.
• Some examples of computational thinking
include developing a chess strategy, making and
reading maps, and organizing a long to-do list
into manageable daily tasks
Computational Data Analytics
• Steps of Computational Thinking:
• Abstraction: Problem formulation;
• Automation: Solution expression;
• Analysis: Solution execution and evaluation.
• Principals of Computational Thinking:
• This broad problem-solving technique includes four elements:
• decomposition, pattern recognition, abstraction and algorithms.
• There are a variety of ways that students can practice and hone
their computational thinking, well before they try computer
programming.
Computational Data Analytics
• computational skills are defined as the abilities to
calculate basic addition, subtraction,
multiplication, and division problems quickly and
accurately using mental methods,
paper-and-pencil, and other tools, such as a
calculator.
• The biggest benefit of computational thinking is
how it enables real-world problem solving. For
kids, knowing how to take large problems and
break them into simpler steps can help with
everything from solving math problems to writing
a book report.
Computational Data Analytics
• Types of Computation:
• Models of computation can be classified into
three categories: sequential models, functional
models, and concurrent models.
• Purpose of Computational: Computational models
intelligently gather, filter, analyze and present
health information to provide guidance to doctors
for disease treatment based on detailed
characteristics of each patient.
Computational Data Analytics
• Computational Analytics enables scientific discovery
through algorithms that identify patterns and
anomalies in data, test hypotheses, create models,
and quantify associated uncertainties.
• Computational Data Science combines aspects of
statistics, computer science, mathematics and
machine learning to identify trends, make
predictions, and solve problems.
• Computational data science uses algorithms and data
structures to store, manipulate, visualize and learn
from large data sets.
Computational Data Analytics
• Data analytics (DA) is the process of
examining data sets to find trends and draw
conclusions about the information they
contain. Increasingly, data analytics is done
with the aid of specialized systems and
software.
Introduction -R Programming
Introduction -R Programming
• R is an open-source programming language
that is widely used as a statistical software and
data analysis tool.
• R generally comes with the Command-line
interface.
• R is available across widely used platforms like
Windows, Linux, and macOS.
• Also, the R programming language is the latest
cutting-edge tool.
Why R Programming Language?
Why R Programming Language?
• R programming is used as a leading tool for machine learning,
statistics, and data analysis. Objects, functions, and packages can
easily be created by R.
• It’s a platform-independent language. This means it can be applied
to all operating system.
• It’s an open-source free language. That means anyone can install it
in any organization without purchasing a license.
• R programming language is not only a statistic package but also
allows us to integrate with other languages (C, C++). Thus, you can
easily interact with many data sources and statistical packages.
• The R programming language has a vast community of users and
it’s growing day by day.
• R is currently one of the most requested programming languages
in the Data Science job market that makes it the hottest trend
nowadays.
Features of R Programming Language
Statistical Features of R:
• Basic Statistics: The most common basic statistics
terms are the mean, mode, and median. These
are all known as “Measures of Central Tendency.”
So using the R language we can measure central
tendency very easily.
• Static graphics: R is rich with facilities for creating
and developing interesting static graphics. R
contains functionality for many plot types
including graphic maps, mosaic plots, biplots, and
the list goes on.
Statistical Features of R:
• Probability distributions: Probability
distributions play a vital role in statistics and
by using R we can easily handle various types
of probability distribution such as Binomial
Distribution, Normal Distribution, Chi-squared
Distribution and many more.
• Data analysis: It provides a large, coherent
and integrated collection of tools for data
analysis.
Programming Features of R:
• R Packages: One of the major features of R is it has a
wide availability of libraries. R has
CRAN(Comprehensive R Archive Network), which is a
repository holding more than 10, 0000 packages.
• Distributed Computing: Distributed computing is a
model in which components of a software system are
shared among multiple computers to improve
efficiency and performance. Two new packages ddR
and multidplyr used for distributed programming in R
were released in November 2015.
Programming in R:
• Since R is much similar to other widely used languages
syntactically, it is easier to code and learn in R. Programs can be
written in R in any of the widely used IDE like R Studio, Rattle,
Tinn-R, etc.
• After writing the program save the file with the extension .r. To
run the program use the following command on the command
line:
• R file_name.r
• Example:
– R
– # R program to print Welcome to GFG!
– # Below line will print "Welcome to GFG!"
– cat("Welcome to GFG!")
– Output:
– Welcome to GFG!
Advantages of R:
• R is the most comprehensive statistical analysis
package. As new technology and concepts often
appear first in R.
• As R programming language is an open source.
Thus, you can run R anywhere and at any time.
• R programming language is suitable for GNU/Linux
and Windows operating system.
• R programming is cross-platform which runs on
any operating system
• In R, everyone is welcome to provide new
packages, bug fixes, and code enhancements
Disadvantages of R:
• In the R programming language, the standard of
some packages is less than perfect.
• Although, R commands give little pressure to
memory management. So R programming
language may consume all available memory.
• In R basically, nobody to complain if something
doesn’t work.
• R programming language is much slower than
other programming languages such as Python and
MATLAB.
Applications of R:
• We use R for Data Science. It gives us a broad variety
of libraries related to statistics. It also provides the
environment for statistical computing and design.
• R is used by many quantitative analysts as its
programming tool. Thus, it helps in data importing and
cleaning.
• R is the most prevalent language. So many data
analysts and research programmers use it. Hence, it is
used as a fundamental tool for finance.
• Tech giants like Google, Facebook, bing, Twitter,
Accenture, Wipro and many more using R nowadays.
R and Data Science
• R and Python both play a major role in data science. It
becomes confusing for any newbie to choose the better or
the most suitable one among the two, R and Python
• Data science deals with identifying, representing and
extracting meaningful information from data sources to be
used to perform some business logics.
• The data scientist uses machine learning, statistics,
probability, linear and logistic regression and more in order
to make out some meaningful data.
• Finding patterns and similar combinations and cracking the
best possible path way according to the business logic is the
biggest job of analysis.
Tools for Data Science
• R, Python, SQL, SAS, Tableau, MATLAB, etc. are
of the most useful tools for data science, R
and Python being the most used ones.
• But still, it becomes confusing for any newbie
to choose the better or the most suitable one
among the two, R and Python. Let’s try to
visualize the difference.
R vs Python in Data science
Overview :
R Python
R is a programming language and Python is an Interpreted high-level
free software environment for programming language for general
statistical computing and graphics, purpose programming.
supported by the R Foundation for
Statistical Computing. It was created by Guido Van
It was designed by Ross Ihaka Rossum and was first released in
and Robert Gentleman and first 1991.
released in August, 1993. Python has a very clean and
It is widely used among simple code syntax.
statisticians and data miners for It emphasizes code readability and
developing statistical software and thus debugging is also far more
data analysis. simpler and easier in Python.
R vs Python in Data science
Specialities for datascience :
R Python
R packages cover advanced R and Python are equally good for
techniques which very useful for finding outliers in a data set, but
statistical work. for developing a web service to
The CRAN text view provides you enable other people to upload
with many useful R packages. datasets and find outliers, Python
R packages cover everything from is better.
Psychometrics to Genetics to People have built modules to
Finance. create websites, interact with a
On the other hand, Python, with variety of databases, and manage
the help of libraries like SciPy and users in Python.
packages like statsmodels, covers In general, to create a tool or
only the most common service that uses data analysis,
techniques. Python is a better choice.
R vs Python in Data science
Functionalities :
R Python
R has inbuilt functionalities for Python is a general purpose
data analysis. programming language.
R was built by eminent So most of the data analysis
statisticians with statistics and functionalities are not inbuilt and
data analysis in mind, so many are available through packages
tools that have been externally like Numpy and Pandas, which are
added to Python through available in PyPi(Python Package
packages are built in R by default. Index).
Key domains of application
R Python
Data visualization is a key aspect Python is better for deep learning.
of analysis, as visual data is best Packages like Lasagne, Caffe,
understood. Keras, Mxnet, OpenNN, Tensor
R packages like ggplot2, ggvis, flow, etc. allows development of
lattice, etc. make data deep neural networks far more
visualization easier in R. simple in Python.
Python is catching up with Although some of these, like
packages like Bokeh, Matplotlib, tensor flow, are being ported to R
etc. but is still far behind in this (packages like deepnet, H2O, etc.)
regard. but it is still better in Python.
Availability of Packages
R Python
R has hundreds of packages and Python relies on a few main
ways to accomplish needful data packages, viz., Scikit learn and
science tasks. Although it allows to Pandas are the packages for
have desired perfection in machine learning data analysis
completing the task, it makes it respectively. It makes easier to
difficult for inexperienced accomplish required tasks but
developers to achieve certain consequently it becomes difficult
goals. to achieve specialization.