0% found this document useful (0 votes)
36 views4 pages

Week 5 Industrial Internship Report

Vikas Gupta completed his fifth week of internship at CureYa. He learned about data visualization using QlikSense, including the steps of visualization and important features. He studied PyTorch for machine learning and completed tutorials on tensors, backpropagation, and creating neural networks. He discussed research papers on AI techniques with his mentor and learned about types of papers, criteria for selection, and top publications. To compare machine learning algorithms, he used a breast cancer dataset and found logistic regression and KNN had highest accuracy while decision trees and naive Bayes had lowest.

Uploaded by

Vikas Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

Week 5 Industrial Internship Report

Vikas Gupta completed his fifth week of internship at CureYa. He learned about data visualization using QlikSense, including the steps of visualization and important features. He studied PyTorch for machine learning and completed tutorials on tensors, backpropagation, and creating neural networks. He discussed research papers on AI techniques with his mentor and learned about types of papers, criteria for selection, and top publications. To compare machine learning algorithms, he used a breast cancer dataset and found logistic regression and KNN had highest accuracy while decision trees and naive Bayes had lowest.

Uploaded by

Vikas Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

INDUSTRIAL INTERNSHIP

WEEKLY PERFORMANCE REPORT (WPR)

Student Name: Vikas Gupta


Supervisor Name: B. P. Mishra/ Shivani Mishra
Coordinator/Team Leader Name: Namira Rangrej
Mentor Name: Pranshu Sharma
Organization: CureYa
Hours Worked: Monday-2 hrs, Tuesday-2 hrs, Wednesday-1.5 hrs, Thursday-1.5 hrs, Friday- 3 hrs

Summarize your thoughts regarding your internship this week. Include duties you have performed,
facts, and procedures you have learned, skills you have mastered, and observations you have made.

Week-5 (24 May 28, 2021 to 28 May 2021)


Monday:
Data Visualization: It is the process of converting data into information in the form of a chart,
diagram, picture etc….to help the decision making in the mean time.
Data Visualization using QlikSense: QlikSense is the biggest game player in Business Intelligence
(BI) market operating from 1993 providing various BI services to around 1700 customers all around
the world every day.
There are 3 steps of visualization:
1. Extraction of data from various data sources,
2. Modeling: It’s the clean up part with outcome as a single table or a set of tables interlinked,
3. Visualization: it includes dimensions (columns) and metrics (operations).
What is the importance of Data Visualization? AND Why BI Visualization tools are required?
Answer:
1) Connect varieties of data sources, bring data into a single platform and perform
transformations.
2) Best Data Compression (40 MB storage can be compressed to 5MB storage)
3) In Memory (Improved Performance)
4) Data Associativity Feature (interact deeply with a particular filter feature)
5) Machine learning and deep learning integration (Mash-Up in QlikSense)
6) Easy to use
7) Accessibility over internet (host the application on server and share)
8) Embedded Analytics (bring visualization to websites)
9) Geo Analytics (dig data deep down geographically)
10) Integration of open source chart libraries like [Link], [Link] etc.
QlikSense consists of bigdata data source connectors, flat files connectors, SQL database connectors,
and API connectors.
Who are the End Users:
DAR Structure (Dashboard, Analysis, Reports)
Dashboard: - For Higher Level Management only (Mentioning critical points only)
Analysis: - For Middle Level Management (Aggregate view, deep view of data)
Reports: - Low Level Management (Record by record transaction for TL etc.)
Top BI Tools:-
1) QlikView/ QlikSense (ETL and customization features), 2) Tableau, 3) Power BI, 4)
SiSense, 5) Kibana, 6) Microstrategy, 7) Birst, 8) TibcoSpotfire, 9) Looker
Qlik Installation: - Go to [Link]/us→ Register on it→ Go to “Support”→ Go to
“Download”→ download the latest version (don’t forget to check ‘extras’)
QlikSense Hub: - Login→ Go to ‘desktop hub’→ click on “CreateNewApp” (it’ll be in qvf
format)
Now, open the app→ add data from files → load a csv file → add data→ go to “data load
editor” to check.
This is used to generate insights.
Go to ‘script editor’ to load data→ create new section by clicking Ꚛ→ Name the section→
create new connection→ select folder→ enter path→ name the connection→ select data
Edit connection→ insert data→ click on load data
App Overview→ Sheets, Bookmarks, Stories
Go to Sheets→ click on CreateSheet→ click on association
Data Model Viewer: interconnection of multiple tables
Creation of variables, barcharts, and add-ons. Change appearance such as title, footnote,
presentation, colors of bars.
Creating multiple charts: go to ‘Tables’→ go to ‘Data’→ click on “Add column”
Data Transformation Basics: -
Give appropriate table names (by default it’ll be the file name of ‘csv’.
Comment // to ignore a column.
Adding Filters: Make changes→ Save→ Load again→ Go to “Model Viewer”
Functions in scripts and expressions, refer [Link] for documentation.

Tuesday:
PyTorch Tutorial: -
PyTorch is developed by Facebook AI Research (FAIR) Lab. PyTorch has a C++ interface, thus,
it’s very fast. A number of features of deep learning software are built on top of PyTorch
including Tesla Autopilot, Uber’s Pyro, HuggingFace’s Tranformers, and PyTorch Lightening
& Catalyst.
From Research to Production: - An open source machine learning framework that
accelerates the path from research prototyping to production deployment. It is used for
Computer Vision and Natural Language Processing.
PyTorch provides 2 high-level features:
1) Tensor Computing (like NumPy) with strong acceleration via GPU
2) Deep Neural Networks built on a type-based automatic differentiation system
PyTorch Videos on:
 Installation of PyTorch for Deep Learning
 Understanding of Tensors: - A Tensor is a generalization of vectors and metrices,
and is easily understood as a multidimensional array. It is a term and set of
techniques known in machine learning in the training and operation of deep learning
models can be described in terms of tensors. In many cases, tensors are used as a
replacement for NumPy to use the power of GPUs.
Tensors are a type of data structure used in linear algebra, and like vectors and
metrices, you can calculate arithmetic operations with tensors.
 Back Propagation using PyTorch (Compute derivatives)
 Creating an ANN (Artificial Neural Network) using PyTorch
 Kaggle Advance House Price Prediction using PyTorch-Tabular Dataset
 How to use GPU to run PyTorch code

Wednesday:
 Revision of Python basics, Statistics, and machine learning algorithms.
 Installed Python package “covid” and done analysis of data by various methods and
functions.
 Hands-on experience with Pandas Profiling.
 Deep Learning Study (Neural Networks)

Thursday:
One-to-one discussion with Dr. Bajarang Mishra Sir on research paper: Discussed Artificial
Intelligence (AI) techniques: 1) Neural Network, 2) Fuzzy Logic, 3) Genetic Algorithms (GAs),
and 4) Hybrid method. I’ve done comparative study of all these techniques. I’ve also done
the comparative study of ANN (Artificial Neural Network), ANFIS (Adaptive neuro fuzzy
inference systems), CANFIS (Co-active neural fuzzy inference systems), and hybrid intelligent
systems.
Types of research papers: 1) Patent paper (topmost priority), 2) Transactions paper, 3)
Journal Paper, 4) International Conference paper.
Criteria for selection: 1) New and innovative ideas (priority), 2) Finding the best
(comparative study of technologies or algorithms), 3) Technical Review Paper (Crux of the
outcome of 20-25 research papers).
Top Publications: - 1) IEEE, US, 2) Elsevier, 3) Taylor & Francis
Most reputed Journal indexing services:
I. WOS (Web Of Science),
II. SCI (Science Citation Index)
a. ESCI (Emerging Source Citation Index)
b. SCIE(Science Citation Index Expanded)),
III. SCOPUS
Research Paper Study: searching of publishers, journals and topics on IEEE Xplore.

Friday:
Comparison of various machine learning algorithms:
I used Breast Cancer Winconsin (Diagnostic) Dataset for this task. The objective was to
predict whether the cancer is Benign or Malignant. I performed exploratory data analysis
(EDA) on this dataset and then compared the accuracy of various machine learning
algorithms. I found that Logistic Regression and KNN provided maximum accuracy while
Decision Tree and Naïve Bayes rendered lowest accuracy. The project was uploaded on my
GitHub profile and then it was posted on LinkedIn along with a video (screen recording of
the code) and a GitHub link (including tags of CureYa, Cureya Internship, all CureYa
individuals involved in this internship and related hashtags).
Student Signature: Vikas Gupta Date: 28/05/2021

Head Co-ordinator Signature: Date:

Instructions: After the completed report has been signed by both the student and Head-
coordinator, the head-coordinator shall scan the form to a pdf format and email it to the Director-
1 (bpmishra435@[Link]) of the company. Specific problems, concerns or suggestions from
either the student/head-coordinator should be emailed separately to the C. E. O. (info@[Link])
of the company.

Common questions

Powered by AI

ANNs, ANFIS, CANFIS, and hybrid intelligent systems each offer unique approaches within AI. ANNs focus on learning from large datasets to detect patterns and make decisions . ANFIS combines neural networks with fuzzy logic for adaptive processing and learning . CANFIS integrates cooperative approaches into a fuzzy inference system, enabling enhanced adaptability and accuracy . Hybrid systems merge multiple techniques to leverage their respective strengths, often resulting in more robust performance across diverse applications .

Data visualization transforms complex data sets into user-friendly representations like charts and diagrams, aiding decision-making processes . QlikSense enhances this by connecting various data sources, offering data compression, in-memory performance improvement, and features like data associativity and machine learning integration. These capabilities provide users with a comprehensive yet accessible way to analyze and interpret data visually .

In the DAR structure, dashboards are used by higher-level management for critical data insights, analysis provides middle-level management with aggregated views and deeper insights, and reports offer low-level management detailed, transactional data records for operational purposes . This tiered approach ensures that information is delivered in a manner aligned with the decision-making needs of each management level .

The four main research paper types in AI are patent papers, transactions papers, journal papers, and international conference papers . When selecting a research article, important criteria include the novelty and innovation of ideas, the comparative analysis of technologies or algorithms, and technical reviews that synthesize outcomes from multiple studies .

PyTorch simplifies backpropagation with its dynamic computation graph, allowing for real-time adjustments and more natural error tracking during neural network training . This dynamic capability compares favorably with static graph frameworks by offering flexibility and ease of debugging, making PyTorch particularly effective for research and experimental setups in AI and ML compared to rigid frameworks .

PyTorch accelerates the shift from research to production through its open-source machine learning framework, supporting Computer Vision and Natural Language Processing tasks . Its key features include tensor computing with GPU acceleration and deep neural networks with automatic differentiation, enabling efficient experimentation and deployment in both academic and industrial settings .

PyTorch's tensor computing, which parallels NumPy operations but leverages GPU acceleration, significantly enhances deep learning model performance by enabling efficient and fast computation of large-scale tensors . GPU acceleration facilitates this by providing the computational power necessary to quickly process complex neural network operations, thereby accelerating model training and inference .

Embedding analytics into websites allows organizations to provide real-time data insights directly within their web platforms, enhancing decision-making and user engagement . QlikSense facilitates this by offering embedded analytics capabilities, supporting the integration of visualizations into web environments through its architecture, which includes open source chart libraries .

Vikas Gupta compared machine learning algorithms using the Breast Cancer Wisconsin (Diagnostic) Dataset by performing exploratory data analysis and assessing prediction accuracy. Logistic Regression and KNN yielded the highest accuracy for classifying cancer as either benign or malignant, whereas Decision Tree and Naïve Bayes rendered lower accuracy .

QlikSense provides connectors for big data sources, flat files, SQL databases, and APIs, allowing it to integrate and unify disparate data streams into a centralized platform . This diversity in connectivity supports its effectiveness as a Business Intelligence tool, enabling comprehensive data modeling and visualization from varied sources .

You might also like