0% found this document useful (0 votes)

9 views5 pages

Big Data Analytics with Python & Hive

BDA answer

Uploaded by

yuvraj120555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

Big Data Analytics with Python & Hive

BDA answer

Uploaded by

yuvraj120555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Big Data Analytics Practical Answers

(BDA-22684)
1. Pandas Program – Excel Import & Column Access
```python
import pandas as pd
df = pd.read_excel('[Link]')
print([Link])
print(df[['Name', 'Age']])
```

2. Pandas Program – Summary Stats & Skipping Rows/Cols

```python
print(df['Age'].sum())
print(df['Age'].mean())
print(df['Age'].max())
print(df['Age'].min())
df_skip = pd.read_excel('[Link]', skiprows=2, usecols="B:D")
```

3. Select & Delete Rows/Columns from DataFrame

```python
print([Link][0:5, ['Name', 'Age']])
[Link]('Age', axis=1, inplace=True)
[Link](0, axis=0, inplace=True)
```

4. Import Modules and Download File

```python
import pandas as pd, [Link]
[Link]('[Link] '[Link]')
df = pd.read_excel('[Link]')
```

5. Extract ZIP, Load Data, Log Each Phase

```python
import zipfile, os, pandas as pd
with [Link]('[Link]', 'r') as z:
[Link]('data_folder')
df = pd.read_csv('data_folder/[Link]')
def log(msg): print('[LOG]', msg)
log("Zip extracted")
log("Data loaded")
```

6. Create Hive Tables

```sql
CREATE EXTERNAL TABLE emp_ext(id INT, name STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' LOCATION '/external';
LOAD DATA LOCAL INPATH '/home/[Link]' INTO TABLE emp_ext;

CREATE TABLE emp_int(id INT, name STRING);

LOAD DATA LOCAL INPATH '/home/[Link]' INTO TABLE emp_int;
LOAD DATA INPATH '/hdfs/[Link]' INTO TABLE emp_int;
```

7. Hive Table Storage Formats

```sql
CREATE TABLE text_format(id INT) STORED AS TEXTFILE;
CREATE TABLE seq_format(id INT) STORED AS SEQUENCEFILE;
CREATE TABLE rc_format(id INT) STORED AS RCFILE;
```

8. Spark Count WARN Logs

```python
from pyspark import SparkContext
sc = SparkContext("local", "WarnCount")
logs = [Link]("[Link]")
warns = [Link](lambda line: "WARN" in line)
print([Link]())
```

9. Create [Link]
```python
with open("[Link]", "w") as f:
[Link]("[Link],[Link],[Link]
```

10. Spark SQL Flipkart Access

```python
from [Link] import SparkSession
spark = [Link]("Flipkart").getOrCreate()
df = [Link]("[Link]")
[Link]("logdata")
[Link]("SELECT location, COUNT(*) FROM logdata WHERE url LIKE '%flipkart%' GROUP
BY location").show()
```

11. Spark SQL Distinct IPs

```python
[Link]("SELECT DISTINCT ip FROM logdata").show()
[Link]("SELECT location, COUNT(DISTINCT ip) FROM logdata GROUP BY
location").show()
```

12. Data Science Responsibilities

- Data collection, cleaning, visualization, modeling, and reporting.
- Use ML and statistical tools to derive insights.

13. Big Data Terminologies

- HDFS, YARN, MapReduce, Hive, Pig, Spark, Zookeeper, NoSQL

14. Big Data Stack

- Data Sources → Storage (HDFS, NoSQL) → Processing (Spark, MapReduce) → Access (Hive,
Pig) → Visualization (Tableau)

15. Analytics Patterns

- Descriptive, Diagnostic, Predictive, Prescriptive, Streaming, Batch Analytics

16. Big Data Challenges

- Volume, Variety, Velocity, Veracity, Security, Scalability, Integration

17. Big Data Analytics Classification

- Descriptive, Predictive, Prescriptive, Diagnostic

18. Advantages of Hadoop

- Scalable, Cost-effective, Fault-tolerant, Open-source, Handles all data types

19. RDBMS vs Hadoop

- RDBMS for structured data, Hadoop for all data types. Hadoop is distributed and scalable.

20. Hadoop Architecture

- HDFS for storage, YARN for resource management, MapReduce for processing.

21. HDFS Description

- NameNode (metadata), DataNode (blocks), replication, fault tolerance.

22. Advantages of Hadoop

- Cost-effective, scalable, flexible, fault-tolerant, supports large datasets.
23. RDBMS vs Hadoop Table
| RDBMS | Hadoop |
|-------|--------|
| Centralized | Distributed |
| Structured only | All data types |

24. Hadoop Architecture

- HDFS stores, YARN schedules, MapReduce processes.

25. HDFS
- Splits files into blocks, stores in DataNodes, NameNode manages metadata.

26. Hadoop in Detail

- Framework for distributed storage and processing of big data using HDFS + MapReduce.

27. HDFS Diagram

Client → NameNode → DataNodes. Blocks stored with replication.

28. Use of Hive

- SQL-like interface for querying large datasets in Hadoop.

29. Hive Architecture

- UI, Driver, Compiler, Metastore, Execution Engine, HDFS.

30. SERDE with Diagram

- Serializer/Deserializer for reading/writing custom formats in Hive.

31. Hadoop Features

- Distributed, scalable, open-source, fault-tolerant, flexible.

32. Use of Apache Spark

- Real-time processing, machine learning, SQL, streaming, fast computation.

33. Apache Spark Architecture

- Driver → Cluster Manager → Executors. Processes tasks in memory.

34. Hive Architecture

- Same as Q29: UI, Driver, Compiler, Metastore, Execution Engine.

35. Spark vs MapReduce

- Spark is faster (in-memory), MapReduce is disk-based. Spark supports more APIs.

36. RDBMS vs Hadoop

- Hadoop is more scalable and handles all data types, unlike RDBMS.
37. Hadoop Explanation
- Full Hadoop architecture: HDFS + YARN + MapReduce. Open-source big data framework.

Hive and Spark: Big Data Essentials
No ratings yet
Hive and Spark: Big Data Essentials
10 pages
Apache Spark & Hive Exam Prep Guide
No ratings yet
Apache Spark & Hive Exam Prep Guide
21 pages
Big Data Spark Lab Manual R22
No ratings yet
Big Data Spark Lab Manual R22
67 pages
Big Data Overview and Technologies
No ratings yet
Big Data Overview and Technologies
8 pages
Big Data Processing with Spark
No ratings yet
Big Data Processing with Spark
39 pages
Hadoop vs. Spark: Key Differences Explained
No ratings yet
Hadoop vs. Spark: Key Differences Explained
12 pages
Spark SQL and Big Data Processing Guide
No ratings yet
Spark SQL and Big Data Processing Guide
19 pages
Big Data Architecture & Apache Spark Guide
No ratings yet
Big Data Architecture & Apache Spark Guide
45 pages
Spark SQL Overview and DataFrames Guide
No ratings yet
Spark SQL Overview and DataFrames Guide
30 pages
Big Data Processing with Spark
No ratings yet
Big Data Processing with Spark
4 pages
First Steps in Big Data
No ratings yet
First Steps in Big Data
6 pages
Overview of Hadoop and Spark Ecosystem
No ratings yet
Overview of Hadoop and Spark Ecosystem
14 pages
HIVE and Spark Overview for 2025
No ratings yet
HIVE and Spark Overview for 2025
24 pages
PySpark DataFrame Operations Guide
No ratings yet
PySpark DataFrame Operations Guide
10 pages
PySpark DataFrame Operations Guide
No ratings yet
PySpark DataFrame Operations Guide
10 pages
Hadoop Evolution to Apache Spark Overview
No ratings yet
Hadoop Evolution to Apache Spark Overview
7 pages
Hadoop and Spark Developer Training
No ratings yet
Hadoop and Spark Developer Training
3 pages
Integrating Big Data with DBMS
No ratings yet
Integrating Big Data with DBMS
31 pages
Big Data Module Overview and Topics
No ratings yet
Big Data Module Overview and Topics
9 pages
Big Data Storage and Processing Solutions
No ratings yet
Big Data Storage and Processing Solutions
39 pages
Big Data Processing with Hadoop & Spark
No ratings yet
Big Data Processing with Hadoop & Spark
67 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
66 pages
Spark CSV Read with Row Skipping
No ratings yet
Spark CSV Read with Row Skipping
3 pages
Key Use Cases of Apache Spark
No ratings yet
Key Use Cases of Apache Spark
26 pages
Apache Spark and NoSQL Data Patterns
No ratings yet
Apache Spark and NoSQL Data Patterns
10 pages
Unit 4 Advanced Big Data Analytics
No ratings yet
Unit 4 Advanced Big Data Analytics
17 pages
Big Data Engineering Course Overview
No ratings yet
Big Data Engineering Course Overview
27 pages
Big Data Course Overview and Topics
No ratings yet
Big Data Course Overview and Topics
3 pages
Spark Big Data Analytics Overview
No ratings yet
Spark Big Data Analytics Overview
72 pages
Big Data with Spark Course Overview
No ratings yet
Big Data with Spark Course Overview
3 pages
Hive Optimization Techniques Guide
No ratings yet
Hive Optimization Techniques Guide
4 pages
PySpark Cheat Sheet and Overview
100% (1)
PySpark Cheat Sheet and Overview
12 pages
Spark and Big Data Analytics Overview
No ratings yet
Spark and Big Data Analytics Overview
38 pages
Databricks & CCA 175 Certification Syllabus
No ratings yet
Databricks & CCA 175 Certification Syllabus
21 pages
Mod 2 Bda
No ratings yet
Mod 2 Bda
19 pages
Apache Spark Features and Use Cases Explained
No ratings yet
Apache Spark Features and Use Cases Explained
14 pages
Big Data Analytics with Apache Spark
No ratings yet
Big Data Analytics with Apache Spark
58 pages
Overview of Big Data Technologies
No ratings yet
Overview of Big Data Technologies
36 pages
Big Data Technologies Overview
No ratings yet
Big Data Technologies Overview
24 pages
Overview of Spark, Hadoop, and Hive
No ratings yet
Overview of Spark, Hadoop, and Hive
24 pages
Overview of Spark SQL Features and Usage
No ratings yet
Overview of Spark SQL Features and Usage
74 pages
Big Data and Hadoop Essentials Guide
No ratings yet
Big Data and Hadoop Essentials Guide
19 pages
RDD Apache Spark
No ratings yet
RDD Apache Spark
37 pages
Hadoop and Spark Course Outline
No ratings yet
Hadoop and Spark Course Outline
4 pages
Hadoop and Spark Course Outline
100% (1)
Hadoop and Spark Course Outline
4 pages
Apache Spark - UTDI 3 Januari 2026
No ratings yet
Apache Spark - UTDI 3 Januari 2026
59 pages
Big Data Analytics with PySpark in Colab
No ratings yet
Big Data Analytics with PySpark in Colab
11 pages
Apache Spark: RDDs and Data Processing
No ratings yet
Apache Spark: RDDs and Data Processing
6 pages
PySpark Interview Questions & Answers
No ratings yet
PySpark Interview Questions & Answers
8 pages
PySpark and AWS Big Data Training
No ratings yet
PySpark and AWS Big Data Training
8 pages
Big Data Course Overview and Curriculum
No ratings yet
Big Data Course Overview and Curriculum
3 pages
Spark SQL and Streaming Overview
No ratings yet
Spark SQL and Streaming Overview
24 pages
Apache Spark & Azure Databricks Guide
No ratings yet
Apache Spark & Azure Databricks Guide
46 pages
Big Data Analytics and Hadoop Framework
No ratings yet
Big Data Analytics and Hadoop Framework
6 pages
B.Tech Big Data with Spark Syllabus
No ratings yet
B.Tech Big Data with Spark Syllabus
2 pages
HBase, Sqoop, Spark Overview and Comparison
No ratings yet
HBase, Sqoop, Spark Overview and Comparison
6 pages
Azure Databricks & PySpark Course Guide
No ratings yet
Azure Databricks & PySpark Course Guide
9 pages
Assistant Programme Officer Role in India
No ratings yet
Assistant Programme Officer Role in India
4 pages
Government Income Redistribution Effects
No ratings yet
Government Income Redistribution Effects
7 pages
Inventory of Training Resources Guide
No ratings yet
Inventory of Training Resources Guide
2 pages
Suzuki VL1500 Service Manual Overview
No ratings yet
Suzuki VL1500 Service Manual Overview
478 pages
M 419 Mach 2c Rev. D December 2012
0% (1)
M 419 Mach 2c Rev. D December 2012
48 pages
CC-210 Computer Organization and Assembly Language
No ratings yet
CC-210 Computer Organization and Assembly Language
1 page
Vivaldi's Concerto for Oboe and Strings
No ratings yet
Vivaldi's Concerto for Oboe and Strings
4 pages
Arabic Passive Voice Lesson Plan
No ratings yet
Arabic Passive Voice Lesson Plan
7 pages
Sulyap Clothing Marketing Research Plan
No ratings yet
Sulyap Clothing Marketing Research Plan
19 pages
Probability Practice Paper Level-II
No ratings yet
Probability Practice Paper Level-II
2 pages
Chicuron: Unique Chicken Turon Concept
No ratings yet
Chicuron: Unique Chicken Turon Concept
13 pages
Stakeholders in Corporate Governance
No ratings yet
Stakeholders in Corporate Governance
29 pages
Grade 12 Life Orientation Test Guide
No ratings yet
Grade 12 Life Orientation Test Guide
10 pages
GNM Nursing Anatomy & Physiology Exam Guide
No ratings yet
GNM Nursing Anatomy & Physiology Exam Guide
3 pages
Alfredo Bryce Echenique: Life & Works
No ratings yet
Alfredo Bryce Echenique: Life & Works
6 pages
Listening Comprehension Test Guide
No ratings yet
Listening Comprehension Test Guide
9 pages
4case Study For Patients Appointment System
No ratings yet
4case Study For Patients Appointment System
8 pages
Fundamental Analysis of Adidas Report
No ratings yet
Fundamental Analysis of Adidas Report
40 pages
Management of Necrotizing Fasciitis
No ratings yet
Management of Necrotizing Fasciitis
6 pages
Ternary Logic Design with CNFETs
No ratings yet
Ternary Logic Design with CNFETs
6 pages
Acknowledgements in Education Contexts
No ratings yet
Acknowledgements in Education Contexts
31 pages
Thin Lenses and Frequency Calculations
No ratings yet
Thin Lenses and Frequency Calculations
73 pages
How To Build A Denver Dojo Floating Floor:: Wisconsin Foam Packaging Tim Lang
No ratings yet
How To Build A Denver Dojo Floating Floor:: Wisconsin Foam Packaging Tim Lang
6 pages
Transmission and Distribution Maintenance
No ratings yet
Transmission and Distribution Maintenance
22 pages
مهام أمين سر مجلس الإدارة
No ratings yet
مهام أمين سر مجلس الإدارة
5 pages
Kartik Kesharwani CV - MBA Finance
No ratings yet
Kartik Kesharwani CV - MBA Finance
2 pages
Binary Search Algorithm Overview
No ratings yet
Binary Search Algorithm Overview
15 pages
Image Processing with Xilinx FPGA
No ratings yet
Image Processing with Xilinx FPGA
5 pages
Broiler Medication and Vaccination Guide
No ratings yet
Broiler Medication and Vaccination Guide
5 pages
Audit Independence Concerns for CPAs
100% (1)
Audit Independence Concerns for CPAs
8 pages

Big Data Analytics with Python & Hive

Uploaded by

Big Data Analytics with Python & Hive

Uploaded by

Big Data Analytics Practical Answers

2. Pandas Program – Summary Stats & Skipping Rows/Cols

3. Select & Delete Rows/Columns from DataFrame

4. Import Modules and Download File

5. Extract ZIP, Load Data, Log Each Phase

6. Create Hive Tables

CREATE TABLE emp_int(id INT, name STRING);

7. Hive Table Storage Formats

8. Spark Count WARN Logs

10. Spark SQL Flipkart Access

11. Spark SQL Distinct IPs

12. Data Science Responsibilities

13. Big Data Terminologies

14. Big Data Stack

15. Analytics Patterns

16. Big Data Challenges

17. Big Data Analytics Classification

18. Advantages of Hadoop

19. RDBMS vs Hadoop

20. Hadoop Architecture

21. HDFS Description

22. Advantages of Hadoop

24. Hadoop Architecture

26. Hadoop in Detail

27. HDFS Diagram

28. Use of Hive

29. Hive Architecture

30. SERDE with Diagram

31. Hadoop Features

32. Use of Apache Spark

33. Apache Spark Architecture

34. Hive Architecture

35. Spark vs MapReduce

36. RDBMS vs Hadoop

You might also like