0% found this document useful (0 votes)
6 views35 pages

MySQL Chatbot for Planet Queries

The document outlines a computer science project for the academic year 2025-2026, focusing on a MySQL-based document query chatbot developed by Mythili M. The chatbot utilizes Python and MySQL to allow users to ask natural-language questions about planetary information, enhancing learning through interactive engagement. Key features include database connectivity, dynamic query execution, and natural language support, making it a practical educational tool for students and astronomy enthusiasts.

Uploaded by

Mythili M
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

MySQL Chatbot for Planet Queries

The document outlines a computer science project for the academic year 2025-2026, focusing on a MySQL-based document query chatbot developed by Mythili M. The chatbot utilizes Python and MySQL to allow users to ask natural-language questions about planetary information, enhancing learning through interactive engagement. Key features include database connectivity, dynamic query execution, and natural language support, making it a practical educational tool for students and astronomy enthusiasts.

Uploaded by

Mythili M
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COMPUTER SCIENCE PROJECT

YEAR: 2025-2026

MySQL-Based Document Query Chatbot

MYTHILI M

12 ‘B’

VIDYA MANDIR @ ESTANCIA


ACKNOWLEDGEMENT

I would like to thank our Principal, [Link] Ravi, and Vice Principal, Dr.
Meenakshi Janakiraman, for providing me with this wonderful opportunity to work
on a project on the topic. I would also like to convey my heartfelt gratitude to Dr.
Vandana Sivaraj for her tremendous support and assistance in completing my
project. The completion of the project would not have been possible without their
help and insight.

I would also like to express my gratitude to the lab teacher, [Link]


Meenakshi, and the lab attendant [Link] Kumar, whose guidance helped me
complete this project without difficulty.

. I would also like to thank my family and friends, who helped me collect the
needed information for the project. Their valuable suggestions and guidance have
been helpful in various phases of the completion of the project.

I would also like to thank VME for supporting my project and allowing me to learn
and grow. Without the help and support of these individuals and organizations, this
project would not have been possible.

Thank you all for your time, effort, and contributions.​


Mythili.M

1
INDEX

SNO CONTENT PG NO

1. Introduction 3

2. Concepts Used 6

3. Source Code 12

4. Output Screenshots 30

5. Conclusion 32

6. Future Enhancements 33

7. Bibliography 34

2
INTRODUCTION
The Solar System Chatbot is an educational and interactive
Python-based program that connects with a MySQL database containing
planetary information. The chatbot allows users to ask natural-language
questions about planets, their characteristics, and related facts. Unlike
static information websites, this chatbot dynamically queries an SQL
database, retrieves accurate data, and presents it conversationally. The
project’s core motivation is to combine Natural Language Processing
(NLP) concepts with database management to create a smooth learning
tool for students, educators, and astronomy enthusiasts.

The chatbot is designed to work seamlessly in Google Colab, making it


accessible to anyone without requiring a complex setup. It uses Google
Drive integration to import .sql files, MySQL to store structured
planetary data, and Python to process user queries. Through intelligent
matching, the system detects keywords such as planet names (“Mars”,
“Jupiter”) and properties (“mass”, “radius”) and generates SQL queries
accordingly.

This project bridges the gap between theoretical concepts of databases


and practical chatbot applications. By building it step by step, students
not only understand Python and SQL connectivity but also learn how
chatbots can serve as interactive knowledge engines. The project
provides a foundation for more advanced AI-driven systems.

3
Key Features
The chatbot is enriched with several key features that make it educational,
flexible, and user-friendly.

1.​ Database Connectivity:

The chatbot connects Python with MySQL through the


mysql-connector-python module, ensuring smooth data retrieval.

2.​ Dynamic Query Execution:

The program intelligently translates user queries into SQL statements,


retrieves relevant information, and returns the results in a conversational
format.

3.​ Natural Language Support:

Instead of typing complex SQL queries, users can ask questions like “What
is the mass of Jupiter?” or “Which planet has the most moons?”.

4.​ Table Normalization:

Even if the SQL file contains different table names, the program
automatically renames them to planets to ensure consistency.

4
5.​ Schema Inspection:

The chatbot checks available columns, understands schema structure, and


adjusts responses accordingly.

6.​ Rule-Based Question Understanding:

Keywords and patterns are matched to identify the right database fields. For
example, “largest planet” maps to the mass or radius column.

7.​ Error Handling:

The program detects missing files, absent tables, or invalid inputs and guides
the user accordingly.

8.​ Interactive Mode:

Users can engage in continuous Q&A sessions with the chatbot, simulating a
real conversation.

9.​ Educational Purpose:

Apart from being functional, it helps students understand MySQL, Python


functions, loops, conditionals, and database integration practically.

5
CONCEPTS USED

Concepts Used:

The chatbot project combines multiple programming and database


concepts.

1.​ Database Management

structured data about planets is stored in MySQL. The chatbot uses SQL
queries like SELECT, COUNT, ORDER BY, and WHERE to fetch
meaningful results.

2.​ Natural Language Processing (NLP).

Instead of expecting users to write exact SQL queries, the chatbot


interprets human-friendly questions. By scanning for keywords (e.g.,
“mass”, “largest”, “moons”), it selects the right database column and
constructs an appropriate SQL command.

3.​ Data Normalization.

SQL dumps may use different table names, but the project normalizes
them into a consistent table (planets) so the chatbot logic always works.

6
4.​ Pattern Matching with Regular Expressions (re module)

For example, when a user asks “Tell me about Mars”, the program
matches “Mars” against available planet names in the database.

5.​ Rule-based Mapping,

where certain phrases like “how many planets” directly trigger count
queries.

6.​ Integration Concepts —

bridging Colab, Drive, MySQL, and Python into a single functional


system, showing the power of combining cloud tools with programming
logic.

Modules

The implementation leverages Python libraries to enhance productivity


and simplify complex operations. These libraries streamline tasks like
text tokenization, vectorization, and mathematical computations,
allowing developers to focus on higher-level.

7
1.​ mysql-connector-python:

This is the most critical module, enabling communication between


Python and the MySQL server. It allows executing queries
([Link]) and fetching results ([Link]).

2.​ os:

Used for checking if the SQL file exists in the specified path. This
ensures error handling for missing files.

3.​ subprocess:

Executes system-level commands like importing .sql files into the


MySQL server. Without this, the SQL data wouldn’t be inserted into the
database.

4.​ re (Regular Expressions):

Essential for keyword detection and normalizing table names. It


identifies patterns in user questions like “largest planet” or “mass of
Jupiter”.

5.​ mysql. server/service commands:

System-level commands (!apt-get, !service mysql start) are required to


set up and run MySQL inside Colab’s Linux environment.

8
Functions

Functions are modular units of code designed to perform specific tasks


within the chatbot. By using functions, the program achieves better
organization, readability, and reusability, enabling seamless integration
of various components.

1.​ normalize_table_name():

Reads the SQL file and replaces all table names with a consistent name
(planets). This prevents runtime errors when querying the database.

2.​ find_planet_in_text():

Scans the user’s question and checks if any planet name exists. For
example, if the question is “What is the radius of Mars?”, this function
detects “Mars”.

3.​ find_column_by_keywords():

Matches keywords in the user’s query with the closest column in the
database. For example, “mass” or “weight” would map to the mass
column.

9
4.​ is_numeric_column():

Identifies whether a column contains numerical data. This is essential for


handling queries like “largest planet” where ordering is required.

5.​ answer_question():

The main logic function that takes the user’s query, processes it,
constructs an SQL statement, executes it, and returns the result in a
conversational format.

6.​ chat_demo():

Allows interactive chat sessions where users can repeatedly ask


questions until they type exit.

Looping and Conditional Statements

Looping and conditional statements form the backbone of the chatbot’s


interactive behavior. Loops ensure continuous user engagement, while
conditional statements guide decision-making processes, such as
handling user inputs and validating file availability

10
Interactive Loop:

When displaying “List all planets”, the chatbot loops through rows
returned by [Link]() and prints them neatly.

While checking planet names, loops scan through each entry in the
database to find matches.

Conditional Checks:

If the user asks “How many planets are there?”, the chatbot detects “how
many” in the question and executes a COUNT(*) SQL query.

If a planet name is found in the text, the chatbot fetches that planet’s
row.

If a keyword like “largest” or “smallest” is detected, the chatbot orders


numeric columns in ascending or descending order.

If no matching keyword or planet is found, the chatbot falls back to a


default help message.

Conditionals also ensure robust error handling. For example, if the table
doesn’t exist, or if a column is missing, the program prints meaningful
guidance instead of crashing.

Thus, loops help in repetitive tasks like iterating rows, while


conditionals guide decision-making based on user queries. Together,
they make the chatbot interactive, adaptive, and error-tolerant.

11
SOURCE CODE

#MySQL-Based Document Query Chatbot


#MYTHILI M
#12 B

# 1) Install MySQL server and python connector (run this cell once).

# Update & install mysql

!apt-get update -qq

!apt-get install -y mysql-server >/dev/null

# Start mysql service

!service mysql start

# Configure root password to a known password ("password") and create


database solar_system.

!mysql -u root -e "ALTER USER 'root'@'localhost' IDENTIFIED WITH


mysql_native_password BY 'password'; FLUSH PRIVILEGES;"

!mysql -u root -ppassword -e "CREATE DATABASE IF NOT EXISTS


solar_system;"

# Install Python MySQL connector

12
!pip install -q mysql-connector-python

print("MySQL server started and database 'solar_system' created (if it did not
exist).")

# 2) Mount Google Drive so we can access the .sql file stored in your drive.

# Change the sql_drive_path variable to match the exact path in your Drive.

from [Link] import drive

[Link]('/content/drive')

# ---- EDIT THIS PATH to the location of your .sql file in Drive ----

# Examples: "/content/drive/MyDrive/solar system [Link]"

sql_drive_path = "/content/drive/MyDrive/solar system [Link]"

import os

if not [Link](sql_drive_path):

raise FileNotFoundError(f"SQL file not found at {sql_drive_path}. Update the


`sql_drive_path` variable to the correct path.")

print("SQL file found at:", sql_drive_path)

13
# 3) Read SQL file, replace the table name occurrences so the table ends up
named `planets`,

# and import into the 'solar_system' database.

# This addresses the "table doesn't exist" errors when the SQL file uses a
different name.

import re

import subprocess

import shlex

# Start mysql service again just before import to ensure it's running

!service mysql start

tmp_import_path = "/tmp/planets_import.sql"

with open(sql_drive_path, "r", encoding="utf-8", errors="ignore") as f:

sql_text = [Link]()

def normalize_table_name(sql_text, new_name="planets"):

"""

Replace common occurrences of a table name in CREATE/INSERT/DROP


statements with `new_name`.

14
This is a best-effort approach: it searches for CREATE TABLE / INSERT INTO
/ DROP TABLE IF EXISTS

and replaces the immediately following identifier with the new_name.

"""

# DROP TABLE IF EXISTS `old` ...

sql_text=[Link](r"(?i)DROP\s+TABLE\s+IF\s+EXISTS\s+`?([a-zA-Z0-9_]+)`?",

f"DROP TABLE IF EXISTS `{new_name}`", sql_text)

# CREATE TABLE `old` ...

sql_text = [Link](r"(?i)CREATE\s+TABLE\s+`?([a-zA-Z0-9_]+)`?",

f"CREATE TABLE `{new_name}`", sql_text)

# INSERT INTO `old` ...

sql_text = [Link](r"(?i)INSERT\s+INTO\s+`?([a-zA-Z0-9_]+)`?",

f"INSERT INTO `{new_name}`", sql_text)

return sql_text

modified_sql = normalize_table_name(sql_text, new_name="planets")

with open(tmp_import_path, "w", encoding="utf-8") as f:

[Link](modified_sql)

# Import the modified SQL into the 'solar_system' database with verbose
output and force

15
cmd = f"mysql -v -f -u root -ppassword solar_system <
{[Link](tmp_import_path)}"

print("Importing SQL into MySQL with verbose output... (this may take a few
seconds)")

process = [Link](cmd, shell=True, check=True, capture_output=True,


text=True)

print("Import finished.")

print("STDOUT:", [Link])

print("STDERR:", [Link])

# 4) Connect to the local MySQL server using mysql-connector-python and


inspect the 'planets' table.

import [Link]

from [Link] import errorcode

dbconfig = {

"user": "root",

"password": "password",

"host": "[Link]",

"database": "solar_system",

"raise_on_warnings": True

16
try:

cnx = [Link](**dbconfig)

cursor = [Link](dictionary=True)

# confirm table exists

[Link]("SHOW TABLES LIKE 'planets';")

found = [Link]()

if not found:

raise RuntimeError("Table 'planets' not found in database 'solar_system'.


Check import step and SQL file.")

# get columns/schema

[Link]("SHOW COLUMNS FROM planets;")

columns = [r['Field'] for r in [Link]()]

print("Detected columns in 'planets':", columns)

# fetch a few rows to understand data

[Link]("SELECT * FROM planets LIMIT 8;")

sample_rows = [Link]()

print(f"Sample rows (up to 8): {sample_rows}")

17
except [Link] as err:

print("MySQL error:", err)

raise

# 5) Build a simple rule-based NL -> SQL mapper that uses the detected
columns and values

# and answers typical planet questions. This is intentionally simple and robust:

# it uses keyword matching, column heuristics, and looks for planet names from
the table.

import re

import math

# Use the 'columns' variable detected earlier. If not present, re-query.

try:

columns

except NameError:

[Link]("SHOW COLUMNS FROM planets;")

columns = [r['Field'] for r in [Link]()]

18
# detect a likely 'name' column (planet name)

name_col = None

for c in columns:

if [Link]() in ("name", "planet", "planet_name", "planetname"):

name_col = c

break

if not name_col:

# fallback to first column (common in many SQL dumps)

name_col = columns[0]

# fetch distinct planet names for matching

[Link](f"SELECT DISTINCT `{name_col}` FROM planets;")

planet_names = [row[name_col] for row in [Link]() if row[name_col] is


not None]

planet_names_lower = [[Link]() for p in planet_names]

def find_planet_in_text(text):

"""Return a planet name if one of the known planets appears in the text
(case-insensitive)."""

t = [Link]()

for i, p in enumerate(planet_names_lower):

19
if p in t:

return planet_names[i]

# also try proper noun match (e.g., "Mars" capitalized)

for p in planet_names:

if [Link](r"\b" + [Link](p) + r"\b", text, flags=[Link]):

return p

return None

def find_column_by_keywords(words):

"""Return a column name that best matches any keyword from words (simple
substring matching)."""

words = [[Link]() for w in words]

# prioritize exact or substring matches

for w in words:

for c in columns:

if w in [Link]():

return c

# try some common synonyms

synonyms = {

"mass": ["mass", "weight", "kg"],

20
"radius": ["radius", "size", "diameter"],

"distance": ["distance", "au", "km", "from sun", "distance_from_sun"],

"gravity": ["gravity", "g"],

"orbital_period": ["orbit", "orbital", "period", "year"],

"moons": ["moons", "satellite", "natural satellites", "number_of_moons",


"moon_count"]

for target, keys in [Link]():

for k in keys:

for c in columns:

if k in [Link]():

return c

return None

def is_numeric_column(col_name):

"""Try to infer whether a column is numeric by sampling a value."""

[Link](f"SELECT `{col_name}` FROM planets WHERE `{col_name}`


IS NOT NULL LIMIT 10;")

rows = [Link]()

for r in rows:

val = list([Link]())[0]

21
if isinstance(val, (int, float)):

return True

# numeric content may be string; try parse float

try:

float(str(val).replace(",", ""))

return True

except:

pass

return False

def answer_question(question):

q = [Link]()

if not q:

return "Please ask a question about the planets (e.g., 'Tell me about Mars' or
'List all planets')."

q_lower = [Link]()

# 1) "how many" -> count

if "how many" in q_lower or "number of" in q_lower or [Link](r"\bcount\b",


q_lower):

[Link]("SELECT COUNT(*) AS c FROM planets;")

22
cnt = [Link]()['c']

return f"There are {cnt} rows in the planets table (likely the number of
planets/entries): {cnt}."

# 2) "list" or "show all" or "all planets"

if "list" in q_lower or "show all" in q_lower or ("all" in q_lower and "planets" in


q_lower):

[Link]("SELECT * FROM planets;")

rows = [Link]()

if not rows:

return "No rows found in planets table."

# Return a brief table-like response (first 10 rows)

out = "Showing up to 10 rows:\n"

for r in rows[:10]:

out += " • " + ", ".join(f"{k}: {v}" for k, v in [Link]()) + "\n"

return out

# 3) direct planet info: "tell me about X", "details about X"

planet = find_planet_in_text(q)

if planet:

# look for a requested column: e.g., "mass of mars"

23
words = [Link](r"[A-Za-z0-9_]+", q)

col = find_column_by_keywords(words)

if col:

[Link](f"SELECT `{col}` FROM planets WHERE `{name_col}` =


%s LIMIT 1;", (planet,))

res = [Link]()

if res:

val = list([Link]())[0]

return f"{planet} — {col}: {val}"

else:

return f"No data found for {planet} in column {col}."

else:

# Return entire row for that planet

[Link](f"SELECT * FROM planets WHERE `{name_col}` = %s


LIMIT 1;", (planet,))

row = [Link]()

if row:

return " • " + "\n • ".join(f"{k}: {v}" for k, v in [Link]())

else:

return f"I couldn't find data for {planet}."

24
# 4) superlatives: "largest planet", "smallest planet", "heaviest", "most
moons"

if any(word in q_lower for word in ["largest", "biggest", "largest planet",


"biggest planet", "most massive", "heaviest", "most moons", "fewest moons",
"smallest"]):

# try to find a numerical column to order by

candidate_cols = [c for c in columns if any(k in [Link]() for k in


("mass","diameter","radius","size","volume","moons","count"))]

if candidate_cols:

# pick first numeric candidate

for c in candidate_cols:

if is_numeric_column(c):

order_col = c

break

else:

order_col = candidate_cols[0]

if "smallest" in q_lower or "fewest" in q_lower or "lightest" in q_lower:

direction = "ASC"

else:

direction = "DESC"

25
[Link](f"SELECT `{name_col}` AS name, `{order_col}` AS val
FROM planets WHERE `{order_col}` IS NOT NULL ORDER BY
CAST(`{order_col}` AS DECIMAL(30,8)) {direction} LIMIT 3;")

rows = [Link]()

if rows:

lines = [f"{r['name']}: {r['val']}" for r in rows]

return f"Top matches by `{order_col}` ({direction}):\n" + "\n".join(" • "


+ l for l in lines)

else:

return "Couldn't compute superlative: no numeric data present."

else:

return "I could not find any numeric column to decide 'largest' or 'smallest'
— check your SQL table schema."

# 5) column focused question without planet name: "what is the mass of


Earth?" handled earlier.

# Try to find a single column request: "what is the mass of which planet has
rings"

words = [Link](r"[A-Za-z0-9_]+", q)

col = find_column_by_keywords(words)

if col:

26
# If question asks "which planet has X" -> return planets with that
column matching keyword true/yes

if "which" in q_lower or "which planet" in q_lower:

# try boolean-like query (yes/true/1)

[Link](f"SELECT `{name_col}`, `{col}` FROM planets WHERE


`{col}` IS NOT NULL LIMIT 50;")

rows = [Link]()

# filter rows that contain 'yes' or 'true' or numeric > 0

matches = []

for r in rows:

val = list([Link]())[1]

if isinstance(val, (int, float)) and val > 0:

[Link](r)

elif isinstance(val, str) and [Link](r"\b(yes|true|y)\b", val,


flags=[Link]):

[Link](r)

if matches:

return "Planets matching:\n" + "\n".join(f" • {m[name_col]}


({col}={m[col]})" for m in matches)

else:

return f"No obvious matches found for column `{col}`."

27
else:

# general column return: return the column for all planets

[Link](f"SELECT `{name_col}`, `{col}` FROM planets LIMIT


100;")

rows = [Link]()

return " • " + "\n • ".join(f"{r[name_col]}: {r[col]}" for r in rows[:50])

# 6) fallback: show sample questions and the columns we detected (helpful


for the user)

hint = ("I couldn't map your question to a clear SQL query. "

"Try examples: 'List all planets', 'Tell me about Mars', 'How many planets?',
"

"'What is the mass of Jupiter?', 'Which planet has the most moons?'.\n"

f"Detected columns: {columns}\nDetected planet names: {planet_names}")

return hint

# a helper interactive function to chat a few times in notebook

def chat_demo():

print("Chat mode: ask questions about [Link] examples: 'List all planets',
'description of Mars', "

28
" mass of Jupiter?', 'moons in neptune?', 'diameter of venus'.\n"," Type 'exit'
or 'quit' to stop.")

while True:

q = input("You: ")

if not q or [Link]() in ("exit", "quit"):

print("[Link] back to explore the cosmos again soon. ✨")


break

print("Bot:", answer_question(q))

chat_demo()

29
OUTPUT SCREENSHOTS

30
Full output:-
Chat mode: ask questions about [Link] examples: 'List all planets',
'description of Mars', mass of Jupiter?', 'moons in neptune?', 'diameter of
venus'.

Type 'exit' or 'quit' to stop.

You: temperature of mercury

Bot: Mercury — mean_temperature_celsius: 167

You: diameter of venus

Bot: Venus — diameter_km: 12103.6

You: mass of earth

Bot: Earth — mass_kg: 5.9724e+24

You: description of mars

Bot: Mars — description: Often called the Red Planet, Mars has a thin
atmosphere, polar ice caps, canyons, and the largest volcano in the solar
system, Olympus Mons.

You: moons in jupiter

Bot: Jupiter — number_of_moons: 95

You: rings in saturn

Bot: Saturn — has_rings: 1

You: orbital days of uranus

Bot: Uranus — orbital_period_days: 30589.0

You: atmosphere in neptune

Bot: Neptune — atmosphere_composition: Hydrogen (~80%), Helium (~19%), Methane


(~1.5%)

You: exit

[Link] back to explore the cosmos again soon. ✨

31
CONCLUSION
The program allows users to ask questions in natural language and receive accurate
responses retrieved from a structured database. It eliminates the need for users to
learn complex SQL commands, bridging the gap between database management
and user-friendly interfaces.

Throughout the development, the project showcased various computer science


concepts: database handling, pattern recognition, conditional logic, and modular
programming. By making the chatbot rule-based, we ensured it delivers reliable
answers for common astronomy-related queries, while also keeping the design
simple enough for students to understand.

This project not only enhances knowledge about Python programming but also
provides practical exposure to real-world integration of SQL with applications. It
can be used in educational environments to engage students in learning about
planets, while also strengthening their programming fundamentals.

In conclusion, the chatbot proves that with the right combination of tools, even
complex data-driven interactions can be simplified into conversational systems,
paving the way for more advanced AI applications in the future.

32
FUTURE ENHANCEMENTS
Multi-Document Support:

●​ Enable the chatbot to analyze multiple documents simultaneously,


increasing its versatility.

User-Friendly Interface:

●​ Develop a GUI or integrate with platforms like Slack or Discord for


easier access.

Mobile and Web Deployment:

●​ Adapt the chatbot into a web or mobile application, broadening its


accessibility.

Advanced NLP Integration:

●​ Currently, the chatbot uses keyword-based detection. Integrating


libraries like NLTK or spaCy, or even large language models, would
allow it to understand more complex questions.

Machine Learning Ranking:

●​ Future versions could learn from user interactions. For instance, if


many users ask about "planet with rings", the system could
automatically improve its ranking of related queries.

33
BIBLIOGRAPHY

1.​Computer Science Textbook for Class XII- NCERT


2.​Computer Science with Python Textbook for Class XII- Sumita Arora
3.​ [Link]
C0Hy5ZypIvh?usp=sharing
4.​ [Link]
5.​ [Link]
ntelligence
6.​ [Link]
ot
7.​ [Link]
8.​ [Link]

34

You might also like