0% found this document useful (0 votes)
8 views7 pages

Research Paper

The document discusses the development of an AI-enabled legal chatbot designed to provide accessible legal information based on Indian law, particularly for rural and semi-urban populations who face challenges in understanding legal rights and processes. Utilizing Natural Language Processing and Large Language Models, the chatbot aims to deliver accurate responses in multiple languages, enhancing legal literacy and awareness among users. The system has demonstrated an accuracy of 82-87% in answering legal queries, with a response time of 2-4 seconds, making it a valuable tool for individuals seeking legal advice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Research Paper

The document discusses the development of an AI-enabled legal chatbot designed to provide accessible legal information based on Indian law, particularly for rural and semi-urban populations who face challenges in understanding legal rights and processes. Utilizing Natural Language Processing and Large Language Models, the chatbot aims to deliver accurate responses in multiple languages, enhancing legal literacy and awareness among users. The system has demonstrated an accuracy of 82-87% in answering legal queries, with a response time of 2-4 seconds, making it a valuable tool for individuals seeking legal advice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AI-Chatbot for Legal Advice Based on Indian Law

Hrutik Wakale Ashlesha Pathade Pratiksha Mesat


Department of Computer Science Department of Computer Science Department of Computer Science
Savitribai Phule Pune University Savitribai Phule Pune University Savitribai Phule Pune University
hrutikwakale27@[Link] ashleshapathade3917@[Link] mesatpratiksha13@[Link]

[Link]
Department of Computer Science
Savitribai Phule Pune University
[Link]@[Link]
Academic Year: 2025–2026

ABSTRACT - In rural and semi-urban areas of India, India's judiciary system has enormous case backlogs.
people have difficulty accessing reliable information Many citizens are unaware of their fundamental legal
regarding the law. Reasons for this include high costs entitlements. They don’t know the processes such as
associated with legal assistance, the complexity involved filing First information Reports (FIRs), making
in the legal jargon, and the limited support in multiple consumer complaints, addressing labor disputes, and
languages. Many individuals may not even be aware of property disputes.
their legal rights, and the framework available to them
in order to exercise those rights. This paper addresses Legal aid has been and continues to be the case with
the design and development of an AI-enabled legal traditional legal assistance. services are usually
chatbot that provides legal information regarding expensive, time-consuming, and inaccessible to the
India’s legislation. The system employs the use of lower socio-economic strata.
Natural Language Processing (NLP), information
retrieval, along with the use of Large Language Models Legal and information services are provided by
(LLMs) to formulate contextually relevant and precise platforms such as Indian Kanoon, VakilSearch, and
responses. Additionally, the chatbot also offers multiple LawRato. However, they provide legal assistance
language support to increase its usability for low services that are lawyer-centric, and that are provided in
literacy individuals. The system was proven, through limited fields of law, or are only in English. Therefore,
various tests, to provide relevant and accurate these services are under-utilized or completely unused
information pertaining to specific legal questions posed by the rural and semi-urban populace, the non-English-
by users, with an accuracy of 82-87%. On average, speaking citizens, and the lesser literate population, and
system responses are attained in 2-4 seconds in order to that creates a strong case for a legal advisory system
increase the frustration level of users. Ultimately, the with a citizen-centric design, multilingual support, and a
system’s purpose is to provide legal information and voice-controlled interface.
support the existing legal aid infrastructure in India,
thereby, increasing users’ level of awareness regarding This paper presents an AI-based legal chatbot that serves
their legal rights, and increasing users level of digital as the first point of contact for legal information for
literacy. users in India.

Key Terms - Legal Chatbot, Laws of India, Natural The system is built to articulate legal rules in simplified
Language Processing, Big Language Models, terms, assist users in finding pertinent laws, and outline
Multilingual Frameworks, Legal Technology, Retrieval- suggested actions; all while stating that the system does
Augmented Generation. not provide legal services.

I. INTRODUCTION II. LITERATURE SURVEY

The refinement of Legal Technology (LegalTech) along The interjection of AI in law assistance has brought
with the integration of Artificial Intellegence (AI) and expansive views to the operation of various branches of
Natural Language Processing (NLP) has made the the law, including, but not limited to, legal natural
processing of accessing justice and legal information language processing, question answering, argumentation
easier. mining, and retrieval-augmented generation.
Chalkidis et al. demonstrated that the creation of develop legal assistance systems, there remain
LEGAL BERT and the subsequent training of the model significant gaps, especially with regard to Indian laws.
on data from the legal field not only surpassed the
performance of the LEGAL BERT model on legal tasks, Most AI-based legal chatbots and question-answering
but also outperformed the other BERT models. Despite systems are designed for Western legal systems and are
the effectiveness of these models, they require extensive of no use for the Indian judicial system. This is primarily
legal datasets in addition to immense computing due to the fact that Indian laws are intricate,
resources. multifaceted and rapidly evolving.

Lewis et al. introduced a model called Retrieval- The legal information systems that are available in India
Augmented Generation (RAG), which integrates neural are mostly primitive systems that are based upon simple
retrieval and generative models to decrease hallucination keyword searching or static data retrieval. Additionally,
and provide justification to the model's answers from they lack the capability of conversational understanding
real documents. This technique is pivotal for legal and contextual reasoning.
chatbots that require the justification of their answers
from legal documents. Although some of the latest advancements of Large
Language Models have been able to provide legal
Savelka and Ashley proposed models to retrieve and answers with much greater naturalness and coherence,
structure legal knowledge from statutes to facilitate the many of these systems use generative models, in which
connection of a user's question to a specific legal the response is not connected to a legitimate legal
document. reference.

Lippi and Torroni conducted a study on the techniques Over-reliance on generic LLMs can result in fabricated
of argumentation mining that enhance the explainability and erroneous data, which is why these systems are not
of legal systems by discovering the claims and the viable for legal counselling.
evidence that supports them. Cumbersome legal
documents are appropriately processed by Moreover, there have been few attempts to incorporate
argumentation mining. retrieval-augmented generation strategies that have been
expressly tailored for Indian statutory texts. The absence
In the Indian legal research and practice, there is a of multilingual and voice-assisted legal assistance
complexity of a multilingual document production systems is also a critical deficiency.
system in the generation of legal documents and a lack
of legal uniformity in the documents and the legal Considering the fact that a majority of the Indian
system. Systems like DoNotPay demonstrate some form population is not proficient in English, the available
of assistance from AI in the area of legal guidance. systems further marginalize the Indian population that
However, they function primarily in Western countries, communicates in regional languages.
and use a lot of template based systems. Current systems also do not respond to the user
problems that arise from low literacy.
The most recent research in India regarding the
answering of legal questions and the use of large Structurally data problems arise from the lack of a well
language models demonstrates a higher level of fluency, structured and annotated data set from a provider that
but also cautions against the phenomenon of addresses Indian legal question answering.
hallucination in models that are unassociated with
credible legal sources. This creates a three fold problem: the inability to
determine user intent, the incapacity to provide a user
III. Research Gap with the answer to the legal question and the inability to
assess and evaluate legal chatbots under the
While much has been done on integrating Artificial circumstances that exist.
Intelligence and Natural Language Processing to
Finally, the majority of the researchers are not The legal knowledge base is constructed from authentic
sufficiently performing real user testing and disregard legal resources like the India Code Portal, eCourts, and
the ethical concerns of misinformation, user privacy, and selected legal FAQs.
the distinction between legal information and legal Legal content is organized into primary divisions and
advice. These gaps demonstrate the need of an ethically sub-divisions like Criminal Law(IPC, CrPC), Civil
designed, multi lingual, retrival-augmented, and legally Law(CPC), Constitutional Law, and Consumer
accurate AI chatbot that caters to the Indian legal system, Protection Law.
which is the objective of the proposed system.
B. Data Preprocessing
This is the objective of the proposed system.
The collected legal texts are subjected to NLP standard
IV. OBJECTIVES AND SYSTEM OVERVIEW procedure inclusive of tokenization, stop word removal,
lemmatization, and named entity recognition.
We intend to build an AI-driven legal chatbot that will This standardization facilitates legal documents
provide users with preliminary legal advice based on consistency and aids the extraction of legal relics like
relevant Indian statutes and judicial pronouncements. Acts, Sections, and Entitlements.

Furthermore, the chatbot will have a user-friendly C. Comprehension of Queries


interface, be multilingual, and support input and output
via text and audio. Intent classification and comprehension of user queries
are performed via BERT and Legal-BERT models.
The primary objectives of the system are the following: This aids the comprehension of legal areas of concern.

1. To be able to build a legal chatbot that will be able to D. Retrieval of Information


answer questions in the form of natural language
concerning the laws and legal procedures in India. Using Sentence-BERT embeddings, the system links
user queries to legal documents in the knowledge base
2. To be able to answer questions in a Legal domain by semantics, not merely keywords, ensuring good legal
using retrieval augmented generation models. source information.

3. To be able to answer questions in a Legal domain E. Response Generation


using retrieval augmented generation models.
Legal data is simplified using the Large Language
4. To provide works in multiple languages, audio input Model (Google Gemini), which streamlines the
and output to make it more usable. explanations.
The strategy integrates retrieval with generation to
5. To make certain that the user’s data is private, secure, ensure conversational responses are accurate and
and used ethically. comprehensible.

V. METHODOLOGY F. Multilingual and Voice Support

The proposed architecture is a layered one consisting of The system integrates translation services and speech-to-
a User Interface layer, Application Logic layer, AI and text and text-to-speech functionalities to offer voice
NLP Processing layer, and a Legal Knowledge Base support to users in several Indian languages.
layer. Users with low literacy can use voice-based queries to
interact with the system more easily.
A. Collection of Data
VI. DATASET DESCRIPTION contextualized real-world situations by including both
queries that were straightforward and those that were
The performance and reliability of the proposed AI- layered with complexity in terms of the procedures
based legal chatbot are based on the quality and involved. The evaluation dataset was designed for the
structure of the datasets used for training, retrieval, and evaluation of accuracy by providing a legal provision,
evaluation. that is, the relevant Act and section, which was expected
to be cited in response to each query.
To improve the actual user experience and research the
various components of the Indian legal system, the D. Dataset in Several Languages
multiple datasets were integrated.
The queries were translated into Hindi and Marathi to
A. Legal Knowledge Base assess the performance of the chatbot in other languages,
as Hindi and Marathi are among the most spoken
The primary legal knowledge base was created from languages in India.
credible and legally permissible sources relating to
Indian law and various eCourt summaries and other Translations were performed and reviewed manually to
legal documents that were rigorously processed. maintain the integrity of the legal meaning and legal
speak, as well as the use of commonplace tools. This
The collected legal texts contain provisions from the dataset showcased the chatbot’s ability to deal with legal
primary branches of law such as the Indian Penal Code questions irrespective of the language, and the problems
(IPC), Code of Criminal Procedure, Code of Civil due to language regarding legal questions when
Procedure, Consumer Protection Act, and other relevant evaluating legal questions in other languages.
Constitutional provisions of India.
E. Data Preprocessing and Validation
To the best of our ability, the documents were checked
for accuracy and timeliness. The entire dataset underwent preprocessing, which
included, among other things, the normalization of the
Legal texts are more efficiently retrieved and more text, the tokenization and lemmatization, and the
positively hit by breakdowns in organization of reduction of the text to its essentials, as well as the
classifications into subsection and section smaller units removal of any superfluous text and textual repetition.
of each. Through named entity recognition, legal jargon such as
Acts and sections, as well as rights, were cataloged.
B. Curated Question–Answer Dataset Several domains of law were addressed by data
balancing (to counter bias and redundancy) and the
A sample of approximately 500 question and answer removal of duplication.
pairs was created for user intent, system evaluation, and To determine the trustworthiness and relevance of the
response evaluation purposes. These pairs correspond to data for legal counseling, the datasets were compared to
the top most frequently asked legal questions by citizens credible legal sources.
of India regarding the procedures, the eligibility criteria,
and the mechanisms of lodging a complaint. VII. RESULTS AND DISCUSSION
The responses were reviewed for jurisprudential
correctness and consistency, and therefore, were souced The aim of this study is to determine the level of
to the corresponding legal provisions comprehension of the user query by the AI based legal
chatbot and its ability to successfully identify the
C. Evaluation of the Query Dataset pertinent legal principle and offer a clear and useful
response.
For evaluation purposes, a separate dataset of 200 user This study focuses on evaluating the accuracy of the
queries was created. The queries were varied with response, the time taken by the chatbot to respond, its
respect to linguistic structure, degree of abstraction, and performance in different languages, the level of
the specific field of law address. The dataset satisfaction, and the experience of the users.
A. Evaluation Setup The average response time ranged from 2-4 seconds, and,
of course, this time is attributed to the complexity of the
To evaluate the system, a collection of 200 user posed question and the number of legal materials the
questions was prepared. chatbot needed to process. The response time was
shorter in the case of simple questions that required a
The questions spanned multiple domains of the law lower number of references, while the response time was
which include criminal law, consumer protection law, longer in the case of complex questions that required
constitutional law, and civil law. Each question was more legal components.
accompanied by the official legal sources to identify the
appropriate legal rule. Even though the chatbot requires a few extra seconds to
look up and reference the necessary legal codes, the
The accuracy of the chatbot was evaluated by average latency is still short and conversations can occur
determining whether the answer provided by the chatbot in real time.
was the most relevant legal rule to the question. A
response was evaluated to be correct if the legal rule was D. Multilingual Evaluation
aligned with the correct answer and the legal In order to evaluate multilingual capability, the chatbot
explanation was directly relevant to the question. was tested in three languages, English, Hindi, and
Marathi. The study found that English questions were
B. Analysis of Accuracy the most accurate. However, in comparison, Hindi and
Marathi questions resulted in only a small decrease in
The overall system performance was good, with an accuracy. Some users expressed that a multilingual
accuracy of 82-87% across all branches of law examined. chatbot was easier to use and more beneficial to users
that lacked English proficiency. This shows a strong
Accuracy was highest for straightforward and frequently demand for multilingual support chatbots in countries
referenced laws, such as those of consumer protection like India, where the population speaks a multitude of
and elementary criminal law, but accuracy was lower for languages.
questions that were more complex or indeterminate and
involved several branches of law. E. Performance Evaluation Table
Metric Value
The improvement of the system’s accuracy in Query–Law Mapping Accuracy 82-87%
identifying the relevant legal provisions, as compared to Average Response Time 2-4 Sec
the only use of keywords, has been attributed to the use Supported Language Hin,Mar,Eng
of Sentence-BERT to capture the semantic equivalence. Queries Tested 2000+
User Satisfaction Score 4.3 / 5
The use of legal provisions in formulating responses
also mitigated the problem of fabrication, which is F. Dataset Composition Table
common in text-based models. Legal Domain No. of Document
Indian Penal Code 200+
C. Performance Regarding Response Time Constitutional Law 210+
Family Law 170+
The measure of time intervals was done in such a way
Criminal Laws 250+
that the time taken for the chatbot to respond was
Civil Laws 210+
evaluated as the time taken from the moment a user
posed a question to the moment a response was provided.
VIII. Architecture Figure

IX. Discussion facilitate access to legal information and answer


questions pertaining to legislation in India.
From the integrated research, the system achieved a
successful triad of accuracy, speed, and user experience. The proposed system combines multiple approaches to
The chatbot's most significant contribution to the user Natural Language Processing, retrieval-augmented
experience is its language functionality, offered in a generation, and large language models. This allows the
conversational manner, as well as the ability to search proposed system to understand a user query, obtain
and retrieve answers to questions from within the system. relevant legal statutes from verified sources, and
Unlike legal websites, the chatbot has offered users, respond with simplified context-based answers. Also,
especially first time users, an experience that is the proposed system incorporates multilingual and
beneficial to learning. voice-based options to increase accessibility, especially
On the other hand, the challenges include legal to users in rural and semi-urban locations where the
arguments, multi-issue case handling, and inter- population is less legally literate.
jurisdictional legal variations.
The system has been experimentally evaluated and it
There are opportunities for additional research because was found that the system has an accuracy rating of 82-
of these challenges, including model refinement for 87% in mapping user queries to the respective legal
particular branches of law and enhancing the models’ statutes and an average response time of 2-4 seconds.
reasoning for more complex scenarios.
According to the users, the system is quite satisfactory
X. Conclusion and Future Work and particularly in the explanation of the answer and the
system's response to questions. This further proves the
This paper illustrates the design and implementation of effectiveness of this approach as a first line of legal
an AI-based legal chatbot. The aim of the chatbot is to guidance tool and reveals the system’s immense
potential in the furtherance of legal aid and digital [7] S. K. Nigam, B. D. Patnaik, S. Mishra, N. Shallum,
empowerment in India. K. Ghosh, and A. Bhattacharya, “Legal question-
answering in the Indian context: Efficacy, challenges,
and potential of modern AI models,” arXiv preprint, Sep.
The chatbot’s features are limited to factual information 2023.
and cannot replace legal counsel. There may be issues
with feedback with respect to complex, unclear, cross- [8] P. Gandhi and V. Talwar, “Artificial intelligence and
border, or multi-layered questions/concerns, and the ChatGPT in the legal context,” Indian Journal of
chatbot’s capacity to address issues with state law may Medical Sciences, vol. 75, no. 1, pp. 1–2, Mar. 2023.
be limited. Over the next few months, we plan to [9] M. Boopathi, M. Hemanthkumar, S. Manimaran, S.
broaden the Knowledge Base to cover more areas and Mathankumar, and R. Hemavathy, “Legal information
state laws, enhance regional Indian language translation, chatbot,” Int. J. Research inEngineering, Science &
and add human-in-the-loop for sensitive and high-risk Management, 2025.
issues.
[10] D. Panchal, A. Gole, V. Narute, and R. Joshi,
“LawPal: A retrieval-augmented generation based
These features may also include interaction with e- system for enhanced legal accessibility in India,” arXiv
Courts and platforms providing legal aid, use of preprint, Feb. 2024.
explainable AI for enhanced transparency, and studies
for deployment at scale in real-world contexts to
evaluate the potential for sustained use and impact.

[Link]

[1] K. L. Srujan Surya, E. Mark K., K. Srujan, P. Charan


Tej, and A. S. F., “AI-powered interactive legal chatbot
for the Department of Justice,” Int. J. Computational
Learning & Intelligence, vol. 4, no. 4, pp. 809–817, May
2025.

[2] A. Acharya, V. Ravikumar, and B. K. Depuru, “AI-


powered legal chatbot for litigative situations:
Leveraging BNS, BNSS, and BSA with contextual
document integration,” Int. J. Innovative Science &
Research Technology, vol. 10, no. 3, pp. 1612–1617,
Mar. 2025.

[3] M. G. Mahalakshmi, R. Shetty, S. S. Benur, S. M.,


and Swara, “A literature review on AI attorney chatbot
for legal assistance,” Int. Adv. Res. J. Sci., Eng. &
Technology, vol. 12, no. 3, Mar. 2025.

[4] A. Asokan, B. K. Bineesha, S. P. M., J. P. Abraham,


and R. Shibu, “Indian legal text summarization using
InCaseLaw-BERT,” Int. J. Res. Appl. Sci. & Eng.
Technol. (IJRASET), 2025.

[5] N. N. Shreenandhan and S. Sujatha, “LawyerBot: An


AI assistant for legal guidance in India,” JETIR, vol. 12,
no. 5, May 2025.

[6] A. D. Kamble and D. A. Kamble, “Innovating legal


aid with AI: A chatbot framework for deep learning-
based IPC information and assistance,” Int. J. Innovative
Research in Science, Engineering and Technology
(IJIRSET), vol. 13, no. 8, Aug. 2024.

You might also like