Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
1. INTRODUCTION
Going through legal documents and case law can be a tedious task for many and mostly
requiring domain specific knowledge. It gets even more difficult for those who are not
proficient in the primary language used in the legal domain. This is the main reason why
majority of the population have difficulty in understanding and comprehending legal
documents which can further affect their decision making and exercising their rights.
The system we are building will address this gap by providing a solution which includes
document summarisation, personalized case recommendation along with translation to
kannada. By using NLP techniques, we plan to extract key information from legal
documents and provide summaries and also at the same time suggest relevant cases that
are similar to the users legal situation. We will also translate the summaries into kannada,
thereby enabling access to legal information for the kannada
speaking population as well.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
7
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
2. PROBLEM DEFINITION
The problem we are addressing is the lack of accessibility and comprehension of legal
information to common man especially to the kannada speaking population. A major
portion of the population still lack understanding of legal documents due to language
barrier and since the main language used is English. There also exists a lack of availability
of concise summaries of legal documents and related cases and further their kannada
translation. We therefore are planning to address these prevalent issues through our
system, thereby improving the access to legal information and also helping the common
man make better informed decisions are planning to achieve this by using technologies like
natural language processing and various other machine learning models
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
8
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
3. LITERATURE SURVEY
3.1 Evaluating the Factuality of Zero-shot Summarizers Across Varied
Domains
3.1.1 Objective
To understand the performance of zero-shot summarization models across
domains, which include legal and biomedical texts.
3.1.2 Features
GPT 3.5 and Flan-T5-XL models were used for zero-shot summarization.
3.1.3 Results
Observed that inaccuracies were more likely in news article summaries
compared to legal and biomedical domains. Highlighted the need for manual
evaluations or new metrics for specialized domains.
3.2 A review of generalized zero-shot learning methods
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
9
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
3.2.1 Objective
Provide a comprehensive review of generalized zero-shot learning (GZSL)
methods and their representative models.
3.2.2 Features
Discussed inductive and transductive GZSL, Global Semantic Consistency
Network (GSC-Net), and Word2Vec.
3.2.3 Results
Highlighted challenges such as the Hubness problem and projection domain
shift problem in GZSL methods.
3.3 InSaAF: Incorporating Safety through Accuracy and Fairness| Are
LLMs ready for the Indian Legal Domain
3.3.1 Objective
Propose a framework to quantify the legal decision-making capability of large
language models (LLMs) and fine-tune them for the Indian legal domain.
3.3.2 Features
Used binary statutory reasoning, fairness-accuracy tradeoff, and the LLaMA 7B
and LLaMA-2 7B models.
3.3.3 Results
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
10
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Introduced the β-weighted Legal Safety Score metric and showed that fine-tuning
increases the safety and usability of LLMs in the legal domain.
3.4 Legal case document similarity: You need both network and text
3.4.1 Objective
Improve the state-of-the-art for estimating similarity between legal case documents
using both network and text features.
3.4.2 Features
Implemented Prior-case Citation Network (PCNet), Heterogeneous network of
statutes, Bibliographic Coupling, and Co-citation.
3.4.3 Results
Proposed Hier-SPCNet and combined text and network similarity signals to
improve document similarity estimation.
3.5 ArgLegalSumm: Improving abstractive summarization of legal
documents with argument mining
3.5.1 Objective
Improve abstractive summarization of legal documents by incorporating
argument
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
11
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
mining techniques.
3.5.2 Features
Experimented with pre-trained language models like BART, T5, Pegasus, and
Longformer.
3.5.3 Results
Demonstrated that representing argument roles using fine-grained labels
effectively
improves the output of the Longformer model.
3.6 Legal Case Document Summarization: Extractive and Abstractive
Methods and Their
Evaluation
3.6.1 Objective
Evaluate and compare extractive and abstractive summarization methods for legal
case documents.
3.6.2 Features
Used domain-independent, domain-specific, and transformer-based models.
3.6.3 Results
Found that domain-specific training/fine-tuning and chunking-based approaches
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
12
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
performed better, especially for long legal documents.
3.7 Improving abstractive summarization of legal rulings through
textual entailment
3.7.1 Objective
Improve abstractive summarization of legal rulings by incorporating
textual entailment.
3.7.2 Features
Proposed the "LegalSumm" model that generates multiple summary versions and
uses an entailment module to ensure faithfulness.
3.7.3 Results
LegalSumm is an effective abstractive method for summarizing legal rulings,
handling long texts, and minimizing hallucinations.
3.8 Quick Check: A Legal Research Recommendation System
3.8.1 Objective
Develop a legal research recommendation system that automatically extracts key
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
13
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
arguments from a brief and provides relevant precedents.
3.8.2 Features
Used two-stage SVM-based ranking models and a legal topic classifier.
3.8.3 Results
The Quick Check system effectively recommended highly relevant case law
opinions to support legal arguments.
3.9 Similar cases recommendation using legal knowledge graphs
3.9.1 Objective
Recommend similar legal cases using a legal knowledge graph and graph
neural network models.
3.9.2 Features
Constructed a legal knowledge graph, used Latent Dirichlet Allocation (LDA) for
feature selection, and applied Relational Graph Convolutional Networks
(RGCN).
3.9.3 Results
Encoding node features using the pre-trained LegalBERT model
improved performance
on the citation link prediction task.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
14
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
3.10 ILDC for CJPE: Indian legal documents corpus for court
judgment prediction and explanation
3.10.1 Objective
Propose the task of court judgment prediction and explanation (CJPE) and
introduce
the ILDC dataset for this task.
3.10.2 Features
Baseline models achieved 78% accuracy compared to human experts, but
struggled
with providing accurate explanations.
3.10.3 Results
Highlighted the need for improving the accuracy of explanation generation for
court judgment prediction.
3.11 Summarizing Legal Rulings: Comparative Experiments
3.11.1 Objective
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
15
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Compare abstractive and extractive summarization models for legal rulings.
3.11.2 Features
Evaluated models like NMTSmall, NMTMedium, Transformer, Luhn, LexRank,
and SumBasic.
3.11.3 Results
Abstractive approaches significantly outperformed extractive methods in terms of
ROUGE scores, but still faced challenges like repeated expressions and
introducing
unrelated subjects.
3.12 LEGAL-BERT: The Muppets straight out of Law School
3.12.1 Objective
Develop LEGAL-BERT, a BERT-based model pre-trained on legal domain-
specific
corpora.
3.12.2 Features
Compared using the original BERT out-of-the-box, adapting BERT with additional
pretraining, and exploring a broader hyperparameter search space.
3.12.3 Results
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
16
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Smaller BERT-based models can be competitive with larger models in
specialized
domains like the legal field.
3.13 How Ready are Pre-trained Abstractive Models and LLMs for
Legal Case Judgement Summarization
3.13.1 Objective
Compare the performance of LLMs, extractive models, and abstractive models
for legal case judgment summarization.
3.13.2 Features
Evaluated models like LegPegasus-IN and LegLED-IN.
3.13.3 Results
Legal domain-specific abstractive models achieved the best metric scores,
outperforming both LLMs and extractive models. However, challenges
remained with inconsistencies and hallucinations in the generated
summaries.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
17
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
3.14 Semantics and structure based recommendation of similar legal
cases
3.14.1 Objective
Recommend similar legal cases by integrating semantic and structural
information from the case texts.
3.14.2 Features
Used Latent Semantic Analysis (LSA), TextRank, and methods to structure the
unstructured verdict text.
3.14.3 Results
The integrated approach of latent semantics and structure demonstrated
improved performance in finding similar criminal cases compared to traditional
methods.
3.15 BERT_LF: A similar case retrieval method based on legal facts
3.15.1 Objective
Propose a legal case representation method based on legal facts and topic
distribution to improve case retrieval.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
18
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
3.15.2 Features
Implemented BERT-LF, which combines semantic information, topic distribution,
and legal entity facts.
3.15.3 Results
BERT-LF outperformed traditional bag-of-words retrieval models and BERT-
based models in legal case retrieval tasks.
3.16 Lawsum: A weakly supervised approach for Indian legal
document summarization
3.16.1 Objective
Develop a neural network-based approach for Indian legal document
summarization.
3.16.2 Features
Used a 2-layer Bidirectional LSTM model.
3.16.3 Results
The neural summarization approach significantly outperformed popular extractive
summarization techniques, with best performance in the Intellectual
Property domain.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
19
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
3.17 Conditional abstractive summarization of court decisions for
laymen and insights from human evaluation
3.17.1 Objective
Generate summaries of court decisions that are easily understandable for laymen,
not just legal experts.
3.17.2 Features
Used a question-answer-decision triplet and a fine-tuned BARThez model.
3.17.3 Results
The best model achieved an average ROUGE-1 score of 37.7 and highlighted
the importance of manual evaluation for improving layman-oriented summaries.
3.18 Improving Access to Justice for the Indian Population: A
Benchmark for Evaluating Translation of Legal Text to Indian
Languages
3.18.1 Objective
Construct a high-quality legal parallel corpus in English and nine Indian languages,
and benchmark the performance of various Machine Translation (MT) systems.
3.18.2 Features
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
20
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Identified common errors in MT systems, such as extra words, mistranslation,
and untranslated portions.
3.18.3 Results
Advocated for human evaluation of legal translations using metrics like
Preservation of Meaning, Suitability for Legal Use, and Fluency.
3.19 Lawrec: automatic recommendation of legal provisions based on
legal text analysis
3.19.1 Objective
Enhance legal recommendation by integrating legal provisions with case
descriptions using advanced technologies.
3.19.2 Features
Leveraged BERT and Skip-Recurrent Neural Network (Skip-RNN) models for text
understanding and feature extraction.
3.19.3 Results
LawRec demonstrated a 92% accuracy rate, outperforming existing methods by
12%, and showcased the effectiveness of integrating legal knowledge for precise
legal recommendations.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
21
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
3.20 Natural Language Processing and Machine Learning for Law and
Policy Texts
3.20.1 Objective
Discuss the role of NLP and machine learning in analyzing legal texts, including
sentiment analysis, text summarization, and topic modeling.
3.20.2 Features
Highlighted the importance of domain-specific training data and the effectiveness
of machine-learned NLP models in legal text analysis.
3.20.3 Results
Suggested that these techniques can help in summarizing law patterns, which can
be further used for similar case recommendation and related sections.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
22
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Summary of Literature Survey:
• The research focuses on developing NLP techniques for summarizing, extracting insights,
and recommending legal documents, to improve accessibility for non-expert users.
• Major challenges include handling the complexity, specialized terminology, and length
of legal texts, which existing NLP models struggle with.
• Abstractive summarization approaches generally outperform extractive methods, but
suffer from issues like hallucinations and factual inaccuracies.
• Integrating domain knowledge through specialized legal language models, knowledge
graphs, and combining semantic and structural features improves performance on legal
text understanding tasks.
• Key research directions include reducing hallucinations in abstractive summaries,
leveraging legal domain knowledge, and creating benchmarks for evaluating NLP
systems on Indian legal data.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
23
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
4. DATA
4.1 Overview
The availability of these specialized legal datasets has been crucial for the development and
assessment of NLP techniques tailored to the complexities of legal text. However, the papers
also highlight the need for more high-quality, multilingual legal datasets, especially for
lower-resource Indian languages, to further advance research in this domain.
4.2 Datasets
• Indian Legal Documents Corpus (ILDC):
A large corpus of 35,000 Indian Supreme Court cases, annotated with original court
decisions. This dataset has been verified legal experts and was therefore used for
Court Judgement prediction and explanation.
• LegalSumm:
This is a dataset that has been used for evaluating extractive and abstractive
summarisation models in the legal domain. It contains court rulings.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
24
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
5. SYSTEM REQUIREMENTS SPECIFICATION
5.1. Project Scope
The system we aim to develop is basically a user-friendly interface that uses natural
language processing to summarize and translate legal documents for the Kannada
speaking population. The main objective of our system is to provide summarized legal
information, recommendation and kannada translation to improve accessibility to legal
information and therefore better decision making
1. Document Summarization: The system will analyze the input given, which Is the
legal situation description using NLP, to generate abstractive summaries of
relevant legal cases from the Indian courts
2. Personalized Recommendations: The user's input is matched with similar legal
cases and recommendations of those relevant cases are given to the user.
3. Kannada Translation: The summaries and recommendations will be translated to
kannada
language to cater to the kannada speaking population.
4. Expansion to Lower Courts: Our future plan includes integrating data of
the lower courts to the system to improve the coverage of our system and
increase accessibility.
Goals:
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
25
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
1. To create a user friendly platform for taking legal situations as input, and in turn getting
summarized legal content along with recommendation and translation.
2. To curate and integrate a wide variety of datasets of legal documentation from Indian
courts which include the High court and the Supreme Court.
3. To implement NLP models for summarisation of documents ,case recommendation
and translation.
4. To evaluate the performance of the system in terms of accuracy and usability
Limitations:
1. Consistency and data availability: The accuracy of the system will totally
depend upon the consistency and availability of legal data which can
pose a challenge.
2. Accuracy of the NLP model: The accuracy of the translation and NLP
models is necessary for providing reliable translation and summaries.
3. To integrate lower court data: Data integration from the lower courts will
pose a challenge later in terms of technical and data.
4. User Adoption: Widespread usage of the system especially by the
kannada speaking
population is necessary.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
26
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
5. Dependency on input: The recommendation given by the system is totally dependent on
the input given by the user which needs to be accurate and any inaccuracies in that can impact
the quality of recommendations.
5.2. Product Perspective
5.2.1. Product Features
• Summarization of documents: The system will use NLP to generate
summaries of legal documents from Indian Courts and also provide
users with an overview of the relevant cases.
• Personalized Recommendations: The system matches the input
given by the user to the dataset and recommends related cases.
• Translation to kannada: The system translates the summaries and
the recommendations to kannada so as to cater to the kannada
speaking population.
• Expand to lower courts: The system shall be ready to integrate data
from the lower courts to ensure coverage and accessibility of
system.
5.2.2. User Classes and Characteristics
1. Regular Users:
[Link]: Regular use for accessing documents, summaries and
translation.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
27
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
[Link]: Utilizes document summarization, recommendation and
translation
iii. Technical Capability: Basic knowledge of web or mobile applications.
iv. Security Levels: Implementing authentication of users profiles and data
encryption for security and privacy protection.
2. Legal Professionals:
i. Frequency: Used for aiding of legal research, analysis of legal documents and
material like an archive, and client consultations.
ii. Functionality: Includes advanced search of the database of extensive legal
resources, legal document and other legal material analysis, and citation
extraction capabilities.
iii. Technical Capability: Expertise in using legal software and a thorough
comprehension of complicated legal ideas on a professional level .
iv. Security Levels: High-level security measures like data encryption are required
to maintain
confidentiality and protect data of users with user authentication
5.2.3. Operating Environment
• Implemented as an Android app to be accessed via Android devices.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
28
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
• The backend will utilize server-side processing and database management to process user
requests and handle the data.
5.2.4. General Constraints, Assumptions and Dependencies
• Availability of Legal data and Documentation: The system's correctness and
efficiency will be determined by the consistency and availability of legal data
obtained from Indian courts.
• Natural language Processing: the accuracy of the NLP algorithms play a crucial
role in system performance in document summarization and translation. \
• Regulatory Compliance: The system must follow data privacy standards and
handle user information securely.
• Technical dependencies: The system's integration of lower court data may rely on
APIs or other data sources which leads to dependencies and may lead to potential
risks.
• User adoption: The system should be widely used among people, particularly
among Kannada speakers who benefit from the Kannada translation support.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
29
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
5.2.5. Risks
o Legal Data Availability: Inconsistencies or gaps in legal documents data may
affect the system's performance.
o Language processing limitations: The precision of natural language comprehension
and translation models may vary, and thus affect the quality of the summaries and
translations generated by the system
o Technical Dependencies: Depending on APIs and other data sources to integrate
lower court data may present risks to the reliability of the system.
o User Adoption: Factors like as usability to the general public, confidence in the
system's recommendations, and competition from other legal information systems
may influence user adoption.
5.3. Functional Requirements
• Legal Situation Description Input: Verifying the information input by the user to confirm it
includes the required details for suggesting relevant legal cases.
• Summarization Process: Using natural language processing to analyze the user's
description of their legal situation, simple summaries of relevant legal cases are
generated
• Recommendation Process: This involves matching user input with cases in the dataset of
past official legal case proceedings to generate tailored recommendations while also
ordering them in order of relevance.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
30
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
• Translation Process: For users who prefer it to be in their own language, the recommended
cases and condensed legal materials are translated into Kannada.
• Error Handling and Recovery: Giving users clear error messages and recommendations to
help them in the event that their input is unclear or they encounter other problems.
• Translation of content: Ensuring that the Kannada translations preserve the original
content's coherence and meaning.
5.4. External Interface Requirements
5.4.1. User Interfaces
A seamless and productive connection between the system and its users depends on the
features of the interface that connects them. These qualities cover a range of topics,
including as error management, system responsiveness, and data interchange.
Overall GUI Standards:
1. Consistent Layout:
• A consistent layout is ensured by keeping everything in its proper place on
all the screens to provide a unified visual experience and making sure that
everything is distributed evenly to be easy on the eye and prevent visual
clutter.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
31
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
2. Standardized Input Fields:
• Interaction with consumers is by using standard input fields that have
obvious placeholders.
• Putting in place a unified look for the input fields and preserving
consistency across the interface.
3. Distinct Buttons:
Utilizing aesthetically distinct buttons for various functions, such as
choosing options, registration, and login. We also keep the size and color
of the buttons consistent as well to stay visually pleasing.
4. Intuitive Controls:
Creating intuitive and user-friendly controls for customisation, such as checkboxes
and sliders.
Ensuring controls are user-friendly and responsive.
Error Messages:
1. Input Validation Errors:
Message: Clearly state the type of the error for example invalid input.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
32
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Guidance: Offer prompts and suggestions on possible corrective measures to the
user, including defining the necessary format or filling in any missing fields.
2. Authentication Errors:
Message: Clearly communicate errors in authentication, like incorrect username
and password.
Guidance: Offer help with troubleshooting which includes password reset
options.
3. System Unavailability:
Message: Clearly convey the message that the system is temporarily unavailable
and is perhaps due to some technical issues or periodic maintenance.
Guidance: Provide the estimated time by which the system will be available.
4. Permission Errors:
Message: Clearly indicate that the user is not permitted to perform a certain
action.
Guidance: Instruct the user on how to request for the required permissions.
5.4.2. Hardware Requirements
Logical Interface:
User Devices:
Supported devices: Smartphones, desktops and laptops.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
33
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Characteristics: Designed to adapt to various screen sizes and compatible with
mobile and web browsers.
Input Devices:
Supported devices: Keyboards, mouse, and touchscreens.
Characteristics: Recognition of input for text entry, touch and other interaction
methods.
Output Devices:
Types Supported: Displays and monitors
Characteristics: Standard display ports are compatible with output interfaces.
Physical Interface:
Communication Channels:
• Wired: Ethernet or other types of wired connections for transfer of data.
• Wireless: Wi-Fi and mobile network connection for access.
• Characteristics: Transition between wired and wireless connections will be
seamless.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
34
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Protocols:
• HTTP/HTTPS: communicates with the user interface using web-based
communication.
• TCP/IP: Reliable data transfer between
hardware and software components
• Characteristics: Protocols to secure and efficient
communication.
Performance Requirements:
• Processing Power: Support for several cores and the lowest possible
processing speed for seamless operation across a range of devices.
• Memory (RAM): For timeliness and most effective data processing,
a minimum amount of RAM is required.
• Storage Space: A minimum amount of storage is required for
caching and data storage.
Security Measures:
Standards for Encryption: Secure data transport between hardware and software
components is achieved through the use of SSL/TLS.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
35
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Device Authentication: To grant authorized access, multi-factor authentication and
device pairing are employed.
5.4.3 Software Requirements
Operating System: Linux-based (e.g., Ubuntu Server 20.04 LTS)
Databases: MongoDB, MySQL
Tools and Libraries: Python, Flask, [Link], [Link], Docker, TensorFlow, NLTK,Scikit-
learn
Source: GitHub Repository
5.4.4. Communication Interfaces
LAN protocol: Local network communication is facilitated by TCP/IP.
Web communication: Secure web interface communication uses HTTP / HTTPS.
Database Communication: MySQL for data storage and retrieval ML model
interface: Scikit-learn and TensorFlow will be utilised to integrate the
models.
5.5. Non-Functional Requirements
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
36
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
5.5.1 Performance Requirements:
Response Time: The system ought to be in a position to reply to user requests within a
few seconds even when there is high demand.
Scalability: The system should be capable of handling a large number of users together
with huge volumes of data without any noticeable deterioration in performance.
Resource Utilization: Hardware resources need to be utilized by the system for
maximum efficiency and minimum wastage of resources.
Reliability: The users must be able to use the system whenever they need to, this
minimum downtime for maintenance or upgrades must be followed
Throughput: the software should support high throughputs simultaneously so all
users receive services on time
Availability: The least amount possible of downtimes should be experienced
throughout service usage periods.
5.5.2 Safety Requirements:
•
Data privacy: Safeguarding user data and legal records by following secure handling measures as well
as conforming with data protection laws and regulations.
•
User Data Protection: Establishing security protocols that would prevent unauthorized persons from
accessing user records.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
37
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
•
Regulatory compliance: Ensuring that system activities and data practices are in line with relevant
legislation and guidelines.
5.5.3 Security Requirements:
•
Encryption: Encrypt data exchanges between clients and servers with protocols like TLS.
•
User authentication: Implement password protection measures and secure user authentication.
•
Control access to sensitive functions and data through role-based access controls (RBAC) policy
implementation. Backup and recovery: For system resilience and preservation of stored information
integrity, perform regular data backups and develop contingency plans for recovery from disasters.
5.6. Other Requirements
1. Usability Requirements:
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
38
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
• User interface consistency: Maintain a user-friendly interface design across
platforms and software in order to improve usability.
• Accessibility: Design the system’s features and interface to be compliant with
accessibility regulations and enables people with disabilities to use it comfortably.
• Enable Multilingual support: by implementing Kannada translation to the system,
we make the system more accessible. Eventually we aim to extend support to more
Indian languages to cater to a wider audience.
2. Interoperability Requirements:
• Integration with external data sources: To make the system more versatile and
usable, add the possibility to integrate legal data from different platforms and
sources, including governmental portals.
• API Support: Develop well-documented API for third-party systems and
platforms for interaction, data exchange and custom solutions based on
the legal information.
3. Scalability and Performance Requirements:
Scalable architecture: Scalable architecture, which enables additional resources to be
added as necessary due to the increase or fluctuations in user intentions is implemented
in the ability to scale the system horizontally
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
39
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Load balancing: Load balancing, which refers to techniques that help optimize how the
system operates by redistributing user intents across various servers for improved system
performance
Performance monitoring: Performance monitoring to review system statistics, including
measures like user response timings and resource usage, and detect any bottlenecks.
6. SYSTEM DESIGN
6.1. Design Considerations
[Link] Goals
Accuracy: the system will correctly summarize and recommend legal cases based on the
relevance to the user’s input
Usability: The system should be able to handle users with varying degrees of legal competence
by providing an intuitive user interface results they can understand and make use of.
Scalability: The system must be built to support and accommodate an expanding user base
and manage a big dataset of legal documents.
Language Support: by implementing translation support, we provide kannada
translations ensuring accessibility for local users.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
40
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
[Link] Choices
• Microservices architecture: Implementing a microservices architecture allows for
modular development, thereby ensuring scaling and deployment of
different components of the system such as summarization, recommendation,
and translation.
• Natural language processing pipeline: Using a robust NLP pipeline ensures the
extraction of important information from legal documents and the creation of
reliable summaries.
• Client server model: Using a client server model allows for seamless interaction
between users and the system, with the server taking care of the data processing
tasks and giving summarized legal information to clients.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
41
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
[Link], Assumptions and Dependencies
• Data availability: The system's effectiveness depends totally on the availability and
quality of legal documentation data that includes high court and supreme court
verdicts,assuming that sufficient data is available for training and testing the system.
• API dependence: For integration with lower court data, dependency on APIs for data
extraction creates a challenge due to the complexity and lack of user friendly interfaces
on lower court websites.
• Language translation limitations: While providing translation to Kannada language, it
enhances accessibility but at the same time there may be limitations in translation
accuracy.
• Legal Compliance: The system must comply with legal regulations regarding data
privacy, copyright, and usage rights when accessing and processing legal documentation.
Assumption: Proper permissions and licenses are obtained for using the legal data.
• Technology Stack: Selection of appropriate technologies for natural language
processing, database management, and web development impacts system performance,
scalability, and maintainability.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
42
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
• Assumption: Technologies chosen align with project requirements and development
capabilities.
[Link] Flow Diagram
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
43
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Fig 1: System Flow Diagram
1. Process -
• First the user either inputs their legal document or types in their legal situation in a
text format. once that is done the features are extracted
• After extraction of feature, it is then sent to abstractive summarizer for
summarization
process. The summarizer returns the summarized document.
• The summarized document is then sent to recommender to recommend similar cases.
• Also, the summarized document is sent to the translator in case the user wishes to
translate the document to kannada language.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
44
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
6.4 Master Class Diagram
Fig 2: Master Class Diagram
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
45
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
6.5 Reusability Considerations
• Project Components that are and can be generated with available reusable
components.
1. UI Components
2. database components
3. Data Visualization tools
• Components that can be built in the project for reuse in the project.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
46
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
6.6 ER Diagram
Fig 3: ER Diagram
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
47
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
6.7 User Interface Diagrams
Fig 4: User Interface Diagram
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
48
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
6.8. Use Case Diagram
Fig 5: Use Case Diagram
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
49
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
6.9 External Interfaces
o User Interface (UI):The UI serves as the primary external interface through
which users interact with the system. It should be intuitive, user-friendly,
and accessible
across different devices and platforms.
o Database Interface: The system interacts with a database to store and
retrieve legal documentation, case details, verdicts, and translations. The
database interface facilitates CRUD (Create, Read, Update, Delete)
operations on the database, ensuring data integrity and reliability.
o Translation service interface: If the system offers Kannada translation
functionality, it requires an interface with a translation service or API. The
interface allows the system to send text in English for translation and
receive Kannada translation for the
same.
o Authentication interface: If user authentication is needed, the system
interfaces with a authentication service to check user credentials and
provide access to
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
50
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
authorized users. This interface ensures secure access to the system's
features.
o Communication interface: In certain scenarios where the system needs to
communicate with external systems, a communication interface is needed. This
interface allows data transfer. Like communicating with websites of lower
courts to get access to data.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
51
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
7. Design Details
Novelty
The system's novelty lies in its integration of natural language processing (NLP)
techniques for summarizing legal documentation and providing translations,
particularly in the context of Indian legal cases. Catering to the wider audience
accessibility is further improved by adding Kannada translations and user-friendly
interfaces.
Innovativeness
The technology creates innovative abstractive summaries by utilizing cutting-edge natural
language processing (NLP) techniques to extract important information from legal
documents. Moreover, the incorporation of translation services for Kannada
accommodates users with varying linguistic backgrounds, hence enhancing accessibility
to legal material.
Interoperability
Through the use of common formats and protocols for communication between
various modules, the system guarantees compatibility.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
52
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
APIs are designed to allow interaction between the User, Backend services,
Translation service, and Database.
Performance
Optimizations in performance are done to ensure efficient processing of user
inputs .Also for retrieval of data from the database. Techniques such as caching
and parallel processing can be used to have minimum response time and
improve system throughput.
Security
The system prioritizes data security by implementing authentication
mechanisms and data encryption techniques. Measures are taken to control
unauthorized access and attacks on the system.
Reliability
The system intends to deliver legal information by ensuring accurate
summarization of legal documents and translations. Quality assurance
processes that include testing and validation, are implemented to identify and
address errors in the generated summaries.
Maintainability:
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
53
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
The system's design promotes maintainability through a modular architecture
and adherence to coding standards. Documentation and version control systems
are used to help in code maintenance and enhancements.
Portability:
The system has been designed to be platform-independent, letting it run on
various operating systems and hardware devices. Compatibility with web
browsers ensures accessibility across different devices.
Legacy to Modernization:
The system might support the migration of legacy legal documentation systems to
modern platforms. Legacy data can be integrated into the system's database.
Reusability:
The system ensures reusability by encapsulating functionality into components
that can be used across different parts of the system. Frameworks and design
patterns will be used to facilitate code reusability.
Resource utilization:
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
54
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
The system shall ensure resource utilization by efficiently managing memory
and processing power. Resource monitoring tools shall be used to study system
performance and identify areas for optimization.
Application Compatibility:
The system ensures compatibility with different web browsers and operating
systems which are most commonly used by the users. Cross browser testing
and checks are performed to verify consistency in functionality across different
platforms.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
55
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
8. CONCLUSION OF CAPSTONE PROJECT PHASE -
1
In the first phase of our capstone project, we conducted a literature review in order to ascertain
the state of the field's investigation into the use of natural language processing techniques to
legal documents. We noted the main difficulties and challenges that were encountered, including
managing the intricacy and technical language of legal texts as well as the shortcomings of the
NLP models currently in use in this domain.
We discovered a number of strategies put out in the literature, such as extractive and abstractive
summarization techniques, language models particular to a given domain, and methods for
combining structural and semantic data from legal documents. The literature review made clear
how crucial it is to use domain expertise and carefully select specialized datasets in order to
improve NLP system performance when it comes to legal text analysis tasks.
We also reviewed the existing legal datasets, which have been crucial in creating and assessing
NLP methods particular to the legal field such as datasets like the Indian Legal Documents
Corpus and LegalSumm datasets. We also found the need for more high-quality, multilingual
legal datasets, mainly for the lower courts.
Based on the literature survey and project requirements, we decided the system
specifications, which includes functional and non-functional requirements and the user
interfaces. We also outlined the design considerations, architecture choices, and constraints,
assumptions, and dependencies for the proposed system.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
56
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
Overall, the first phase of the project laid a solid foundation for the development and
implementation of the proposed system, which aims to provide accessible legal assistance
through NLP-based summarization, recommendation, and translation of Indian
legal documentation.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
57
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
9. PLAN OF WORK FOR CAPSTONE PROJECT
PHASE - 2
In the second phase of the capstone project, we will focus on the implementation and
evaluation of the proposed system. The following tasks are planned:
1. Data Acquisition and Preparation:
• Curate and preprocess a comprehensive dataset of legal documentation from
Indian high
courts and supreme courts.
• Explore the availability of lower court data and potential integration mechanisms.
• Ensure data quality and compliance with legal regulations and usage rights.
2. Model Development and Implementation:
• Implement advanced NLP models for document summarization, case
recommendation, and Kannada translation.
• Explore techniques for integrating domain knowledge and leveraging specialized
legal language models.
• Develop a user-friendly interface for inputting legal situations and accessing
summarized, recommended, and translated content.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
58
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
REFERENCES/BIBLIOGRAPHY
1. Ramprasad, Sanjana, et al. "Evaluating the Factuality of Zero-shot Summarizers Across
Varied Domains." arXiv preprint arXiv:2402.03509 (2024).
2. Pourpanah, Farhad, et al. "A review of generalized zero-shot learning methods." IEEE
transactions on pattern analysis and machine intelligence 45.4 (2022): 4051-4070.
3. Tripathi, Yogesh, et al. "InSaAF: Incorporating Safety through Accuracy and Fairness|
Are LLMs ready for the Indian Legal Domain?." arXiv preprint arXiv:2402.10567
(2024).
4. Bhattacharya, Paheli, et al. "Legal case document similarity: You need both network
and text." Information Processing & Management 59.6 (2022): 103069.
5. Elaraby, Mohamed, and Diane Litman. "ArgLegalSumm: Improving abstractive
summarization of legal documents with argument mining." arXiv preprint
arXiv:2209.01650 (2022).
6. Shukla, Abhay, et al. "Legal case document summarization: Extractive and abstractive
methods and their evaluation." arXiv preprint arXiv:2210.07544 (2022).
7. Feijo, Diego de Vargas, and Viviane P. Moreira. "Improving abstractive summarization
of legal rulings through textual entailment." Artificial intelligence and law 31.1 (2023):
91-113.
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
59
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
8. Thomas, Merine, et al. "Quick Check: A Legal Research Recommendation System."
NLLP@ KDD. 2020.
9. Dhani, Jaspreet Singh, et al. "Similar cases recommendation using legal knowledge
graphs." arXiv preprint arXiv:2107.04771 (2021).
10. Malik, Vijit, et al. "ILDC for CJPE: Indian legal documents corpus for court judgment
prediction and explanation." arXiv preprint arXiv:2105.13562 (2021).
11. Feijo, Diego, and Viviane Moreira. "Summarizing legal rulings: Comparative
experiments." proceedings of the international conference on recent advances in
natural language processing (RANLP 2019). 2019.
12. Chalkidis, Ilias, et al. "LEGAL-BERT: The muppets straight out of law school." arXiv
preprint arXiv:2010.02559 (2020).
13. Deroy, Aniket, Kripabandhu Ghosh, and Saptarshi Ghosh. "How ready are pre-trained
abstractive models and LLMs for legal case judgement summarization?." arXiv preprint
arXiv:2306.01248 (2023).
14. Liu, Ying, Xudong Luo, and Xi Yang. "Semantics and structure based recommendation of
similar legal cases." 2019 IEEE 14th International Conference on Intelligent Systems
and Knowledge Engineering (ISKE). IEEE, 2019.
15. Hu, Weifeng, et al. "BERT_LF: A similar case retrieval method based on legal facts."
Wireless Communications and Mobile Computing 2022 (2022).
16. Parikh, Vedant, et al. "Lawsum: A weakly supervised approach for indian legal
document summarization." arXiv preprint arXiv:2110.01188 (2021).
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
60
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
17. Salau n, Olivier, et al. "Conditional abstractive summarization of court decisions for
laymen and insights from human evaluation." Legal Knowledge and Information
Systems. IOS Press, 2022. 123132.
18. Mahapatra, Sayan, et al. "Improving Access to Justice for the Indian Population: A
Benchmark for Evaluating Translation of Legal Text to Indian Languages." arXiv
preprint arXiv:2310.09765 (2023).
19. Zheng, Min, Bo Liu, and Le Sun. "Lawrec: automatic recommendation of legal
provisions based on legal text analysis." Computational Intelligence and Neuroscience
2022 (2022).
20. Nay, John, Natural Language Processing and Machine Learning for Law and Policy
Texts (April 7, 2018). Nay, J. (2021) “Natural Language Processing for Legal Texts.” In
D. M. Katz, R. Dolin & M.
Bommarito (Eds.), Legal Informatics. Cambridge University Press.
APPENDIX A DEFINITIONS, ACRONYMS, AND
ABBREVIATIONS
• NLP - Natural Language Processing
• API - Application Programming Interface
• GUI - Graphical User Interface
• SSL - Secure Sockets Layer
• TLS - Transport Layer Security
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
61
Accessible Legal Assistance through NLP: Summarization and Recommendation
of Indian Legal Documentation with Kannada Translation
_____________________________________________________________________________________
• TCP/IP - Transmission Control Protocol/Internet
Protocol
• HTTP - Hypertext Transfer Protocol
• HTTPS - Hypertext Transfer Protocol Secure
• RAM - Random Access Memory
• LAN - Local Area Network
• IPv4 - Internet Protocol version 4
• IPv6 - Internet Protocol version 6
• ML - Machine Learning
• NLTK - Natural Language Toolkit
• TensorFlow - An open-source machine learning
framework ● LTS - Long Term Support
_____________________________________________________________________________________
Dept. of CSE Jan - May, 2024 Page No.
62