0% found this document useful (0 votes)

139 views17 pages

Database vs. Search Engine Overview

A database is an organized collection of related records stored digitally, arranged in a structured order for efficient searching. A search engine is a software system designed to search for information on a computer system like the World Wide Web. It has four main components: a crawler that gathers web pages and stores them in an index; the index database; a search engine that finds matches to user queries in the index; and a user interface. Basic and advanced search techniques can be used to conduct targeted searches within databases and search engines.

Uploaded by

bhavgifee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views17 pages

Database vs. Search Engine Overview

Uploaded by

bhavgifee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

DATABASE & SEARCH ENGINE

PRESENTED BY:
[Link]
Database
What is a database?
• A database is an organized collection of related
records that is stored digitally.
• It is arranged in a structured order for ease and
speed of search.
• An example would be the Library Literature
Database on the New York Public Library website
which “Indexes
periodicals and books, reports, pamphlets, and
library school theses on all aspects of library and
information science” from 1984 to the present
What Is A Search Engine ?
• Search Engine usually refer to a web search engine, which searches information
on the web.

• Search engines are huge databases of web page files that have been assembled
automatically by machines.

• By performing a search using a search engine, you're asking the engine to scan its
index of sites and match your keywords and phrases with those in the text of
documents within the engine's database.
• Search Engine is a Document Retrieval System* which is designed to help find
information stored on a computer system like on World Wide Web.

• Search Engine allows one to ask for content, meeting specific criteria, typically
those containing a given word or a phrase and retrieves a list of those items that
match those criteria. This list is often sorted with respect to some measure of
relevance of the results.

• When you are using a search engine, you are NOT searching the entire web as it
exists at this moment. You are actually searching a portion of the web, captured
in a fixed index created at an earlier date.
What Is A Search Engine ?

• A client/server application
• A document retrieval system
• Use regularly updated indexes to operate quickly and
Efficiently
• Designed to help find information stored:
• On a computer system, such as on the World Wide Web
• Inside a corporate or proprietary network
• 􀁹 In a personal computer
• 􀁹 Different selection and relevance criteria can apply in
• different environments, or for different uses
• 􀁹 Allows one to ask for content meeting specific criteria
• 􀁹 Typically those containing a given word or phrase
• 􀁹 Retrieves a list of items that match those criteria
Search Engines Consist of Four Discrete Software
Components

• Spider/ Crawler : a software program that gathers

information and puts it into the search engine’s database. It
visits Web pages, often starting at the main page of a site,
reads them and the follows the links to other pages.

• The database or Index: the web pages are systematically

stored and updated here.

• Search Engine Result Engine: which is the software that sifts

through the pages stored in the index to find matches to a
search and rank them in order of what it believes, is most
relevant.

• The interface, which is what we use to query the database. It

usually consists of a search box in which you type your query
and a button to launch the search. Sometimes there are
menus to choose various search functions to refine the query.
• The Spider retrieves pages
from the world wide web.

• The data retrieved by the

spider is systematically
indexed and stored in the
search engine’s database.

• When a user types in a search

query the Search Engine
Result Engine looks up the
Index and provides a listing of
best-matching web pages
according to its criteria,
usually with a short summary
containing the document's title
and sometimes parts of the
text. Most search engines
support the use of the boolean
terms AND, OR and NOT to
further specify the search
query.
Let's see how Goggle processes a query

1. The web server sends the

query to the index servers.
The content inside the
index servers is similar to
the index in the back of a
book--it tells which pages
contain the words that
match any particular
query term.
2. The query travels to the
doc servers, which
actually retrieve the stored
documents. Snippets are
generated to describe each
search result.
3. The search results are
returned to the user in a
fraction of a second.
Types of Search Engines
• Crawler based Search Engines – Crawlers are indexed using
spiders. E.g Google, Altavista.

• Directory – These are created and maintained by human

editors. The editors review and select sites for inclusion in
their directories on the basis of previously determined
selection criteria. Their databases are organised by category or
subject to permit browsing but are in general much smaller
than those of crawler based engines. E.g. Yahoo, Looksmart.

• Regional - Regional search engines focus on one particular

language or region. E.g. [Link], [Link]

• Metasearcher - MetaSearchers use a uniform platform to

search using several engines simultaneously. E.g [Link],
profusion, vivisimo.
Invisible Web
• Search engines do not necessarily reach all parts of the
Web or necessarily index all pages at a site.

• The Invisible Web, as it is called, is largely comprised of

databases not easily indexed by the search engines, pages
deep in a web site that don't get crawled, file formats that
the search engines ignore, and services for subscribers
only (and often for a fee).

• No one has an estimate, but some have guessed at 500

billion.
SEARCH ENGINE APPLICATIONS
Search Engines allow field
searches for Search in title,
Date last updated, Search in
the URL, etc.

Search Engine searches from

a huge database of web
pages.

The results are displayed as

per the highest occurrence of
keywords specified. One can
reorder by date of posting as
well.
THE METHOD TO CONDUCT SEARCH
When you conduct a search for a specific title or author what type of search are
you conducting?
Field searching allows the researcher to select a specific portion of the electronic
record to search, be that title, author, publication year, etc. If someone were looking
for articles by John Updike, the searcher could simply type “Updike, John” into the
author field to search for all articles contained in the database written by John
Updike.

What are basic search techniques?

The first basic principle of conducting a search is to choose appropriate keywords,
using a thesaurus if deemed necessary. In choosing keywords the researcher should
consider variant word forms, differing spellings and related words
List some advanced search techniques.
In order to conduct a more specific search,
field searching is recommended. This
would mean searching such particular fields as
Author, Title, Year of Publication,
Language, etc. for precise keywords. Thus a
researcher could input “1999” in the
year of publication field to find documents
published in that year or “French” in the
language field to find documents written in
French or “small” in the title field to find
books with the word small in the title.
In addition to the basic search techniques, on
some interfaces a proximity operator,
like “with, “adjacent” or “near,” can be used to
further limit or expand search
potentials.
THANK YOU

Common questions

Databases are organized collections of related records, structured for efficient querying and retrieval of data stored digitally, such as library catalogues . In contrast, search engines are designed to locate information stored on computer systems, like the internet, by indexing and retrieving web page files based on user queries . While databases focus on curated and structured data management, search engines dynamically index and retrieve data from vast, less structured web environments .

Search engines create and maintain their indexes by employing spiders to traverse the web, gathering data from websites. The spider reads web pages, follows links, and systematically collects information, which is then indexed and organized in search engine databases . This indexing is crucial for efficient data retrieval as it enables the search engine result engine to quickly sift through massive datasets to find and rank pages relevant to user queries, facilitating the rapid delivery of search results .

A search engine system primarily comprises four components: the spider/crawler, the database or index, the search engine result engine, and the interface. The spider or crawler gathers information by visiting web pages and following links to other pages . The collected data is then indexed and stored systematically in the search engine's database . When a user inputs a query via the interface, it is sent to the index servers to find pages containing the search terms. These pages are then ranked by the search engine result engine based on relevance criteria and returned to the user's interface, often with a document summary including the title and parts of the text .

Metasearch engines distinguish themselves by utilizing a single platform to search across several other search engines simultaneously, which allows users to access a broader range of search results than a single crawler-based search engine can provide . In contrast, crawler-based search engines use spiders to index web pages and independently rank them in indexes . Hence, while crawler-based engines build and search their own indexes, metasearch engines rely on aggregating results from various indexes without maintaining their own .

Search engines have limitations in reaching all parts of the web due to the vast and dynamic nature of internet content. The 'Invisible Web' refers to areas of the internet that are not indexed by search engines. This includes databases that present challenges for indexing, pages deep within a site that are not accessed by spiders, file formats ignored by search engines, and subscriber-only services . As a result, a significant portion of online information remains inaccessible through conventional search engines, with estimates suggesting the Invisible Web could involve up to 500 billion pages .

Boolean operators refine search engine queries by allowing users to define relationships between keywords and phrases, thereby enhancing the specificity of search results. Common Boolean operators include AND, OR, and NOT. 'AND' narrows search results by including only pages containing all specified terms, 'OR' broadens results to include pages with any of the listed terms, and 'NOT' excludes pages containing certain terms . Utilizing these operators helps users filter relevant information more effectively during searches .

Directory-based search engines differ from crawler-based search engines in that they rely on human editors to review, select, and categorize sites based on predetermined criteria . This results in smaller, more curated databases organized by subject. In contrast, crawler-based engines use automated spiders to index sites, offering broader, more extensive coverage of the web. For users, directory-based engines can offer more focused and high-quality results in specific categories, while crawler-based engines provide access to a wider array of data, potentially requiring more effort to filter for relevance .

Search engines have evolved to offer region-specific search results, enhancing their relevance and accessibility to global users. This development involves tailoring search algorithms and indexes to prioritize content based on location, language, and cultural context. Regional search engines like Google.co.in focus on tailoring results to local interests and language preferences, enhancing user experience by delivering more applicable results . The impact on global user access is significant, as users can receive more pertinent and contextually appropriate information without geographic or language barriers, fostering increased internet utility across diverse populations .

Field searching in databases allows users to target specific portions of an electronic record, such as the title, author, or publication year, which enables more precise retrieval than general keyword searching . This structured approach can filter results based on defined record fields, whereas general keyword searching casts a wider net by identifying relevant terms throughout the entire record. For example, field searching for "Updike, John" in the author field retrieves works specifically by that author, whereas keyword searching might yield broader results .

The user interface is a vital component of a search engine as it facilitates user interaction with the system. It typically includes a search box for typing queries and control elements to launch searches and refine results . The interface translates user queries into formats that the search engine understands, allowing operational commands such as applying Boolean operators or choosing specific search fields. Additionally, a well-designed interface increases usability by making it intuitive for users to access information efficiently .

Introduction to C Programming Basics
No ratings yet
Introduction to C Programming Basics
50 pages
Data Structures and Algorithms Course Overview
No ratings yet
Data Structures and Algorithms Course Overview
17 pages
Advanced Search Techniques Page 1 of 8
No ratings yet
Advanced Search Techniques Page 1 of 8
8 pages
EEE-342 Microprocessor Lab Manual
No ratings yet
EEE-342 Microprocessor Lab Manual
79 pages
Introduction to Algorithms Overview
No ratings yet
Introduction to Algorithms Overview
3 pages
NTRA Graduation Project Template
No ratings yet
NTRA Graduation Project Template
6 pages
Compiler Design Lab Manual Overview
No ratings yet
Compiler Design Lab Manual Overview
112 pages
Traditional File System vs DBMS Explained
100% (1)
Traditional File System vs DBMS Explained
1 page
Web Tech API Programming Experiments
100% (1)
Web Tech API Programming Experiments
2 pages
Explainable Deep Learning for Pneumonia Diagnosis
No ratings yet
Explainable Deep Learning for Pneumonia Diagnosis
5 pages
Fundamental Test Process Overview
No ratings yet
Fundamental Test Process Overview
4 pages
Image Enhancement Techniques Overview
No ratings yet
Image Enhancement Techniques Overview
55 pages
Introduction to Algorithms Overview
100% (1)
Introduction to Algorithms Overview
19 pages
Essential Search Engine Techniques
No ratings yet
Essential Search Engine Techniques
9 pages
8086 Microprocessor Overview
No ratings yet
8086 Microprocessor Overview
31 pages
Evaluating Digital Publishing Sources
No ratings yet
Evaluating Digital Publishing Sources
6 pages
Effective Search Engine Techniques
No ratings yet
Effective Search Engine Techniques
41 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
60 pages
Introduction to Algorithms Course Overview
No ratings yet
Introduction to Algorithms Course Overview
20 pages
J58K-300 Electric Screw Press Guide
No ratings yet
J58K-300 Electric Screw Press Guide
20 pages
ANN Training and GA Optimization Guide
100% (2)
ANN Training and GA Optimization Guide
3 pages
Understanding CPU: Definition & Function
No ratings yet
Understanding CPU: Definition & Function
19 pages
ملخص لغة البرمجة C
No ratings yet
ملخص لغة البرمجة C
12 pages
Machine Learning Bookcamp Build A Portfolio of Real Life Projects 1st Edition Alexey Grigorev Ebook Bookmarked PDF
No ratings yet
Machine Learning Bookcamp Build A Portfolio of Real Life Projects 1st Edition Alexey Grigorev Ebook Bookmarked PDF
101 pages
Raspberry Pi 3 IoT Project Specs
No ratings yet
Raspberry Pi 3 IoT Project Specs
33 pages
Reverse Engineering:An Exploration
No ratings yet
Reverse Engineering:An Exploration
7 pages
Visual Cryptography & Steganography Review
No ratings yet
Visual Cryptography & Steganography Review
4 pages
MySQL Setup on Raspberry Pi for IoT
No ratings yet
MySQL Setup on Raspberry Pi for IoT
5 pages
Bluetooth Security Overview and Risks
No ratings yet
Bluetooth Security Overview and Risks
33 pages
USP Lab C/C++ Programming Guide
No ratings yet
USP Lab C/C++ Programming Guide
35 pages
Dr. Gajanan Kharate's Academic Profile
No ratings yet
Dr. Gajanan Kharate's Academic Profile
6 pages
Shodan: Comprehensive User Guide
No ratings yet
Shodan: Comprehensive User Guide
23 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
50 pages
AI Experiments in Python Programming
No ratings yet
AI Experiments in Python Programming
55 pages
Hierarchical Network Design Essentials
No ratings yet
Hierarchical Network Design Essentials
34 pages
Evolution of Intel Processors PDF
50% (2)
Evolution of Intel Processors PDF
2 pages
UNSW NB15 Datasets-ReadMe
No ratings yet
UNSW NB15 Datasets-ReadMe
1 page
Beam Search vs Greedy Algorithm Explained
No ratings yet
Beam Search vs Greedy Algorithm Explained
15 pages
Database Normalization and Anomalies
No ratings yet
Database Normalization and Anomalies
105 pages
Half-Monthly Internship Progress Report
No ratings yet
Half-Monthly Internship Progress Report
2 pages
FDS Question Bank: Search & Sort Methods
No ratings yet
FDS Question Bank: Search & Sort Methods
20 pages
Implement SHA-256 Hashing in C++
No ratings yet
Implement SHA-256 Hashing in C++
4 pages
Compiler Design Phases Overview
No ratings yet
Compiler Design Phases Overview
117 pages
Fuzzy Logic: Concepts and Applications
100% (2)
Fuzzy Logic: Concepts and Applications
4 pages
VCLBook
No ratings yet
VCLBook
462 pages
Digital Image Processing Lecture Notes
100% (1)
Digital Image Processing Lecture Notes
10 pages
Afaan Oromo NER with Deep Learning
No ratings yet
Afaan Oromo NER with Deep Learning
93 pages
CS 224N: Natural Language Processing Syllabus
No ratings yet
CS 224N: Natural Language Processing Syllabus
6 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
123 pages
Deep Learning Project Ideas for Students
No ratings yet
Deep Learning Project Ideas for Students
30 pages
Intro To Reverse Engineering - No Assembly Required
100% (1)
Intro To Reverse Engineering - No Assembly Required
19 pages
Semaphore Synchronization in C Lab
No ratings yet
Semaphore Synchronization in C Lab
6 pages
VMware vSphere Virtual Data Center Guide
No ratings yet
VMware vSphere Virtual Data Center Guide
23 pages
Insertion Sort Algorithm Explained
100% (1)
Insertion Sort Algorithm Explained
2 pages
Installing Arm Keil IDE Guide
No ratings yet
Installing Arm Keil IDE Guide
27 pages
Machine Learning Experiment Guidelines
No ratings yet
Machine Learning Experiment Guidelines
6 pages
Solving 8 Queens with Backtracking
100% (1)
Solving 8 Queens with Backtracking
4 pages
Understanding Search Engine Functionality
No ratings yet
Understanding Search Engine Functionality
17 pages
Overview of Search Engine Functionality
No ratings yet
Overview of Search Engine Functionality
13 pages
9 Information Search and Surfing
No ratings yet
9 Information Search and Surfing
7 pages
Understanding Comfort in Clothing
No ratings yet
Understanding Comfort in Clothing
23 pages
Arun C Smart Fabrics
100% (5)
Arun C Smart Fabrics
51 pages
Market Segmentation Strategies Explained
80% (5)
Market Segmentation Strategies Explained
22 pages
Understanding Market Segmentation
No ratings yet
Understanding Market Segmentation
49 pages
Data Structures & Algorithms Question Bank
No ratings yet
Data Structures & Algorithms Question Bank
3 pages
CN Unit 1
No ratings yet
CN Unit 1
25 pages
Blockchain's Impact on Energy Sector
No ratings yet
Blockchain's Impact on Energy Sector
22 pages
Case Book
No ratings yet
Case Book
78 pages
Indexing Techniques for Online Content
No ratings yet
Indexing Techniques for Online Content
3 pages
ANSYS Mechanical APDL Tutorials 16.2
No ratings yet
ANSYS Mechanical APDL Tutorials 16.2
140 pages
Fast Flux Networks: An In-Depth Analysis
No ratings yet
Fast Flux Networks: An In-Depth Analysis
17 pages
PPS Question Bank for C Programming
No ratings yet
PPS Question Bank for C Programming
65 pages
Understanding SEO Silo Structures
No ratings yet
Understanding SEO Silo Structures
14 pages
AT-S114 V2.0.2 (1.00.035) AT-GS950/16 Gigabit Ethernet Smart Switch Software Release Notes
No ratings yet
AT-S114 V2.0.2 (1.00.035) AT-GS950/16 Gigabit Ethernet Smart Switch Software Release Notes
4 pages
NeuralSteiner: Overflow-Avoiding Routing
No ratings yet
NeuralSteiner: Overflow-Avoiding Routing
23 pages
Understanding Router Functions and Types
No ratings yet
Understanding Router Functions and Types
2 pages
AI Tools and Resources Overview
No ratings yet
AI Tools and Resources Overview
15 pages
Matrix Algebra Exercises and Solutions PDF
100% (3)
Matrix Algebra Exercises and Solutions PDF
292 pages
CSV File Class 12 Short Revision Notes
No ratings yet
CSV File Class 12 Short Revision Notes
7 pages
BDCOM Online Limited: Leading ICT Solutions
No ratings yet
BDCOM Online Limited: Leading ICT Solutions
3 pages
Prodapt Recruitment Drive for B.Tech Students
No ratings yet
Prodapt Recruitment Drive for B.Tech Students
3 pages
Bank Database Design and Transactions Guide
No ratings yet
Bank Database Design and Transactions Guide
2 pages
Fake Invoice Dataset for Learning
No ratings yet
Fake Invoice Dataset for Learning
1 page
Computer Science Class XII Syllabus
No ratings yet
Computer Science Class XII Syllabus
2 pages
BaaS Brochure PDF
No ratings yet
BaaS Brochure PDF
6 pages
Trusted Authentication Setup for BI 4.3
No ratings yet
Trusted Authentication Setup for BI 4.3
6 pages
Introduction to Computer Algorithms
No ratings yet
Introduction to Computer Algorithms
22 pages
IMB MB6002C Button Feeder Manual
No ratings yet
IMB MB6002C Button Feeder Manual
20 pages
GCC Unit - 1 Notes
No ratings yet
GCC Unit - 1 Notes
32 pages
Digital Communications Standards for Amway
No ratings yet
Digital Communications Standards for Amway
15 pages
Summer 2024 Esri User Conference Insights
No ratings yet
Summer 2024 Esri User Conference Insights
72 pages
Supervised Machine Learning Guide
No ratings yet
Supervised Machine Learning Guide
11 pages
Machine Learning for Stress Detection
No ratings yet
Machine Learning for Stress Detection
7 pages
BashCrawl Game Overview and Tasks
No ratings yet
BashCrawl Game Overview and Tasks
4 pages