Databases and Research Metrics
In 21st century Innovation and knowledge are the key drivers of
the economic development for any country.
Research data management and scholarly databases are
essential for economic and academic progress.
Generating, storing, and to meaningfully use the data
related to research outputs are crucial ingredients for
innovation and knowledge creation in any country.
Growth of Research and Challenges
The huge increase in academic papers since World War II has
made it hard to store, organize, and access them.
Over two million papers are published each year, and this
number keeps growing.
Data Overload and Evaluation Issues
Traditional peer reviews are limited at scale, requiring
supplementary objective metrics for assessment.
To explore major scholarly databases and the metrics used
to evaluate research quality (journals, individuals, and
institutions).
Historical Introduction
Quality research is usually published in peer-reviewed journals,
but with the huge increase in journals and papers, we need
ways to measure their quality.
In 1955, Eugene Garfield suggested using citations, inspired by
legal citations from 1873, to assess journal quality.
He founded the Institute for Scientific Information (ISI) in 1960
and launched the Science Citation Index (SCI) in 1964.
Garfield introduced the Journal Impact Factor (JIF), which
measures how often a journal’s articles are cited in a year,
based on citations from the previous two years.
This led to the field of Bibliometrics, which analyzes
publications using math and statistics
JIF became a gold standard for research evaluation. It is used
judge journal prestige, journal quality and a researcher’s
success.
ISI later introduced: Social Sciences Citation Index (SSCI,
1972),Arts & Humanities Citation Index (AHCI, 1978). At present
in the form of Web of Science (Web of Knowledge), along with
Scopus, it continues to be one of the most prestigious
subscription databases.
Other databases, like Google Scholar launched in 2004, also
boosted the field with free access options.
Databases and Indexing
The first step in any serious research project is a
comprehensive literature search.
Accessing high-quality, peer-reviewed literature is essential but
can be time-consuming without reliable databases.
Trusted academic databases that include top journals and the
best research papers simplify this process significantly.
Selecting the right database is crucial for efficient and
credible research.
Researchers must be aware of the types of databases available
and their limitations
Databases
Web of Science (WoS) / Web of Knowledge
( [Link] )
Founded by Eugene Garfield (1961) as ISI, later acquired
by Thomson Reuters, now owned by Clarivate
Analytics.
It covers 254 disciplines with 171 million records and 1.9
billion citations.
Gold standard for citation analysis, journal impact metrics,
and multidisciplinary research.
It offers powerful search tools for journals, books, patents,
and more, covering from the year 1900 to the present day.
Integrates Science Citation Index (SCI), Social
Sciences Citation Index (SSCI), and Arts &
Humanities Citation Index (AHCI).
Core Collection includes:
o Science Citation Index Expanded (SCIE): 8,500+
journals (1900–present).
o Social Sciences Citation Index (SSCI): 3,000+
journals (1900–present).
o Arts & Humanities Citation Index (AHCI): 1,700+
journals (1975–present).
o Emerging Sources Citation Index (ESCI): 5,000+
newer journals.
o Book Citation Index: 60,000+ books (2005–
present).
o Conference Proceedings Citation Index:
160,000+ conferences (1990–present).
Scopus/ScienceDirect
Launched in 2004 by Elsevier.
Scopus is a major database covering 80 million
records (oldest from 1788), 1.5 billion
citations, 24,000 journals across various fields.
Includes "Articles-in-Press" (4,000 journals).
It provides citation tracking and analytics.
ScienceDirect, also by Elsevier, offers full-text access to 16
million articles and 40,000 e-books, primarily in science,
technology, and health.
Google Scholar:
Launched in 2004,
it’s the largest free academic search engine, indexing
about 390 million documents, including articles, theses,
and books.
It sorts results by relevance, tracks citations, and covers
diverse sources, finding 88% of citations not always found
in other databases.
Sources include preprint repositories, universities, and open-access
journals.
Free, broadest coverage, but less curated (includes predatory
journals).
Microsoft Academic
A free search engine launched in 2014 (rebuilt with AI),
indexing over 220 million publications.
It uses semantic search technology and integrates with
Bing, covering 60% of all citations, including many from
Scopus and WoS.
Database Focus Key Features
30M+ abstracts, links
Biomedical
PubMed to PubMed Central (free
sciences
full-text).
1.3M+ freely accessible
ERIC Education
education research items.
1M+ preprints; free
SSRN Social sciences
membership required.
Open-access 12,000 journals, 4.5M+
DOAJ
journals free articles.
Humanities/Social 12M+ items; free access to
JSTOR
sciences pre-1924 US publications.
5M+ documents (journals,
IEEE Xplore Engineering/CS conferences, standards);
subscription-based.
Open-access 200M+ free papers; API-
CORE
aggregator supported.
500,000+ theses (50%
EThOS UK theses
free).
Indian Indian journals Covers 800+ South Asian
Database Focus Key Features
Citation journals (subscription-
Index (ICI) based).
Free repository for physics,
arXiv Preprints (STEM) CS, math, etc.; widely used
pre-publication.
CiteSeerX:
A free search engine and digital library focused on
computer and information sciences, started in 1997.
It indexes publicly available documents and pioneered
automated citation indexing, hosted by Penn State
University.
WorldWideScience (WWS):
Federated search across 70+ countries;
Offers automatic translation of results into multiple
languages
Semantic Scholar:
Launched in 2015 by the Allen Institute
It uses AI prioritizes high-impact papers
indexing over 175 million documents across disciplines.
For rigorous metrics: Use WoS or Scopus.
For breadth: Google Scholar or Microsoft Academic.
For open access: DOAJ, CORE, arXiv.
Discipline-specific: PubMed (medicine), IEEE Xplore
(engineering), ERIC (education).
Each database has unique strengths, and researchers should
select based on discipline, accessibility, and evaluation needs.
Indexing
The Importance of Journal Indexing
1. Why Indexing Matters
Indexing ensures research articles are
discoverable by scholars, increasing their impact.
Journals indexed in reputable bibliographic databases
(e.g., WoS, Scopus or Google Scholar) are of higher
quality due to the rigorous vetting process. Indexing
enhances a journal’s prestige and visibility.
Indexed content becomes instantly available to all
users.
Databases may index titles, full articles, abstracts, or
references, depending on their criteria.
2. Types of Indexes
o General Search Engines: Google Scholar, Microsoft
Academic (use web crawlers to index freely
accessible content).
o Disciplinary Databases: PubMed (biomedicine),
IEEE Xplore (engineering), ERIC (education).
o Aggregators: DOAJ (open access), CORE (open
research).
Starting with general search engines, these indexes will be
searching the web for content to index via computer
programs commonly referred to as ‘crawlers’, ‘spiders’, or
‘bots’.
3. Requirements for Indexing
To get indexed, journals must meet specific requirements,
which vary by database but often include:
ISSN (International Standard Serial Number).
DOIs (Digital Object Identifiers) for all articles.
Editorial Board page with names, titles, and affiliations of
the editors.
Clearly stated time bound peer-review policy and
publishing schedule.
Established copyright or intellectual property rights policy.
At least basic article-level metadata to facilitate indexing.
For better indexing, journals should provide full-text articles in
machine-readable formats like HTML or XML, which are more
search-engine and mobile-friendly than PDFs. Some databases,
like PubMed Central, require full-text XML files for the articles.
Journals search should start with general search engines like
Google Scholar and Microsoft Academic, which are subscription-
free and crawl websites using bots. Google Scholar and
Microsoft Academic have quality controls to index only
academic content, contrary to the misconception that they
index everything.
To check if a journal is indexed, search “site:journalURL” in
these engines.
Indexing is non-negotiable for journal credibility and author
visibility. Researchers should prioritize publishing in indexed
journals, while journals must adhere to technical and ethical
standards to qualify for inclusion.
Research metrics
Research metrics are bibliometric tools used to assess journal
performance and author/institutional impact, primarily based
on number of publications (productivity) and citation count
(impact).
The Journal Impact Factor (JIF) ranks academic journals
based on the citations they receive and the impact they
created in the scientific communities.
Since 1975, the Science Citation Index (SCI) has published JIF
and the Immediacy Index through Journal Citation Reports
(JCR), providing insights into citation data.
From its beginning, the SCI database included details of
institution affiliation of all authors for any article published in a
journal. This facilitated research collaborations and the
globalization of scientific research.
Other metrics, like Eigenfactor, h-index, and Altmetrics, have
emerged alongside JIF.
The logic behind these metrics is that highly cited papers
indicate higher quality research. Journals with a higher citation
average are considered more prestigious. While self-citations
are generally acceptable, excessive self-citation may raise
concerns among reviewers and analysts. To ensure fairness in
evaluation, self-citations are often excluded from metric
calculations.
Self-citations—where authors cite their own work or journals
reference their previously published articles.
Journal Metrics: Impact Factor (JIF)
Journal Impact Factor (JIF) is a widely recognized metric to
assess journal performance, introduced by the Institute for
Scientific Information (ISI, now Clarivate Analytics).
Initially designed in the 1960s to help librarians in managing
journal collections, JIF has become a common indicator of
journal quality.
It is published annually in the Journal Citation Reports (JCR)
as part of the Science Citation Index (SCI) and Social
Sciences Citation Index (SSCI).
How JIF is Calculated:
Formula:
Impact Factor of a given Journal for the year 2019:
Citations in 2019 to papers published in 2017-2018 = A
Number of papers published in 2017-2018 = B
Impact Factor = A/B
JIF = (Citations in 2019 to papers published in 2017-2018) ÷
(Number of papers published in 2017-2018).
Example: A JIF of 2.0 means articles from the prior two years
were cited twice on average in the current year.
JIF measures the average number of citations received in a
given year (e.g., 2019) by articles published in a journal
during the two preceding years (e.g., 2017 and 2018).
Purpose and Use:
JCR provides a systematic, quantifiable way to rank,
evaluate, and compare journals across 250+ disciplines ,
covering ~23,000 journalsin WoS, with ~12,600 receiving
JIFs (9,200 in SCIE, 3,400 in SSCI).
JIF helps clarify citation significance, reducing biases
favouring large, frequent, or older journals.
It’s used in academic evaluations to gauge journal
prestige, aids librarians in collection management, and
helps editors/publishers position journals competitively.
Advertisers use JIF to assess a journal’s market potential.
Journal Evaluation Process:
Clarivate uses 28 criteria (24 for editorial quality, 4 for
impact) to evaluate journals.
Journals meeting quality criteria but not impact criteria are
included in the Emerging Sources Citation Index
(ESCI). Those meeting both can move to SCIE, SSCI, or
AHCI and receive a JIF.
Journals are dynamically assessed; underperforming ones
may be moved to ESCI or excluded.
Limitations of JIF:
Skewed by Outliers: A few highly cited articles inflate
JIF. JIF is an average and can be heavily influenced by a
few highly cited articles, not reflecting the citation spread.
Citation Context: JIF counts citations without considering
their quality or reason (positive or negative).
Field Differences: JIFs vary across disciplines due to
different citation patterns, hence JIFs are not comparable
across disciplines (e.g., Medicine vs. Mathematics).
Content Types: Citations to non-citable items (editorials,
news) count in the numerator but not denominator,
potentially inflating JIF.
Yearly Fluctuations: Smaller journals show larger JIF
variations due to fewer articles, causing random
fluctuations.
Conclusion: JIF is a valuable but imperfect tool for assessing
journal impact. It should be used cautiously with other metrics
(e.g., peer review, subject-specific citation rates) to avoid
misleading conclusions.
For Authors: Prioritize field relevance and open
access over JIF alone.
For Journals: Focus on ethical citation practices to
avoid manipulation.
Immediacy Index
Measures how quickly articles in a journal are cited in
the same year they are published.
Example: For 2023, immediacy index = citations in 2023 ÷
articles published in 2023.
Immediacy index useful for assessing journals in fast-moving or
emerging research fields.
The journal Immediacy Index indicates how quickly articles in a
journal are cited. A related idea, the Aggregate Immediacy
Index indicates how quickly articles in a subject category are
cited.
Advantages: Reduces bias favouring large journals, as it’s a per-
article average.
Limitations:
Frequency Bias: Journals with more issues/year have an
advantage.
Time Bias: Articles published late in the year have little chance
to earn citations, skewing results.
Not Predictive: Does not predict long-term citation impact, as
citations often peak after several years
5-Year Impact Factor
Average citations over 5 years (e.g., 2023 citations to articles
published in 2018–2022).
Purpose:
o Better for slow-citing disciplines (e.g.,
mathematics, humanities).
o Offers greater stability for smaller journals due to a
larger sample of articles and citations.
o Same limitations as traditional Impact Factor,
Time
Metric Best For Major Pitfalls
Window
Same- Emerging fields, Late-year
Immediac
year fast-moving publications
y Index
citations research penalized
5-Year IF 5-year Slow-citing fields, Still skewed by
Time
Metric Best For Major Pitfalls
Window
citation
citations small journals
disparities
Field biases,
Traditiona 2-year General
outlier
l JIF citations benchmarking
sensitivity
Impact Per Publication (IPP):
A metric introduced in 2014 by Leiden University’s Centre for
Science and Technology Studies (CWTS) using the Scopus
database.
Measures average citations per publication over a 3-year
window
(e.g., 2019 IPP = 2019 citations to 2016–2018 publications ÷
total publications in 2016–2018).
Similar to JIF but with a longer citation window (3 years vs.
2).
Discontinued in 2016 and replaced by CiteScore.
CiteScore
CiteScore = Citations in Year X to documents published in (X-3
to X-1) ÷ Total documents published in (X-3 to X-1)
Includes more document types than JIF (articles, reviews,
conference papers, editorials, letters, etc.).
Broader coverage: Uses Scopus’s larger
database (vs. WoS for JIF).
Transparency: Free to access via [Link].
Additional Metrics:
o CiteScore Percentile: Ranks journals within subject
categories.
o CiteScore Tracker: Monthly updates for real-time trends.
Limitations:
o Field Bias: Like JIF, it’s not comparable across disciplines.
o Skewed Distributions: A few highly cited papers can inflate
scores.
CiteScore vs. JIF: Key Differences
Journal Impact
Metric CiteScore (Scopus)
Factor (JIF, WoS)
Citation 3 years (e.g., 2023 cites 2 years (e.g., 2023
Window 2020–2022) cites 2021–2022)
Scopus (broader Web of Science (more
Database
coverage) selective)
Counts all document Excludes editorials,
Document
types in numerator & letters, etc. from
Types
denominator denominator
Availabilit All Scopus-indexed Only SCIE/SSCI
y journals journals (~12,600)
Field Bias Not normalized Not normalized
Cited Half-Life vs. Citing Half-Life
The Cited Half-Life and Citing Half-Life metrics help assess
the longevity and relevance of journal articles in academic
research.
Aspect Cited Half-Life Citing Half-Life
Median age of a Median age of
journal's articles cited references used by a
Definition in a given journal in a given
year (incoming year (outgoing
citations). citations).
Shows how recent
Measures how long a
or historical a
Purpose journal’s content
journal’s
remains influential.
references are.
Based on Based on
Calculation citations received by citations given by
the journal. the journal.
Short half-life (2-3
Short half-life: Cites
yrs): Fast-moving fields
recent research
(e.g., AI, medicine).
Interpretat (trend-driven).
Long half-life (10+
ion Long half-life:
yrs): Foundational
Relies on older works
fields (e.g., math,
(theory-heavy).
history).
- Reveals journal’s
- Helps librarians citation habits.
decide backfile - Useful
Use Cases retention. for publisher
- Identifies archivally strategy (e.g.,
significant journals. cutting-edge vs.
review-focused).
Aspect Cited Half-Life Citing Half-Life
- Influenced by
journal scope (review
- Varies by discipline.
vs. original research).
Limitation - Does not explain
- Database-
s citation context
dependent
(quality/relevance).
(WoS/Scopus may
differ).
1. Cited Half-Life → Journal’s staying power (how long its
articles are cited).
2. Citing Half-Life → Journal’s research habits (how old
its references are).
3. Both metrics are complementary and help
assess journal impact, archival value, and
disciplinary trends.
Newly Emerged Indicators : Eigenfactor, Article Influence, SNIP, SJR
Metric Calculation Method Strengths Limitations
Weighted citations (5-
Accounts for Favors large
year window) ÷ total
Eigenfact citation quality; journals;
articles; citations from
or reduces self- complex
prestigious journals
citation bias calculation
count more
Normalized per-
article impact
Article (Eigenfactor × 0.01) ÷ Still influenced
(avg=1.0);
Influence normalized article count by journal size
comparable across
fields
SNIP Citations ÷ expected Field-normalized; Excludes non-
citations for field (3- enables cross- citing sources;
year window) discipline updates twice
Metric Calculation Method Strengths Limitations
comparisons yearly
Prestige-weighted Values citations Complex
citations ÷ articles from high- algorithm;
SJR
(PageRank-like network reputation journals; Scopus-only
analysis) field-normalized coverage
Key Notes:
1. All metrics improve upon traditional Impact Factor by addressing its biases
2. Eigenfactor/SJR emphasize citation quality (who cites you)
3. SNIP/SJR enable fair field comparisons
4. Article Influence provides per-paper impact assessment
Best for:
Interdisciplinary work → SNIP/SJR
Top-journal rankings → Eigenfactor
Per-article impact → Article Influence
Author-Level Metrics:
Author-level metrics extend citation-based journal metrics to evaluate individual
researchers’ productivity and impact, aiding career progression and collaboration
insights.
h-index ( Introduced by Jorge E. Hirsch in 2005)
Definition: An author has an h-index of h if h of their papers have at least
h citations each. An h-index of 20 means 20 papers with at least 20
citations each.
Measures both productivity (number of papers) and impact (citations).
Strengths:
o Balances quality and quantity
o Not skewed by a few highly-cited papers or many low-cited
papers
o Widely recognized and used
Limitations:
o Varies across databases (Google Scholar > Scopus > WoS)
o Self-Citation Bias:Susceptible to self-citation inflation
o Field-dependent (not comparable across disciplines)
o Favors senior researchers over early-career scientists
o Doesn't account for citation quality/prestige . Misses unique
work.
Related Metrics:
H-core: The set of top h cited articles.
h-median: Median citation count in the h-core.
h5-index: h-index restricted to the past 5 years (with h5-core and h5-median).
o
g-index (Proposed by Leo Egghe in 2006)
Definition: The largest number where the top g papers have received
together at least g² citations
Example: If the top 10 articles have at least 100 citations combined (10²), the g-index
is 10.
Key Differences from h-index:
o Gives more weight to highly-cited papers (Gives credit to impactful
articles)
o Always ≥ h-index , (Reflects the influence of top papers.)
Limitations:
o Same field-dependence issues as h-index
o Still favors established researchers
o Complex calculation
Other Author-level Metrices
Metric Definition Purpose Strengths Limitations
i10-index Number of Measures Simple; good Less useful in
publications with ≥10 productivity for early-career low-citation fields
Metric Definition Purpose Strengths Limitations
citations and visibility researchers
h-index with time
decay (older citations
hc-index Reduces May undervalue
weighted less). Reflects recent
(Contemporary influence of old foundational
Weighs newer articles research impact
h-index) papers research
more heavily than
older ones
Normalizes Helps compare Can be skewed by
h-index ÷ years since citation impact researchers at early-career
m-index
first publication by career different career researchers with
length stages few publications
These metrics complement traditional indicators like the h-index, offering more nuanced
insights into research productivity and influence.
Altmetrics
Altmetrics (alternative metrics) measure the broader societal and online impact
of research beyond traditional citations.
Key Altmetrics Tools
1. Altmetric Attention Score ([Link])
o Visualized as a "donut" showing attention sources (e.g., Twitter,
news, blogs).
o Scores reflect the volume and diversity of online engagement.
2. PlumX Metrics (PlumAnalytics)
o Categorizes impact into five areas: usage, captures, mentions,
social media, and citations.
o Tracks both traditional and alternative research outputs (e.g.,
datasets, code).
3. ResearchGate Score
o Measures academic reputation based on peer interactions and
contributions.
o Weighted by the influence of evaluators.
4. Impactstory ([Link])
o Aggregates altmetrics from diverse platforms (e.g., GitHub,
Wikipedia).
o Links to ORCID for researcher profiling.
Uses of Altmetrics
For Researchers: Track real-time engagement, demonstrate societal
impact, and enhance visibility.
For Institutions/Publishers: Showcase research influence, monitor
competitor output, and enrich institutional repositories.
For Librarians: Supplement traditional metrics with article-level impact
data.
Limitations
Manipulation Risk: Susceptible to artificial inflation (e.g., bot-driven
shares).
Popularity ≠ Quality: High altmetrics may reflect public interest, not
scholarly rigor.
Field Bias: More relevant for social sciences/humanities than some STEM
fields.
Unique Identifiers for Research Contributors/Authors
Searching online databases for an author’s publications often returns ambiguous
results due to similar or identical names. Unique Identifiers (UIDs) solve this
by assigning a distinct code to each researcher, ensuring accurate attribution of
scholarly work. Two primary systems provide UIDs:
ResearcherID (available at [Link]): A unique identifier linked
to Web of Science, allowing researchers to manage publication profiles and
track citations.
ORCID (available at [Link]): A non-proprietary, community-driven
identifier used globally across platforms, linking researchers to
publications, datasets, and more.
Creating a UID enables researchers to build online profiles, connect with
collaborators, and join discussion groups based on research interests, enhancing
visibility and networking.