0% found this document useful (0 votes)
8 views2 pages

Databases

The document discusses the significance of databases, particularly SQL and relational databases, in data collection and analysis, highlighting their scalability and ACID properties. It explains the evolving concept of 'big data' and how relational databases can efficiently handle large datasets while providing robust transaction support. Additionally, it emphasizes the role of optimization engines in executing SQL queries effectively across different hardware configurations.

Uploaded by

ndolo
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

Databases

The document discusses the significance of databases, particularly SQL and relational databases, in data collection and analysis, highlighting their scalability and ACID properties. It explains the evolving concept of 'big data' and how relational databases can efficiently handle large datasets while providing robust transaction support. Additionally, it emphasizes the role of optimization engines in executing SQL queries effectively across different hardware configurations.

Uploaded by

ndolo
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Databases, SQL, and Big Data

Collecting and analyzing data is a major activity, so many tools are available for this
purpose. Some of these focus on “big data” (whatever that might mean). Some focus on
consistently storing the data quickly. Some on deep analysis. Some have pretty visual
interfaces; others are programming languages.

SQL and relational databases are a powerful combination that is useful in any arsenal of
tools for analysis, particularly ad hoc analyses:

A mature and standardized language for accessing data Multiple vendors, including open
source Scalability over a very broad range of hardware A non-programming interface for
data manipulations.

Before continuing with SQL, it is worth looking at SQL in the context of other tools.

What Is Big Data?

Big data is one of those concepts whose definition changes over time. In the 1800s, when
statistics was first being invented, researchers worked with dozens or hundreds of rows
of data. That might not seem like a lot, but if you have to add everything up with a pencil
and paper, and do long division by hand or using a slide rule, then it certainly seems like
a lot of data.

The concept of big data has always been relative, at least since data processing was
invented. The difference is that now data is measured in gigabytes and terabytes—
enough bytes to fit the text in all the books in the Library of Congress—and we can readily
carry it around with us. The good news is that analyzing “big data” no longer requires
trying to get data to fit into very limited amounts of memory. The bad news is that simply
scrolling through “big data” is not sufficient to really understand it.

This book does not attempt to define “big data.” Relational databases definitely scale well
into the tens of terabytes of data—big by anyone’s definition. They also work efficiently
on smaller datasets, such as the ones accompanying this book.

Relational Databases

Relational databases, which were invented in the 1970s, are now the storehouse of
mountains of data available to businesses. To a large extent, the popularity of relational
databases rests on what are called ACID properties of transactions:

• Atomicity
• Consistency
• Isolation
• Durability
These properties basically mean that when data is stored or updated in a database, it
really is changed. The databases have transaction logs and other capabilities to ensure
that changes really do happen and that modified data is visible when the data
modification step completes. (The data should even survive major failures such as the
operating system crashing.) In practice, databases support transactions, logs, replication,
concurrent access, stored procedures, security, and a host of features suitable for
designing real-world applications.

From our perspective, a more important attribute of relational databases is their ability
to take advantage of the hardware they are running on—multiple processors, memory,
and disk. When you run a query, the optimization engine first translates the SQL query
into the appropriate lower-level algorithms that exploit the available resources. The
optimization engine is one of the reasons why SQL is so powerful: the same query running
on a slightly different machine or slightly different data might have very different
execution plans. The SQL remains the same; it is the optimization engine that chooses
the best way to execute the code.

You might also like