100% found this document useful (1 vote)

42 views31 pages

Cassandra

Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability and proven fault-tolerance. Some key versions include 1.2, 2.0, 2.1, 2.2, 3.0 and the latest version is 3.10.3004.

Uploaded by

Vignesh Sreenivasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

42 views31 pages

Cassandra

Uploaded by

Vignesh Sreenivasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 1: Getting started with cassandra
Chapter 2: Cassandra - PHP
Chapter 3: Cassandra as a Service
Chapter 4: Cassandra keys
Chapter 5: Connecting to Cassandra
Chapter 6: Repairs in Cassandra
Chapter 7: Running Repair on Cassandra
Chapter 8: Security

cassandra

#cassandra
Table of Contents
About 1

Chapter 1: Getting started with cassandra 2

Remarks 2

Versions 3

Examples 5

Installation or Setup 5

Single node Installation 5

1. Installing a Debian package (installs Cassandra as a service) 6

2. Installing any version of Cassandra in form of binary tarball (installs Cassandra as a 6

Multi node installation 7

Multi DC Cluster Installation 7

Chapter 2: Cassandra - PHP 8

Examples 8

Simple console application 8

Chapter 3: Cassandra as a Service 10

Introduction 10

Examples 10

Windows 10

Linux 10

Chapter 4: Cassandra keys 12

Examples 12

Partition key, clustering key, primary key 12

The PRIMARY KEY syntax 12

Declaring a key 12

Examples 12

Syntax summary 13

Key ordering and allowed queries 13

Chapter 5: Connecting to Cassandra 15

Remarks 15

Examples 15
Java: Include the Cassandra DSE Driver 15

Java: Connect to a Local Cassandra Instance 15

Java: Connect Using a Singleton 16

Chapter 6: Repairs in Cassandra 18

Parameters 18

Remarks 18

Examples 19

Examples for running Nodetool Repair 19

Chapter 7: Running Repair on Cassandra 22

Syntax 22

Parameters 22

Examples 24

Running repair on Cassandra 24

Run repair on a particular partition range. 24

Run repair on the whole cluster. 24

Run repair in parallel mode. 24

Chapter 8: Security 25

Remarks 25

Cassandra security resources 25

Examples 25

Configuring internal authentication 25

(Optional) Replace default superuser with custom user 26

Configuring internal authorization 26

Credits 28
About
You can share this PDF with anyone you feel could benefit from it, downloaded the latest version
from: cassandra

It is an unofficial and free cassandra ebook created for educational purposes. All the content is
extracted from Stack Overflow Documentation, which is written by many hardworking individuals at
Stack Overflow. It is neither affiliated with Stack Overflow nor official cassandra.

The content is released under Creative Commons BY-SA, and the list of contributors to each
chapter are provided in the credits section at the end of this book. Images may be copyright of
their respective owners unless otherwise specified. All trademarks and registered trademarks are
the property of their respective company owners.

Use the content presented in this book at your own risk; it is not guaranteed to be correct nor
accurate, please send your feedback and corrections to info@[Link]

[Link] 1
Chapter 1: Getting started with cassandra
Remarks
The Apache Cassandra database is the right choice when you need scalability and high
availability without compromising performance. Linear scalability and proven fault-tolerance on
commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower
latency for your users and the peace of mind of knowing that you can survive regional outages.

PROVEN

Cassandra is in use at Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu,
Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have
large, active data sets.

FAULT TOLERANT

Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple
data centers is supported. Failed nodes can be replaced with no downtime.

PERFORMANT

Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real

applications, primarily because of fundamental architectural choices.

DECENTRALIZED

There are no single points of failure. There are no network bottlenecks. Every node in the cluster
is identical.

SCALABLE

Some of the largest production deployments include Apple's, with over 75,000 nodes storing over
10 PB of data, Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search
engine Easou (270 nodes, 300 TB, over 800 million reqests per day), and eBay (over 100 nodes,
250 TB).

DURABLE

Cassandra is suitable for applications that can't afford to lose data, even when an entire data
center goes down.

YOU'RE IN CONTROL

Choose between synchronous or asynchronous replication for each update. Highly available
asynchronous operations are optimized with features like Hinted Handoff and Read Repair.

[Link] 2
ELASTIC

Read and write throughput both increase linearly as new machines are added, with no downtime
or interruption to applications.

PROFESSIONALLY SUPPORTED

Cassandra support contracts and services are available from third parties.

Versions

Version Release Date

1.1.12 2013-11-19

1.1.9 2013-02-11

1.2.12 2013-11-28

1.2.13 2013-12-19

1.2.15 2014-02-19

1.2.16 2014-04-22

1.2.17 2014-06-25

1.2.18 2014-07-04

1.2.19 2014-11-14

1.2.6 2013-07-02

1.2.8 2013-07-27

2.0.10 2014-08-12

2.0.11 2014-10-17

2.0.12 2015-01-14

2.0.13 2015-03-20

2.0.14 2015-04-01

2.0.15 2015-06-01

2.0.16 2015-07-08

2.0.17 2015-09-18

[Link] 3
Version Release Date

2.0.5 2014-02-13

2.0.6 2014-04-02

2.0.7 2014-04-24

2.0.8 2014-06-13

2.0.9 2014-07-22

2.1.11 2015-10-12

2.1.12 2015-10-22

2.1.2 2014-11-20

2.1.3 2015-03-03

2.1.4 2015-04-01

2.1.5 2015-03-31

2.1.6 2015-06-09

2.1.7 2015-06-18

2.1.8 2015-07-03

2.1.9 2015-09-03

2.2.0 2015-05-14

2.2.0-beta1 2015-05-19

2.2.0-rc1 2015-06-04

2.2.0-rc2 2015-06-30

2.2.1 2015-08-25

2.2.2 2015-09-25

2.2.3 2015-10-12

2.2.4 2015-12-02

3.0.0 2015-01-26

3.0.0-alpha 2015-07-29

[Link] 4
Version Release Date

3.0.0-alpha1 2015-07-18

3.0.0-beta1 2015-07-10

3.0.0-beta2 2015-09-04

3.0.0-rc1 2015-07-16

3.0.0-rc2 2015-10-16

3.0.1 2015-12-04

3.0.2 2016-01-21

3.0.3 2015-11-24

3.0.4 2016-02-05

3.0.5 2016-04-02

3.0.6 2016-03-31

3.0.7 2016-05-24

3.0.8 2016-05-25

3.2.819 2016-01-05

3.4.950 2016-03-08

3.6.1076 2016-05-02

3.8.1199 2016-06-27

3.10.3004 2016-08-10

(Got this using a bit of awk: git

log --tags --simplify-by-decoration --pretty="format:%ai %d"
|egrep "\(tag: [0-9]"| awk -F" " '{ print $1 " " $5}'|awk -F"." '{print $1 "." $2 "." $3}'| awk
-F" " '{print $2 " |" $1}'| sed 's/)//'|sed 's/,//'| sort -n|sort -u -t" " -k1,1 | awk '{print
"|" $0 "|"}')

Examples
Installation or Setup

Single node Installation

1. Pre-install NodeJS, Python and Java

[Link] 5
2. Select your installation document based on your platform
[Link]
3. Download Cassandra binaries from [Link]
4. Untar the downloaded file to <installation location>
5. Start the cassandra using <installation location>/bin/cassandra OR start Cassandra as a
service - [sudo] service cassandra start
6. Check whether cassandra is up and running using <installation location>/bin/nodetool
status.

Ex:

1. On Windows environment run [Link] file to start Cassandra server and [Link] to
open CQL client terminal to execute CQL commands.

There are two ways that installation for a Single Node can be carried out.

You should have Oracle Java 8 or OpenJDk 8 (preferred for Cassandra versions > 3.0)

1. Installing a Debian package (installs Cassandra as a service)

Add the Cassandra version to the repository (replace the 22x with your own version for example
for 2.7 use 27x)

echo "deb-src [Link] 22x main" | sudo tee -a

/etc/apt/[Link].d/[Link]
# Update the repository
sudo apt-get update
# Then install it
sudo apt-get install cassandra cassandra-tools

Now Cassandra can be started and stopped using:

sudo service cassandra start

sudo service cassandra stop

Check the status using:

nodetool status

Logs and Data directories are /var/log/cassandra and /var/lib/cassandra respectively.

2. Installing any version of Cassandra in form of binary tarball (installs

Cassandra as a standalone process)

Download the Datastax version:

curl -L [Link] | tar

[Link] 6
Or Apache Cassandra binary tarball manually (from the site
[Link]

Now untar this:

tar -xvzf dsc-cassandra-version_number-[Link]

Change the directory to install location:

cd install_location

Start Cassandra using:

sudo sh ./bin/cassandra

Stop using:

sudo kill -9 pid

Check:

./bin/nodetool status

And viola, you have a single-node test cluster for Cassandra. So just use cqlsh in the terminal for
Cassandra shell.

Configuration of Cassandra can be done in [Link] in conf folder in install_location.

Multi node installation

Multi DC Cluster Installation

Read Getting started with cassandra online: [Link]
started-with-cassandra

[Link] 7
Chapter 2: Cassandra - PHP
Examples
Simple console application

Download the Datastax PHP driver for Apache Cassandra from the Git project site, and follow the
installation instructions.

We will be using the "users" table from the KillrVideo app and the Datastax PHP driver. Once
you have Cassandra up and running, create the following keyspace and table:

//Create the keyspace

CREATE KEYSPACE killrvideo WITH REPLICATION =
{ 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

//Use the keyspace

USE killrvideo;

// Users keyed by id
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);

Create a PHP file with your favorite editor. First, we need to connect to our “killrvideo” keyspace
and cluster:

build();
$keyspace = 'killrvideo';
$session = $cluster->connect($keyspace);

Let’s insert a user into the “users” table:

execute(new Cassandra\SimpleStatement(
"INSERT INTO users (userid, created_date, email, firstname, lastname)
VALUES (14c532ac-f5ae-479a-9d0a-36604732e01d, '2013-01-01 00:00:00',
'luke@[Link]','Luke','Tillman')"
));

Using the Datastax PHP Cassandra driver, we can query the user by userid:

execute(new Cassandra\SimpleStatement
("SELECT firstname, lastname, email FROM [Link]
WHERE userid=14c532ac-f5ae-479a-9d0a-36604732e01d"));

foreach ($result as $row) {

[Link] 8
printf("user: \"%s\" \"%s\" email: \"%s\" \n", $row['firstname'],
$row['lastname'], $row['email']);
}

For a user to update their email address in the system:

execute(new Cassandra\SimpleStatement
("UPDATE users SET email = 'language_evangelist@[Link]'
WHERE userid = 14c532ac-f5ae-479a-9d0a-36604732e01d"));

execute(new Cassandra\SimpleStatement
("SELECT firstname, lastname, email FROM [Link]
WHERE userid=14c532ac-f5ae-479a-9d0a-36604732e01d"));

foreach ($result as $row) {

printf("user: \"%s\" \"%s\" email: \"%s\" \n", $row['firstname'],
$row['lastname'], $row['email']);
}

Delete the user from the table and output all of the rows. You’ll notice that the user's information
no longer comes back after being deleted:

execute(new Cassandra\SimpleStatement
("DELETE FROM users WHERE userid = 14c532ac-f5ae-479a-9d0a-36604732e01d"));

execute(new Cassandra\SimpleStatement
("SELECT firstname, lastname, email FROM [Link]
WHERE userid=14c532ac-f5ae-479a-9d0a-36604732e01d"));

foreach ($result as $row) {

printf("user: \"%s\" \"%s\" email: \"%s\" \n", $row['firstname'],
$row['lastname'], $row['email']);
}

The output should look something like this:

user: "Luke" "Tillman" email: "luke@[Link]"

user: "Luke" "Tillman" email: "language_evangelist@[Link]"

References:

Getting Started with Apache Cassandra and PHP, DataStax Academy

Read Cassandra - PHP online: [Link]

[Link] 9
Chapter 3: Cassandra as a Service
Introduction
This topic describes how to start Apache Cassandra as a service in windows and linux platforms.
Remember you also start Cassandra from bin directory by running the batch or shell script.

Examples
Windows

1. Download the latest apache commons daemon from Apache Commons Project Distributions.

2. Extract the commons daemon in <Cassandra installed directory>\bin.

3. Rename the extracted folder as daemon.

4. Add <Cassandra installed directory> as CASSANDRA_HOME in windows environment

variable.

5. Edit the [Link] file in <Cassandra installed directory>\conf and uncomment the
data_file_directories, commitlog_directory, saved_cache_directory and set the absolute
paths.

6. Edit [Link] in <Cassandra installed directory>\bin and replace the value for the
PATH_PRUNSRV as follows:

for 32 bit windows, set PATH_PRUNSRV=%CASSANDRA_HOME%\bin\daemon\

for 64 bit windows, set PATH_PRUNSRV=%CASSANDRA_HOME%\bin\daemon\amd64\

7. Edit [Link] and configure SERVICE_JVM for required service name.

SERVICE_JVM="cassandra"

8. With administrator privileges, run [Link] install from command prompt.

Linux

1. Create the /etc/init.d/cassandra startup script.

2. Edit the contents of the file:

#!/bin/sh
#
# chkconfig: - 80 45
# description: Starts and stops Cassandra
# update daemon path to point to the cassandra executable

[Link] 10
DAEMON=<Cassandra installed directory>/bin/cassandra
start() {
echo -n "Starting Cassandra... "
$DAEMON -p /var/run/[Link]
echo "OK"
return 0
}
stop() {
echo -n "Stopping Cassandra... "
kill $(cat /var/run/[Link])
echo "OK"
return 0
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac
exit $?

3. Make the file executable:

sudo chmod +x /etc/init.d/cassandra

4. Add the new service to the list:

sudo chkconfig --add cassandra

5. Now you can manage the service from the command line:

sudo /etc/init.d/cassandra start

sudo /etc/init.d/cassandra stop
sudo /etc/init.d/cassandra restart

Read Cassandra as a Service online: [Link]

a-service

[Link] 11
Chapter 4: Cassandra keys
Examples
Partition key, clustering key, primary key

Cassandra uses two kinds of keys:

• the Partition Keys is responsible for data distribution across nodes

• the Clustering Key is responsible for data sorting within a partition

A primary key is a combination of those to types. The vocabulary depends on the combination:

• simple primary key: only the partition key, composed of one column
• composite partition key: only the partition key, composed of multiple columns
• compound primary key: one partition key with one or more clustering keys.
• composite and compound primary key: a partition key composed of multiple columns and
multiple clustering keys.

The PRIMARY KEY syntax

Declaring a key
The table creation statement should contain a PRIMARY KEY expression. The way you declare it is
very important. In a nutshell:

PRIMARY KEY(partition key)

PRIMARY KEY(partition key, clustering key)

Additional parentheses group multiple fields into a composite partition key or declares a compound
composite key.

Examples
Simple primary key:

PRIMARY KEY (key)

key is called the partition key.

(for simple primary key, it is also possible to put the PRIMARY KEY expression after the field, i.e. key
int PRIMARY KEY, for example).

Compound primary key:

[Link] 12
PRIMARY KEY (key_part_1, key_part_2)

Contrary to SQL, this does not exactly create a composite primary key. Instead, it declares
key_part_1 as the partition key and key_part_2 as the clustering key. Any other field will also be
considered part of the clustering key.

Composite+Compound primary keys:

PRIMARY KEY ((part_key_1, ..., part_key_n), (clust_key_1, ..., clust_key_n))

The first parenthese defines the compound partition key, the other columns are the clustering
keys.

Syntax summary
• (part_key)
• (part_key, clust_key)
• (part_key, clust_key_1, clust_key_2)
• (part_key, (clust_key_1, clust_key_2))
• ((part_key_1, part_key_2), clust_key)
• ((part_key_1, part_key_2), (clust_key_1, clust_key_2))

Key ordering and allowed queries

The partition key is the minimum specifier needed to perform a query using a where clause.

If you declare a composite clustering key, the order matters.

Say you have the following primary key:

PRIMARY KEY((part_key1, part_key_2), (clust_key_1, clust_key_2, clust_key_3))

Then, the only valid queries use the following fields in the where clause:

• part_key_1, part_key_2
• part_key_1, part_key_2, clust_key_1
• part_key_1, part_key_2, clust_key_1, clust_key_2
• part_key_1, part_key_2, clust_key_1, clust_key_2, clust_key_3

Example of invalid queries are:

• part_key_1, part_key_2, clust_key_2

• Anything that does not contain both part_key_1, part_key_2
• ...

If you want to use clust_key_2, you have to also specify clust_key_1, and so on.

So the order in which you declare your clustering keys will have an impact on the type of queries
you can do. In the opposite, the order of the partition key fields is not important, since you always

[Link] 13
have to specify all of them in a query.

Read Cassandra keys online: [Link]

[Link] 14
Chapter 5: Connecting to Cassandra
Remarks
The Cassandra Driver from Datastax very much mirrors the Java JDBC MySQL driver.

Session, Statement, PreparedStatement are present in both drivers.

The Singleton Connection is from this question and answer:

[Link]

Feature wise, Cassandra 2 and 3 are identical. Cassandra 3 introduced a complete rewrite of the
data storage system.

Examples
Java: Include the Cassandra DSE Driver

In your Maven project, add the following to your [Link] file. The following versions are for
Cassandra 3.x.

<dependency>
<groupId>[Link]</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>[Link]</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>[Link]</groupId>
<artifactId>cassandra-driver-extras</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>[Link]</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.0</version>
</dependency>

Java: Connect to a Local Cassandra Instance

Connecting to Cassandra is very similar to connecting to other datasources. With Cassandra,

credentials are not required.

String cassandraIPAddress = "[Link]";

String cassandraKeyspace = "myKeyspace";
String username = "foo";

[Link] 15
String password = "bar";

[Link] cluster = [Link]()

.addContactPoint(cassandraIPAddress)
.withCredentials(username, password) // If you have setup a username and password for your
node.
.build();

[Link] session = [Link](cassandraKeyspace);

[Link] metadata = [Link]();

// Output Cassandra connection status

[Link]("Connected to Cassandra cluster: " + [Link]() + " with
Partitioner: " + [Link]());

// Loop through your entire Cluster.

for (Host host : [Link]()) {
[Link]("Cassandra Host Address: " + [Link]() + " | Is Up = " +
[Link]());
}

Java: Connect Using a Singleton

public enum Cassandra {

DB;

private Session session;

private Cluster cluster;
private static final Logger LOGGER = [Link]([Link]);

/**
* Connect to the cassandra database based on the connection configuration provided.
* Multiple call to this method will have no effects if a connection is already
established
* @param conf the configuration for the connection
*/
public void connect(ConnectionCfg conf) {
if (cluster == null && session == null) {
cluster =
[Link]().withPort([Link]()).withCredentials([Link](),
[Link]()).addContactPoints([Link]()).build();
session = [Link]([Link]());
}
Metadata metadata = [Link]();
[Link]("Connected to cluster: " + [Link]() + " with partitioner:
" + [Link]());
[Link]().stream().forEach((host) -> {
[Link]("Cassandra datacenter: " + [Link]() + " | address: " +
[Link]() + " | rack: " + [Link]());
});
}

/**
* Invalidate and close the session and connection to the cassandra database
*/
public void shutdown() {
[Link]("Shutting down the whole cassandra cluster");

[Link] 16
if (null != session) {
[Link]();
}
if (null != cluster) {
[Link]();
}
}

public Session getSession() {

if (session == null) {
throw new IllegalStateException("No connection initialized");
}
return session;
}
}

Using the Singleton connection

public void cassandra() throws Exception {

[Link]();
[Link]().execute(/* CQL | Statement | PreparedStatement */)
[Link]();
}

Read Connecting to Cassandra online: [Link]

cassandra

[Link] 17
Chapter 6: Repairs in Cassandra
Parameters

Option/Flag Description

option option description

hostname. Defaults to "localhost." If you do not specify a host repair is run on

-h
the same host that the command is executed from.

-p JMX port. The default is 7199.

-u username. Only required if JMX security is enabled.

-pw password. Only required if JMX security is enabled.

flag flag description

-local Only compare and stream data from nodes in the "local" data center.

"Partitioner Range" repair. Only repair the primary token range for a replica.
Faster than repairing all ranges of your replicas, as it prevents repairing the
-pr
same data multiple times. Note that if you use this option for repairing one
node, you must also use it for the rest of your cluster, as well.

Run repairs in parallel. Gets repairs done faster, but significantly restricts the
-par
cluster's ability to handle requests.

Allows you to specify a comma-delimited list of nodes to stream your data

from. Useful if you have nodes that are known to be "good." While it is
-hosts
documented as a valid option for Cassandra 2.1+, it also works with
Cassandra 2.0.

Remarks
Cassandra Anti-Entropy Repairs:

Anti-entropy repair in Cassandra has two distinct phases. To run successful, performant repairs, it
is important to understand both of them.

• Merkle Tree calculations: This computes the differences between the nodes and their
replicas.

• Data streaming: Based on the outcome of the Merkle Tree calculations, data is scheduled
to be streamed from one node to another. This is an attempt to synchronize the data

[Link] 18
between replicas.

Stopping a Repair:

You can stop a repair by issuing a STOP VALIDATION command from nodetool:

$ nodetool stop validation

How do I know when repair is completed?

You can check for the first phase of repair (Merkle Tree calculations) by checking nodetool
compactionstats.

You can check for repair streams using nodetool netstats. Repair streams will also be visible in
your logs. You can grep for them in your system logs like this:

$ grep Entropy [Link]

INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,077 [Link] (line 164) [repair

#70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /[Link]
INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,081 [Link] (line 164) [repair
#70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /[Link]
INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,091 [Link] (line 221) [repair
#70c35af0-526e-11e6-8646-8102d8573519] test_users is fully synced
INFO [AntiEntropySessions:4] 2016-07-25 07:32:47,091 [Link] (line 282) [repair
#70c35af0-526e-11e6-8646-8102d8573519] session completed successfully

Active repair streams can also be monitored with this (Bash) command:

$ while true; do date; diff <(nodetool -h [Link] netstats) <(sleep 5 && nodetool -h
[Link] netstats); done

ref: how do i know if nodetool repair is finished

How to check for stuck or orphaned repair streams?

On each node, you can monitor this with nodetool tpstats, and check for anything "blocked" on the
"AntiEntropy" lines.

$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
...
AntiEntropyStage 0 0 854866 0 0
...
AntiEntropySessions 0 0 2576 0 0
...

Examples
Examples for running Nodetool Repair

[Link] 19
Usage:

$ nodetool repair [-h | -p | -pw | -u] <flags> [ -- keyspace_name [table_name]]

Default Repair Option

$ nodetool repair

This command will repair the current node's primary token range (i.e. range which it owns) along
with the replicas of other token ranges it has in all tables and all keyspaces on the current node:

For e.g. If you have replication factor of 3 then total of 5 nodes will be involved in repair: 2 nodes
will be fixing 1 partition range 2 nodes will be fixing 2 partition ranges 1 node will be fixing 3
partition ranges. (Command was run on this node)

Repair in Parallel

$ nodetool repair -par

This command will run do perform the same task as default repair but by running the repair in
parallel on the nodes containing replicas.

Repair Primary Token Range

This command repairs only the primary token range of the node in all tables and all keyspaces on
the current node:

$ nodetool repair -pr

Repair only the local Data Center on which the node resides:

$ nodetool repair -pr -local

Repair only the primary range for all replicas in all tables and all keyspaces on the current node,
only by streaming from the listed nodes:

$ nodetool repair -pr -hosts [Link], [Link], [Link]

Repair only the primary range for all replicas in the stackoverflow keyspace on the current node:

$ nodetool repair -pr -- stackoverflow

Repair only the primary range for all replicas in the test_users table of the stackoverflow keyspace
on the current node:

$ nodetool repair -pr -- stackoverflow test_users

[Link] 20
Read Repairs in Cassandra online: [Link]
cassandra

[Link] 21
Chapter 7: Running Repair on Cassandra
Syntax
• Synopsis

• nodetool [node-options] repair [other-options]

• Node options

• [(-h <host> | --host <host>)]

• [(-p <port> | --port <port>)]

• [(-pw <password> | --password <password>)]

• [(-pwf <passwordFilePath> | --password-file <passwordFilePath>)]

• [(-u <username> | --username <username>)]

• Other options

• [(-dc <specific_dc> | --in-dc <specific_dc>)...]

• [(-dcpar | --dc-parallel)]

• [(-et <end_token> | --end-token <end_token>)]

• [(-full | --full)]

• [(-hosts <specific_host> | --in-hosts <specific_host>)...]

• [(-j <job_threads> | --job-threads <job_threads>)]

• [(-local | --in-local-dc)]

• [(-pl | --pull)]

• [(-pr | --partitioner-range)]

• [(-seq | --sequential)]

• [(-st <start_token> | --start-token <start_token>)]

• [(-tr | --trace)]

• [--]

• [<keyspace> <tables>...]

Parameters

[Link] 22
Parameter Details

-dc <specific_dc>, --in-dc

Use -dc to repair specific datacenters
<specific_dc>

-dcpar, --dc-parallel Use -dcpar to repair data centers in parallel.

-et <end_token>, --end-token

Use -et to specify a token at which repair range ends
<end_token>

-full, --full Use -full to issue a full repair.

-h <host>, --host <host> Node hostname or ip address

-hosts <specific_host>, --
Use -hosts to repair specific hosts
in-hosts <specific_host>

Number of threads to run repair jobs. Usually this means

-j <job_threads>, --job- number of CFs to repair concurrently. WARNING: increasing
threads <job_threads> this puts more load on repairing nodes, so be careful. (default:
1, max: 4)

-local, --in-local-dc Use -local to only repair against nodes in the same datacenter

-p <port>, --port <port> Remote jmx agent port number

Use --pull to perform a one way repair where data is only

-pl, --pull
streamed from a remote node to this node.

-pr, --partitioner-range Use -pr to repair only the first range returned by the partitioner

-pw <password>, --password

Remote jmx agent password
<password>

-pwf <passwordFilePath>, --
password-file Path to the JMX password file
<passwordFilePath>

-seq, --sequential Use -seq to carry out a sequential repair

-st <start_token>, --start-

Use -st to specify a token at which the repair range starts
token <start_token>

Use -tr to trace the repair. Traces are logged to

-tr, --trace
system_traces.events.

-u <username>, --username
Remote jmx agent username
<username>

This option can be used to separate command-line options

-- from the list of argument, (useful when arguments might be
mistaken for command-line options

[Link] 23
Parameter Details

[<keyspace> <tables>...] The keyspace followed by one or many tables

Examples
Running repair on Cassandra

Run repair on a particular partition range.

nodetool repair -pr

Run repair on the whole cluster.

nodetool repair

Run repair in parallel mode.

nodetool repair -par

Read Running Repair on Cassandra online: [Link]

repair-on-cassandra

[Link] 24
Chapter 8: Security
Remarks

Cassandra security resources

• CQL: Database roles syntax definition
• CQL: List of object permissions
• DataStax Documentation: Internal authentication
• DataStax Documentation: Internal authorization

Examples
Configuring internal authentication

Cassandra will not require users to login using the default configuration. Instead password-less,
anonymous logins are permitted for anyone able to connect to the native_transport_port. This
behaviour can be changed by editing the [Link] config to use a different authenticator:

# Allow anonymous logins without authentication

# authenticator: AllowAllAuthenticator

# Use username/password based logins

authenticator: PasswordAuthenticator

The login credentials validated by PasswordAuthenticator will be stored in the internal system_auth
keyspace. By default, the keyspace will not be replicated accross all nodes. You'll have to change
the replication settings to make sure that Cassandra will still be able to read user credentials from
local storage in case other nodes in the cluster cannot be reached, or else you might not be able
to login!

For SimpleStrategy (where N is the number of nodes in your cluster):

ALTER KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy',

'replication_factor': N};

For NetworkTopologyStrategy (where N is the number of nodes in the corresponding data center):

ALTER KEYSPACE system_auth WITH replication = { 'class' : 'NetworkTopologyStrategy',

'datacenter1' : N };

Restart each node after the changes described above. You'll now only be able to login using the
default superuser:

cqlsh -u cassandra -p cassandra

[Link] 25
(Optional) Replace default superuser with custom user
Using a default superuser with a standard password isn't much safer than using no user at all. You
should create your own user instead using a safe and unique password:

CREATE ROLE myadminuser WITH PASSWORD = 'admin123' AND LOGIN = true AND SUPERUSER = true;

Log in using your new user: cqlsh -u myadminuser -p admin123

Now disable login for the standard cassandra user and remove the superuser status:

ALTER ROLE cassandra WITH LOGIN = false AND SUPERUSER = false;

Configuring internal authorization

By default each user will be able to access all data in Cassandra. You'll have to configuring a
different authorizer in your [Link] to grant individual object permissions to your users.

# Grant all permissions to all users

# authorizer: AllowAllAuthorizer

# Use object permissions managed internally by Cassandra

authorizer: CassandraAuthorizer

Permissions for individual users will be store in the internal system_auth keyspace. You should
change the replication settings in case you haven't already done so while enabling password
based authentication.

For SimpleStrategy (where N is the number of nodes in your cluster):

ALTER KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy',

'replication_factor': N};

For NetworkTopologyStrategy (where N is the number of nodes in the corresponding data center):

ALTER KEYSPACE system_auth WITH replication = { 'class' : 'NetworkTopologyStrategy',

'datacenter1' : N };

Restart each node after the changes described above. You'll now be able to set permissions using
e.g. the following commands.

Grants all permissions for specified keyspace and role:

GRANT ALL ON KEYSPACE keyspace_name TO role_name;

Grant read permissions on all keyspaces:

GRANT SELECT ON ALL KEYSPACES TO role_name;

Allow execution of INSERT, UPDATE, DELETE and TRUNCATE statements on a certain

[Link] 26
keyspace:

GRANT MODIFY ON KEYSPACE keyspace_name TO role_name;

Allow changing keyspaces, tables and indices for certain keyspace:

GRANT ALTER ON KEYSPACE keyspace_name TO role_name;

Please note that permissions will be cached for permissions_validity_in_ms ([Link]) and
changes might not be effective instantly.

Read Security online: [Link]

[Link] 27
Credits
S.
Chapters Contributors
No

Getting started with Community, muru, perennial_noob, phact, Ravinder Matte,

1
cassandra sourav, Stephen Leppik

2 Cassandra - PHP Aaron, ethrbunny, Renjith VR

Cassandra as a
3 Shoban Sundar
Service

4 Cassandra keys Derlin

Connecting to
5 geeves
Cassandra

Repairs in
6 Aaron, Akshay
Cassandra

Running Repair on
7 muru, Ravinder Matte
Cassandra

8 Security Stefan Podkowinski

[Link] 28

Common questions

To safely remove a user from the Cassandra 'users' table, first execute a DELETE command targeting the specific user's identifier. Post-deletion, a SELECT command should confirm the absence of the user's data in the table. To ensure full synchronization across the cluster, run nodetool repair after deletion, primarily focusing on the affected keyspaces. This guarantees that all data nodes consistently reflect the change through anti-entropy repairs, verifying that the data is erased from all nodes .

To switch from allowing anonymous logins to password authentication in Cassandra, modify the cassandra.yaml configuration file. Set the 'authenticator' parameter from AllowAllAuthenticator to PasswordAuthenticator. This change mandates user login credentials for access, enhancing security. It is crucial to ensure that the 'system_auth' keyspace is appropriately replicated across nodes to maintain accessibility during network partitions or node downtimes .

When the '-pr' or --partitioner-range flag is used during repair operations in a Cassandra cluster, it limits the repair to only the primary token range of the node executing the command. This can improve performance by reducing redundant repairs on the same data across the cluster, especially when coupled with datacenter or host-specific flags. It is particularly useful for reducing the load on the cluster during repairs by focusing the job on essential data synchronization tasks rather than reparing all data redundantly .

Regular anti-entropy repairs in a Cassandra cluster are crucial because they ensure data consistency across nodes by aligning replicas. This is significant in distributed systems like Cassandra where eventual consistency can lead to discrepancies. The repair process involves two distinct phases: 1) Merkle Tree calculations, which identify differences between the nodes and their replicas, and 2) Data streaming, which synchronizes data between nodes based on these differences. By running repairs, especially using options like parallel repairs for faster execution, the database's data integrity is maintained .

Secure authentication in Apache Cassandra can be achieved by editing the cassandra.yaml file to use PasswordAuthenticator, which allows username/password-based logins. The authenticated credentials are stored in the system_auth keyspace, and the replication settings need to be updated to prevent loss of login ability if some nodes are unreachable. For role management, users should be created with custom roles using the CREATE ROLE command while setting a new password and specifying it as a superuser. To manage permissions, switch from the AllowAllAuthorizer to CassandraAuthorizer, and set specific data access controls for users. This involves granting specific permissions on keyspaces and modifying roles as needed .

To handle stuck or orphaned repair streams during a Cassandra repair operation, one can employ monitoring commands like nodetool tpstats to identify blocked operations in the AntiEntropyStage. Increasing logging levels temporarily can provide more insights into the repair process and pinpoint failures. Implementing periodic and incremental repairs can also minimize the risk of having orphaned streams, as they help maintain data alignment more efficiently. Finally, consider increasing the repair timeout settings or manually terminating faulty repair jobs using nodetool commands to free up resources for ongoing repairs .

To set up Apache Cassandra on a Windows environment, you download the latest Apache Commons daemon to support running Cassandra as a service. Extract the daemon in the Cassandra installed directory’s bin folder and configure the environment variables such as CASSANDRA_HOME. You'll also need to edit cassandra.bat for path configurations and set up the service JVM. Run the install command with administrator privileges to set up the service. On Linux, however, you create an init.d startup script and configure the paths to Cassandra executables. The file should be made executable and added to the services with chkconfig, from where it can be managed using start or stop commands. This method involves using shell script modifications instead of batch script configurations seen in Windows installations .

Running Cassandra as a service on both Windows and Linux provides the benefit of automatic management, like starting on boot, which increases the reliability of system operations and minimizes manual interventions. However, challenges include ensuring the correct configuration of service scripts and managing differences in service control between the platforms. Windows requires environment variable configurations and path adjustments, whereas Linux involves script modifications in init.d or systemd. Both require careful handling of configuration files like cassandra.yaml to maintain consistent performance and security .

The replication settings of the 'system_auth' keyspace in Cassandra are crucial because they dictate where the authentication credentials are stored and accessed across the cluster. Inadequate replication can lead to authentication failures if nodes hosting the keyspace become unavailable. Replication should match the size and topology of the cluster, ensuring distributed availability. Adjustments to replication settings require rolling restarts of nodes to incorporate changes, maintaining seamless authentication even during node failures or network issues .

Parallel repair mode in Cassandra should be chosen when quick synchronization is needed across nodes with manageable impact on cluster resources. This mode, activated with the '-par' flag, executes repairs on multiple replicas simultaneously and is suitable when immediate data consistency checks are necessary after significant data activity. However, the increased resource consumption can degrade the cluster’s ability to handle concurrent requests, making it less desirable under high operational loads on production environments .

Table of Contents
About
1
Chapter 1: Getting started with cassandra
2
Remarks
2
Versions
3
Examples
5
Installation or Setup
5

Java: Include the Cassandra DSE Driver
15
Java: Connect to a Local Cassandra Instance
15
Java: Connect Using a Singleton
16
C

About
You can share this PDF with anyone you feel could benefit from it, downloaded the latest version
from: cassandra (http

Chapter 1: Getting started with cassandra
Remarks
The Apache Cassandra database is the right choice when you need scalability

ELASTIC
Read and write throughput both increase linearly as new machines are added, with no downtime
or interruption to appl

Version
Release Date
2.0.5
2014-02-13
2.0.6
2014-04-02
2.0.7
2014-04-24
2.0.8
2014-06-13
2.0.9
2014-07-22
2.1.11
2015-10-12
2

Version
Release Date
3.0.0-alpha1
2015-07-18
3.0.0-beta1
2015-07-10
3.0.0-beta2
2015-09-04
3.0.0-rc1
2015-07-16
3.0.0-rc2
201

Select your installation document based on your platform
http://docs.datastax.com/en/cassandra/3.x/cassandra/install/install

Or Apache Cassandra binary tarball manually (from the site
http://www.apache.org/dist/cassandra/) (http://www.apache.org/dis

Introduction to Apache Cassandra
No ratings yet
Introduction to Apache Cassandra
27 pages
Flask-RESTPlus API Guide
No ratings yet
Flask-RESTPlus API Guide
86 pages
Professional JMS Programming
No ratings yet
Professional JMS Programming
502 pages
AWS CodeCommit Git Commands Guide
No ratings yet
AWS CodeCommit Git Commands Guide
2 pages
Cassandra Vs MongoDB Vs CouchDB Vs Redis Vs Riak Vs HBase Vs Couchbase Vs Hypertable Vs ElasticSearch Vs Accumulo Vs VoltDB Vs Scalaris Comparison - Software Architect Kristof Kovacs
No ratings yet
Cassandra Vs MongoDB Vs CouchDB Vs Redis Vs Riak Vs HBase Vs Couchbase Vs Hypertable Vs ElasticSearch Vs Accumulo Vs VoltDB Vs Scalaris Comparison - Software Architect Kristof Kovacs
11 pages
RRlist
No ratings yet
RRlist
7 pages
Online Machine Learning for Forex Prediction
No ratings yet
Online Machine Learning for Forex Prediction
84 pages
Near Real-Time Big Data Processing
No ratings yet
Near Real-Time Big Data Processing
59 pages
Data Source Architectural Patterns Overview
No ratings yet
Data Source Architectural Patterns Overview
20 pages
Understanding Bayesian Inference Basics
No ratings yet
Understanding Bayesian Inference Basics
5 pages
Denodo Query Optimization Techniques
No ratings yet
Denodo Query Optimization Techniques
78 pages
JVM Internals
No ratings yet
JVM Internals
23 pages
Proof-of-Possession Tokens in Microservice Architectures
No ratings yet
Proof-of-Possession Tokens in Microservice Architectures
44 pages
Redis
No ratings yet
Redis
30 pages
Aha Sprint Planning Overview
No ratings yet
Aha Sprint Planning Overview
15 pages
Dijkstra's Algorithm Overview
No ratings yet
Dijkstra's Algorithm Overview
24 pages
Practical Gremlin: A TinkerPop Guide
No ratings yet
Practical Gremlin: A TinkerPop Guide
468 pages
Drools Expert User Guide Overview
No ratings yet
Drools Expert User Guide Overview
203 pages
Naïve Bayes Classification Overview
No ratings yet
Naïve Bayes Classification Overview
8 pages
GAN-Based Financial Data Augmentation
No ratings yet
GAN-Based Financial Data Augmentation
8 pages
Essential Scala
No ratings yet
Essential Scala
245 pages
Spring Framework Overview and Setup
No ratings yet
Spring Framework Overview and Setup
212 pages
Spring Cloud Dataflow Reference
No ratings yet
Spring Cloud Dataflow Reference
130 pages
CC-IN2P3 GPU Farm Job Submission Guide
No ratings yet
CC-IN2P3 GPU Farm Job Submission Guide
13 pages
Domain-Driven Design - What Is It and How Do You Use It?: Shares
No ratings yet
Domain-Driven Design - What Is It and How Do You Use It?: Shares
4 pages
Clojure For Data Science - Sample Chapter
100% (1)
Clojure For Data Science - Sample Chapter
61 pages
Tableau Cheat Sheet Overview
No ratings yet
Tableau Cheat Sheet Overview
1 page
Getting Started with TensorFlow.js
No ratings yet
Getting Started with TensorFlow.js
6 pages
GraphQL in Microservice Architectures
No ratings yet
GraphQL in Microservice Architectures
60 pages
XML Database Comparison Overview
No ratings yet
XML Database Comparison Overview
1 page
JavaScript DOM Rule Engine Overview
No ratings yet
JavaScript DOM Rule Engine Overview
2 pages
Design Patterns in OOP Explained
No ratings yet
Design Patterns in OOP Explained
10 pages
Cd-Rom Included: Business User Action
100% (1)
Cd-Rom Included: Business User Action
11 pages
Essential Data Science Resources Guide
No ratings yet
Essential Data Science Resources Guide
1 page
Time Series Analysis Course Overview
No ratings yet
Time Series Analysis Course Overview
4 pages
Apache Iotdb: Time-Series Database For Internet of Things
No ratings yet
Apache Iotdb: Time-Series Database For Internet of Things
4 pages
System Design Principles and Solutions
No ratings yet
System Design Principles and Solutions
15 pages
Golang Session Cookie Authentication Guide
No ratings yet
Golang Session Cookie Authentication Guide
12 pages
Bayesian Inference on Student Smoking Rates
No ratings yet
Bayesian Inference on Student Smoking Rates
4 pages
Quantum Machine Learning Advances
No ratings yet
Quantum Machine Learning Advances
38 pages
Apache Calcite Tutorial Overview
No ratings yet
Apache Calcite Tutorial Overview
83 pages
Data Engineering Cookbook by Andreas Kretz
No ratings yet
Data Engineering Cookbook by Andreas Kretz
40 pages
Understanding API Security Risks
No ratings yet
Understanding API Security Risks
12 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
DS Notes
No ratings yet
DS Notes
170 pages
Overview of NoSQL Database Concepts
No ratings yet
Overview of NoSQL Database Concepts
149 pages
Learning Advanced Python by Studying Open Source Projects (Li, Rongpeng) (Z-Library)
No ratings yet
Learning Advanced Python by Studying Open Source Projects (Li, Rongpeng) (Z-Library)
139 pages
Building Reliable Data Lakes with Delta
100% (1)
Building Reliable Data Lakes with Delta
29 pages
Getting Started with WEKA Data Mining
No ratings yet
Getting Started with WEKA Data Mining
13 pages
Ecto: Beyond Traditional ORM Concepts
No ratings yet
Ecto: Beyond Traditional ORM Concepts
79 pages
Cassandra Tutorial
100% (3)
Cassandra Tutorial
111 pages
Learning Apache Cassandra Essentials
No ratings yet
Learning Apache Cassandra Essentials
20 pages
Beginner's Guide to Apache Cassandra
No ratings yet
Beginner's Guide to Apache Cassandra
4 pages
DataStax Enterprise Installation Guide
No ratings yet
DataStax Enterprise Installation Guide
38 pages
Introduction to Apache Cassandra
No ratings yet
Introduction to Apache Cassandra
10 pages
Comprehensive Guide - Apache Cassandra & Neo4j (Advanced DBMS)
No ratings yet
Comprehensive Guide - Apache Cassandra & Neo4j (Advanced DBMS)
8 pages
Learn Apache Cassandra Basics Fast
No ratings yet
Learn Apache Cassandra Basics Fast
9 pages
Cassandra Memtable Flush Guide
No ratings yet
Cassandra Memtable Flush Guide
27 pages
Overview of Apache Cassandra Database
No ratings yet
Overview of Apache Cassandra Database
8 pages
Mastering Apache Cassandra Overview
No ratings yet
Mastering Apache Cassandra Overview
31 pages
Director of Operations Profile
No ratings yet
Director of Operations Profile
4 pages
HDFC Bank Personal Banker CV
No ratings yet
HDFC Bank Personal Banker CV
2 pages
Elusive Nonviolence The Making and Unmaking of S Annas Archive
No ratings yet
Elusive Nonviolence The Making and Unmaking of S Annas Archive
242 pages
Python Handwritten Notes by Mandar Patil
No ratings yet
Python Handwritten Notes by Mandar Patil
77 pages
Napoleon Hill
100% (1)
Napoleon Hill
154 pages
Atomic Habits by James Clear PDF Guide
No ratings yet
Atomic Habits by James Clear PDF Guide
4 pages
Docker Handwritten Notes by Mohd Imran
No ratings yet
Docker Handwritten Notes by Mohd Imran
13 pages
Kafka Handwritte Notes 1
No ratings yet
Kafka Handwritte Notes 1
9 pages
CSS 2019 Computer Science Past Papers
No ratings yet
CSS 2019 Computer Science Past Papers
5 pages
SMT MN El 0029 Rev.d (Code C)
No ratings yet
SMT MN El 0029 Rev.d (Code C)
643 pages
Social Media's Impact on Student Learning
No ratings yet
Social Media's Impact on Student Learning
5 pages
GoGoBaby: On-Demand Childcare App
No ratings yet
GoGoBaby: On-Demand Childcare App
8 pages
HBase: Key Features and Benefits
No ratings yet
HBase: Key Features and Benefits
4 pages
Understanding vparboot Command Usage
No ratings yet
Understanding vparboot Command Usage
4 pages
Com Port Data Read Program
No ratings yet
Com Port Data Read Program
6 pages
OHS Safety Assessment for Computer Labs
No ratings yet
OHS Safety Assessment for Computer Labs
2 pages
Understanding Collaboration Diagrams
No ratings yet
Understanding Collaboration Diagrams
40 pages
Simulator Quality System Manual Overview
No ratings yet
Simulator Quality System Manual Overview
12 pages
GameCenter Startup Process Log
No ratings yet
GameCenter Startup Process Log
46 pages
Senior Software Engineer Resume
No ratings yet
Senior Software Engineer Resume
1 page
MCGM AutoDCR Drawing Protocol Guide
No ratings yet
MCGM AutoDCR Drawing Protocol Guide
12 pages
Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
7 pages
International Accounting 4th Edition Doupnik Solutions Manual
100% (1)
International Accounting 4th Edition Doupnik Solutions Manual
41 pages
Augmented Reality Art in Media Ecology
No ratings yet
Augmented Reality Art in Media Ecology
9 pages
Seneca English Assessment Test Practice
100% (1)
Seneca English Assessment Test Practice
10 pages
Reinforcement Learning Algorithms Overview
No ratings yet
Reinforcement Learning Algorithms Overview
31 pages
Optimal Engineering Solutions Portfolio
No ratings yet
Optimal Engineering Solutions Portfolio
15 pages
E-Commerce and ERP Question Bank 2023
No ratings yet
E-Commerce and ERP Question Bank 2023
9 pages
Python Projects for All Skill Levels
No ratings yet
Python Projects for All Skill Levels
7 pages
New System Installation Quotation
No ratings yet
New System Installation Quotation
24 pages
COMP 100A Exam: Computer Architecture
No ratings yet
COMP 100A Exam: Computer Architecture
3 pages
Experienced .NET Developer Profile
No ratings yet
Experienced .NET Developer Profile
2 pages
Valid Palindromic Roman Numerals
No ratings yet
Valid Palindromic Roman Numerals
45 pages
Chase Premier Plus Checking Activity
No ratings yet
Chase Premier Plus Checking Activity
8 pages
Python Projects with Source Code
No ratings yet
Python Projects with Source Code
6 pages
Presentation 1
No ratings yet
Presentation 1
12 pages
DULE: Secure BLE Messaging Solution
No ratings yet
DULE: Secure BLE Messaging Solution
13 pages
Understanding Cloud Computing Essentials
No ratings yet
Understanding Cloud Computing Essentials
90 pages