0% found this document useful (0 votes)
12 views46 pages

Data Manipulation and Visualization with Pandas

The document provides a comprehensive guide on data manipulation and visualization using Python libraries such as Pandas, Matplotlib, and Seaborn. It covers various operations including data frame creation, filtering, sorting, plotting different types of graphs, and performing statistical analyses like correlation and regression. Additionally, it includes examples of logistic regression and ROC curves for classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views46 pages

Data Manipulation and Visualization with Pandas

The document provides a comprehensive guide on data manipulation and visualization using Python libraries such as Pandas, Matplotlib, and Seaborn. It covers various operations including data frame creation, filtering, sorting, plotting different types of graphs, and performing statistical analyses like correlation and regression. Additionally, it includes examples of logistic regression and ROC curves for classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Working with Pandas data frames, perform Data Manipulation operations


using Panda’s package.
i. Print datatypes
import pandas as pd
data = pd.read_csv(‘[Link]’)
df = [Link](data)
print([Link])
Output:

ii. Print accessing column name


print(df['Name’])
Output:

iii. Print head, tail


print('First Three Rows:')
print([Link](3))
print('Last Three Rows:')
print([Link](3))
Output:
iv. Get info
[Link]()
Output:

v. Print shape
print("Shape:", [Link])
Output:

vi. Adding and deleting row and column


df['Address'] = ['Addr1', 'Addr2', 'Addr3', 'Addr4', 'Addr5', 'Addr6', 'Addr7',
'Addr8', 'Addr9', 'Addr10']
print("\nDataframe after adding Address column")
print(df)
df = [Link]('Address', axis=1)
print("\nDataFrame after deleting Address column:")
print(df)
Output:

vii. Filtering rows


filtered_df = df[df['Height'] == 5.5]
print("\nFiltered DataFrame (Height = ‘5.5'):")
print(filtered_df)
Output:

viii. Sorting dataframe, aggregating data


sorted_df = df.sort_values('Height', ascending=True)
print("\nSorted DataFrame ")
print(sorted_df)
average_height = df['Height'].mean()
print("\nAverage Salary:", average_height )
Output:

ix. Dropping duplicates, merging


import pandas as pd
df.drop_duplicates(inplace=True)
print(df)
df1 = [Link]({
'ID': [1, 2, 3, 4],
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 22, 35, 32]})
df2 = [Link]({
'ID': [3, 4, 5, 6],
'City': ['New York', 'Berlin', 'Paris', 'Tokyo'],
'Salary': [70000, 80000, 50000, 60000]})
inner_merge = [Link](df1, df2, on='ID', how='inner')
print("\nInner Merge:")
print(inner_merge)
Output:

[Link] on frequency distribution, Averages and Variability.


i. Frequency table using value counts
import pandas as pd
import numpy as np
data = pd.read_csv('[Link]')
[Link]()
df = data['species'].value_counts()
print(df)
Output:

ii. One way frequency table using crosstab


import pandas as pd
import numpy as np
import [Link] as plt
data = pd.read_csv('[Link]')
freq_table = [Link](data['species'], 'no_of_species')
freq_table
Output:

iii. Two way frequency table


import pandas as pd
import numpy as np
data = pd.read_csv('[Link]')
freq_table = [Link](data['Ship Mode'], data['Segment'])
freq_table
Output:

iv. Average, variance, and standard deviation


import pandas as pd
df=pd.read_csv('[Link]')
Average=df['petal_length'].mean()
print("Average:",Average)
Variance=df['petal_length'].var()
print("Variance:",Variance)
standard_deviation=df['petal_length'].std()
print("Standard Deviation:",standard_deviation)
Output:

3. Develop a program for Basic plots using Matplotlib and seaborn/SKL.


i. Bar graph
import [Link] as plt
import pandas as pd
[Link]('bmh')
df = pd.read_csv('[Link]')
x = df['Age']
y = df['Height']
[Link]('Age', fontsize=18)
[Link]('Height', fontsize=16)
[Link](x, y)
Output:
ii. Line graph with grid
import [Link] as plt
import pandas as pd
[Link]('bmh')
df = pd.read_csv('[Link]')
x = df['Price']
y = df['Year']
[Link]('Price', fontsize=18)
[Link]('Year', fontsize=16)
[Link](x, y)
Output:
iii. Line graph without grid
import [Link] as plt
import pandas as pd
[Link]('bmh')
df = pd.read_csv('[Link]')
x = df['Price']
y = df['Year']
[Link]('Price', fontsize=18)
[Link]('Year', fontsize=16)
[Link](x, y)
[Link]()
Output:
iv. Scatter plot
import [Link] as plt
import pandas as pd
[Link]('bmh')
df = pd.read_csv('[Link]')
x = df['Price']
y = df['Year']
[Link]('Year', fontsize=18)
[Link]('Price', fontsize=16)
[Link](x, y)
Output:
v. Dendogram
import [Link] as plt
import numpy as np
from [Link] import dendrogram, linkage
[Link](42)
data = [Link](10, 2)
linked = linkage(data, method='ward')
[Link](figsize=(7, 5))
dendrogram(linked,
orientation='top',
distance_sort='descending',
show_leaf_counts=True)
[Link]('Dendrogram of Random Data')
[Link]('Index')
[Link]('Euclidean Distance')
[Link]()
Output:

vi. Histogram
import [Link] as plt
import numpy as np
x = [Link](170, 150, 250)
[Link](x)
[Link]()
Output:

vii. Pie chart


import [Link] as plt
import pandas as pd
[Link]('bmh')
df = pd.read_csv('[Link]')
x = df['Price']
y = df['Year']
[Link](y, labels=x, radius=1.2,autopct='%0.01f%%', shadow=True)
[Link]()
Output:
viii. Multiple graph
import [Link] as plt
import numpy as np
x = [Link](0, 10, 0.01)
[Link](3, 1, 1)
[Link](x, [Link](x))
[Link](3, 1, 2)
[Link](x, [Link](x))
[Link](3, 1, 3)
[Link](x, x)
[Link]()
Output:
[Link] a program for Normal Curves.
i. Normal distribution
import pandas as pd
import seaborn as sns
import numpy as np
import [Link] as plt
[Link](42)
data = [Link]({
'X': [Link](50) * 10,
'Y': 2 * [Link](50) * 10 + 5})
[Link](x='X', y='Y', data=data, height=4)
[Link]()
Output:
ii. Simple plot
import seaborn as sns
data = sns.load_dataset("iris")
[Link](x="sepal_length", y="sepal_width", data=data)
Output:
iii. Customizing graph
import seaborn as sns
import [Link] as plt
data = sns.load_dataset("iris")
[Link](x="sepal_length", y="sepal_width", data=data)
sns.set_style("dark")
[Link]()
Output:
iv. Scaling plot
import seaborn as sns
import [Link] as plt
data = sns.load_dataset("iris")
[Link](x="sepal_length", y="sepal_width", data=data)
sns.set_context("paper")
[Link]()
Output:
v. Changing figure size
import seaborn as sns
import [Link] as plt
data = sns.load_dataset("iris")
[Link](figsize = (4 , 3))
[Link](x="sepal_length", y="sepal_width", data=data)
[Link]()
[Link]()
Output:
vi. Multiple plot

vii. Box plot


import seaborn as sns
import [Link] as plt
data = sns.load_dataset("iris")
[Link](x='species', y='sepal_width', data=data)
[Link]()
Output:
viii. Strip plot
import seaborn as sns
import [Link] as plt
data = sns.load_dataset("iris")
[Link](x='species', y='sepal_width', data=data)
[Link]()
Output:
ix. Swarm plot
import seaborn as sns
import [Link] as plt
data = sns.load_dataset('tips')
[Link](figsize=(8, 6))
[Link](x='day', y='total_bill', data=data)
[Link]('Swarm Plot of Total Bill by Day')
[Link]()
Output:
[Link] a program for Correlation and scatter plots
i. Pearson correlation
from [Link] import randn
from [Link] import seed
from [Link] import pearsonr
seed(1)
data1 = 20 * randn(1000) + 100
data2 = data1 + (10 * randn(1000) + 50)
corr, _ = pearsonr(data1, data2)
print('Pearsons correlation: %.3f' % corr)
Output:

ii. Spearman correlation


from [Link] import randn
from [Link] import seed
from [Link] import spearmanr
seed(1)
data1 = 20 * randn(1000) + 100
data2 = data1 + (10 * randn(1000) + 50)
corr, _ = spearmanr(data1, data2)
print('Spearmans correlation: %.3f' % corr)
Output:

iii. Correlation matrix


import numpy as np
x = [215, 325, 185, 332, 406, 522, 412,
614, 544, 421, 445, 408],
y = [14.2, 16.4, 11.9, 15.2, 18.5, 22.1,
19.4, 25.1, 23.4, 18.1, 22.6, 17.2]
matrix = [Link](x, y)
print(matrix)
Output:

iv. Cross correlation analysis (Numpy, Scipy, and statsmodel)


Numpy
import numpy as np
t = [Link](0, 10, 0.1)
signal1 = [Link](2 * [Link] * 2 * t) + 0.5 * [Link](2 * [Link] * 3 * t) +
[Link](0, 0.1, len(t))
signal2 = [Link](2 * [Link] * 2 * t) + 0.5 * [Link](2 * [Link] * 3 * t) +
[Link](0, 0.1, len(t))
numpy_correlation = [Link](signal1, signal2)[0, 1]
print('NumPy Correlation:', numpy_correlation)
output:

Scipy
import numpy as np
t = [Link](0, 10, 0.1)
signal1 = [Link](2 * [Link] * 2 * t) + 0.5 * [Link](2 * [Link] * 3 * t) +
[Link](0, 0.1, len(t))
signal2 = [Link](2 * [Link] * 2 * t) + 0.5 * [Link](2 * [Link] * 3 * t) +
[Link](0, 0.1, len(t))
from [Link] import pearsonr
scipy_correlation, _ = pearsonr(signal1, signal2)
print('SciPy Correlation:', scipy_correlation)
Output:

StatsModel
import numpy as np
t = [Link](0, 10, 0.1)
signal1 = [Link](2 * [Link] * 2 * t) + 0.5 * [Link](2 * [Link] * 3 * t) +
[Link](0, 0.1, len(t))
signal2 = [Link](2 * [Link] * 2 * t) + 0.5 * [Link](2 * [Link] * 3 * t) +
[Link](0, 0.1, len(t))
import [Link] as sm
statsmodels_correlation = [Link](signal1, signal2).fit().rsquared
print('Statsmodels Correlation:', statsmodels_correlation)
Output:
6. Develop a program for Correlation coefficient and ROC curves.
i. ROC curve
from [Link] import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)
model = LogisticRegression(max_iter=1000)
[Link](X_train_scaled, y_train)
y_pred_proba = model.predict_proba(X_test_scaled)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)
[Link]()
[Link](fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
[Link]([0, 1], [0, 1], 'k--', label='No Skill')
[Link]([0.0, 1.0])
[Link]([0.0, 1.05])
[Link]('False Positive Rate')
[Link]('True Positive Rate')
[Link]('ROC Curve for Breast Cancer Classification')
[Link]()
[Link]()
Output:
ii. Precision recall curve
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import precision_recall_curve
import [Link] as plt
X, y = datasets.make_classification(n_samples=1000,
n_features=4,
n_informative=3,
n_redundant=1,
random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=.3,random_state=0)
classifier = LogisticRegression()
[Link](X_train, y_train)
y_score = classifier.predict_proba(X_test)[:, 1]
precision, recall, thresholds = precision_recall_curve(y_test, y_score)
fig, ax = [Link]()
[Link](recall, precision, color='purple')
ax.set_title('Precision-Recall Curve')
ax.set_ylabel('Precision')
ax.set_xlabel('Recall')
[Link]()
Output:

7. Develop a program for Simple Linear Regression.


i. Linear
import matplotlib
[Link]('TkAgg') # Change to a valid backend like 'TkAgg'
import [Link] as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd
df = pd.read_csv("[Link]")
Y = df['price']
X = df['lotsize']
X=[Link](len(X),1)
Y=[Link](len(Y),1)
X_train = X[:-250]
X_test = X[-250:]
Y_train = Y[:-250]
Y_test = Y[-250:]
[Link](X_test, Y_test, color='black')
[Link]('Test Data')
[Link]('Size')
[Link]('Price')
[Link](())
[Link](())
regr = linear_model.LinearRegression()
[Link](X_train, Y_train)
[Link](X_test, [Link](X_test), color='red',linewidth=3)
[Link]()
Output:
ii. Nonlinear
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import [Link] as plt
housing_data = pd.read_csv("[Link]")
X = housing_data[['lotsize']]
Y = housing_data['price']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2,
random_state=42)
poly = PolynomialFeatures(degree=3)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = [Link](X_test)
model = LinearRegression()
[Link](X_poly_train, Y_train)
Y_pred = [Link](X_poly_test)
[Link](X_test, Y_test, color='black', label='Actual')
[Link](X_test, Y_pred, color='red', label='Predicted')
[Link]('Non-linear Regression (Polynomial) - Actual vs Predicted')
[Link]('Lot Size')
[Link]('Price')
[Link]()
[Link]()Output:

8. Develop a program for Logistic Regression.


i. Binomial
from [Link] import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score
x, y = load_breast_cancer(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20,
random_state=23)
clf = LogisticRegression(random_state=0, max_iter=5000)
[Link](x_train, y_train)
y_pred = [Link](x_test)
acc = accuracy_score(y_test, y_pred)
print("Logistic regression model accuracy (in %): ", acc * 100)
Output:

ii. Multinomial
from sklearn.model_selection import train_test_split
from sklearn import datasets, linear_model, metrics
digits = datasets.load_digits()
X = [Link]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=1)
reg = linear_model.LogisticRegression(max_iter=5000)
[Link](X_train, y_train)
y_pred = [Link](X_test)
print("Logistic Regression model accuracy (in %):",
metrics.accuracy_score(y_test, y_pred) * 100)
Output:

iii. Ordinal
import mord
from sklearn.model_selection import train_test_split
from sklearn import datasets, metrics
digits = datasets.load_digits()
X = [Link]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=1)
model = [Link](alpha=1.0)
[Link](X_train, y_train)
y_pred = [Link](X_test)
print("Ordinal Regression accuracy (in %):", metrics.accuracy_score(y_test,
y_pred) * 100)
Output:

9. Develop a program for Multiple regression.


i. Scatter plot
ii. Distribution plot
import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
from sklearn.model_selection import train_test_split
from [Link] import ColumnTransformer
from [Link] import OneHotEncoder
from sklearn.linear_model import LinearRegression
df_start = pd.read_csv('[Link]')
print(df_start.head())
print(df_start.describe())
[Link](df_start['Profit'], kde=True, color='blue')
[Link]('Profit Distribution Plot')
[Link]()
[Link](df_start['R&D Spend'], df_start['Profit'], color='lightcoral')
[Link]('Profit vs R&D Spend')
[Link]('R&D Spend')
[Link]('Profit')
[Link](False)
[Link]()
X = df_start.iloc[:, :-1].values
y = df_start.iloc[:, -1].values
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])],
remainder='passthrough')
X = [Link](ct.fit_transform(X))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
regressor = LinearRegression()
[Link](X_train, y_train)
y_pred = [Link](X_test)
np.set_printoptions(precision=2)
result = [Link]((y_pred.reshape(len(y_pred), 1),
y_test.reshape(len(y_test), 1)), axis=1)
print("Predicted vs Actual Results:")
print(result)
print(f'Coefficients: {regressor.coef_}')
print(f'Intercept: {regressor.intercept_}')
Output:
10. Develop a program on decision trees-based on ID-3 Algorithm.
i. Information gain
ii. Entropy
iii. Decision tree
import pandas as pd
from [Link] import DecisionTreeClassifier, plot_tree
import [Link] as plt
import math
df = pd.read_csv('[Link]')
[Link]()
def calculate_entropy(data, target_column):
total_rows = len(data)
target_values = data[target_column].unique()
entropy = 0
for value in target_values:
value_count = len(data[data[target_column] == value])
proportion = value_count / total_rows
entropy -= proportion * math.log2(proportion) if proportion != 0 else 0
return entropy
entropy_outcome = calculate_entropy(df, 'Outcome')
print(f"Entropy of the dataset: {entropy_outcome}")
def calculate_information_gain(data, feature, target_column):
unique_values = data[feature].unique()
weighted_entropy = 0
for value in unique_values:
subset = data[data[feature] == value]
proportion = len(subset) / len(data)
weighted_entropy += proportion * calculate_entropy(subset,
target_column)
information_gain = entropy_outcome - weighted_entropy
return information_gain
for column in [Link][:-1]:
entropy = calculate_entropy(df, column)
information_gain = calculate_information_gain(df, column, 'Outcome')
print(f"{column} - Entropy: {entropy:.3f}, Information Gain:
{information_gain:.3f}")
selected_feature = 'DiabetesPedigreeFunction'
clf = DecisionTreeClassifier(criterion='entropy', max_depth=1)
X = df[[selected_feature]]
y = df['Outcome']
[Link](X, y)
[Link](figsize=(8, 6))
plot_tree(clf, feature_names=[selected_feature], class_names=['0', '1'],
filled=True, rounded=True)
[Link]()
def id3(data, target_column, features):
if len(data[target_column].unique()) == 1:
return data[target_column].iloc[0]
if len(features) == 0:
return data[target_column].mode().iloc[0]
best_feature = max(features, key=lambda x: calculate_information_gain(data,
x, target_column))
tree = {best_feature: {}}
features = [f for f in features if f != best_feature]
for value in data[best_feature].unique():
subset = data[data[best_feature] == value]
tree[best_feature][value] = id3(subset, target_column, features)
return tree
Output:
11. Develop a program to implement k-Nearest Neighbour algorithm to classify
data set. Print both correct and wrong predictions. (iris Data set can be used).
i. Correct prediction
ii. Wrong prediction
iii. Confusion matrix
iv. Accuracy
import numpy as np
import pandas as pd
from [Link] import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
dataset = pd.read_csv("[Link]", names=names)
X = [Link][:, :-1]
y = [Link][:, -1]
print([Link]())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)
ypred = [Link](Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label',
'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")
Output:
12. Develop k-Means algorithm for clustering.
i. Confusion matrix
ii. 3D scatter plot
import numpy as np
import [Link] as plt
from [Link] import KMeans
from sklearn import datasets
from [Link] import shuffle
iris = datasets.load_iris()
X = [Link]
y = [Link]
names = iris.feature_names
X, y = shuffle(X, y, random_state=42)
model = KMeans(n_clusters=3, random_state=42)
iris_kmeans = [Link](X)
y = [Link](y, [1, 2, 0]).astype(int)
from [Link] import confusion_matrix
conf_matrix = confusion_matrix(y, iris_kmeans.labels_)
fig, ax = [Link](figsize=(4.5, 4.5))
[Link](conf_matrix, cmap=[Link], alpha=0.3)
for i in range(conf_matrix.shape[0]):
for j in range(conf_matrix.shape[1]):
[Link](x=j, y=i, s=conf_matrix[i, j], va='center', ha='center', size='xx-
large')
[Link]('Predictions', fontsize=18)
[Link]('Actuals', fontsize=18)
[Link]('Confusion Matrix', fontsize=18)
[Link]()
fig = [Link](figsize=(15, 10))
ax1 = fig.add_subplot(1, 2, 1, projection='3d')
[Link](X[:, 3], X[:, 0], X[:, 2], c=iris_kmeans.labels_.astype(float),
edgecolor="k", s=150)
ax1.view_init(20, -50)
ax1.set_xlabel(names[3], fontsize=12)
ax1.set_ylabel(names[0], fontsize=12)
ax1.set_zlabel(names[2], fontsize=12)
ax1.set_title("K-Means Clusters for the Iris Dataset", fontsize=12)
ax2 = fig.add_subplot(1, 2, 2, projection='3d')
for label, name in enumerate(['virginica', 'setosa', 'versicolor']):
ax2.text3D(X[y == label, 3].mean(), X[y == label, 0].mean(), X[y == label,
2].mean() + 2, name, horizontalalignment="center")
[Link](X[:, 3], X[:, 0], X[:, 2], c=y, edgecolor="k", s=150)
ax2.view_init(20, -50)
ax2.set_xlabel(names[3], fontsize=12)
ax2.set_ylabel(names[0], fontsize=12)
ax2.set_zlabel(names[2], fontsize=12)
ax2.set_title("Actual Labels for the Iris Dataset", fontsize=12)
[Link]()
Output:
13 Develop a program for Agglomerative clustering algorithm.
i. Scatter plot
ii. Dendogram
import numpy as np
import [Link] as plt
import pandas as pd
dataset = pd.read_csv('Mall_Customers.csv')
x = [Link][:, [3, 4]].values
import [Link] as shc
dendro = [Link]([Link](x, method="ward"))
[Link]("Dendrogram Plot")
[Link]("Euclidean Distances")
[Link]("Customers")
[Link]()
from [Link] import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters=5, linkage='ward')
y_pred = hc.fit_predict(x)
[Link](x[y_pred == 0, 0], x[y_pred == 0, 1], s=100, c='blue', label='Cluster
1')
[Link](x[y_pred == 1, 0], x[y_pred == 1, 1], s=100, c='green', label='Cluster
2')
[Link](x[y_pred == 2, 0], x[y_pred == 2, 1], s=100, c='red', label='Cluster
3')
[Link](x[y_pred == 3, 0], x[y_pred == 3, 1], s=100, c='cyan', label='Cluster
4')
[Link](x[y_pred == 4, 0], x[y_pred == 4, 1], s=100, c='magenta',
label='Cluster 5')
[Link]('Clusters of Customers')
[Link]('Annual Income (k$)')
[Link]('Spending Score (1-100)')
[Link]()
[Link]()
Output:

You might also like