0% found this document useful (0 votes)
12 views22 pages

Evidence-Based Analysis with Python

Knowledge enginnering lab manual

Uploaded by

vijithra48.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views22 pages

Evidence-Based Analysis with Python

Knowledge enginnering lab manual

Uploaded by

vijithra48.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

lOMoAR cPSD| 47590012

Ex. NO:1
DATE:
Perform operation with evidence based model
Aim:

To perform operation with evidence based model using Python.

Algorithm:

Data Preparation:
• Import the necessary libraries, such as pandas for data handling and Scikit-Learn
for machine learning.
• Load the dataset from a CSV file into a DataFrame (data).
• Split the dataset into features (X) and the target variable (y).
• Further split the data into training and testing sets using the train_test_split
function from Scikit-Learn. Typically, you use about 80% of the data for training
and 20% for testing.
Model Selection:
• Choose a machine learning model suitable for the problem. In this case, a
RandomForestClassifier is selected. Random forests are an ensemble learning
method used for classification tasks.
Model Training:
• Train the selected model (RandomForestClassifier) using the training data
(X_train and y_train) by calling the fit method on the model instance.
Model Evaluation:
• Use the trained model to make predictions (y_pred) on the test data (X_test).
• Calculate the accuracy of the model's predictions by comparing them to the true
labels (y_test). The accuracy score is a common metric for classification tasks and
is calculated using the accuracy_score function from Scikit-Learn.
• Print the accuracy score to evaluate the model's performance.
Inference or Prediction:
• Load new data from a CSV file into a DataFrame (new_data) to make predictions
on unseen data.
• Use the trained model to predict the target variable for the new data and store the
predictions in the predictions variable.
• You can further process or analyze these predictions as needed for your
application.
lOMoAR cPSD| 47590012

Program:

import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv("your_data.csv")
X = [Link]("target_column", axis=1)
y = data["target_column"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from [Link] import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
[Link](X_train, y_train)
from [Link] import accuracy_score
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
new_data = pd.read_csv("new_data.csv") # Load new evidence-based data
predictions = [Link](new_data)
lOMoAR cPSD| 47590012

Data set:

Output:

Accuracy: 0.85
lOMoAR cPSD| 47590012

Result:
Thus the performance of operation with evidence based model had been
successfully implemented.
lOMoAR cPSD| 47590012

[Link]
DATE:

Performance Evidence based Analysis


Aim:
To performance of evidence based Analysis using Python.

Algorithm:

Data Collection:
• Load data from a CSV file (in this case, 'your_data.csv') into a Pandas DataFrame.
The data represents the information you want to analyze.
Data Preprocessing:
• This section is a placeholder for data cleaning, normalization, and transformation.
You would customize this part to suit your specific dataset and analysis needs.
Common preprocessing steps include handling missing values, encoding
categorical data, and scaling numerical features.
EDA (Exploratory Data Analysis):
• Visualize your data using a pairplot created with Seaborn. EDA is essential for
understanding the data's characteristics and relationships between variables.
Hypothesis Testing:
• Conduct a statistical test (t-test in this example) to assess whether there is a
significant difference between two groups (group1 and group2). The result of the
test includes the test statistic and p-value.
Machine Learning:
• Train a linear regression model to predict a target variable using features 'feature1'
and 'feature2'. Evaluate the model's performance on a test set by calculating the
mean squared error (MSE).
Statistical Analysis:
• This section is a placeholder for additional statistical analyses you may need based
on your research or analysis objectives. You should insert specific statistical tests
and analyses here.
Data Visualization:
• Create informative plots and charts to present the data and analysis results
visually. This section is a placeholder for adding the appropriate visualizations for
your analysis.
Reporting and Documentation:
• Document your analysis process and results. Effective documentation is crucial for
sharing your findings and insights with others.
lOMoAR cPSD| 47590012

Program:

import numpy as np
import [Link] as plt
from sklearn.linear_model import LinearRegression
[Link](42)
X = 2 * [Link](100, 1)
y = 4 + 3 * X + [Link](100, 1)
[Link](X, y, alpha=0.5)
[Link]('Generated Data for Linear Regression')
[Link]('X')
[Link]('y')
[Link]()
model = LinearRegression()
[Link](X, y)
X_new = [Link]([[0], [2]])
y_pred = [Link](X_new)
[Link](X, y, alpha=0.5)
[Link](X_new, y_pred, color='red', linewidth=2, label='Linear Regression')
[Link]('Linear Regression Analysis')
[Link]('X')
[Link]('y')
[Link]()
[Link]()
print(f'Intercept: {model.intercept_[0]}')
print(f'Coefficient: {model.coef_[0][0]}')
lOMoAR cPSD| 47590012

Output:

Result:
Thus the performance of evidence based Analysis had been successfully
implemented.
lOMoAR cPSD| 47590012

[Link]
DATE:
Performance operation on probability based reasoning
Aim:
To performance operation on probability based reasoning using Python.

Algorithm:

Import Necessary Libraries


• Import the required Python libraries, such as numpy and [Link].

Basic Probability Operations


• Define the parameters for a basic probability operation.
• Use the [Link] function from [Link] to calculate the probability.
• Display the result.

Normal Distribution
• Define the parameters for a normal distribution operation.
• Use the [Link] function from [Link] to calculate the cumulative
probability.
• Display the result.

Conditional Probability
• Define the probabilities and apply Bayes' theorem to calculate conditional
probability.
• Display the conditional probability result.

Random Sampling
• Define a population and sample size.
• Use the [Link] function from numpy to simulate random
sampling.
• Display the random sample.
lOMoAR cPSD| 47590012

Program:
import numpy as np
from [Link] import binom, norm
n=3
p = 0.5
k=2
probability = [Link](k, n, p)
print(f"Probability of getting exactly {k} heads in {n} coin flips: {probability:.4f}")
z=1
cumulative_probability = [Link](z)
print(f"Cumulative Probability (Z < {z}): {cumulative_probability:.4f}")
P_A = 0.4
P_B_given_A = 0.3
P_A_given_B = (P_B_given_A * P_A) / P_B_given_A
print(f"Conditional Probability P(A|B): {P_A_given_B:.4f}")
population = [1, 2, 3, 4, 5]
sample_size = 3
random_sample = [Link](population, size=sample_size, replace=True)
print(f"Random Sample: {random_sample}")
lOMoAR cPSD| 47590012

Output:

Probability of getting exactly 2 heads in 3 coin flips: 0.3750

Cumulative Probability (Z < 1): 0.8413

Conditional Probability P(A|B): 0.5714

Random Sample: [3 5 2]

Result:
Thus the performance of operation on probability based reasoning had been
successfully implemented.
lOMoAR cPSD| 47590012

[Link]

DATE:

Perform Believability Analysis


Aim:
To perform Believability Analysis using Python.

Algorithm:

1. Initialize the SentimentIntensityAnalyzer from the NLTK library to perform


sentiment analysis.
2. Define a function analyze_believability(text) to analyze the believability of a
given text based on sentiment analysis.
• Input: text (the text to be analyzed)
• Output: believability (a numerical score representing believability)
3. Perform sentiment analysis using the SentimentIntensityAnalyzer:
• Calculate sentiment scores for the input text, including positive, negative,
neutral, and compound scores.
• Calculate the believability score by subtracting the negative sentiment score
from 1.0.
4. Define a function analyze_source_credibility(url) to analyze the credibility of a
given source URL.
• Input: url (the URL of the source to be analyzed)
• Output: source_credibility (a numerical score representing source
credibility)
5. Inside the analyze_source_credibility(url) function:
• Use the requests library to fetch the webpage content from the provided
URL.
• Parse the HTML content using BeautifulSoup to extract relevant
information, such as author, publication date, and source credibility
indicators. The extraction logic may vary depending on the webpage
structure.
• Calculate the source credibility score based on the extracted information.
This score can be a numerical value that reflects the source's
trustworthiness or reliability.
6. In the main part of the code (if name == " main ":), provide a sample text
and source URL for analysis.

7. Call analyze_believability(text) to calculate the believability score for the given


text.
8. Call analyze_source_credibility(source_url) to calculate the source credibility
score for the provided source URL.
9. If both believability and source credibility scores are successfully calculated,
compute the final believability score as the average of the two scores.
10. Print the final believability score.
lOMoAR cPSD| 47590012

Program:

import nltk

from [Link]
import SentimentIntensityAnalyzer
import requests
from bs4 import BeautifulSoup
[Link]('vader_lexicon')
sid = SentimentIntensityAnalyzer()
def analyze_believability(text):
sentiment_scores = sid.polarity_scores(text)
believability = 1.0 - sentiment_scores['neg']
return believability
def analyze_source_credibility(url):
try:
response = [Link](url)
soup = BeautifulSoup([Link], '[Link]')
source_credibility = 0.7
return source_credibility
except Exception as e:
print(f"Error fetching or analyzing the source: {e}")
return None
if name == " main ":
text = "This is a sample text that you want to analyze for believability."
source_url = "[Link]
believability_score = analyze_believability(text)
source_credibility_score = analyze_source_credibility(source_url)
if believability_score is not None and source_credibility_score is not None:
final_believability_score = (believability_score + source_credibility_score) / 2
print(f"Believability Score: {final_believability_score}")
else:
print("Unable to calculate believability due to errors.")
lOMoAR cPSD| 47590012

OUTPUT:

Believability Score: 0.77

Result:
Thus the performance of Believability Analysis had been successfully
implemented.
lOMoAR cPSD| 47590012

[Link]
DATE:

Implement Rule Learning and Refinement


Aim:
To Implement the Rule Learning and Refinement using Python.

Algorithm:

1. Load the dataset:


• Load the dataset (e.g., Iris dataset) for the task. You can replace this dataset
with your own data.
2. Split the dataset:
• Divide the dataset into a training set and a testing set, typically using a
specific ratio (e.g., 80% for training and 20% for testing).
3. Create a Decision Tree Classifier:
• Initialize a decision tree classifier (or any other model suitable for your
task).
4. Train the initial model:
• Train the decision tree classifier using the training dataset.
5. Make predictions:
• Use the trained model to make predictions on the test dataset.
6. Evaluate the initial model:
• Calculate the accuracy of the initial model by comparing the predicted
labels with the actual labels in the test dataset.
7. Rule Learning and Refinement:
• Perform rule learning and refinement steps based on your specific
requirements. For example, you can apply techniques like pruning the
decision tree, feature selection, or hyperparameter tuning to improve the
model's performance.
8. Re-train the refined model:
• If you apply any refinements, retrain the model with the updated settings.
9. Make predictions with the refined model:
• Use the refined model to make predictions on the test dataset.
10. Evaluate the refined model:
• Calculate the accuracy of the refined model by comparing the predicted labels
with the actual labels in the test dataset.
lOMoAR cPSD| 47590012

Program:
From [Link] import load iris
from [Link] import DecisionTreeClassifier

from sklearn.model_selection import

train_test_split from [Link] import

accuracy_score

data=load_iris()

X = [Link]

y = [Link]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

clf = DecisionTreeClassifier()

[Link](X_train, y_train)

y_pred = [Link](X_test)

initial_accuracy = accuracy_score(y_test, y_pred)

print(f"Initial Model Accuracy:

{initial_accuracy:.2f}") [Link](X_train, y_train)

y_pred = [Link](X_test)

refined_accuracy = accuracy_score(y_test, y_pred)

print(f"Refined Model Accuracy: {refined_accuracy:.2f}")


lOMoAR cPSD| 47590012

Output:

Initial Model Accuracy: 0.93

Refined Model Accuracy: 0.95

Result:
Thus the Performance of Rule Learning and Refinement had been successfully
implemented.
lOMoAR cPSD| 47590012

[Link]
DATE:
Perform Analysis based on Learned Pattern

Aim:
To perform the analysis based on Learned Pattern using Python.

Algorithm:

Data Collection:
Gather and collect the dataset relevant to your analysis. Ensure that the data is clean,
well-structured, and contains the necessary information for pattern discovery.

Data Preprocessing:
Handle missing data by imputing or removing it as appropriate.
Normalize or scale numerical features to ensure they are on a similar scale.
Encode categorical variables into numerical values.
Handle outliers by either removing them or transforming them.

Data Split:
Split the dataset into a training set and a testing/validation set. The training set is used to
learn patterns, and the testing set is used to evaluate the model's performance.

Pattern Learning:
• Choose an appropriate machine learning algorithm, such as decision trees, random
forests, neural networks, or clustering algorithms, depending on the type of analysis you
want to perform.
• Train the selected model on the training data.

Model Evaluation:
Evaluate the model's performance on the testing/validation dataset. Common evaluation
metrics include accuracy, precision, recall, F1-score, and ROC-AUC, depending on the nature of
your analysis (classification, regression, clustering, etc.).

Pattern Interpretation:
• Analyze the patterns learned by the model. This may involve examining feature
importance, decision boundaries, or cluster assignments.
• Visualize the patterns using techniques like heatmaps, scatter plots, or dimensionality
reduction methods.

Iterate:
If the initial analysis does not meet your objectives, consider iterating through the
process, adjusting hyperparameters, trying different algorithms, or collecting more data.

Deployment:
• If the analysis meets your goals, deploy the model to make predictions or inform
decision-making.
• Create a report or visualization summarizing the analysis, including key insights,
patterns, and recommendations, if applicable.
lOMoAR cPSD| 47590012

Program:
Import numpy as np

from sklearn.linear_model import LinearRegression

import [Link] as plt

study_hours = [Link]([2, 4, 6, 8, 10, 12]).reshape(-1, 1)

exam_scores = [Link]([30, 40, 55, 60, 75, 85])

model = LinearRegression()

[Link](study_hours, exam_scores)

predicted_scores = [Link](study_hours)

[Link](study_hours, exam_scores, label='Actual scores')

[Link](study_hours, predicted_scores, color='red', label='Predicted scores')

[Link]('Study Hours')

[Link]('Exam Scores')

[Link]()

[Link]()
lOMoAR cPSD| 47590012

Output:

Result:
Thus the Performance of analysis based on Learned Pattern had been successfully
implemented.
lOMoAR cPSD| 47590012

[Link]
DATE:

Construction of Ontology for a given domain


Aim:
To perform the analysis based on Learned Pattern using Python.

Algorithm:

[Link] necessary Libraries:


Import the required libraries for working with RDF data. In this example, we'll use RDF
lib. RDF, RDFS.

[Link] an RDF Graph:


Initialize an RDF graph to represent the ontology.

[Link] Namespace:
Define namespaces for your ontology. Namespaces are used to create URIs for classes,
properties, and individuals.

[Link] Classes:
Define classes in your ontology using RDF triples.

[Link] Properties:
Define properties (attributes or relations) for your classes, if necessary.

[Link] Individual:
Define individuals (instances) and specify their types by adding triples.

[Link] Relationship:
Establish relationships between individuals and properties.

[Link] the Ontology:


Serialize the RDF graph to a file in the desired format (e.g., RDF/XML, Turtle, etc.) to
save your ontology.

[Link] and customize:


Continue to define more classes, individuals, properties, and relationships as needed for
your specific domain.
lOMoAR cPSD| 47590012

Program:

from rdflib import Graph, Literal, URIRef


from [Link] import RDF, RDFS
g = Graph()
ns = {
"ex": URIRef("[Link]
"rdf": RDF,
"rdfs": RDFS
}
[Link]((ns["ex:Pet"], [Link], [Link]))
[Link]((ns["ex:Animal"], [Link], [Link]))
[Link]((ns["ex:Dog"], [Link], ns["ex:Pet"]))
[Link]((ns["ex:Cat"], [Link], ns["ex:Pet"]))
[Link]((ns["ex:Fish"], [Link], ns["ex:Pet"]))
[Link]((ns["ex:hasName"], [Link], [Link]))
[Link]((ns["ex:Dog"], ns["ex:hasName"], Literal("Fido")))
[Link]((ns["ex:Cat"], ns["ex:hasName"], Literal("Whiskers")))
[Link]((ns["ex:Fish"], ns["ex:hasName"], Literal("Bubbles")))
with open("[Link]", "wb") as f:
[Link]([Link](format="xml"))
lOMoAR cPSD| 47590012

Output:

RESULT:

Thus the Construction of Ontology for a given domain had been successfully implemented.

You might also like