CHAPTER – 3
MINING FREQUENT PATTERNS,
ASSOCIATION AND
CORRELATIONS
Data Mining:
• Data Mining is the technique that extracting information from huge sets of data.
Data mining is the procedure of mining knowledge from data.
• Data Mining is defined as, “ to extracting or mining knowledge from massive
amount of datasets.”
The essential step in the process of knowledge discovery:
1. Data Cleaning: In this step, the noise and inconsistent data is removed and
cleaned.
2. Data Integration: In this step, multiple data sources are combined.
3. Data Selection: In this step, data relevant to the analysistasks are retriwed from
the dataset.
4. Data Transformation: In this step, data is transformed or consolidation into
forms appropriate for mining by performing aggregation or summary operations.
5. Data Mining: In this step, intelligent methods are applied in order to extract data
patterns.
6. Pattern Evaluation: In this step, data patterns are evaluated.
7. Knowledge Presentation: In this step, knowledge is represented.
Advantages of Data Mining:
• Data mining is a quick process that makes it easy for new users to analyze enormous amounts of
data in short time.
• It helps to enables organizations to obtain knowledge-based data.
• Compare to other statistical data application, data mining is efficient and cost efficient.
• It helps in decision making process.
• Helps to predict future trends.
Disadvantages of Data Mining:
• It violates the privacy of its users and that is why it lacks in the safety matter and security to user.
• Identity is a big issue when using data mining.
• Data mining techniques is not a 100% accurate and may cause serious consequences in certain
condition.
• It requires its own space as well as maintenance. This an greatly increase the implementation cost.
What kind of Patterns Can be Mined?
• Patterns that occur frequently in data. Finding frequent patterns plays an essential
role in mining associations, correlations, and many other relationships among data.
• There are number of data mining functionalities such as characterization and
discrimination , the mining of frequent patterns, association and correlations,
classification and regression, clustering analysis and outlier analysis.
• In general tasks can be classified into two categories namely, descriptive and
predictive.
• Descriptive mining tasks characterize properties of the data in a target dataset.
• Predictive mining tasks performs induction on the current data in order to make predictions.
Class/Concept Description:
• Class/Concept refers to data to be associated with the classes or concept.
• Class groups similar items into categories (e.g., "computers" or "printers"),
focusing on what the data represents.
• Concept describes characteristics or behaviors (e.g., "big spenders" or "budget
spenders"), focusing on how the data behaves or is perceived.
• Classes organize data for structure, while concepts provide insights for decisions or
patterns. For example, businesses use classes to group products and concepts to
target customer behavior.
Characterization and Discrimination:
Data Characterization:
• It refers to summarizing data of class under study. This class under study is called as target class.
• It is summarization of the general characteristics or features of a target class of data.
• The data corresponding to user-specified class are typically collected by a query.
Data Discrimination:
• It refers to mapping or classification of a class with some predefined group or class.
• Data Discrimination is comparison of the general features of the target class data objects against general
features of objects from one or multiple contrasting classes.
• Ex: a university analyzing student performance:
• Target Class: High-performing students (scoring above 85%).
• Contrasting Class: Average-performing students (scoring between 50%–70%).
• Comparison (Discrimination): High performers: Spend more time studying, attend extra sessions, and
participate in projects.
• Average performers: Spend less time studying and focus only on exams.
• Purpose: By comparing features, the university identifies patterns to help average performers improve
and better allocate resources for student success.
Mining Frequent Patterns, Associations and Correlations:
Mining of Frequent Patterns:
• There are many kinds of frequent patterns, including itemsets, frequent
subsequences and frequent substructures.
• Frequent itemset a set of items that often appear together in a transactional dataset.
• Ex.: butter and bread (frequently bought together)
• Frequently occurring subsequence, such as pattern that customers, tend to purchase
in sequential order.
• Ex: Landing Page → Product Page → Add to Cart → Checkout.
• Substructure can refer to different structural forms (i.e. graphs, trees) tat may be
combined with itemsets or subsequences.
Mining of Association:
• Associations are used in retail sales to identify patterns that are frequently
purchased together.
• This process refers to the process of uncovering the relationship among data and
determining association rules.
• Ex: 70% of time milk is sold with bread and only 30% of times biscuits are sold
with bread.
Mining of Correlations:
• It is kind of additional analysis performed to uncover interesting statistical
correlations between associated-attribute-value pair or between two item sets to
analyze that if they have positive, negative or no effect on each other.
Support:
The support of a rule x y (where x and y are each item) is defined as proportion of
transaction in dataset which contain item set x as well as y.
Support = No. of transactions which contain item set x and y / Total no. of
transactions.
Confidence:
Confidence (xy) = Support (xy) / Support(x)
Apriori Algorithm:
Frequent Pattern Growth Algorithm: