Assignment 3
ME 781
(dissimilarity/similarity measure and testing)
1. Create a Python function/subroutine to calculate the dissimilarity and similarity between two data
points.
a. It should take 2 data points and a dissimilarity/similarity measure as inputs (and any additional data
needed for calculation) and return both the dissimilarity and similarity between the data points based
on that.
- Parameters:
- Data Point 1: 1D array of data type float or int
- Data Point 2: 1D array of data type float or int
- Measure: String (abbreviation for the dissimilarity/similarity measure as given in Table 1)
- (Optional) Additional data: Any additional data required by the dissimilarity/similarity measure
- Returns:
- Dissimilarity and Similarity between the data points: 2-tuple (Float, Float)
- Just for uniformity in the submissions, return (dissimilarity, similarity)
PS: Any dissimilarity measure can be used to define a corresponding similarity measure
and vice versa.
b. The subroutine should be robust to wrong inputs
- If any of the 3 (or 4) arguments are not in the expected format, then it should be able to detect it
and print an appropriate message on the debug console.
- It should not crash under any circumstances, as long as we pass it 3 (or 4) arguments.
- Basically, for any arguments, it should either return the 2-tuple (dissimilarity, similarity) or it should
give an appropriate reason for not being able to compute it.
Submission guidelines: Submit a single .ipynb file containing all the required functions.
Table 1: Dissimilarity/ Similarity Measures
Dissimilarity/Similarity Measure Abbreviation Additional data
Euclidean norm EN None
Frobenius or Hilbert Schmidt norm HSN None
Diagonal norm DN Diagonal matrix data (a vector)
Mahalanobis norm MN All “n” data points for computing
covariance matrix
Lebesgue or Minkowski norm LMN Alpha value
Cosine CS None
Overlap OS None
Dice DS None
Jaccard JS None
P.S.: In the slides, all vectors are row vectors not column vectors. For calculating the covariance matrix in
Mahalanobis norm, use p (distribution dimension) mutually independent data points in order to ensure
that covariance matrix is not singular.