* Different choice of distance functions yields different measures of similarity
* Distance functions implicitly assign more weighting to features with large anges than to those with small ranges
* Rule of thumb: when no a priori domain knowledge is available, clustering should follow the principle of equal weightings to each attribute \[Mirkin, 2005\]
* This necessitates need for normalization/data pre-processing/feature scaling of feature vectors.