Tuesday, 26 March 2019

Normalization vs Standardization data

Introduction
- Your data may contain features with a mixtures of scales.
- Many machine learning methods expect the data features have the same scale.
- Two popular data scaling methods are normalization and standardization.
- One disadvantage of normalization over standardization is that it loses some information in the data, especially about outliers.

Normalization
  Normalization refers to rescaling numeric valued features into the range 0 and 1.
  For example, in image processing, where pixel intensities have to be normalized to fit within a certain range.
 Standardization
  Standardization rescales data to have a mean (μ) of 0 and standard deviation (σ) of 1.
  For example, in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures.

Most algorithms will probably benefit from standardization more so than from normalization => choose what works best for your problem.

0 comments:

Post a Comment