## What is class separability?

### Theoretical Minimum Error and Overlapping

In the example below we have two classes: $C_0$ et $C_1$. The points of class $C_0$ follow a normal distribution of variance 4. The points of class $C_1$ follow a normal distribution of variance 1. Class $C_0$ represents 90% of the data set and class $C_1$ represents 10%.
The following image represents a dataset containing 50 points as well as the theoretical distributions of the two classes in the corresponding proportions. The overlapping of the two classes is varied by changing the average of class $C_1$.

The theoretical minimum error probability is given by the area below the minimum of the two overlapping curves. It is given by the following expression.
$$P(false)=\int_RP(false|x)P(x)dx=\int_R min(P(x|C_0), P(x|C_1))dx$$
This probability could be used as a separability measure because it measures the overlapping between the two distributions of classes $C_1$ and $C_0$. However, in practice we cannot calculate this integral because we do not have the exact expression of the probability densities.

### Separability in the linear case

Another expression of class separability is given by wikipedia in the linear case:

Let $X_0$ and $X_1$ be two sets of points in a n-dimensional Euclidean space. Then $X_0$ and $X_1$ are linearly separable if there are $n+1$ real numbers $w_1,w_2,…w_n,k$ such that for any $x \in X_0 \sum_{i=1}^n w_ix_i>k$ and for any $x \in X_1 : \sum_{i=1}^n w_ix_i<k$. <=”” p=””> However, it does not give any separability measures to be used in concrete cases. </k$.> ## My trick: supervised clustering ### In theory In the absence of a ready-made separability measure, I have found a way to estimate the separability of classes: 1. Perform clustering with an algorithm appropriate to your dataset. See scikit learn page. 2. Choose k, the number of clusters consistent with silhouette analysis. See sklearn.. 3. For these k-values, estimate the separability of the classes by measuring clusters homogeneity, see sklearn. 4. Choose the k giving the best homogeneity. This measure involves the conditional entropy of the class conditionally to the cluster,$H(C|K)$, normalized by the entropy of the$H(C)$class. The lower the conditional entropy, the more important the information given by the K cluster on class C is, and therefore the more homogeneous the clusters are. The homogeneity score$h\$, limited between 0 and 1, is as follows with a maximum value of 1 (perfect homogeneity):

$$h=1-\frac{H(C|K)}{H(C)}$$