Python Examples¶
How to Install¶
To install the package from PyPI, call:
pip3 install deadwood # python3 -m pip install deadwood
Basic Use¶
Note
This section is a work in progress. In the meantime, take a look at the examples in the reference manual.
To learn more about Python, check out my open-access textbook Minimalist Data Wrangling in Python.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import deadwood
Example noisy dataset[1]:
X1 = np.loadtxt("chameleon_t7_10k.data.gz", ndmin=2)
deadwood.plot_scatter(X1, asp=1, alpha=0.3)
plt.show()
Figure 1 The chameleon_t7_10k dataset¶
Detect outliers with Deadwood (default settings):
is_outlier = deadwood.Deadwood().fit_predict(X1)
deadwood.plot_scatter(X1, asp=1, labels=(is_outlier<0), alpha=0.3)
plt.show()
Figure 2 Outlier detection on chameleon_t7_10k¶
Fraction of detected outliers:
np.mean(is_outlier<0)
## np.float64(0.1014)
Clusters of Unequal Densities¶
The above dataset consists of clusters of relatively equal densities. Here is another one, where it is clearly not the case.
X2 = np.loadtxt("chameleon_t8_8k.data.gz", ndmin=2)
deadwood.plot_scatter(X2, asp=1, alpha=0.3)
plt.show()
Figure 3 The chameleon_t8_8k dataset¶
Detect outliers with Deadwood (default settings):
is_outlier = deadwood.Deadwood().fit_predict(X2)
deadwood.plot_scatter(X2, asp=1, labels=(is_outlier<0), alpha=0.3)
plt.show()
Figure 4 Outlier detection on chameleon_t8_8k¶
Detect outliers with Deadwood, separately in each cluster detected by Genie:
import genieclust
clusters = genieclust.Genie(n_clusters=10, M=5).fit(X2)
is_outlier = deadwood.Deadwood().fit_predict(clusters)
deadwood.plot_scatter(X2, asp=1, labels=(is_outlier<0), alpha=0.3)
plt.show()
Figure 5 Outlier detection on clusters of chameleon_t8_8k¶