Data Analysis

Data analysis involves inspecting, cleaning, transforming, and modeling data in order to discover useful information. For example, by inspecting pedal length, you might find it to be a good indicator of iris type.

Principal Component Analysis

Principal Component Analysis (PCA) is used to reduce a vector from a higher dimensional space down to a lower dimensional space, while still maintaining most of the information the data provides.

• Sklearn

Docs

# Load dataset
from sklearn.datasets import load_iris
iris = load_iris()
data = iris.data  # (150, 4)
target = iris.target  # (150, 1)

# --- PCA --- #
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
data = pca.fit_transform(data)  # (150, 2)
# ----------- #

# Plot
import matplotlib.pyplot as plt
points = {0: [], 1: [], 2: []}
for i in range(len(target)):
    points[target[i]].append(data[i])

for label, dat in points.items():
    x, y = zip(*dat)
    plt.scatter(x, y, label=label)

plt.plot()
PCA Example