Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Certainly! While k-means clustering is not inherently designed for semantic classification tasks like sorting images of cats and dogs into separate categories based on their labels, it can still be used to cluster images based on pixel-level similarities. This might not align perfectly with the true semantic classes (cats and dogs), but it can still provide some insights or groupings.
- Let's refine the steps to ensure we handle the dataset properly and explore the clustering results.
- ### Step 1: Load the Dataset
- First, let's load the Cats vs Dogs dataset as before:
- ```python
- import tensorflow as tf
- from tensorflow.keras.preprocessing.image import ImageDataGenerator
- import os
- import numpy as np
- from sklearn.cluster import KMeans
- import shutil
- from matplotlib import pyplot as plt
- import cv2
- # Download and extract the dataset
- !wget https://storage.googleapis.com/tensorflow-1-public/course2/cats_and_dogs_filtered.zip -O /tmp/cats_and_dogs_filtered.zip
- !unzip -q /tmp/cats_and_dogs_filtered.zip -d /tmp
- base_dir = '/tmp/cats_and_dogs_filtered'
- train_dir = os.path.join(base_dir, 'train')
- # Collect all images from the training set
- image_files = []
- for class_name in os.listdir(train_dir):
- class_dir = os.path.join(train_dir, class_name)
- for image_file in os.listdir(class_dir):
- image_files.append(os.path.join(class_dir, image_file))
- # Load and resize images to a fixed size (e.g., 150x150)
- images = []
- for image_file in image_files[:200]: # Limit to a subset of images for simplicity
- img = cv2.imread(image_file)
- img = cv2.resize(img, (150, 150))
- images.append(img)
- # Convert list of images to a numpy array and normalize pixel values
- images = np.array(images) / 255.0
- print(f"Loaded {len(images)} images")
- ```
- ### Step 2: Apply K-Means Clustering
- We will flatten each image into a vector and apply k-means clustering:
- ```python
- # Flatten each image to create feature vectors
- flattened_images = images.reshape(images.shape[0], -1)
- # Apply k-means clustering
- kmeans = KMeans(n_clusters=2, random_state=42)
- labels = kmeans.fit_predict(flattened_images)
- # Create directories for clustered images
- cluster_dir_0 = '/tmp/cluster_0'
- cluster_dir_1 = '/tmp/cluster_1'
- os.makedirs(cluster_dir_0, exist_ok=True)
- os.makedirs(cluster_dir_1, exist_ok=True)
- # Save the images to their respective cluster folders
- for idx, label in enumerate(labels):
- src_image_path = image_files[idx]
- filename = os.path.basename(src_image_path)
- if label == 0:
- dst_image_path = os.path.join(cluster_dir_0, filename)
- else:
- dst_image_path = os.path.join(cluster_dir_1, filename)
- shutil.copy(src_image_path, dst_image_path)
- print(f"Clustered images into {cluster_dir_0} and {cluster_dir_1}")
- # Visualize a few images from each cluster
- fig, axes = plt.subplots(2, 5, figsize=(15, 6))
- for i in range(5):
- ax = axes[0, i]
- ax.imshow(images[np.where(labels == 0)[0][i]])
- ax.axis('off')
- ax = axes[1, i]
- ax.imshow(images[np.where(labels == 1)[0][i]])
- ax.axis('off')
- plt.tight_layout()
- plt.show()
- ```
- ### Step 3: Build a Binary Classification CNN
- Now, let's build and train a binary classification CNN using the original dataset:
- ```python
- from tensorflow.keras.models import Sequential
- from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
- # Define the model architecture
- model = Sequential([
- Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
- MaxPooling2D((2, 2)),
- Conv2D(64, (3, 3), activation='relu'),
- MaxPooling2D((2, 2)),
- Conv2D(128, (3, 3), activation='relu'),
- MaxPooling2D((2, 2)),
- Flatten(),
- Dense(512, activation='relu'),
- Dropout(0.5),
- Dense(1, activation='sigmoid')
- ])
- # Compile the model
- model.compile(
- loss='binary_crossentropy',
- optimizer='adam',
- metrics=['accuracy']
- )
- # Image data generator for training and validation datasets
- train_datagen = ImageDataGenerator(rescale=1./255)
- val_datagen = ImageDataGenerator(rescale=1./255)
- train_generator = train_datagen.flow_from_directory(
- train_dir,
- target_size=(150, 150),
- batch_size=20,
- class_mode='binary'
- )
- validation_dir = os.path.join(base_dir, 'validation')
- validation_generator = val_datagen.flow_from_directory(
- validation_dir,
- target_size=(150, 150),
- batch_size=20,
- class_mode='binary'
- )
- # Train the model
- history = model.fit(
- train_generator,
- steps_per_epoch=100,
- epochs=30,
- validation_data=validation_generator,
- validation_steps=50
- )
- ```
- ### Analysis
- - **K-Means Clustering**: The k-means algorithm will group images based on pixel-level similarities. This might not perfectly separate cats and dogs, but it can still provide some interesting clusters. You can visualize the images in each cluster to see if they resemble certain patterns or groups of images.
- - **Binary Classification CNN**: The CNN model is trained to classify images into two categories (cats and dogs) using labeled data. This approach leverages supervised learning and will generally perform better than unsupervised clustering for this task.
- ### Conclusion
- While k-means can provide some insights by grouping images based on pixel-level similarities, it is not suitable for semantic classification tasks like identifying cats and dogs. For such tasks, a supervised learning approach using labeled data with models like CNNs is more effective. However, exploring unsupervised methods can still be valuable for understanding the data and gaining insights into image patterns.
- Feel free to experiment further by adjusting parameters or trying different clustering algorithms!
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement