Advertisement
Guest User

Untitled

a guest
Nov 12th, 2024
169
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.67 KB | None | 0 0
  1. Certainly! While k-means clustering is not inherently designed for semantic classification tasks like sorting images of cats and dogs into separate categories based on their labels, it can still be used to cluster images based on pixel-level similarities. This might not align perfectly with the true semantic classes (cats and dogs), but it can still provide some insights or groupings.
  2.  
  3. Let's refine the steps to ensure we handle the dataset properly and explore the clustering results.
  4.  
  5. ### Step 1: Load the Dataset
  6.  
  7. First, let's load the Cats vs Dogs dataset as before:
  8.  
  9. ```python
  10. import tensorflow as tf
  11. from tensorflow.keras.preprocessing.image import ImageDataGenerator
  12. import os
  13. import numpy as np
  14. from sklearn.cluster import KMeans
  15. import shutil
  16. from matplotlib import pyplot as plt
  17. import cv2
  18.  
  19. # Download and extract the dataset
  20. !wget https://storage.googleapis.com/tensorflow-1-public/course2/cats_and_dogs_filtered.zip -O /tmp/cats_and_dogs_filtered.zip
  21. !unzip -q /tmp/cats_and_dogs_filtered.zip -d /tmp
  22.  
  23. base_dir = '/tmp/cats_and_dogs_filtered'
  24. train_dir = os.path.join(base_dir, 'train')
  25.  
  26. # Collect all images from the training set
  27. image_files = []
  28. for class_name in os.listdir(train_dir):
  29. class_dir = os.path.join(train_dir, class_name)
  30. for image_file in os.listdir(class_dir):
  31. image_files.append(os.path.join(class_dir, image_file))
  32.  
  33. # Load and resize images to a fixed size (e.g., 150x150)
  34. images = []
  35. for image_file in image_files[:200]: # Limit to a subset of images for simplicity
  36. img = cv2.imread(image_file)
  37. img = cv2.resize(img, (150, 150))
  38. images.append(img)
  39.  
  40. # Convert list of images to a numpy array and normalize pixel values
  41. images = np.array(images) / 255.0
  42.  
  43. print(f"Loaded {len(images)} images")
  44. ```
  45.  
  46. ### Step 2: Apply K-Means Clustering
  47.  
  48. We will flatten each image into a vector and apply k-means clustering:
  49.  
  50. ```python
  51. # Flatten each image to create feature vectors
  52. flattened_images = images.reshape(images.shape[0], -1)
  53.  
  54. # Apply k-means clustering
  55. kmeans = KMeans(n_clusters=2, random_state=42)
  56. labels = kmeans.fit_predict(flattened_images)
  57.  
  58. # Create directories for clustered images
  59. cluster_dir_0 = '/tmp/cluster_0'
  60. cluster_dir_1 = '/tmp/cluster_1'
  61.  
  62. os.makedirs(cluster_dir_0, exist_ok=True)
  63. os.makedirs(cluster_dir_1, exist_ok=True)
  64.  
  65. # Save the images to their respective cluster folders
  66. for idx, label in enumerate(labels):
  67. src_image_path = image_files[idx]
  68. filename = os.path.basename(src_image_path)
  69. if label == 0:
  70. dst_image_path = os.path.join(cluster_dir_0, filename)
  71. else:
  72. dst_image_path = os.path.join(cluster_dir_1, filename)
  73. shutil.copy(src_image_path, dst_image_path)
  74.  
  75. print(f"Clustered images into {cluster_dir_0} and {cluster_dir_1}")
  76.  
  77. # Visualize a few images from each cluster
  78. fig, axes = plt.subplots(2, 5, figsize=(15, 6))
  79. for i in range(5):
  80. ax = axes[0, i]
  81. ax.imshow(images[np.where(labels == 0)[0][i]])
  82. ax.axis('off')
  83.  
  84. ax = axes[1, i]
  85. ax.imshow(images[np.where(labels == 1)[0][i]])
  86. ax.axis('off')
  87.  
  88. plt.tight_layout()
  89. plt.show()
  90. ```
  91.  
  92. ### Step 3: Build a Binary Classification CNN
  93.  
  94. Now, let's build and train a binary classification CNN using the original dataset:
  95.  
  96. ```python
  97. from tensorflow.keras.models import Sequential
  98. from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
  99.  
  100. # Define the model architecture
  101. model = Sequential([
  102. Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
  103. MaxPooling2D((2, 2)),
  104. Conv2D(64, (3, 3), activation='relu'),
  105. MaxPooling2D((2, 2)),
  106. Conv2D(128, (3, 3), activation='relu'),
  107. MaxPooling2D((2, 2)),
  108. Flatten(),
  109. Dense(512, activation='relu'),
  110. Dropout(0.5),
  111. Dense(1, activation='sigmoid')
  112. ])
  113.  
  114. # Compile the model
  115. model.compile(
  116. loss='binary_crossentropy',
  117. optimizer='adam',
  118. metrics=['accuracy']
  119. )
  120.  
  121. # Image data generator for training and validation datasets
  122. train_datagen = ImageDataGenerator(rescale=1./255)
  123. val_datagen = ImageDataGenerator(rescale=1./255)
  124.  
  125. train_generator = train_datagen.flow_from_directory(
  126. train_dir,
  127. target_size=(150, 150),
  128. batch_size=20,
  129. class_mode='binary'
  130. )
  131.  
  132. validation_dir = os.path.join(base_dir, 'validation')
  133. validation_generator = val_datagen.flow_from_directory(
  134. validation_dir,
  135. target_size=(150, 150),
  136. batch_size=20,
  137. class_mode='binary'
  138. )
  139.  
  140. # Train the model
  141. history = model.fit(
  142. train_generator,
  143. steps_per_epoch=100,
  144. epochs=30,
  145. validation_data=validation_generator,
  146. validation_steps=50
  147. )
  148. ```
  149.  
  150. ### Analysis
  151.  
  152. - **K-Means Clustering**: The k-means algorithm will group images based on pixel-level similarities. This might not perfectly separate cats and dogs, but it can still provide some interesting clusters. You can visualize the images in each cluster to see if they resemble certain patterns or groups of images.
  153.  
  154. - **Binary Classification CNN**: The CNN model is trained to classify images into two categories (cats and dogs) using labeled data. This approach leverages supervised learning and will generally perform better than unsupervised clustering for this task.
  155.  
  156. ### Conclusion
  157.  
  158. While k-means can provide some insights by grouping images based on pixel-level similarities, it is not suitable for semantic classification tasks like identifying cats and dogs. For such tasks, a supervised learning approach using labeled data with models like CNNs is more effective. However, exploring unsupervised methods can still be valuable for understanding the data and gaining insights into image patterns.
  159.  
  160. Feel free to experiment further by adjusting parameters or trying different clustering algorithms!
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement