top of page

Data Analysis

hOW WELL CAN WE CLASSIFY hOUSEPLANTS?

An Analysis of Classifying Wilted and Healthy Plants Using Machine Learning

This project focuses on building a machine learning model to classify houseplants as either "healthy" or "wilted." With data sourced from Kaggle, the goal was to aid plant owners by developing an automated system that could help detect early signs of plant stress or health issues.
 

To accomplish this, I conducted exploratory data analysis (EDA) on images of houseplants, followed by building and testing a model that could reliably predict the health of plants based on image inputs.

Indoor Flowerpots

This project involved:

Data Collection & Processing:

​

  • Data Source: Images of healthy and wilted houseplants were sourced from the Kaggle dataset Healthy and Wilted Houseplant Images.

  • Data Preparation: The images were split into training and test sets with an 80/20 split to ensure proper model validation. All images were resized to 256x256 pixels for consistency.

  • Handling Imbalanced Data: I reviewed the class distribution to check for imbalances between the healthy and wilted plant categories, finding that the data was fairly balanced, with no need for additional handling techniques.

​

Data Analysis:

​

  • Conducted exploratory analysis on the dataset to inspect the distribution of images across the "healthy" and "wilted" categories.

  • Visualized the class distribution using bar charts to ensure balanced data.

  • Randomly sampled images from both classes to get a qualitative feel for the dataset and to understand the variety in plant images.

​

Model Training:

​

  • Utilized TensorFlow to load and preprocess the image dataset, splitting it into training and validation sets.

  • A CNN-based machine learning model was built to classify the houseplants into the two categories.

  • Implemented techniques such as image augmentation and batch processing to improve model performance and avoid overfitting.

​

Visualization & Reporting:

​

  • Visualized class distribution using Python’s matplotlib.

  • Displayed samples of images to illustrate the appearance of healthy versus wilted plants.

  • Developed a comprehensive report documenting the data preprocessing steps, model architecture, and performance metrics, including accuracy and validation results.

Project insights:

This project was both challenging and insightful. While plant classification may seem straightforward, the variations in image quality and environmental factors like lighting presented unique challenges for model accuracy.

​

This project was a great learning experience, helping me hone my skills in data processing, model training, and image classification. I look forward to applying these techniques to future projects!

​

If you're interested you can read about more of my projects, or explore the Python code used to build and train the model below:

Plants on the Window
bottom of page