Predicting Boat Listing Conversion: An End-to-End Marketplace Propensity Model

boat listing project thumbnail
From Listings to Leads: Building a Conversion Propensity Model for a Marine Marketplace

After completing a sales analytics dashboard for a pizza shop, I wanted to challenge myself and tackle a more advanced project. Something that mirrors how data science teams operate inside digital marketplaces.

In this project, I built an end-to-end conversion propensity model with the goal of predicting whether a boat listing would generate a qualified inquiry within a 7-day window.

It simulates the type of internal machine learning workflows used by marketplace platforms to optimize ranking, sales prioritization, and inventory performance.

The Business Problem

Marine marketplaces host thousands of listings all at the same time.

But that DOES NOT mean that they will all perform equally.

Some listings may quickly attract high buyer interest. All while others sit stale for weeks, even months.

I found this very interesting, which is why I made the goal of the project to answer:

“Can we predict which listings are most likely to generate inquiries? And could we use that signal to improve marketplace operations?”

Building the Dataset (Warehouse Simulation)

Instead of manually constructing a flat dataset, what I did is simulate a real-world warehouse structure.

The dataset was created using SQL by combining:

  • listings (price, length, year, seller infor)
  • engagement_events (views, saves, inquiries)
  • listing_photos (photo counts)

Engagement data was aggregated into rolling 7-day metrics, which then produced one row per listing with all relevant features.

boat listing sql snippet

Feature Engineering

Raw listing attributes alone are rarely enough for predicting modeling.

This is why I created several derived features to better capture marketplace behavior:

  • Boat age (transformed from year)
  • Price per foot (relative value indicator)
  • Engagement rate (saves + inquiries per day)
  • Dealer indicator
  • Log-trasformed days on site
  • One-hot encoded categorical values\

These features were designed specifically to show:

  • Pricing competitiveness
  • Buyer engagement intensity
  • Seller credibility
  • Inventory freshness
image 2026 03 01 174116371

Model Selection

I chose Logistic Regression as a baseline model because:

  • It produces calibrated probability outputs
  • Coefficients are interpretable
  • It’s lightweight and production-friendly
  • It’s appropriate for binary classification problems

A preprocessing pipeline was used to scale numeric variables and encode categorical variables cleanly.

The dataset was split using a stratified 80/20 train-test approach.

Model Performance

This model demonstrated strong discriminatory power, which indicates that listing engagement and quality signals meaningfully influence conversion probability.

Key evaluation metrics included:

  • ROC-AUC
  • Precision
  • Recall
  • F1 Score

Instead of optimizing for a fixed threshold, I discussed how the decision threshold should align with business objectives (e.g., precision for sales prioritization vs. recall for broad identification).

boat listing evaluation

Key Insights

The strongest drivers of conversion were:

  • Engagement intensity
  • Seller rating
  • Dealer affiliation
  • Lisiting freshness

Listings that remained active longer showed a significant decrease in the probability of generating an inquiry.

Operational Applications

The model output (predicted probability) could be implemented to:

  • Rank listings dynamically
  • Prioritize the top decile for sales outreach
  • Trigger pricing review for stale inventory
  • Monitor engagement as an early performance indicator

This makes a shift in the platform, from reactive reporting to proactive organization.

What I’d Improve Next

Future versions of the model could include:

  • Tree-based ensemble models (Random Forest/Gradient Boosting)
  • Precision@k for ranking evaluation
  • Time-decay engagement features
  • A/B testing ranking improvements
  • Monitoring model drift in production

Access the Full Modeling Pipeline & SQL Layer

https://github.com/edusoto03/boat-listing-conversion-propensity

This repository contains the full end-to-end modeling workflow, which includes the SQL-based dataset construction, feature engineering and a production-style machine learning pipeline.

Final Reflection

This project represents a different direction for me, from dashboard analytics to applied machine learning.

More importantly, it demonstrates how predictive modeling can directly inform product, sales, and operational decisions within a marketplace environment.

Building this end-to-end, from SQL extraction to business interpretation, strengthened my understanding of how data science functions.