Predicting Boat Listing Conversion: An End-to-End Marketplace Propensity Model

From Listings to Leads: Building a Conversion Propensity Model for a Marine Marketplace

After completing a sales analytics dashboard for a pizza shop, I wanted to challenge myself and tackle a more advanced project. Something that mirrors how data science teams operate inside digital marketplaces.

In this project, I built an end-to-end conversion propensity model with the goal of predicting whether a boat listing would generate a qualified inquiry within a 7-day window.

It simulates the type of internal machine learning workflows used by marketplace platforms to optimize ranking, sales prioritization, and inventory performance.

The Business Problem

Marine marketplaces host thousands of listings all at the same time.

But that DOES NOT mean that they will all perform equally.

Some listings may quickly attract high buyer interest. All while others sit stale for weeks, even months.

I found this very interesting, which is why I made the goal of the project to answer:

“Can we predict which listings are most likely to generate inquiries? And could we use that signal to improve marketplace operations?”

Building the Dataset (Warehouse Simulation)

Instead of manually constructing a flat dataset, what I did is simulate a real-world warehouse structure.

The dataset was created using SQL by combining:

listings (price, length, year, seller infor)
engagement_events (views, saves, inquiries)
listing_photos (photo counts)

Engagement data was aggregated into rolling 7-day metrics, which then produced one row per listing with all relevant features.

Feature Engineering

Raw listing attributes alone are rarely enough for predicting modeling.

This is why I created several derived features to better capture marketplace behavior:

Boat age (transformed from year)
Price per foot (relative value indicator)
Engagement rate (saves + inquiries per day)
Dealer indicator
Log-trasformed days on site
One-hot encoded categorical values\

These features were designed specifically to show:

Pricing competitiveness
Buyer engagement intensity
Seller credibility
Inventory freshness

Model Selection

I chose Logistic Regression as a baseline model because:

It produces calibrated probability outputs
Coefficients are interpretable
It’s lightweight and production-friendly
It’s appropriate for binary classification problems

A preprocessing pipeline was used to scale numeric variables and encode categorical variables cleanly.

The dataset was split using a stratified 80/20 train-test approach.

Model Performance

This model demonstrated strong discriminatory power, which indicates that listing engagement and quality signals meaningfully influence conversion probability.

Key evaluation metrics included:

ROC-AUC
Precision
Recall
F1 Score

Instead of optimizing for a fixed threshold, I discussed how the decision threshold should align with business objectives (e.g., precision for sales prioritization vs. recall for broad identification).

Key Insights

The strongest drivers of conversion were:

Engagement intensity
Seller rating
Dealer affiliation
Lisiting freshness

Listings that remained active longer showed a significant decrease in the probability of generating an inquiry.

Operational Applications

The model output (predicted probability) could be implemented to:

Rank listings dynamically
Prioritize the top decile for sales outreach
Trigger pricing review for stale inventory
Monitor engagement as an early performance indicator

This makes a shift in the platform, from reactive reporting to proactive organization.

What I’d Improve Next

Future versions of the model could include:

Tree-based ensemble models (Random Forest/Gradient Boosting)
Precision@k for ranking evaluation
Time-decay engagement features
A/B testing ranking improvements
Monitoring model drift in production

Access the Full Modeling Pipeline & SQL Layer

https://github.com/edusoto03/boat-listing-conversion-propensity

This repository contains the full end-to-end modeling workflow, which includes the SQL-based dataset construction, feature engineering and a production-style machine learning pipeline.

Final Reflection

This project represents a different direction for me, from dashboard analytics to applied machine learning.

More importantly, it demonstrates how predictive modeling can directly inform product, sales, and operational decisions within a marketplace environment.

Building this end-to-end, from SQL extraction to business interpretation, strengthened my understanding of how data science functions.