
From Listings to Leads: Building a Conversion Propensity Model for a Marine Marketplace
After completing a sales analytics dashboard for a pizza shop, I wanted to challenge myself and tackle a more advanced project. Something that mirrors how data science teams operate inside digital marketplaces.
In this project, I built an end-to-end conversion propensity model with the goal of predicting whether a boat listing would generate a qualified inquiry within a 7-day window.
It simulates the type of internal machine learning workflows used by marketplace platforms to optimize ranking, sales prioritization, and inventory performance.
The Business Problem
Marine marketplaces host thousands of listings all at the same time.
But that DOES NOT mean that they will all perform equally.
Some listings may quickly attract high buyer interest. All while others sit stale for weeks, even months.
I found this very interesting, which is why I made the goal of the project to answer:
“Can we predict which listings are most likely to generate inquiries? And could we use that signal to improve marketplace operations?”
Building the Dataset (Warehouse Simulation)
Instead of manually constructing a flat dataset, what I did is simulate a real-world warehouse structure.
The dataset was created using SQL by combining:
- listings (price, length, year, seller infor)
- engagement_events (views, saves, inquiries)
- listing_photos (photo counts)
Engagement data was aggregated into rolling 7-day metrics, which then produced one row per listing with all relevant features.

Feature Engineering
Raw listing attributes alone are rarely enough for predicting modeling.
This is why I created several derived features to better capture marketplace behavior:
- Boat age (transformed from year)
- Price per foot (relative value indicator)
- Engagement rate (saves + inquiries per day)
- Dealer indicator
- Log-trasformed days on site
- One-hot encoded categorical values\
These features were designed specifically to show:
- Pricing competitiveness
- Buyer engagement intensity
- Seller credibility
- Inventory freshness

Model Selection
I chose Logistic Regression as a baseline model because:
- It produces calibrated probability outputs
- Coefficients are interpretable
- It’s lightweight and production-friendly
- It’s appropriate for binary classification problems
A preprocessing pipeline was used to scale numeric variables and encode categorical variables cleanly.
The dataset was split using a stratified 80/20 train-test approach.
Model Performance
This model demonstrated strong discriminatory power, which indicates that listing engagement and quality signals meaningfully influence conversion probability.
Key evaluation metrics included:
- ROC-AUC
- Precision
- Recall
- F1 Score
Instead of optimizing for a fixed threshold, I discussed how the decision threshold should align with business objectives (e.g., precision for sales prioritization vs. recall for broad identification).

Key Insights
The strongest drivers of conversion were:
- Engagement intensity
- Seller rating
- Dealer affiliation
- Lisiting freshness
Listings that remained active longer showed a significant decrease in the probability of generating an inquiry.
Operational Applications
The model output (predicted probability) could be implemented to:
- Rank listings dynamically
- Prioritize the top decile for sales outreach
- Trigger pricing review for stale inventory
- Monitor engagement as an early performance indicator
This makes a shift in the platform, from reactive reporting to proactive organization.
What I’d Improve Next
Future versions of the model could include:
- Tree-based ensemble models (Random Forest/Gradient Boosting)
- Precision@k for ranking evaluation
- Time-decay engagement features
- A/B testing ranking improvements
- Monitoring model drift in production
Access the Full Modeling Pipeline & SQL Layer
https://github.com/edusoto03/boat-listing-conversion-propensity
This repository contains the full end-to-end modeling workflow, which includes the SQL-based dataset construction, feature engineering and a production-style machine learning pipeline.
Final Reflection
This project represents a different direction for me, from dashboard analytics to applied machine learning.
More importantly, it demonstrates how predictive modeling can directly inform product, sales, and operational decisions within a marketplace environment.
Building this end-to-end, from SQL extraction to business interpretation, strengthened my understanding of how data science functions.
