Data Cleaning Tool

Python CLI and Flask web application that cleans, normalizes, and reformats raw lead data for import into the Convoso dialer platform, deployed on Vercel for team-wide access

11
case managers supported
489
leads processed in first batch
10
seconds per import from 30-60 minutes

Lead Data Cleaning Tool

One-Line Summary: Python CLI and Flask web application that cleans, normalizes, and reformats raw lead data for import into the Convoso dialer platform, deployed on Vercel for team-wide access.

--

Problem Statement

The client's sales operations team imports hundreds of leads per week into Convoso, a cloud-based dialer platform. The raw data arrives in inconsistent formats from multiple sources. Excel exports from case management systems, Salesforce exports, and third-party lead providers. Common problems include:

  • Phone numbers in scientific notation (e.g., 6.20E+19), with country codes, or in mixed formats (dashes, parentheses, dots)
  • Full names in a single column instead of separate First/Last Name fields
  • Special characters ($, /, ') that break Convoso's import parser
  • NULL, N/A, and NAN values scattered throughout
  • Columns that don't match Convoso's expected field names
  • Data that needs to be split by Case Manager for per-manager import workflows

Without cleaning, imports fail or produce corrupted records. The operations team was spending significant time manually fixing spreadsheets before every import.

Solution

Built a comprehensive data cleaning pipeline with two interfaces:

  1. CLI Tool. A command-line tool that reads CSV or Excel files, applies all cleaning transformations, maps columns to Convoso's field schema, and outputs import-ready CSVs. Supports single-file cleaning and split-by-Case-Manager mode that produces one cleaned CSV per manager.

  2. Flask Web Application. A browser-based interface where non-technical team members can upload a file, select a cleaning mode (single or split-by-manager), and download the cleaned output. Split mode produces a ZIP archive of per-manager CSVs. Deployed on Vercel with a serverless Python backend for team access from any machine.

  3. Offline Mode. A Windows batch wrapper that checks for Python, installs dependencies, launches the Flask app locally, and opens the browser. Designed for use when internet access is unavailable.

Tech Stack

LayerTechnology
LanguagePython 3.11
Data Processingpandas, openpyxl
Web FrameworkFlask
FrontendHTML, CSS, JavaScript (vanilla)
DeploymentVercel (serverless Python)
Offline RunnerWindows Batch wrapper
Testingpytest

Key Features

  • Phone Number Normalization. Strips all formatting, removes +1 country code prefix, handles scientific notation (returns empty for invalid numbers), enforces exactly 10 digits.
  • Name Splitting. Automatically detects full-name columns (Name, Full Name, Customer Name, Client Name) and splits into First Name / Last Name. Middle names are dropped.
  • Special Character Removal. Strips $, /, '; replaces commas with periods; removes NULL/N/A/NAN values.
  • Convoso Field Mapping. Auto-detects source columns and maps them to Convoso's 20-field template (First Name, Last Name, Primary Phone, Cell Phone, Email, Company, Address, City, State, Postal Code, Lead Source, Date of Birth, Notes, etc.).
  • Unmapped Column Handling. Any columns that don't match known fields are concatenated into the Notes field with column_name: value formatting, so no data is lost.
  • Split-by-Manager Mode. Reads Excel files with a Case Manager column, splits data into per-manager groups, cleans each independently, and outputs individual CSVs (CLI) or a ZIP archive (web app).
  • Date Normalization. Converts date slashes to dashes for Convoso compatibility.
  • Web Upload Interface. Drag-and-drop file upload, mode selection (single/split), progress feedback, and instant download of cleaned output. 50MB max file size.
  • Vercel Deployment. Serverless Python function with URL rewriting so the web app runs at the root path.

Impact / Metrics

  • 11 Case Managers supported in the initial deployment, processing 489 leads across the first batch
  • Eliminated manual spreadsheet cleanup. what previously took 30-60 minutes per import now takes under 10 seconds
  • Zero import failures since adoption. cleaned data passes Convoso's import validation on first attempt
  • Team-wide access via Vercel deployment. operations staff can clean data without Python installed or CLI knowledge
  • Offline capability for use during internet outages or on locked-down machines

Status

Active / In Production. Deployed on Vercel for team access. CLI tool used for ad-hoc batch processing. The tool is called upon whenever new lead data arrives for Convoso import.

PythonFlaskpandasVercelopenpyxl

More in this category