The Challenge

What was broken.

At the institution, awards data was scattered across dozens of CSV files delivered at irregular intervals. Each file came from a different source, with different column orders, different date formats, and sometimes different column names entirely. Administrators had to open each file, visually scan for structure, and manually copy records into a central spreadsheet. One misaligned column could mean hundreds of awards assigned to the wrong people.

Duplicates were the bigger headache. The same award would appear in multiple files, sometimes with slightly different formatting, sometimes identical. The team tracked "already processed" records in a separate spreadsheet, but it was never fully up to date. When auditors asked for a clean list of all awards issued, nobody could produce one with confidence.

The volume made manual processing unsustainable. What started as a manageable trickle of award files had grown into regular bulk deliveries containing thousands of records. The team needed a system that could ingest messy data and detect problems on its own, then produce audit-ready output without requiring anyone to become a spreadsheet expert.

Inconsistent CSV Formats

Every data source delivered files with different column orders, naming conventions, and date formats. Automated processing was impossible without manual cleanup first.

Untracked Duplicates

The same awards appeared in multiple files. Tracking what had already been processed lived in a spreadsheet that was perpetually out of date.

Error-Prone Manual Entry

Copy-pasting thousands of records by hand led to misaligned columns, dropped rows, and data that couldn't be trusted when auditors came calling.

No Audit Trail

There was no centralized database or processing log. When asked "how many awards were issued last quarter?", the answer required hours of manual reconciliation.

The Approach

How we solved it.

ZIP Upload & Automatic Extraction

Built a file upload pipeline that accepts ZIP archives containing multiple CSV files. The system extracts every CSV, validates file integrity, and queues each one for processing. Administrators can upload an entire delivery in one action instead of handling files individually.

The upload handler checks file size limits and validates archive structure. Corrupt or non-CSV entries are rejected before any data touches the database.

Intelligent Column Detection

Implemented automatic column mapping that identifies AwardId, UserName, and IssueDate regardless of column order or naming variations. The system reads header rows, matches against known patterns, and flags files where it can't confidently identify required fields. No more manual column alignment.

Duplicate Detection & Flagging

Every incoming record is checked against the existing database before insertion. Duplicates are flagged but never silently dropped: administrators see exactly which records were skipped and why, with enough context to verify each decision.

The deduplication logic handles edge cases like whitespace differences, date format mismatches, and case-insensitive username comparisons to catch near-duplicates that exact matching would miss.

Summary Reports & Session Management

After every batch upload, the system generates a detailed summary report: total records processed, new records inserted, duplicates flagged, errors encountered. Session management with cookie-based "Remember Me" authentication keeps administrators logged in across sessions without re-entering credentials.

Technologies Used

PHP MySQL CSV Processing ZIP Archive Handling Session Management Cookie-based Auth

Features

What it actually does.

ZIP Archive Upload

Upload entire ZIP archives containing multiple CSV files. The system extracts, validates, and queues every file automatically. No more processing files one at a time.

Auto Column Detection

Identifies AwardId, UserName, and IssueDate columns regardless of naming or order. Files with unrecognizable structures are flagged for manual review.

Duplicate Detection

Every record is cross-checked against existing data before insertion. Near-duplicates with whitespace or formatting differences are caught and flagged transparently.

Upload Summary Reports

After every batch, a detailed report breaks down total records, new inserts, duplicates skipped, and errors encountered. Every decision the system made is visible and verifiable.

Session & Auth Management

Secure login with session management and cookie-based "Remember Me" functionality. Administrators stay authenticated across sessions without repeated logins.

Data Validation & Error Reporting

Every row is validated against expected formats before processing. Malformed dates, missing required fields, and structural anomalies are caught and reported, not silently ignored.

In Action

See it in action.

Batch Upload Interface Screenshot

Batch Upload Interface

The main upload screen where administrators drag and drop ZIP archives. Progress indicators show extraction, validation, and processing status in real time as each CSV file is handled.

Duplicate Detection Report Screenshot

Duplicate Detection Report

Each flagged duplicate is shown next to the existing record it matched. Administrators can verify the decision and override when needed.

Upload Summary Dashboard Screenshot

Upload Summary Dashboard

The post-processing summary lists records processed, new inserts, duplicates skipped, and validation errors. It's a complete audit trail for every batch upload.

Results

The numbers speak.

Faster Processing

Award uploads that took hours now complete in minutes

Duplicate Awards

Automated detection catches every repeat before insertion

Audit-Ready

Data Output

Clean, verified records with full processing history

Batch

Multi-File Processing

Upload once, process every CSV in the archive

We used to dread award season. Every delivery meant hours of sorting through CSVs, checking for duplicates by hand, and praying we didn't miss anything. Now we upload the ZIP, review the summary, and move on. The system catches things we never would have found manually.

Data Administrator Awards Processing, The Institution

Insights

What I learned.

Messy data needs transparent handling, not silent fixes

The first instinct was to auto-correct formatting issues and silently skip duplicates. Administrators pushed back hard. They needed to see every decision the system made, which records were inserted, which were flagged, and why. Trust in automated processing comes from transparency, not from hiding the messy parts. The summary reports became the most valued feature.

Column detection is harder than parsing

Parsing a well-formed CSV is straightforward. Figuring out which column is which when every source uses different headers ("award_id" vs "AwardID" vs "Award Number" vs just "ID") required building a fuzzy matching system that could handle real-world naming chaos. Edge cases in column detection consumed more development time than the entire upload pipeline.

Batch processing unlocks workflow changes

Before the system, administrators processed files as they arrived, one at a time, interrupting other work. Once batch processing was reliable, the team shifted to a scheduled workflow: collect deliveries throughout the week, upload everything Friday afternoon, review the summary Monday morning. The tool didn't just speed up a task; it changed how the team organized their time.

Awards Processing & Tracking System

What was broken.

Inconsistent CSV Formats

Untracked Duplicates

Error-Prone Manual Entry

No Audit Trail

How we solved it.

ZIP Upload & Automatic Extraction

Intelligent Column Detection

Duplicate Detection & Flagging

Summary Reports & Session Management

Technologies Used

Drowning in messy data files?

What it actually does.

ZIP Archive Upload

Auto Column Detection

Duplicate Detection

Upload Summary Reports

Session & Auth Management

Data Validation & Error Reporting

See it in action.

Batch Upload Interface

Duplicate Detection Report

Upload Summary Dashboard

The numbers speak.

What I learned.

Messy data needs transparent handling, not silent fixes

Column detection is harder than parsing

Batch processing unlocks workflow changes

Want this for
your institution?

Awards Processing & Tracking System

What was broken.

Inconsistent CSV Formats

Untracked Duplicates

Error-Prone Manual Entry

No Audit Trail

How we solved it.

ZIP Upload & Automatic Extraction

Intelligent Column Detection

Duplicate Detection & Flagging

Summary Reports & Session Management

Technologies Used

Drowning in messy data files?

What it actually does.

ZIP Archive Upload

Auto Column Detection

Duplicate Detection

Upload Summary Reports

Session & Auth Management

Data Validation & Error Reporting

See it in action.

Batch Upload Interface

Duplicate Detection Report

Upload Summary Dashboard

The numbers speak.

What I learned.

Messy data needs transparent handling, not silent fixes

Column detection is harder than parsing

Batch processing unlocks workflow changes

Related projects.

Automated Email System

Payroll Processing System

Want this foryour institution?

Want this for
your institution?