⭐ Featured Project

Longitudinal Survey Data Analysis (8-Year, Multilingual)

Standardized and analyzed 8 years of multilingual learner survey data to enable longitudinal insights, trend analysis, and data-driven program evaluation.

Technologies & Tools

PythonGoogle Translate APIPandasExcel

📊 Impact: 8 years of multilingual data standardized for insights

Problem

Over multiple years, Tech Goes Home collected learner feedback through annual surveys, resulting in a large but fragmented dataset spanning 8 years, 3 survey instruments per year, and multiple languages. While rich in insight potential, this data was difficult to analyze due to inconsistent formats, language barriers, and evolving survey structures. Survey data existed in separate files across years, questions and response formats varied over time, surveys were conducted in multiple languages, and long-term trends were difficult or impossible to analyze reliably.

My Role

I owned the project end-to-end, including designing the longitudinal data model, translating multilingual survey responses into English, cleaning and standardizing historical datasets, aligning survey questions across years, and producing analysis-ready datasets for downstream use. This work laid the data foundation for future reporting, dashboards, and research.

Solution

Designed and executed a longitudinal analysis pipeline that standardized, translated, and cleaned multi-year, multilingual survey data into a single, analysis-ready dataset. This work enabled year-over-year trend analysis, deeper insights into learner outcomes, and provided a clean foundation for dashboards and advanced analytics.

Architecture

High-Level Data Flow

1

High-Level Data Flow: (1) Raw survey datasets collected across years and languages

2

Non-English responses translated to English via Google Translate APIs

3

Data cleaned and standardized (columns, response formats, codes)

4

Survey questions aligned across years for comparability

5

Clean datasets produced for analysis and visualization. Used Python (pandas) for data processing workflows, Google Translate APIs for translation, and Excel for spot checks and sanity validation.

Key Design Decisions

🔹Translation Before Analysis - Ensured consistent semantic meaning across languages, enabling unified analysis
🔹Standardized Question Mapping - Mapped evolving survey questions to consistent analytical fields while preserving historical integrity
🔹Reusable Clean Datasets - Designed outputs to be reusable for dashboards and future research, not one-off analysis files

Results

  • Enabled 8-year longitudinal analysis of learner outcomes
  • Made multilingual survey data usable for organization-wide insights
  • Improved reliability of trend analysis and reporting
  • Provided clean datasets for dashboards and advanced analytics
  • Supported evidence-based evaluation of program impact over time

Technologies Used

PythonPandasGoogle Translate APIExcelData CleaningETL