Longitudinal Survey Data Analysis (8-Year, Multilingual)
Standardized and analyzed 8 years of multilingual learner survey data to enable longitudinal insights, trend analysis, and data-driven program evaluation.
Technologies & Tools
📊 Impact: 8 years of multilingual data standardized for insights
Problem
Over multiple years, Tech Goes Home collected learner feedback through annual surveys, resulting in a large but fragmented dataset spanning 8 years, 3 survey instruments per year, and multiple languages. While rich in insight potential, this data was difficult to analyze due to inconsistent formats, language barriers, and evolving survey structures. Survey data existed in separate files across years, questions and response formats varied over time, surveys were conducted in multiple languages, and long-term trends were difficult or impossible to analyze reliably.
My Role
I owned the project end-to-end, including designing the longitudinal data model, translating multilingual survey responses into English, cleaning and standardizing historical datasets, aligning survey questions across years, and producing analysis-ready datasets for downstream use. This work laid the data foundation for future reporting, dashboards, and research.
Solution
Designed and executed a longitudinal analysis pipeline that standardized, translated, and cleaned multi-year, multilingual survey data into a single, analysis-ready dataset. This work enabled year-over-year trend analysis, deeper insights into learner outcomes, and provided a clean foundation for dashboards and advanced analytics.
Architecture
High-Level Data Flow
High-Level Data Flow: (1) Raw survey datasets collected across years and languages
Non-English responses translated to English via Google Translate APIs
Data cleaned and standardized (columns, response formats, codes)
Survey questions aligned across years for comparability
Clean datasets produced for analysis and visualization. Used Python (pandas) for data processing workflows, Google Translate APIs for translation, and Excel for spot checks and sanity validation.
Key Design Decisions
Results
- ✓Enabled 8-year longitudinal analysis of learner outcomes
- ✓Made multilingual survey data usable for organization-wide insights
- ✓Improved reliability of trend analysis and reporting
- ✓Provided clean datasets for dashboards and advanced analytics
- ✓Supported evidence-based evaluation of program impact over time