
Palmy Chomchat Silarat
Data Scientist, People Analytics, Survey Research, Psychometrics, Consumer & Product Insights
Report Samples
Technical Papers
Resume
Predicting Teacher Attrition with Machine Learning: Insights from Washington State
3 min read | Download Full Paper
The Challenge: A Growing Teacher Attrition CrisisTeacher attrition is a persistent and costly issue, affecting education quality, student outcomes, and workforce stability. In Washington State, nearly 8% of teachers leave the public school system annually, with the pandemic further exacerbating turnover rates. But why are teachers leaving? And more importantly, what can be done to retain them?To answer this, I applied advanced machine learning techniques to uncover the most influential factors driving teacher attrition across nine Educational Service Districts (ESDs) in Washington.The Solution: Using XGBoost to Predict Workforce TrendsRather than relying on traditional methods like logistic regression, which assumes linear relationships between factors, I implemented an XGBoost decision tree model, known for its ability to handle complex, nonlinear relationships and provide feature importance rankings.Key Technical Highlights:- Data Engineering & Processing: Cleaned and merged six years of workforce data (2017–2023) from multiple public sources, using SQL and R.- Machine Learning Pipeline: Built and fine-tuned an XGBoost model to predict attrition risk, leveraging R’s xgboost package for optimized performance.- Feature Importance Analysis: Used gain-based ranking to pinpoint the most critical drivers of teacher turnover.Key Findings: What Drives Teacher Attrition?Salary Dominates, But It’s Not the Only FactorAcross all experience levels and districts, salary was the strongest predictor of attrition.
However, for early-career teachers, salary alone wasn’t enough, housing affordability played a critical role.Housing Costs Push Early-Career Teachers OutIn districts with skyrocketing home prices, teachers with less than 5 years of experience were leaving at alarming rates. This suggests that school districts may benefit from teacher housing incentives or mortgage assistance programs.Crime Rate & Community Safety Matter More Than ExpectedIn some districts, crime rate ranked higher than housing costs as a predictor of attrition.
Teachers in high-crime areas were more likely to leave, even if salaries were competitive.
This finding emphasizes the need for safe school environments and community engagement initiatives to support retention.Mid- and Late-Career Teachers Respond DifferentlyMid-career teachers (5–15 years experience): More sensitive to benefits and retirement plans than salary increases.Late-career teachers (15+ years experience): Attrition risk was more tied to unemployment trends, suggesting that these teachers may leave when better job opportunities arise elsewhere.Real-World Impact: How Can This Be Used?
The findings of this study provide actionable insights for school districts, HR teams, and policymakers. Instead of relying on a one-size-fits-all approach to teacher retention, decision-makers can tailor strategies based on data-driven insights:
👉 Early-career teachers → Offer housing assistance and relocation incentives.
👉 Mid-career teachers → Improve health insurance and retirement benefits.
👉 Late-career teachers → Provide leadership pathways to keep experienced teachers engaged.
👉 High-crime districts → Invest in school safety initiatives to improve retention.By integrating machine learning into workforce planning, education leaders can proactively identify attrition risks and implement targeted interventions, ultimately creating a more stable, engaged, and effective teaching workforce.Final Thoughts
This project is an example of how machine learning and workforce analytics can transform HR decision-making. While my model was built for teacher attrition, these techniques can be applied to any industry facing workforce retention challenges.If you’re interested in how predictive analytics can improve workforce strategy, let’s connect!
Why Businesses Must Rethink How They Analyze Customer & User DataAvoiding Misleading Insights with Better Data Science Methods
3 min read | Download Full Paper
The Problem: Are We Optimizing for the Wrong Things?Many businesses rely on predictive models to understand customer behavior, user engagement, and workforce trends. These models often rank factors by importance, helping companies decide where to invest resources, whether it’s improving a product feature, adjusting marketing strategies, or refining employee retention efforts.But here’s the issue:
❌ Most conventional methods don’t account for correlations between variables.
❌ This leads to misleading conclusions about what really drives business outcomes.For example, let’s say a company wants to understand what makes users stay engaged with an online platform. A basic analysis might find that age is the biggest factor, but in reality, age is just highly correlated with tech-savviness or subscription history. If the company invests in age-based personalization instead of improving onboarding for new users, they may completely miss the real driver of engagement.This study challenges the flawed assumptions behind traditional data science methods and introduces better approaches to ensure businesses make data-driven decisions that actually reflect reality.Why Conventional Analytics Fall Short
Most businesses use basic statistical models (such as multiple regression) to analyze customer and user data. These models try to answer:
👉 “Which factors have the biggest impact on our key business metric?”The problem is, these methods don’t handle complex relationships well, especially when different factors are naturally linked.Consider a company analyzing employee retention. They might find that employees who work remotely stay longer. But is it really remote work that’s keeping them? Or is it that remote workers tend to be more experienced, better compensated, or work in less stressful roles?Without properly accounting for correlations, businesses risk investing in the wrong solutions, spending millions improving remote work policies when, in reality, better career development opportunities might have been the true retention driver.The Solution: Smarter Data Science for Smarter DecisionsTo tackle this, I conducted a Monte Carlo simulation study, testing different ways to rank variable importance in large datasets. Instead of relying on conventional methods that assume each factor is rather independent, I evaluated more advanced techniques that:
✅ Properly decompose shared variance (so correlated factors don’t mislead results).
✅ Identify true key drivers behind user behavior and workforce trends.
✅ Ensure companies focus on data-backed priorities, not statistical illusions.Here’s what I found:
🚀 Traditional variable importance methods are highly sensitive to correlations, often exaggerating the impact of certain factors.
🚀 Newer techniques, like the LMG method, provide a more reliable way to analyze complex customer and workforce data.
🚀 Businesses that don’t adjust their analytics approach risk making expensive miscalculations.Real-World Business Implications
If you’re using data to guide business decisions, here’s what you need to know:Basic statistical models can mislead you.If two factors are correlated, traditional models might overemphasize one and ignore the other.
This can lead to misdirected investments and missed opportunities.
✅ Not all variables are equally important, even if the numbers say so.A variable might rank high in importance just because it has a high variance, not because it’s truly impactful.
Without careful analysis, businesses could chase the wrong priorities.
✅ Better modeling leads to better business strategy.Companies with large customer datasets or workforce analytics should adopt more sophisticated methods to avoid costly errors.Techniques like LMG provide a clearer picture of what really drives user engagement, retention, and satisfaction.Final Thoughts: The Future of Data-Driven Business DecisionsThis study highlights a major blind spot in how many companies approach user experience, customer behavior, and workforce analytics.To stay competitive, businesses must:
✔️ Move beyond basic statistical models that oversimplify complex relationships.
✔️ Use more advanced techniques to accurately rank what truly drives engagement and retention.
✔️ Make smarter investments based on insights that actually reflect reality.If your organization relies on large datasets to make decisions, let's talk about how better analytics can lead to better business outcomes. 🚀Why This Matters for Hiring Managers & Executives
This project showcases my ability to:
✔️ Translate complex data science problems into actionable business insights.
✔️ Design and test advanced statistical models for real-world applications.
✔️ Help businesses avoid data-driven mistakes by improving how they interpret results.If your company is looking to improve customer insights, workforce analytics, or data-driven decision-making, let’s connect.
Feature Selection: Gathering and Predicting Behavioral Outcomes!
2 min read | Download Full Paper
The Problem: It's Hard to Predict Behaviors Because Behaviors are Hard to Measure!In today’s data-driven world, businesses need effective tools to make accurate predictions about people’s behaviors, whether it’s understanding employee performance, customer behavior, or identifying trends that affect business outcomes. While existing tools often focus on general characteristics, there's a gap in understanding the specific traits that drive these behaviors. One example is the Autism Quotient (AQ-10), a tool designed to predict autism diagnoses. This type of behavioral analysis can be a game-changer when adapted for business purposes.The Solution: Be Creative With Data Segmentation
I took a deeper look at the AQ-10 to see how behavioral data gathered by multiple sections of the tools, like attention to detail, communication, and imagination could predict behaviors in different contexts as opposed to relying on the entire score of the instrument.I also examined how factors like gender and age could impact those predictions. By focusing on specific traits and their effects, I developed a more targeted, data-backed approach to predicting outcomes. This analysis allows us to get a clearer picture of how behaviors influence decisions and actions.Conclusion: Better Data, Better Prediction
The study revealed that attention to detail, communication, and imagination significantly impact behavior prediction, while attention switching didn’t show much effect. This was crucial in demonstrating how certain traits have a stronger influence on outcomes, while others may not be as important.This insight enables businesses to tailor their processes with greater precision. For instance, consider having a multi-dimensional assessment tool when gathering data for predictive purposes.Final Thoughts
This work highlights the power of behavioral data analysis to improve predictive models in numerous fields. By breaking down complex human behaviors into measurable traits, businesses can create more accurate, targeted strategies, whether for improving recruitment processes, optimizing marketing efforts, or refining customer service.I am committed to ethical and effective data segmentation and feature selection for predictive abilities, let's connect if you are interested!
Applying Psychometrics for Better Decision-Making
2 min read | Download Full Paper