Palmy Chomchat Silarat

Data Scientist, People Analytics, Survey Research, Psychometrics, Consumer & Product Insights

Predicting Teacher Attrition with Machine Learning: Insights from Washington State

3 min read | Download Full Paper

The Challenge: A Growing Teacher Attrition CrisisTeacher attrition is a persistent and costly issue, affecting education quality, student outcomes, and workforce stability. In Washington State, nearly 8% of teachers leave the public school system annually, with the pandemic further exacerbating turnover rates. But why are teachers leaving? And more importantly, what can be done to retain them?To answer this, I applied advanced machine learning techniques to uncover the most influential factors driving teacher attrition across nine Educational Service Districts (ESDs) in Washington.The Solution: Using XGBoost to Predict Workforce TrendsRather than relying on traditional methods like logistic regression, which assumes linear relationships between factors, I implemented an XGBoost decision tree model, known for its ability to handle complex, nonlinear relationships and provide feature importance rankings.Key Technical Highlights:- Data Engineering & Processing: Cleaned and merged six years of workforce data (2017–2023) from multiple public sources, using SQL and R.- Machine Learning Pipeline: Built and fine-tuned an XGBoost model to predict attrition risk, leveraging R’s xgboost package for optimized performance.- Feature Importance Analysis: Used gain-based ranking to pinpoint the most critical drivers of teacher turnover.Key Findings: What Drives Teacher Attrition?Salary Dominates, But It’s Not the Only FactorAcross all experience levels and districts, salary was the strongest predictor of attrition.
However, for early-career teachers, salary alone wasn’t enough, housing affordability played a critical role.
Housing Costs Push Early-Career Teachers OutIn districts with skyrocketing home prices, teachers with less than 5 years of experience were leaving at alarming rates. This suggests that school districts may benefit from teacher housing incentives or mortgage assistance programs.Crime Rate & Community Safety Matter More Than ExpectedIn some districts, crime rate ranked higher than housing costs as a predictor of attrition.
Teachers in high-crime areas were more likely to leave, even if salaries were competitive.
This finding emphasizes the need for safe school environments and community engagement initiatives to support retention.
Mid- and Late-Career Teachers Respond DifferentlyMid-career teachers (5–15 years experience): More sensitive to benefits and retirement plans than salary increases.Late-career teachers (15+ years experience): Attrition risk was more tied to unemployment trends, suggesting that these teachers may leave when better job opportunities arise elsewhere.Real-World Impact: How Can This Be Used?
The findings of this study provide actionable insights for school districts, HR teams, and policymakers. Instead of relying on a one-size-fits-all approach to teacher retention, decision-makers can tailor strategies based on data-driven insights:
👉 Early-career teachers → Offer housing assistance and relocation incentives.
👉 Mid-career teachers → Improve health insurance and retirement benefits.
👉 Late-career teachers → Provide leadership pathways to keep experienced teachers engaged.
👉 High-crime districts → Invest in school safety initiatives to improve retention.
By integrating machine learning into workforce planning, education leaders can proactively identify attrition risks and implement targeted interventions, ultimately creating a more stable, engaged, and effective teaching workforce.Final Thoughts
This project is an example of how machine learning and workforce analytics can transform HR decision-making. While my model was built for teacher attrition, these techniques can be applied to any industry facing workforce retention challenges.
If you’re interested in how predictive analytics can improve workforce strategy, let’s connect!

Why Businesses Must Rethink How They Analyze Customer & User DataAvoiding Misleading Insights with Better Data Science Methods

3 min read | Download Full Paper

The Problem: Are We Optimizing for the Wrong Things?Many businesses rely on predictive models to understand customer behavior, user engagement, and workforce trends. These models often rank factors by importance, helping companies decide where to invest resources, whether it’s improving a product feature, adjusting marketing strategies, or refining employee retention efforts.But here’s the issue:
❌ Most conventional methods don’t account for correlations between variables.
❌ This leads to misleading conclusions about what really drives business outcomes.
For example, let’s say a company wants to understand what makes users stay engaged with an online platform. A basic analysis might find that age is the biggest factor, but in reality, age is just highly correlated with tech-savviness or subscription history. If the company invests in age-based personalization instead of improving onboarding for new users, they may completely miss the real driver of engagement.This study challenges the flawed assumptions behind traditional data science methods and introduces better approaches to ensure businesses make data-driven decisions that actually reflect reality.Why Conventional Analytics Fall Short
Most businesses use basic statistical models (such as multiple regression) to analyze customer and user data. These models try to answer:
👉 “Which factors have the biggest impact on our key business metric?”
The problem is, these methods don’t handle complex relationships well, especially when different factors are naturally linked.Consider a company analyzing employee retention. They might find that employees who work remotely stay longer. But is it really remote work that’s keeping them? Or is it that remote workers tend to be more experienced, better compensated, or work in less stressful roles?Without properly accounting for correlations, businesses risk investing in the wrong solutions, spending millions improving remote work policies when, in reality, better career development opportunities might have been the true retention driver.The Solution: Smarter Data Science for Smarter DecisionsTo tackle this, I conducted a Monte Carlo simulation study, testing different ways to rank variable importance in large datasets. Instead of relying on conventional methods that assume each factor is rather independent, I evaluated more advanced techniques that:
✅ Properly decompose shared variance (so correlated factors don’t mislead results).
✅ Identify true key drivers behind user behavior and workforce trends.
✅ Ensure companies focus on data-backed priorities, not statistical illusions.
Here’s what I found:
🚀 Traditional variable importance methods are highly sensitive to correlations, often exaggerating the impact of certain factors.
🚀 Newer techniques, like the LMG method, provide a more reliable way to analyze complex customer and workforce data.
🚀 Businesses that don’t adjust their analytics approach risk making expensive miscalculations.
Real-World Business Implications
If you’re using data to guide business decisions, here’s what you need to know:
Basic statistical models can mislead you.If two factors are correlated, traditional models might overemphasize one and ignore the other.
This can lead to misdirected investments and missed opportunities.
✅ Not all variables are equally important, even if the numbers say so.
A variable might rank high in importance just because it has a high variance, not because it’s truly impactful.
Without careful analysis, businesses could chase the wrong priorities.
✅ Better modeling leads to better business strategy.
Companies with large customer datasets or workforce analytics should adopt more sophisticated methods to avoid costly errors.Techniques like LMG provide a clearer picture of what really drives user engagement, retention, and satisfaction.Final Thoughts: The Future of Data-Driven Business DecisionsThis study highlights a major blind spot in how many companies approach user experience, customer behavior, and workforce analytics.To stay competitive, businesses must:
✔️ Move beyond basic statistical models that oversimplify complex relationships.
✔️ Use more advanced techniques to accurately rank what truly drives engagement and retention.
✔️ Make smarter investments based on insights that actually reflect reality.
If your organization relies on large datasets to make decisions, let's talk about how better analytics can lead to better business outcomes. 🚀Why This Matters for Hiring Managers & Executives
This project showcases my ability to:
✔️ Translate complex data science problems into actionable business insights.
✔️ Design and test advanced statistical models for real-world applications.
✔️ Help businesses avoid data-driven mistakes by improving how they interpret results.
If your company is looking to improve customer insights, workforce analytics, or data-driven decision-making, let’s connect.

Feature Selection: Gathering and Predicting Behavioral Outcomes!

2 min read | Download Full Paper

The Problem: It's Hard to Predict Behaviors Because Behaviors are Hard to Measure!In today’s data-driven world, businesses need effective tools to make accurate predictions about people’s behaviors, whether it’s understanding employee performance, customer behavior, or identifying trends that affect business outcomes. While existing tools often focus on general characteristics, there's a gap in understanding the specific traits that drive these behaviors. One example is the Autism Quotient (AQ-10), a tool designed to predict autism diagnoses. This type of behavioral analysis can be a game-changer when adapted for business purposes.The Solution: Be Creative With Data Segmentation
I took a deeper look at the AQ-10 to see how behavioral data gathered by multiple sections of the tools, like attention to detail, communication, and imagination could predict behaviors in different contexts as opposed to relying on the entire score of the instrument.
I also examined how factors like gender and age could impact those predictions. By focusing on specific traits and their effects, I developed a more targeted, data-backed approach to predicting outcomes. This analysis allows us to get a clearer picture of how behaviors influence decisions and actions.Conclusion: Better Data, Better Prediction
The study revealed that attention to detail, communication, and imagination significantly impact behavior prediction, while attention switching didn’t show much effect. This was crucial in demonstrating how certain traits have a stronger influence on outcomes, while others may not be as important.
This insight enables businesses to tailor their processes with greater precision. For instance, consider having a multi-dimensional assessment tool when gathering data for predictive purposes.Final Thoughts
This work highlights the power of behavioral data analysis to improve predictive models in numerous fields. By breaking down complex human behaviors into measurable traits, businesses can create more accurate, targeted strategies, whether for improving recruitment processes, optimizing marketing efforts, or refining customer service.
I am committed to ethical and effective data segmentation and feature selection for predictive abilities, let's connect if you are interested!

Applying Psychometrics for Better Decision-Making

2 min read | Download Full Paper

What is Psychometrics?Psychometrics is the science of measuring things that are not directly observable, like intelligence, personality traits, or job performance potential. It involves designing, testing, and validating assessments to ensure they are reliable and meaningful. Businesses and researchers use psychometric methods to develop better surveys, hiring tools, and customer feedback forms that lead to more accurate and fair decision-making.What is an Item and Why It MattersAn "item" refers to an individual question or statement in a survey or assessment. Each item is designed to measure a specific aspect of the underlying construct, such as a personality trait or cognitive ability. However, not all items perform equally well. Some may be too vague, redundant, or misaligned with the construct they are supposed to measure. This is why psychometric validation is crucial.For businesses, having well-validated scales ensures that their assessments provide reliable and actionable insights. Whether used for hiring, customer feedback, or research, a poorly designed scale can lead to incorrect conclusions and misguided decisions. Psychometric validation helps ensure that assessments capture meaningful data, improve user experience, and maintain fairness across different groups.Understanding Item PerformanceWhen working with assessment tools, ensuring that each item is meaningful and reliable is critical. In this study, I applied Item Response Theory (IRT) to evaluate the Brief Autism Quotient (AQ-10), a widely used screening tool for Autism diagnosis eligibility. My goal was to determine how well each item captured the underlying trait it was meant to measure.By using IRT models such as the 2-Parameter Logistic (2PL) model for dichotomous responses and the Graded Response Model (GRM) for polytomous responses, I identified discrepancies in item performance. Certain items had low discrimination values, meaning they did not effectively differentiate between individuals at different levels of the trait. Addressing these weaknesses ensures assessments provide accurate and actionable insights.I also examined model fit using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to determine the best scoring method. These metrics helped confirm that the GRM model was more appropriate for the AQ-10, as it provided better sensitivity and precision across a range of responses.Translating Technical Analysis for Actionable InsightsRaw analysis alone isn’t enough. It must be translated into practical recommendations that stakeholders can use. In this case, I identified that some items in AQ-10 performed inconsistently and suggested improvements. Rather than overwhelming non-technical audiences with statistical jargon, I framed the results around key questions:❓Are all items contributing meaningful information?❓Can the assessment be optimized for different populations?❓What scoring method best captures the nuances of the data?By presenting results in clear visuals and concise summaries, I helped executives and researchers understand the trade-offs between shorter, more practical assessments and maintaining diagnostic accuracy.Collaborating Across TeamsPsychometric work often requires bridging the gap between data science, product teams, and end users. In this project, I worked with researchers, data scientists, and decision-makers to align on goals and implementation. I recommended that organizations using AQ-10 consider adopting polytomous scoring to enhance sensitivity or explore revised item structures for greater reliability.By ensuring clear communication and actionable takeaways, I helped transform raw statistical analysis into insights that drive better decision-making.