Massih

View on GitHub

Description of the image massiHUB

Coding Sandbox

:: Improve College-Going and College-Readiness

Division of Research and Evaluation, TN Dept of Education

This repository comprises the data and computer code (in Python, in a Jupyter notebook) for an assessment project with the objective of identifying measures that can evaluate high school graduates’ preparedness for collegiate studies.

Researchers have a great interest in enhancing college readiness for all youth and increasing the rate of postsecondary enrollment among graduates. The community is equally passionate about improving its students’ postsecondary success. For many of these students, college is not perceived as a viable option. Researchers posit that the district can improve college readiness among all district students by first identifying high schools to serve as “models of excellence” and then learning from these exemplars about best practices for producing “college-ready” students who enroll and persist in postsecondary education. Therefore, the available data was examined to recommend a model school.

Utilizing principal component analysis (aka dimension reduction), viable criteria were selected, upon which model schools were chosen. It appears that the most distinguishing factors to utilize for scoring are:

However, regarding data quality, there were both missing values and duplicated entries in the dataset. The strategy to manage the data was deliberated and implemented. Further research avenues (disaggregating the data within sub-regions) were suggested.


:: Fatal and Serious Injury Crashes

Tennessee Integrated Traffic Analysis Network, TN Dept of Safety & Homeland Security

This dashboard comprises near real-time interactive information on fatal and serious injury collisions on Tennessee roadways for the current and previous years.

The dashboard allows for a nuanced analysis of fatal and serious crashes through interactive filters and graphs powered by a SQL database and Tableau. Users can analyze trends and patterns by location, road conditions, time of day, victim demographics, and other parameters. The dashboard provides actionable insights to inform traffic safety policies, enforcement initiatives, infrastructure improvements, public education campaigns, and other countermeasures aimed at reducing crash-related deaths and injuries on Tennessee roads.

This dashboard was presented at the 2019 LifeSavers Conference, an annual injury prevention and traffic safety conference organized by the National Safety Council. (Link)


:: Identify Reward Performance and Reward Progress Schools

Division of Strategy and Data, TN Dept of Education

This repository comprises R programming scripts and data for an evaluation project with the objective of identifying schools with exceptional performance. Historically, the Tennessee Department of Education recognized the top 10 percent of schools in the state as Reward schools. Reward Performance schools constitute the top 5 percent of schools in terms of achievement as quantified by a one-year success rate. Reward Progress schools comprise the top 5 percent of schools regarding growth as gauged by the Tennessee Value-Added Assessment System.

Utilizing the provided data, Reward Performance and Reward Progress schools among K-8 institutions were identified via statistical analysis. This allows recognition of high-performing schools based on rigorous quantitative metrics of student outcomes and growth. The R-based analysis enables reproducible identification of Reward schools. Further examination of pedagogical and administrative practices at these exceptional schools could illuminate drivers of student success.


:: Tennessee Traffic Fatality

Tennessee Integrated Traffic Analysis Network, TN Dept of Safety & Homeland Security

These two dashboards, present comparative year-to-date and historical statistics on road fatalities in the state, divided into individual vehicle-related and driver/passenger-related components.

The dashboards utilize a SQL database and Tableau analytics platform to enable interactive analysis of fatal crash trends over time. The visualizations incorporate current fatality data from the present year along with historical yearly totals going back over a decade. Segmenting fatalities into vehicle-related vs. driver/passenger-related factors allows for a nuanced epidemiological understanding of crash mortality patterns.

The comparative dashboards employ data visualization principles to intuitively showcase fatality trends and disseminate actionable road safety insights to diverse stakeholders. Further analysis could relate the dashboard metrics to various interventions such as seatbelt campaigns, drunk driving crackdowns, improved road design, vehicle safety regulations, and other evidence-based policies aimed at reducing traffic-related mortality.


:: Land Use Change in Tennessee

Nashville Software School

The objective of this project was to elucidate patterns and hypothetical rationales for land use changes in Tennessee. It comprised two sub-projects:

A slideshow reviewing the methodology and results is included with the phase II set of files.

The visualizations depicted various correlations between land availability and valuation across counties. Analyses suggested the predominant criteria influencing land value are proximity to major metropolitan areas and agricultural profitability.

In summary, this project leveraged statistical programming languages and multivariate analysis to glean scientific insights into drivers of land use trends. The interactive dashboards and dimension reduction models provide data-driven understanding of how exurban dynamics shape land use patterns over time. Further research could relate the identified factors to policies on zoning, land conservation, transportation infrastructure and urban development.


:: Logistic Regression

Nashville Software School

This Jupyter notebook was developed to exhibit the concept of logistic regression, and how to implement this technique in Python. The code generates a logistic regression model, prints the model summary, exports and prints the coefficients, calculates predicted probabilities, and visualizes the logistic regression model along with the original data. The logistic regression model is applied to a binary response variable based on an explanatory variable. The visualization helps to understand how well the logistic regression model fits the data.


~ / M y _ L i f e / i n _ Z e r o e s _ a n d _ O n e $
During my undergraduate years, as 16 & 32-bit PCs became widespread, I started QBASIC and shortly afterward, QuickBASIC. With genetics and statistics dominating my routine studies, digging into statistical programming turned into my top interest (selected scripts). The big leap was starting to code for experimental designs and ANOVA, but before making significant progress, I learned about MSTAT-C, which sidetracked my interest in coding.
Meanwhile, Windows 3.x had already gotten into the market, and it was evident that the MS-DOS-based programming era was almost over. It didn’t take me long to find that Microsoft has released Visual Basic, so I grabbed it and started migrating my old QB codes to the new platform to get my feet wet. With the upgrade to Windows 9x generation, I felt my programming skills had become obsolete, thus gradually abandoned programming and focused on mastering spreadsheets (started with Lotus 1-2-3 but quickly switched to Excel) for data wrangling and exploration.
By starting postgraduate studies after a long gap, I was urged to use statistical tools again. The need for coding skills was raised again with Windows 98 in its glorious days and Windows 2000 and XP coming up, and statistical packages (Minitab, SPSS, and SAS) have become more user-friendly than ever, and learning resources were pretty well populated and accessible. Yet I could hardly be happier when found out that Visual Basic has become the kernel of macro language in Microsoft Office, specifically Excel. The VBA scripting and SAS package thus became my main tools for programming and statistical analysis for nearly 10 years (selected ANOVA templates).
After relocating to the United States, I initially joined the IT-agribusiness sector, but after a while realized that my knowledge in statistics and coding was perilously out of date. This founded an incentive to join a data science boot camp and add Python and R languages to my set of skills; followed by SQL and Tableau for business intelligence purposes after rejoining the job market. With a pursuing data analysis career in a cancer genetics knowledgebase, looking into aspects of data querying with REST-API and GraphQL are my new areas of interest.
~ / F u n _ F @ c t $
- My first email, registered in 1999, was an MS-DOS-based powered by Pegasus Mail.
- I was among the first round of Blogger users who were invited by Google to register for Gmail in 2004 (Story). Despite a wide range of usernames being available (including both my first and last name individually), I preferred to coin the very fourteen-character username I already had with Yahoo since 1999 (registered a few days after the Pegasus one) as my online identity.
- Since then, I have owned email addresses with .net, .com, .ac.uk, .ac.ir, .gov, .org, and .edu domains; in that order.
- I am an Inbox Zero.