my_logo

massiHUB

About Me

LinkedIn profile
What I am good at
What I deliver
What am I most proud of?
Why hiring me?

Data Analytics Portfolio

Tennessee Integrated Traffic Analysis Dashboards
Cancer Genomics Data QC Automation
Student Life Data Integration
Data Quality Testing in Agribusiness Software
Research Data Analysis and Experimental Design Projects
Mini-Projects

What I am good at:

Data Storytelling & Insight Translation: I have a natural ability to translate raw, complex data into clear, actionable insights. Where others see spreadsheets and messy datasets, I instinctively see narratives and opportunities for improvement. This talent has made me the go-to person for dashboard development and reporting transformations.
Process Automation & Workflow Optimization: I’m consistently recognized for my knack for streamlining inefficient processes. Whether it’s automating data pipelines or reducing reporting time through code, I thrive on building solutions that save time and reduce human error.
Technical Versatility Across Domains: With hands-on experience in Python, R, SQL, SAS, Tableau, Power BI, and even REST APIs and GraphQL, I easily navigate between tools and platforms — a skill that makes cross-functional collaboration smoother and projects more resilient. ُWhen needed, I actively develop new technical skills by pairing AI tools with my current expertise.
Mentorship & Cross-Functional Coordination: From mentoring students and analysts to liaising between data curators, owners, and stakeholders, I excel at relationship management. I naturally align people and processes, ensuring clarity and mutual understanding even in technically dense contexts.

What I deliver:

Actionable Data Products That Drive Decisions: At the University of Tennessee, Knoxville, I delivered integrated, automated dashboards and reporting workflows that enabled leaders to make faster, data-informed decisions. My work didn’t just add numbers to a dashboard — it provided clarity in ambiguity.
Time & Cost Savings Through Automation: At the Center for Translational Data Science, I cut data processing time by 50% through automated cleaning and validation systems. This wasn’t just a workflow improvement; it freed up critical staff time for higher-value tasks and reduced data integrity issues in vital cancer genomics research.
Operational Efficiency Through Digital Transformation: At the State of Tennessee, I replaced over 100 pages of paper reports with interactive dashboards. This transition wasn’t merely digital; it transformed how teams consumed information, prioritized actions, and monitored safety outcomes.
Knowledge Infrastructure for Future Growth: Beyond immediate project deliverables, I consistently leave organizations better prepared for the future. From building a cell culture unit at the Research Institute of Forests and Rangelands to authoring technical documentation at the Center for Translational Data Science, I ensure that improvements are scalable and repeatable. I leverage AI tools to tailor my expertise to meet new requirements.

What I am most proud of:

Award-Winning Dashboard Development: At the State of Tennessee, my creation of interactive dashboards modernized reporting, improved user experience, and earned me Pay for Performance recognition — a rare accolade in a government data role.
Establishing an End-to-End Data Quality System in Bioinformatics Research: At the Center for Translational Data Science, I not only automated cleaning pipelines but also resolved systemic data integrity issues in genomic databases, directly improving data reliability for cancer research.
Building a Biotech Research Unit From Scratch: At the Research Institute of Forests and Rangelands, I established a cell culture unit and led conservation projects on endangered species. These efforts culminated in a peer-reviewed Springer publication — a career milestone blending science and operational leadership.
Mentorship That Lasts: Over the years, I’ve mentored numerous students, several of whom progressed into leadership roles. I measure success not only by my own achievements but by those I’ve helped lift along the way.

Why hire me?

I’m drawn to organizations that value clarity, efficiency, and impact, and I deliver all three. My track record in public health, education, biotech, and traffic safety analytics proves that I adapt quickly, produce consistently, and elevate the people and systems around me.

I offer a rare blend of technical rigor, process intuition, and people-centered leadership. I’m not just someone who builds dashboards; I build systems, relationships, and narratives that turn data into decisive action.

If you’re seeking a data professional who thrives in fast-paced, cross-functional environments, who’s as comfortable cleaning a messy dataset as presenting to executives, and who aligns with mission-driven, outcome-focused cultures, that’s me.

Portfolio Projects

Tennessee Integrated Traffic Analysis Dashboards

Tools: Tableau, SQL
Key Features: Dynamic filtering, Multi-year traffic and incident trend analysis

Built 4 interactive dashboards replacing 100+ pages of monthly reports, driving operational insights for state traffic analysis:
Designed KPIs and interactive reports for law enforcement and policy teams.
Earned a Pay-for-Performance (P4P) award for operational efficiency.

Highlight: Fatal and Serious Injury Crashes

This dashboard comprises near-real-time interactive information on fatal and serious injury collisions on Tennessee roadways for the current and previous years.

The dashboard enables a nuanced analysis of fatal and serious crashes through interactive filters and graphs, powered by a SQL database and Tableau. Users can analyze trends and patterns by location, road conditions, time of day, victim demographics, and other parameters. The dashboard provides actionable insights to inform traffic safety policies, enforcement initiatives, infrastructure improvements, public education campaigns, and other countermeasures aimed at reducing crash-related deaths and injuries on Tennessee roads.

The dashboard was presented at the 2019 LifeSavers Conference, an annual conference on injury prevention and traffic safety organized by the National Safety Council.

Cancer Genomics Data QC Automation

Tools: Python, Linux, REST-API
Key Features: Automated QC pipelines, API-based data retrieval & validation, End-user support & documentation

Automated large-scale, unstructured genomic data quality control at the University of Chicago’s Center for Translational Data Science.
Cut data QC processing time by ~50%.
Developed reusable Python scripts for error detection, data migration, and cleaning.
Supported users via help desk, script writing, and documentation.

Highlights: GDC tutorial videos for tools OncoMatrix, Mutation Frequency, and ProteinPaint

Student Life Data Integration

Tools: Python, Power BI, Semantic models, AI prompting

Key Features: Live data connections, Multivariate analysis, Role-based access dashboards

Developed an automated data pipeline integrating data from multiple university departments for operational insights. Designed dashboards to track student engagement, program outcomes, and operational KPIs. Conducted student satisfaction assessment using multiple correspondence analysis.
Automated data cleaning and integration process from disparate systems.
Created interactive dashboards, reducing reporting time for stakeholders.
Increased data visibility across Student Life Effectiveness initiatives.

Highlights: Multiple Correspondence Analysis for student satisfaction data (Jupyter Notebook)

This report presents an analysis of student feedback data regarding study support, academic accessibility, and staff responsiveness. The analyst employed two statistical approaches: descriptive analysis examining frequency distributions and inferential analysis using Multiple Correspondence Analysis (MCA).

Key findings reveal that Net Promoter Scores (NPS) were consistently high across all student groups, though senior students showed significantly lower survey participation. Many students hadn’t used tutoring services, resulting in incomplete tutor-related responses. The MCA identified two main variation components: tutor usage and staff responsiveness (first component), and NPS with class standing (second component).

Notable patterns emerged showing freshmen ranked NPS lower than sophomores and juniors, suggesting satisfaction increases with experience. Tutor-related metrics dominated the data variation, masking other parameters’ relationships. The analysis revealed potential correlations between academic success, academic support, and clear communication, though statistical significance wasn’t formally tested.

The report recommends engaging senior students for their valuable long-term perspective, tracking students longitudinally to observe changing perceptions, and considering a simplified NPS scale given the skewed distribution toward high scores. The analyst suggests dedicated statistical tests are needed to properly assess correlations between non-tutor metrics, as current relationships remain masked by tutor-usage data.

Data Quality Testing in Agribusiness Software

Tools: Excel VBA
Key Features: Automated data checks, Cross-platform behavior validation

Performed QA analysis at EFC Systems Inc (Now Ever.Ag), identifying algorithmic inconsistencies across platforms.
Designed test scenarios mimicking real-world use.
Discovered critical discrepancies between iOS and Windows software versions.

Research Data Analysis and Experimental Design Projects

Tools: Excel, VBA, SAS
Key Features: Multivariate analysis, Experimental designs

Oversaw 9 research projects, supervising statistical analysis and experimental design.
Taught statistics and experimental methodology: See my SAS learning sample scripts for experimental design and multivariate analysis.
Developed low-cost protocols to optimize student research budgets.

Highlights: Trends in Breeding and Biotechnology Research: A Review. (Presentation)

This report reviews quantitative aspects of scientific research on trees, plants, and rangeland species in Iran, analyzing what has been studied and how. Comparing government-funded projects with university dissertations revealed clear differences in priorities: students favor quicker lab-based topics like genetic analysis, while institutions cover broader practical areas. Notable gaps were identified, with ecologically and commercially significant species like oak and Damask Rose receiving uneven attention across research domains. A panel of experts recommended prioritizing endangered, valuable, and ecosystem-critical species, proposing actions ranging from conservation to stress-response studies. The report concludes by calling for better organization of existing findings and a clearer roadmap for future research investment. This report was prepared in Persian (with an English translation available in the slideshow notes) and presented at the Research Institute of Forests and Rangelands in Iran in 2009.

Publications: ORCID profile

Miscellaneous Projects:

Land Use Change in Tennessee - Nashville Software School

Tools: Python (SciKit), R (ShinyApp)

Key Features: Principal Component Analysis, k-Nearest Neighbors

The objective of this project was to elucidate patterns and hypothetical rationales for land use changes in Tennessee. It comprised two sub-projects:
- Phase I (powered by R & Shiny App): Visualizing land use alterations utilizing an interactive dashboard.
- Phase II (powered by Python & SciKit): Identifying determinants contributing to land use changes via dimension reduction techniques.
A slideshow reviewing the methodology and results is included with the phase II set of files.
The visualizations depicted various correlations between land availability and valuation across counties. Analyses suggested the predominant criteria influencing land value are proximity to major metropolitan areas and agricultural profitability.
In summary, this project leveraged statistical programming languages and multivariate analysis to glean scientific insights into drivers of land use trends. The interactive dashboards and dimension reduction models provide data-driven understanding of how exurban dynamics shape land use patterns over time. Further research could relate the identified factors to policies on zoning, land conservation, transportation infrastructure, and urban development.

Improve College-Going and College-Readiness - Division of Research and Evaluation, TN Dept of Education

Tools: Python

Key Features: Principal Component Analysis

This repository comprises the data and computer code (in Python, in a Jupyter notebook) for an assessment project to identify measures that can evaluate high school graduates’ preparedness for collegiate studies.
Researchers have a great interest in enhancing college readiness for all youth and increasing the rate of postsecondary enrollment among graduates. The community is equally passionate about improving its students’ postsecondary success. For many of these students, college is not perceived as a viable option. Researchers posit that the district can improve college readiness among all district students by first identifying high schools to serve as “models of excellence” and then learning from these exemplars about best practices for producing “college-ready” students who enroll and persist in postsecondary education. Therefore, the available data was examined to recommend a model school.
Utilizing principal component analysis (aka dimension reduction), viable criteria were selected, upon which model schools were chosen. It appears that the most distinguishing factors to utilize for scoring are:
- The standardized grades (mathematics, SAT assessments)
- Enrolling for college. The two metrics that seem most relevant are: Enrolling for a 4-year degree (i.e., a strong education), and enrolling shortly after high school (an index of passion for education).
However, regarding data quality, there were both missing values and duplicated entries in the dataset. The strategy to manage the data was deliberated and implemented. Further research avenues (disaggregating the data within sub-regions) were suggested.

Logistic Regression - Nashville Software School

Tools: Python

This Jupyter notebook was developed to exhibit the concept of logistic regression and how to implement this technique in Python. The code generates a logistic regression model, prints the model summary, exports and prints the coefficients, calculates predicted probabilities, and visualizes the logistic regression model along with the original data. The logistic regression model is applied to a binary response variable based on an explanatory variable. The visualization helps to understand how well the logistic regression model fits the data.

~ / m y _ l i f e / i n _ z e r o e s _ a n d _ o n e $

I stepped into the programming world in my teenage years with a CASIO FX-702P calculator. Equipped with a collection of built-in math functions and a BASIC programming language environment, it was a suitable tool for those who were interested in automating simple to semi-complex math problems. When 8-bit home computers became popular (for dual use of gaming and programming), my pick was a ZX-Spectrum +2, pretty much for the same trend of programming interests.

During my undergraduate years, as 16 & 32-bit PCs emerged, I started QBASIC and shortly afterward, QuickBASIC. With genetics and statistics dominating my routine studies, digging into statistical programming turned into my top interest (selected scripts). The big leap was starting to code for experimental designs and ANOVA, but before making significant progress, I learned about MSTAT-C, which sidetracked my interest in coding.

Meanwhile, Windows 3.x had already gotten into the market, and it was evident that the MS-DOS-based programming era was almost over. It didn’t take me long to find that Microsoft has released Visual Basic, so I grabbed it and started migrating my old QB codes to the new platform to get my feet wet. With the upgrade to Windows 9x generation, I felt my programming skills had become obsolete, thus gradually abandoned programming and focused on mastering spreadsheets (started with Lotus 1-2-3 but quickly switched to Excel) for data wrangling and exploration.

By starting postgraduate studies after a long gap, I was urged to use statistical tools again. The need for coding skills was raised again with Windows 98 in its glorious days and Windows 2000 and XP coming up, and statistical packages (Minitab, SPSS, and SAS) have become more user-friendly than ever, and learning resources were pretty well populated and accessible. Yet I could hardly be happier when found out that Visual Basic has become the kernel of macro language in Microsoft Office, specifically Excel. The VBA scripting and SAS package thus became my main tools for programming and statistical analysis for nearly 10 years (selected ANOVA templates).

After relocating to the United States, I initially joined the IT-agribusiness sector, but after a while realized that my knowledge in statistics and coding was perilously out of date. This founded an incentive to join a data science boot camp and add Python and R languages to my set of skills; followed by SQL and Tableau for business intelligence purposes after rejoining the job market.

Since then, my data analysis career journey in road safety, cancer genomics, and education industry has prompted exploring business management and data wrangling tools such as REST-API, GraphQL, PowerBI, Smartsheet, albeit with the aid of AI tools, and I’m looking forward to reconciling my cross-domain experience to develop solutions for seamless and robust data streaming.

. . / f u n _ f @ c t $

- My first email, registered in 1999, was an MS-DOS-based powered by Pegasus Mail.

- I was among the first round of Blogger users who were invited by Google to register for Gmail in 2004 (Story). Despite a wide range of usernames being available (including both my first and last name individually), I preferred to coin the very fourteen-character username I already had with Yahoo since 1999 (registered a few days after the Pegasus one) as my online identity.

Massih Forootan