My PROJECTS

Optimizing Loan Marketing Strategies Through Predictive Modeling

Technologies Used: Python

GitHub Repository

This project focuses on predicting personal loan acceptance using supervised classification algorithms, specifically K-Nearest Neighbors (KNN) and Logistic Regression. Based on a bank marketing dataset containing customer demographic and financial attributes—such as age, income, education, and account balances—the analysis includes thorough preprocessing, normalization, and feature selection to prepare the data for modeling. Both models were trained and evaluated using accuracy, confusion matrices, and classification reports, with Logistic Regression providing interpretable coefficients and KNN capturing non-linear decision boundaries. The project demonstrates how predictive modeling can support financial institutions in identifying likely loan applicants, enabling more efficient marketing and risk management strategies.

Fraud Detection

Technologies Used: Python

GitHub Repository

This project tackles the challenge of credit card fraud detection by applying supervised machine learning techniques, specifically Random Forest and K-Nearest Neighbors (KNN). Using a highly imbalanced dataset of anonymized transaction records, the analysis incorporates essential preprocessing steps, including scaling, under-sampling, and handling class imbalance to improve model performance. Both models were evaluated using metrics such as precision, recall, F1-score, and confusion matrices, with a strong focus on minimizing false negatives, critical in fraud detection. Random Forest demonstrated robust performance and interpretability through feature importance analysis, while KNN served as a comparative baseline. This project highlights the application of ensemble and distance-based classifiers in high-stakes, real-world classification tasks involving rare event detection.

Toyota Corolla Resale Price Prediction Using Linear Regression

Technologies Used: Python

GitHub Repository

This project applies regression techniques to predict the resale price of Toyota Corolla vehicles based on various car features such as age, mileage, fuel type, engine size, and transmission. Utilizing linear regression and exploratory data analysis, the study identifies key predictors influencing car prices and evaluates model performance using metrics like R-squared and RMSE. Feature engineering, outlier removal, and correlation analysis were conducted to improve model accuracy and interpretability. The final model provides actionable insights for dealerships and individual sellers, supporting data-driven decision-making in vehicle pricing and valuation.

Customer Segmentation with K-Means Clustering for Targeted Marketing

Technologies Used: Python

GitHub Repository

This project applies unsupervised machine learning to segment customers based on purchasing behavior using K-Means clustering. Drawing from a retail dataset containing features such as age, annual income, and spending score, the analysis employs exploratory data analysis and visualization techniques to uncover patterns in customer demographics and shopping tendencies. The optimal number of clusters was determined using the Elbow Method, followed by model training and interpretation of the resulting segments. Each cluster represents a distinct customer profile, aiding businesses in tailoring marketing strategies, enhancing customer engagement, and improving retention. This project demonstrates practical use of clustering algorithms for data-driven decision-making in customer analytics.

Toxicity Detection in Political Subreddits

Technologies Used: Python

GitHub Repository

This project developed a supervised machine learning pipeline to detect toxic language in political discussions on Reddit, integrating TF-IDF and BERT-based embeddings to capture both lexical patterns and contextual nuance. Using a binary classification approach trained on the Jigsaw Toxic Comment dataset and Reddit-specific features, models such as Logistic Regression, XGBoost, and Random Forest were evaluated. The Random Forest classifier achieved the most balanced performance, with 94% accuracy and an F1-score of 0.77. To address class imbalance, SMOTE oversampling was applied. The final model was deployed across over 219,000 Reddit comments, revealing significant regional and ideological disparities in toxicity rates. Insights from this analysis informed the identification of toxic subreddits, influential entities in polarizing discourse, and provided foundational tools for improving online moderation strategies.

Exploring the Influence of Socioeconomic and Demographic Factors on Mental Health Outcomes in Canada

Technologies Used: Python

GitHub Repository

This project investigates the impact of socioeconomic and demographic factors—including gender, age, immigrant status, visible minority status, education level, household income, and LGBTQ2+ identity, on mental health outcomes in Canada, using data from the 2022 Mental Health and Access to Care Survey. Through statistical analysis with Python, including chi-squared tests and data visualizations such as heatmaps and mosaic plots, the study identifies significant disparities in the prevalence of mood disorders, anxiety, PTSD, and ADHD across different population groups. Notably, higher rates of mental health disorders were observed among women, LGBTQ2+ individuals, and younger adults. The findings highlight the importance of targeted mental health interventions and provide data-driven insights to support more inclusive and equitable public health policies.

R Markdown Website

Technologies Used: R RStudio Shinyapps

RMarkdown Website

The Shiny app developed for this project offers an interactive platform for exploring a penguin dataset through various visualizations and analyses. Users can delve into exploratory data analysis (EDA) using interactive features like bar charts, scatter plots, and line plots. The app allows for customization, enabling users to select different variables or categories for comparison and interact with the visualizations by hovering over data points or zooming in on specific areas. Through this interactive exploration, users can generate insights and hypotheses about relationships between variables, distribution across categories, and trends within the data.

Topic Modeling

Technologies Used: Python

GitHub Repository

Conducted a project using Python to implement Latent Dirichlet Allocation (LDA) for topic modeling. The project involved data preprocessing, LDA model training with 20 topics, visualization of topic-word distributions with PyLDAvis, and analysis of topic trends over time. I also addressed challenges faced, compared LDA with other text clustering techniques, and determined the optimal number of topics (15) using Silhouette Scores and Elbow Curves. This project highlighted my expertise in text analysis and demonstrated the practical application of LDA for uncovering topics within a document corpus.

Time Series

Portland and USA unemployment rate (2003-2023)

Technologies Used: Python

GitHub Repository

This project focused on analyzing the unemployment rate data for the Portland area over a 20-year period from January 2003 to January 2023. After preprocessing and converting the date column to datetime format, the highest unemployment rate was identified, which occurred in April 2020 at 13.3%. Additionally, a comparison was made between the Portland and national (USA) unemployment rates, revealing similar trends and identifying April 2020 as the period with the highest unemployment rates for both.

Predictive Analysis: Housing Price

Technologies Used: Python

Kaggle

This project utilized California house price dataset to predict the price of unseen data based on house features. The project involved data cleaning, visualization, and identifying skewness, followed by the application of different models including Linear Regression, Quadratic Regression, Decision Trees, Random Forests, and Support Vector Machines. The Random Forest model stood out with a lower RMSE, higher R-squared (around 0.866), and lower MAE compared to other models. Overall, the Random Forest model emerged as the most effective in predicting house prices among the evaluated models in this project and used for the final prediction.

Deep Learning: Cat & Dog image prediction

Technologies Used: Python

GitHub Repository

Used the Cat and Dog dataset to develop a neural network based on the AlexNet architecture. Employing tools like Keras, TensorFlow, and scikit-learn, the project achieved a remarkable 92% accuracy on unseen data. The AlexNet model, with its 8-layer deep convolutional neural network, excelled in feature extraction. Challenges were addressed through techniques like data augmentation and hyperparameter tuning. Visual aids, including accuracy and loss plots, provided insights into the model's learning process. The final evaluation showcased the model's effectiveness, emphasizing its robust performance in image classification.

Tableau: Major Crime Indicators In Toronto

Technologies Used: Tableau Python

Tableau Interactive Dashboard

Utilized the "Major Crime Indicators" (MCI) dataset from the Toronto Police Service to analyze crime patterns in Toronto from 2014 to 2023. By employing Python for data preprocessing and Tableau for visualization, the project aims to offer actionable insights for law enforcement, policymakers, and local communities to enhance public safety strategies. Through interactive maps and charts generated by Tableau, users can explore crime distribution across neighborhoods, with each crime category represented by a distinct color. The analysis provides a deeper understanding of crime patterns, facilitating targeted strategies against major crimes in the city.

Employment distribution: Power BI

Technologies Used: Power BI

Utilized the "Employee Database.xls" file in Microsoft Power BI, the project visualized the distribution of occupations across genders within the company. The analysis revealed a stark gender imbalance, with men holding a higher number of positions in nearly every occupation, except for Clerical Support roles. Men accounted for approximately 70% of all occupations, while women represented only around 30%. Particularly notable gender disparities were observed in roles such as product assemblers, product testers, Electronics Technicians, and Executives, where men significantly outnumbered women.

Data visualization: QlikView

Technologies Used: QlikView

Used Qlik's platform to analyze films available on Netflix dataset, showcasing the proportion of genres. Drama and Action/Adventure emerged as the most prevalent genres, demonstrating their popularity within the Netflix audience. This project underscores Qlik's effectiveness in extracting meaningful insights from data to drive informed decision-making and enhance overall performance.

HR Employment Tracker Application & IT Request Application:

Technologies Used: Power Apps Power Automate Sharepoint List

Associated with My internship in Department of National Defence: Designed and implemented Power Apps-based solutions for HR and IT departments, accelerating new employee data entry by 80% and reducing IT service request processing time by 80%. Improved data integrity by migrating to SharePoint lists and enhanced usability with Power BI visualizations, leading to better insights into HR metrics and IT efficiency.

Medical Appointment Application & Boot Reimbursement Request Application:

Technologies Used: Power Apps Power Automate Sharepoint List

Associated with My internship in Department of National Defence: Created Power Apps applications to streamline boot reimbursement requests and manage medical appointments, reducing wait times by up to 90% and simplifying approval workflows by 80%. Integrated data with SharePoint and automated email updates, enhancing overall efficiency and data integrity.

Automations Using Power Automate & ATI Request Automation:

Technologies Used: Power Automate

Associated with My internship in Department of National Defence: Leveraged Power Automate to streamline tasks such as automated emails, SharePoint updates, and ATI request processes, achieving a 95% reduction in manual effort and an 80% improvement in processing time, significantly boosting productivity and user satisfaction. Developed an application for position change requests, automating data extraction from PDFs into SharePoint lists. This reduced human errors by 90% and expedited approval processes, leading to faster decision-making and streamlined workflows.

Visualization Using Power BI:

Technologies Used: Power BI

Associated with My internship in Department of National Defence: Developed numerous Power BI reports and dashboards, utilizing DAX, Power BI Desktop, and Power BI Service. Extracted data from semantic models linked to SharePoint lists, providing clear, actionable insights that improved decision-making across departments.

DATA ANALYTICS TOOLS

Utilized a range of Python libraries, including but not limited to Pandas, NumPy, Matplotlib, Seaborn for data analysis and visualization, Scikit-learn for machine learning algorithms, TensorFlow and Keras for neural networks, SimPy for simulation, and NLTK for natural language processing.

Utilized R and RStudio, leveraging libraries such as Dplyr, Ggplot2, Shiny, Knitr, Caret, RMarkdown, and StringR for data manipulation, advanced data visualization and interactive web applications.

Utilized Tableau to transform raw data into actionable insights, created dynamic dashboards and interactive reports, explored trends and patterns, and uncovered hidden correlations. Visualized data to analyze sales performance, track key metrics, and explore market trends.

Utilized Microsoft Power BI to transform raw data into visually compelling reports and interactive dashboards that facilitate informed decision-making. Leveraged Power BI to streamline business processes, identify trends, and drive organizational growth.

Utilized Oracle SQL for comprehensive relational database management, encompassing tasks such as data querying, manipulation, and optimization within Oracle Database environments. Leveraged advanced Oracle SQL features to execute complex queries, perform efficient data manipulation operations, and optimize database performance.

Implemented SQL with a focus on database concepts, normalization, and relational principles. Utilized queries to analyze schema, ensure data consistency, and establish connections between tables using joins. Applied normalization techniques to organize data efficiently, reducing redundancy and enhancing integrity. .

Utilized SAS (Statistical Analysis System) for advanced statistical analysis, data cleaning, and predictive modeling. Applied procedures such as PROC MEANS, PROC FREQ, and PROC REG to summarize data, identify trends, and build regression models. Leveraged SAS for large-scale data manipulation, hypothesis testing, and generating actionable insights in structured datasets. Demonstrated proficiency in using SAS Studio and Base SAS programming for real-world data science applications.

Utilized MongoDB for creating non-relational databases. Expertise in NoSQL principles. Structured databases with schema-less design, ensuring efficient storage and retrieval. Leveraged MongoDB's query language and aggregation framework to extract insights from extensive datasets. Demonstrated proficiency in NoSQL databases, highlighting aptitude for diverse data management challenges.

Utilized LaTeX for creating professional, publication-quality documents, including academic reports, research papers, and presentations. Demonstrated proficiency in formatting complex mathematical equations, tables, citations, and structured layouts using packages such as `amsmath`, `graphicx`, and `biblatex`. Ensured consistency and typographic excellence in technical documentation and journal papers preparation.

Utilized Mendeley as a reference management tool for organizing, annotating, and citing scholarly articles and research papers. Streamlined the research workflow by generating citations in various styles (APA, MLA, IEEE) and integrating seamlessly with Microsoft Word and LaTeX for automated bibliography creation. Collaborated with peers through shared libraries to support group research and literature reviews.

Utilized Microsoft Azure for deploying and managing cloud solutions. Deployed virtual machines, created storage solutions, and configured networking resources. Implemented security measures using Azure's identity and access management services for data protection. Overall, utilized Azure for architecting, deploying, and managing cloud solutions, ensuring optimal performance and reliability.

Utilized Visual Studio Code for coding and sharing progress of projects on GitHub. Utilized Visual Studio Code for code editing, debugging, and version control integration. Leveraged Visual Studio Code's robust extensions marketplace to customize and enhance the development environment according to project requirements. Utilized Visual Studio Code to streamline coding workflows, enhance productivity, and deliver high-quality code.

Utilized the Databricks platform for developing Python projects, leveraging its collaborative environment and Apache Spark integrations for debugging and performance monitoring in data analytics.

Utilized Qlik Sense to develop interactive and insightful data visualizations, uncovering trends and patterns within datasets, and enhancing data-driven decision-making processes.

Utilized Neo4j for developing graph databases, demonstrating proficiency in leveraging its graph-based data storage and querying capabilities.

Utilized PyCharm effectively to showcase simulation codes through animation, and demonstrated adeptness in coding and debugging Python scripts, ensuring the seamless visualization of simulation processes.

Utilized Git for version control and collaboration on GitHub, ensuring seamless code management and team coordination. Leveraged Git's branching and merging capabilities to streamline development workflows and facilitate code reviews, enhancing project efficiency and quality.

Utilized SPSS Statistics for social science survey projects. Leveraged SPSS's tools and features to analyze survey data, generate descriptive statistics, and conduct hypothesis testing, contributing to insightful research findings.

Utilized Power Apps to streamline data entry, automate workflows, and reduce processing times by 90%. Designed and implemented several applications during my internship in Department of National Defence, including HR Employment Tracker, IT Request, Accessories Reimbursement Request, Medical Appointment, and Position Change Approval applications.

Leveraged Power Automate to automate tasks such as email notifications, SharePoint list updates, and Power BI data management. Achieved a 95% reduction in manual effort, significantly enhancing productivity and efficiency across multiple processes.

Migrated and managed data across various applications using SharePoint lists during my internship in Department of National Defence. Improved data integrity and usability for HR metrics, ATI requests, and medical appointments by integrating data into SharePoint Lists and automating workflows with Power Automate.

Utilized DAX (Data Analysis Expressions) in Power BI to create advanced visualizations and perform intricate calculations, such as running totals and dynamic measures. Developed interactive dashboards that provided deep insights into data, enabling precise analysis and the identification of trends and patterns. Enhanced data interpretation and supported informed decision-making through advanced reporting techniques.

Utilized Agile framework methodologies to streamline project workflows and enhance collaboration within cross-functional teams during my internship in Department of National Defence. Applied Agile methodologies and Gantt charts to effectively manage projects, ensuring timely task completion and milestone achievement.

Used to create My Portfolio Website and a basic online shopping store.

Honors & Awards