Mohamed Elsayed
Projects
This Dashboard / Report was my submission to a challenge created by Maven Analytics. The goal was to explore a data set containing information about unicorn companies (private companies valued at $1 Billion or more) & to give insights about the data.
I created my dashboard / report to tell a story that highlights the best performing unicorn. The report flows from exploring high level general data about unicorns, then narrows down to exploring the performance of continents, cities, industries, cities, companies & finally the winning unicorn based on the criteria I mention on the report. PDF copy of the report is linked to the image & the title of this project
This is the Capstone project for the Data Analytics Certificate. The task or goal of this project was to analyze a bike-share company’s dataset of almost 6 Million observations to give the marketing team recommendations on how to convert casual users of the service into paying members. I analyzed user patterns, ride times, dates, preferred stations & preferred bike types as the basis of my analysis. I have chosen to use SQL to prepare & manipulate data for this project as I was already familiar with R & wanted to challenge myself in using new tools. I also have chosen to use Tableau for data visualization. Details of all the steps and the tools I used in this analysis are in my GitHub repository linked to the picture or the title of this project. I have included a PDF file of the final report with my findings + an Rmd file containing all the code used.
Harvard MovieLens Capstone
(Machine Learning, Matrix Factorization, R)
This was the first Capstone project I made as part of the Harvard Data Science Certificate. In this project, the goal was to give data insights on the MovieLens public dataset & to improve the performance (RMSE) of a movie recommendation algorithm by a certain amount. I have used several techniques to show the gradual improvement in each model's prediction accuracy, starting with a “Baseline” model and working my way up to techniques like “Data Regularization” & “Matrix Factorization”. To be completely honest, this is not my favorite subject as it’s very complicated, but I have included it because it’s something that I have studied & spent considerable time researching. Details of the code & the final PDF report are in my GitHub repository linked to the image & the title of this project. This repository contains a PDF report, an Rmd file & an R script file.
Harvard COVID Prediction Capstone
(Machine Learning, Prediction Model, Data Visualization, R)
This was the second Capstone project I chose as part of the Harvard Data Science Certificate. The goal of this project was to build a Machine Learning prediction model for any dataset of my choosing. At that time, Alberta was struggling with a severe fourth wave of COVID-19, which prompted national and a few international headlines. So, I built a prediction model to predict deaths due to COVID-19 based on several factors such as gender, age & location in the province. The challenge in this project was that the dataset was severely unbalanced meaning, the percentage of deaths was < 1% of the total case count so the algorithm couldn’t be “trained” using regular methods. To overcome this issue, I have used over-sampling & under-sampling techniques to train the different algorithms used. I have also compared the accuracies of several algorithms like glm, Ctree & Random Forest to achieve the highest prediction accuracy. Details of the code & the final PDF report are in my GitHub repository linked to the image & the title of this project. This repository contains a PDF report, an Rmd file & an R script file.
This is a data wrangling / extraction project that I have completed to solve a problem with inventory traceability. An inventory tracking system for an organization could only generate reports with current inventory status but, never keeps a record of inventory historically. This caused an issue because inventory status could not be tracked over time. So, to solve this problem, I have written this code in R to import inventory reports then merge them into one file. This then could be used in R for visual representation or exported to a spreadsheet. The most challenging part of this project was to detect patterns of product names & properties to be imported correctly into R as reports varied over time. In this example, I have attached what each inventory report looked like before extraction (Weekly Inventory) & what the visualization of inventory tracking over time after extraction on a chart looks like (Extracted Inventory).
Data has been masked to protect data privacy
This project is the one that made me realize how significant data analysis can be in the development of an organization. This organization has kept track of ultrasonic inspections in a report for one of their most profitable products but, could never get any useful insights from that report. They were struggling to keep track of the performance of that product from a quality standpoint & needed to figure out ways to improve product quality by decreasing defect rates.
So when I did some investigation on the report they were using, I found out that this report kept track of defects in a very clear way but it was kept in a "sentence" form like a comment. This definitely prevented any kind of analysis to be made, it was intended just for record keeping.
I have then created a new spreadsheet tailored to that product & the inspection methods used to capture all data that would be required whenever analysis was needed. Then, I have compiled the data for 5 previous years into that report & asked the organization to keep using the new spreadsheet going on forward.
Finally I have performed failure analysis on that product using the spreadsheet I have created & have given them valuable insights that helped them track this product failure rates for the first time in this company's history. Moreover, those insights helped highlight some issues with the production process which led to reducing failure rates by more than 50%.
Please click on the image or the project title for details
Mechanical Properties Prediction Model
(Prediction Model, Linear Regression, Data Visualization, R)
In this project, the company wanted to build a prediction model to optimize the use of raw materials in manufacturing. They needed to reduce costs by reducing the use of the most expensive raw material in the manufacturing process without compromising the mechanical properties of their product. However, the cost of achieving this goal by trial & error would have been enormous as each trial costs around $100,000.
To achieve this goal, I used R to explore the manufacturing data including chemical compositions & the resulting mechanical properties to build a linear regression prediction model.
The result was a 64% reduction in the most expensive raw material used in manufacturing without compromising the mechanical properties.
Details of the code & the final PDF report are in my GitHub repository linked to the image & the title of this project. This repository contains a PDF report & an Rmd file.
Data has been masked to protect data privacy
Analysis of complex wireline data to identify Oil & Gas pay zones and provide a preliminary formation evaluation analysis to customers on wellsites. Examples of different methods, techniques, tools & analysis provided on the page linked to the project title or the image.