My data highlights
- Camila Matoba
- Mar 22, 2022
- 1 min read
Updated: May 1, 2023
Over the years I had the opportunity to work on multiple data driven projects. I learned a lot across different industries and data challenges. Here's an overview of my highlights. Some projects involve confidential client data, so I cannot publish the code nor dig deeper in the solution. For transparency, some of them are personal projects or something I did in an (unpaid) partnership for my University. These will be explicitly mentioned as [Unremunerated]. Even in this case, I am only publishing the GitHub repository and dataset of the ones that used exclusively public datasets.
Last but not least: I would like to thank all the partners for their time, commitment and of course data. These projects are my highlight not only because of the results, but also because I enjoyed working on them.
Customer satisfaction Text Analysis (2022)
Partner: EY Netherlands, for ***
Industry: Healthcare
Dataset: Interviewing text data, with multiple questions
Goal: Understanding the insights in the data
Challenge: ***
Solution: ***
Tool: Spacy, EmoRoberta, Sumy, LDA (Python)
Analysis of facilities blind spots in the Netherlands (2022)
Partner: InnoBeweegLab [Unremunerated]
Industry: Public affairs, facilities
Dataset: Public data from the Dutch government on neighbourhood level and facilities data scraped.
Goal: Finding specific neighbourhoods that have an unfulfilled need for a certain facility, according to the neighbourhood profile.
Challenge: It is hard to obtain precise data about the facilities locations and types form the whole country.
Solution: The app is available at https://blind-spot-map.herokuapp.com/
Tool: GeoPandas, Heroku, OverpassTurbo
Analysis of road traffic accidents in the UK (2022)
Partner: Technische Universiteit Eindhoven (University project) [Unremunerated]
Industry: Automobile, Public affairs
Dataset: Public Data of road traffic accidents in the UK
Goal: Improving Road Safety
Challenge: Making it as interative and intuitive as possible
Solution: The tool can be accessed at https://github.com/octokami/uk_road_safety
Tool: Dash (python)
Stock prediction from news articles (2022)
Partner: Technische Universiteit Eindhoven [Unremunerated]
Industry: Financial, Publishing
Dataset: Stock Values from Yahoo and News from Kaggle
Goal: Evaluating the effect of sentiment in news about a company in their performance in the stock market.
Challenge: It is difficult to disassociate the external factors.
Solution: https://github.com/octokami/news_stock_market
Tool: GaussianNB (Python)
Breast cancer 3D model prediction (2021)
Partner: MKBLab for XYZ Imaging (Netherlands)
Industry: Healthcare and Research
Dataset: Multiple breast pictures from different angles
Goal: Finding a less invasive method to perform breast cancer detection that does not use radiation
Challenge: For healthcare, the recall must be very high as False Negatives have extremely undesirable consequences.
Solution: Making a 3D model out of the pictures and a model that moves to detect abnormalities
Tool: Tochvision (python) News article (in Dutch): https://www.cursor.tue.nl/nieuws/2021/juni/week-4/borstkanker-opsporen-met-3d-fotografie/
Parking peak hours prediction (2021)
Partner: Park Now (Scotland) [Unremunerated] Industry: Automobile
Dataset: 3 years of parking transactions
Goal: Predicting peak hours for parking locations, so that the end-user can avoid them
Challenge: Covid-19 year outliers
Solution: A Pipeline of an One-hot-encoder with Random Forest Regressor
Tool: sklearn (Python)
Supermarket stock forecast (2019): Bachelor thesis project
Partner: Samsung Cello (Logistics sector of Samsung Brazil) [Unremunerated] Industry: Logistics Dataset: 1.5 years of groceries transactions within one supermarket.
Goal: predict consumption in a chain of supermarkets to avoid zero-level stock disruption and improve profits by avoiding waste
Challenge: ***
Solution: ***
Tools: ARIMA (python)
Data integration and analysis (2019)
Partner: Philips (Brazil)
Industry: Healthcare
Dataset: Information about the medical equipment and costumer satisfaction from SAP, SalesForce, QlikView, QlikSense and Tableau
Goal: A tool developed with the Service Managers team used to present certain service quality KPIs for customers, focused on Uptime, OSRT (Onsite Response Time) e ETTR (Elapsed Time To Repair), and MTBF (Mean Time Between Failures). The main need was integrating and validating all data sources into one tool that assisted data driven decisions.
Challenge: ***
Solution: ***
Tools used: Power BI, MS Access, MS Flows
Comments