top of page

My data highlights

  • Writer: Camila Matoba
    Camila Matoba
  • Mar 22, 2022
  • 1 min read

Updated: May 1, 2023

Over the years I had the opportunity to work on multiple data driven projects. I learned a lot across different industries and data challenges. Here's an overview of my highlights. Some projects involve confidential client data, so I cannot publish the code nor dig deeper in the solution. For transparency, some of them are personal projects or something I did in an (unpaid) partnership for my University. These will be explicitly mentioned as [Unremunerated]. Even in this case, I am only publishing the GitHub repository and dataset of the ones that used exclusively public datasets.


Last but not least: I would like to thank all the partners for their time, commitment and of course data. These projects are my highlight not only because of the results, but also because I enjoyed working on them.

Customer satisfaction Text Analysis (2022)

Partner: EY Netherlands, for ***

Industry: Healthcare

Dataset: Interviewing text data, with multiple questions

Goal: Understanding the insights in the data

Challenge: ***

Solution: ***

Tool: Spacy, EmoRoberta, Sumy, LDA (Python)

Analysis of facilities blind spots in the Netherlands (2022)

Partner: InnoBeweegLab [Unremunerated]

Industry: Public affairs, facilities

Dataset: Public data from the Dutch government on neighbourhood level and facilities data scraped.

Goal: Finding specific neighbourhoods that have an unfulfilled need for a certain facility, according to the neighbourhood profile.

Challenge: It is hard to obtain precise data about the facilities locations and types form the whole country.

Solution: The app is available at https://blind-spot-map.herokuapp.com/

Tool: GeoPandas, Heroku, OverpassTurbo

repo: https://github.com/octokami/NL_blindspots



Analysis of road traffic accidents in the UK (2022)

Partner: Technische Universiteit Eindhoven (University project) [Unremunerated]

Industry: Automobile, Public affairs

Dataset: Public Data of road traffic accidents in the UK

Goal: Improving Road Safety

Challenge: Making it as interative and intuitive as possible

Solution: The tool can be accessed at https://github.com/octokami/uk_road_safety

Tool: Dash (python)




Stock prediction from news articles (2022)

Partner: Technische Universiteit Eindhoven [Unremunerated]

Industry: Financial, Publishing

Dataset: Stock Values from Yahoo and News from Kaggle

Goal: Evaluating the effect of sentiment in news about a company in their performance in the stock market.

Challenge: It is difficult to disassociate the external factors.

Solution: https://github.com/octokami/news_stock_market

Tool: GaussianNB (Python)


Breast cancer 3D model prediction (2021)

Partner: MKBLab for XYZ Imaging (Netherlands)

Industry: Healthcare and Research

Dataset: Multiple breast pictures from different angles

Goal: Finding a less invasive method to perform breast cancer detection that does not use radiation

Challenge: For healthcare, the recall must be very high as False Negatives have extremely undesirable consequences.

Solution: Making a 3D model out of the pictures and a model that moves to detect abnormalities

Tool: Tochvision (python) News article (in Dutch): https://www.cursor.tue.nl/nieuws/2021/juni/week-4/borstkanker-opsporen-met-3d-fotografie/

Parking peak hours prediction (2021)

Partner: Park Now (Scotland) [Unremunerated] Industry: Automobile

Dataset: 3 years of parking transactions

Goal: Predicting peak hours for parking locations, so that the end-user can avoid them

Challenge: Covid-19 year outliers

Solution: A Pipeline of an One-hot-encoder with Random Forest Regressor

Tool: sklearn (Python)

Supermarket stock forecast (2019): Bachelor thesis project

Partner: Samsung Cello (Logistics sector of Samsung Brazil) [Unremunerated] Industry: Logistics Dataset: 1.5 years of groceries transactions within one supermarket.

Goal: predict consumption in a chain of supermarkets to avoid zero-level stock disruption and improve profits by avoiding waste

Challenge: ***

Solution: ***

Tools: ARIMA (python)

Data integration and analysis (2019)

Partner: Philips (Brazil)

Industry: Healthcare

Dataset: Information about the medical equipment and costumer satisfaction from SAP, SalesForce, QlikView, QlikSense and Tableau

Goal: A tool developed with the Service Managers team used to present certain service quality KPIs for customers, focused on Uptime, OSRT (Onsite Response Time) e ETTR (Elapsed Time To Repair), and MTBF (Mean Time Between Failures). The main need was integrating and validating all data sources into one tool that assisted data driven decisions.

Challenge: ***

Solution: ***

Tools used: Power BI, MS Access, MS Flows



Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Linkedin
  • git
  • Youtube

©2022 by Camila Matoba
KvK: 82791686

bottom of page