ClearCast - Captech (via Ballastlane) 2025
|
Designed and implemented an Emission Reporting System application that forecasts future emissions based on historical data. Developed robust unstructured data extraction processes from complex Excel files, capturing both data and detailed metadata. Utilized generative AI (genAI) with Gemini and LangChain to standardize and structure metadata files, enabling automated generation of accurate emission reporting forms for frontend applications. Technologies Used:
|
Kroger Throughout Forecast (via BCG X) 2024
|
Designed and implemented a Power BI dashboard to empower regional managers with actionable insights into logistics and warehouse costs, segmented by product and region. Leveraged PySpark for data transformation and orchestrated the pipeline using Databricks, seamlessly integrating data from Azure Blob Storage. Additionally, developed an automated pipeline to generate customized performance reports for regional and store managers. Utilized LangChain and GenAI to create personalized emails, comparing product performance across stores and regions, enabling data-driven decision-making. Technologies Used:
|
Casas Bahia EVA (VIA BCG X) 2024
![]() |
Developed and deployed predictive pipelines using PySpark and ML models in Databricks to forecast top-selling items based on product seasonality. Designed KPIs for store managers to monitor store and salesperson performance, enabling targeted sales strategies. Leveraged genAI to deliver customized insights based on individual salesperson performance, further enhancing strategic decision-making and sales efficiency. The solution also provided sales teams with real-time updates on the most profitable items and sub-items, driving a 5% increase in total sales and a 14% boost in profits. All datasets were cataloged and managed in Azure Data Catalog for seamless accessibility. Built interactive Plotly and PowerBI dashboards and integrated them into a mobile app, delivering actionable insights directly to stakeholders. Additionally, tracked user app engagement and behavior using GA4 (Google Analytics 4) and Google BigQuery, enabling data-driven optimizations to the app experience. Technologies Used:
|
Saint-Gobain (VIA BCG X) 2024
|
Optimized and refactored existing Argo data pipelines, achieving an outstanding 40% reduction in running time by enhancing PySpark code and parameterizing cluster sizes for efficient data processing. These improvements resulted in $15K in cost savings for the organization. Redesigned pipelines to better align with task dependencies and model scheduling, ensuring optimal performance. Additionally, streamlined data ingestion into PostgreSQL, enabling seamless access for business stakeholders. Technologies Used:
|
Teknosa (VIA BCG X) 2024
![]() |
Designed and implemented an end-to-end data delivery system using FastAPI to provide KPIs for Teknosa’s online store. The solution enabled seamless API-based consumption of key metrics, empowering stakeholders with real-time insights. The architecture leveraged Azure Blob Storage for data ingestion, Polars and Pandas for efficient data processing, and Airflow for pipeline orchestration. Processed data was ingested into MySQL and served through the API, ensuring scalability and reliability. The API utilized GenAI to generate customized HTML messages integrated directly with KPIs from the data processing pipeline. This approach provided tailored product recommendations and detailed explanations of why specific products were ideal for each user, significantly enhancing customer engagement and driving sales. By optimizing the tech stack and incorporating GenAI-driven personalization, the project significantly reduced data processing costs, earning strong client approval and establishing a cost-effective, high-performance solution. Technologies Used:
|
Intelligent Sales Assistant BCG X 2023
|
Developed a mobile application to help salespersons and managers track sales progress and performance metrics. Played a multifaceted role in the project, contributing to both backend development and data engineering. Built the backend using Django and transformed raw data from GA4 for actionable insights. Designed an S3 data lake to enable advanced calculations, such as product forecasting and sales predictions, using AWS Lambda for transformations and AWS Step Functions for orchestration. Additionaly, integrated GenAI for data processing, extracting useful information from text based columns. Analyzed processed data in AWS Athena and shared insights with clients. Additionally, contributed to software engineering by developing an admin panel in ReactJS to manage feature access and application permissions. Technologies Used:
|
Gen AI Santander (via BCG X) 2023
|
Engineered Santander’s first GenAI project in Brazil, revolutionizing the IVR (Interactive Voice Response) system to enhance customer satisfaction. The solution utilized speech-to-text technology to transcribe customer conversations, enabling real-time analysis and reducing the steps required for attendants to resolve queries. Leveraged LangChain and PySpark Streaming, the system generated embeddings from instructional documents stored in MongoDB, retrieving the most relevant solutions for customer issues. These embeddings were stored in a vectorized CosmosDB database for efficient retrieval. The project achieved a 22% reduction in IVR handling time and a 20% increase in customer satisfaction, setting a new standard for customer service efficiency. Technologies Used:
|
Distrito - Data Enrichment and SaaS Solution Development
2022
![]() |
The goal was to expand and enrich a small list of enterprises by capturing additional data from external sources. Developed custom web scrapers tailored to extract data from multiple websites, storing the responses as JSON files in Amazon S3. Used Airflow to orchestrate the pipeline, triggering data processing whenever new files landed in the data lake. Depending on the data source, leveraged Pandas and PySpark for transformation and enrichment, ensuring scalability and efficiency. Technologies Used:
|
Distrito - Cloud Infrastructure and Data Warehouse
Architecture
2022
![]() |
Designed and implemented a cloud-based environment on AWS to support the growing data needs. Built a data warehouse (thanks Kimball) using MySQL to store enriched enterprise data, which was then shared through APIs and Looker dashboards as part of the SaaS solution. Later, enhanced the architecture by implementing a Delta Lake solution, introducing refinement layers to improve data quality and reliability. Technologies Used:
|
Distrito - Data Integration and Automation
2022
![]() |
With multiple data sources (starting with Hubspot), managing the ELT process became complex. Deployed Airbyte on Kubernetes to streamline data ingestion and automate the ELT process. Post-ingestion, used EMR along with PySpark for data transformation and refinement. Technologies Used:
|
Distrito - CRUD System Development
2021
![]() |
Developed a CRUD system using Flask to allow users to add, update, or delete enterprise records in the system. Deployed the solution on AWS EC2, ensuring scalability and accessibility for end-users. Technologies Used:
|