Sneakers Web Crawlers

📣📣📣 A couple of days ago we released our first contribution to the open-source community from the cotatenis project. It’s was a collection of sneakers images as a Kaggle dataset.

🕸️ 🕸️🕸️ And today I’m thrilled to announce that we’ve decided to release 17 code repositories that were the ingestion layer of the cotatenis project.

These projects were a collection of web crawlers who were in charge of collecting data through a series of stores and websites and dump into our data lake.

😲 That’s a lot of code, isn’t it? All of that code is organized and scheduled by Airflow. Each one of these stores/sites was organized as DAGs. So we do some health checks to ensure that our spiders will encounter those targets in a way that we expected to perform to make the data collection successfully.

To encapsulate this logic we have a code that is in charge of that. It runs before each data gathering process inside of our DAGs.

I’m very excited 🤩 to hear from you guys about what you think about that launch and hope you enjoy it!