SOON
OSS Trends Scraper
building
The OSS Trends Scraper is a high-performance Python microservice engineered to automate the discovery and enrichment of trending GitHub repositories. Built with FastAPI and an entirely asynchronous pipeline, it utilizes BeautifulSoup4 and httpx for high-concurrency web scraping of the GitHub Trending pages. Once a repository is discovered, the service enriches the record by interfacing with the official GitHub REST API to fetch deep metadata, including owner profiles and precise activity metrics.
To solve the challenge of GitHub's aggressive rate limiting, the service implements an intelligent scheduling and batching system using APScheduler. Enriched data is 'upserted' into a PostgreSQL database (hosted via Supabase) using asyncpg to ensure non-blocking I/O. The microservice then automatically synchronizes this data with the SourceSurf backend via secure webhooks, providing a continuous stream of fresh, high-quality open-source leads while maintaining extreme operational efficiency.