scraping (@scraping) | Cosmocode

scraping

@scraping

No description available 🫠

scrapy

55.1k

@scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

firecrawl

37.5k

@mendableai

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

Jobs_Applier_AI_Agent_AIHawk

28.1k

@feder-cr

AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.

colly

24.1k

@gocolly

Elegant Scraper and Crawler Framework for Golang

Scrapegraph-ai

19.4k

@ScrapeGraphAI

Python scraper based on AI

crawlee

17.6k

@apify

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

maigret

15.2k

@soxoj

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

requests-html

13.8k

@psf

Pythonic HTML Parsing for Humans™

webmagic

11.5k

@code4craft

A scalable web crawler framework for Java.

undetected-chromedriver

11.1k

@ultrafunkamsterdam

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

tabula

@tabulapdf

Tabula is a tool for liberating data tables trapped inside PDF files

awesome-web-scraping

@lorien

List of libraries, tools and APIs for web scraping and data processing.

autoscraper

6.7k

@alirezamika

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

ferret

5.8k

@MontFerret

Declarative web scraping

crawlee-python

5.6k

@apify

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.