Last translated: 16 Jun 2025

Translation Not Available Yet

This repository's README hasn't been translated yet. Once translated, it will be easier to read and understand in your native language (ไธญๆ–‡).

After translating, add the links to README so others can view it directly.

๐Ÿš€๐Ÿค– Crawl4AI: ์˜คํ”ˆ์†Œ์Šค LLM ์นœํ™”์  ์›น ํฌ๋กค๋Ÿฌ & ์Šคํฌ๋ž˜ํผ.

unclecode%2Fcrawl4ai | Trendshift

GitHub Stars GitHub Forks

PyPI version Python Version Downloads

License Code style: black Security: bandit Contributor Covenant

Crawl4AI๋Š” #1 ํŠธ๋ Œ๋”ฉ GitHub ์ €์žฅ์†Œ๋กœ, ํ™œ๋ฐœํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์˜ํ•ด ์œ ์ง€๋ณด์ˆ˜๋˜๋ฉฐ LLM, AI ์—์ด์ „ํŠธ ๋ฐ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์œ„ํ•œ ์ดˆ๊ณ ์† AI-ready ์›น ํฌ๋กค๋ง์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์˜คํ”ˆ์†Œ์Šค์ด๋ฉฐ ์œ ์—ฐํ•˜๊ณ  ์‹ค์‹œ๊ฐ„ ์„ฑ๋Šฅ์„ ์œ„ํ•ด ๊ตฌ์ถ•๋œ Crawl4AI๋Š” ๊ฐœ๋ฐœ์ž์—๊ฒŒ ํƒ์›”ํ•œ ์†๋„, ์ •ํ™•์„ฑ ๋ฐ ๋ฐฐํฌ ํŽธ์˜์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

โœจ ์ตœ์‹  ์—…๋ฐ์ดํŠธ v0.6.0 ํ™•์ธํ•˜๊ธฐ

๐ŸŽ‰ ๋ฒ„์ „ 0.6.0์ด ์ถœ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! ์ด ๋ฆด๋ฆฌ์Šค ํ›„๋ณด์—๋Š” ์ง€๋ฆฌ์  ์œ„์น˜ ๋ฐ ๋กœ์ผ€์ผ ์„ค์ •์„ ํ†ตํ•œ World-aware ํฌ๋กค๋ง, ํ…Œ์ด๋ธ”-ํˆฌ-๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ถ”์ถœ, ๋ธŒ๋ผ์šฐ์ € ํ’€๋ง ๋ฐ ์‚ฌ์ „ ์›Œ๋ฐ, ๋„คํŠธ์›Œํฌ ๋ฐ ์ฝ˜์†” ํŠธ๋ž˜๏ฟฝ ์บก์ฒ˜, AI ๋„๊ตฌ๋ฅผ ์œ„ํ•œ MCP ํ†ตํ•ฉ, ์™„์ „ํžˆ ๊ฐœ์„ ๋œ Docker ๋ฐฐํฌ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค! ๋ฆด๋ฆฌ์Šค ๋…ธํŠธ ์ฝ๊ธฐ โ†’

๐Ÿค“ ๊ฐœ์ธ์ ์ธ ์ด์•ผ๊ธฐ

์ œ ์ปดํ“จํ„ฐ์™€์˜ ์—ฌ์ •์€ ์–ด๋ฆฐ ์‹œ์ ˆ๋กœ ๊ฑฐ์Šฌ๋Ÿฌ ์˜ฌ๋ผ๊ฐ‘๋‹ˆ๋‹ค. ์ปดํ“จํ„ฐ ๊ณผํ•™์ž์ด์‹  ์•„๋ฒ„์ง€๊ฐ€ ์ €์—๊ฒŒ Amstrad ์ปดํ“จํ„ฐ๋ฅผ ์†Œ๊ฐœํ•ด์ฃผ์…จ์ฃ . ๊ทธ ์ดˆ๊ธฐ ์‹œ์ ˆ์€ ๊ธฐ์ˆ ์— ๋Œ€ํ•œ ๋งค๋ ฅ์„ ๋ถˆ๋Ÿฌ์ผ์œผ์ผฐ๊ณ , ๊ฒฐ๊ตญ ์ปดํ“จํ„ฐ ๊ณผํ•™์„ ์ „๊ณตํ•˜๊ฒŒ ๋˜์—ˆ์œผ๋ฉฐ ๋Œ€ํ•™์› ์‹œ์ ˆ์—๋Š” NLP๋ฅผ ์ „๋ฌธํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๋‹น์‹œ ์›น ํฌ๋กค๋ง์— ์ฒ˜์Œ ๋ฐœ์„ ๋“ค์˜€๊ณ , ์—ฐ๊ตฌ์ž๋“ค์ด ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•˜๊ณ  ์ถœํŒ๋ฌผ์—์„œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ๋„๊ตฌ๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ถ”์ถœ ๊ธฐ์ˆ ์„ ์—ฐ๋งˆํ•˜๋Š” ๋„์ „์ ์ด๋ฉด์„œ๋„ ๋ณด๋žŒ ์žˆ๋Š” ๊ฒฝํ—˜์ด์—ˆ์ฃ .

2023๋…„์œผ๋กœ ๋„˜์–ด์™€, ์ €๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์œ„ํ•œ ๋„๊ตฌ๋ฅผ ์ž‘์—… ์ค‘์ด์—ˆ๊ณ  ์›นํŽ˜์ด์ง€๋ฅผ ๋งˆํฌ๋‹ค์šด์œผ๋กœ ๋ณ€ํ™˜ํ•  ํฌ๋กค๋Ÿฌ๊ฐ€ ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ๋˜ ์ค‘, ์˜คํ”ˆ์†Œ์Šค๋ผ๊ณ  ์ฃผ์žฅํ•˜์ง€๋งŒ ๊ณ„์ • ์ƒ์„ฑ๊ณผ API ํ† ํฐ ์ƒ์„ฑ์ด ํ•„์š”ํ•œ ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๋” ๋‚˜์œ ๊ฒƒ์€ SaaS ๋ชจ๋ธ๋กœ $16์„ ์ฒญ๊ตฌํ–ˆ๊ณ  ํ’ˆ์งˆ์ด ์ œ ๊ธฐ์ค€์— ๋ฏธ์น˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ์ขŒ์ ˆ๊ฐ์„ ๋А๋ผ๋ฉฐ, ์ด๋Š” ๋” ๊นŠ์€ ๋ฌธ์ œ์ž„์„ ๊นจ๋‹ฌ์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ ์ขŒ์ ˆ๊ฐ์€ ํ„ฐ๋ณด ๋ถ„๋…ธ ๋ชจ๋“œ๋กœ ๋ฐ”๋€Œ์—ˆ๊ณ , ์ €๋Š” ์ง์ ‘ ํ•ด๊ฒฐ์ฑ…์„ ๋งŒ๋“ค๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹จ ๋ฉฐ์น  ๋งŒ์— Crawl4AI๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋†€๋ž๊ฒŒ๋„, ์ด ํ”„๋กœ์ ํŠธ๋Š” ๊ธ‰์†๋„๋กœ ํผ์ ธ ์ˆ˜์ฒœ ๊ฐœ์˜ GitHub ์Šคํƒ€๋ฅผ ์–ป์œผ๋ฉฐ ๊ธ€๋กœ๋ฒŒ ์ปค๋ฎค๋‹ˆํ‹ฐ์™€ ๊ณต๊ฐ์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

์ €๋Š” Crawl4AI๋ฅผ ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœํ•œ ๋ฐ ๋‘ ๊ฐ€์ง€ ์ด์œ ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ์งธ, ์ œ ๊ฒฝ๋ ฅ ์ „๋ฐ˜์— ๊ฑธ์ณ ์ €๋ฅผ ์ง€์›ํ•ด์ค€ ์˜คํ”ˆ์†Œ์Šค ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋ณด๋‹ตํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋‘˜์งธ, ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋“  ์‚ฌ๋žŒ์ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๋ฉฐ, ์œ ๋ฃŒ ๋ฒฝ ๋’ค์— ๊ฐ‡ํžˆ๊ฑฐ๋‚˜ ์†Œ์ˆ˜์— ์˜ํ•ด ๋…์ ๋˜์–ด์„œ๋Š” ์•ˆ ๋œ๋‹ค๊ณ  ๋ฏฟ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฐœ๋ฐฉํ˜• ์ ‘๊ทผ์€ AI์˜ ๋ฏผ์ฃผํ™”๋ฅผ ์œ„ํ•œ ๊ธฐ๋ฐ˜์„ ๋งˆ๋ จํ•˜๋ฉฐ, ๊ฐœ์ธ์ด ์ž์‹ ์˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ์ •๋ณด์˜ ์†Œ์œ ๊ถŒ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๋น„์ „์„ ์‹คํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ์—ด์ •์ ์ธ ์ปค๋ฎค๋‹ˆํ‹ฐ๊ฐ€ ํ˜‘๋ ฅํ•˜์—ฌ ๊ตฌ์ถ•ํ•œ ์ตœ๊ณ ์˜ ์˜คํ”ˆ์†Œ์Šค ๋ฐ์ดํ„ฐ ์ถ”์ถœ ๋ฐ ์ƒ์„ฑ ๋„๊ตฌ๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋” ํฐ ์—ฌ์ •์˜ ์ฒซ ๊ฑธ์Œ์ž…๋‹ˆ๋‹ค.

์ด ํ”„๋กœ์ ํŠธ๋ฅผ ์ง€์›ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋ฉฐ ํ”ผ๋“œ๋ฐฑ์„ ๊ณต์œ ํ•ด์ฃผ์‹  ๋ชจ๋“  ๋ถ„๋“ค๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๋ถ„์˜ ๊ฒฉ๋ ค๋Š” ์ œ๊ฐ€ ๋” ํฐ ๊ฟˆ์„ ๊พธ๋„๋ก ๋™๊ธฐ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค. ํ•จ๊ป˜ํ•˜์…”์„œ ์ด์Šˆ๋ฅผ ์ œ์ถœํ•˜๊ฑฐ๋‚˜ PR์„ ์ œ์ถœํ•˜๊ฑฐ๋‚˜ ์†Œ๋ฌธ์„ ํผ๋œจ๋ ค์ฃผ์„ธ์š”. ํ•จ๊ป˜๋ผ๋ฉด ์‚ฌ๋žŒ๋“ค์ด ์ž์‹ ์˜ ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผํ•˜๊ณ  AI์˜ ๋ฏธ๋ž˜๋ฅผ ์žฌ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์ง„์ •ํ•œ ๋„๊ตฌ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿง Crawl4AI๋ฅผ ์„ ํƒํ•ด์•ผ ํ•˜๋Š” ์ด์œ ?

  1. LLM์„ ์œ„ํ•ด ๊ตฌ์ถ•๋จ: RAG ๋ฐ ํŒŒ์ธํŠœ๋‹ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ตœ์ ํ™”๋œ ์Šค๋งˆํŠธํ•˜๊ณ  ๊ฐ„๊ฒฐํ•œ ๋งˆํฌ๋‹ค์šด์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  2. ๋ฒˆ๊ฐœ ๊ฐ™์€ ์†๋„: ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋น„์šฉ ํšจ์œจ์ ์ธ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ ๊ฒฐ๊ณผ๋ฅผ 6๋ฐฐ ๋” ๋น ๋ฅด๊ฒŒ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  3. ์œ ์—ฐํ•œ ๋ธŒ๋ผ์šฐ์ € ์ œ์–ด: ์›ํ™œํ•œ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์„ ์œ„ํ•œ ์„ธ์…˜ ๊ด€๋ฆฌ, ํ”„๋ก์‹œ ๋ฐ ์‚ฌ์šฉ์ž ์ •์˜ ํ›…์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  4. ํœด๋ฆฌ์Šคํ‹ฑ ์ธํ…”๋ฆฌ์ „์Šค: ๊ณ ๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ๋ชจ๋ธ์— ๋Œ€ํ•œ ์˜์กด๋„๋ฅผ ์ค„์ด๋Š” ํšจ์œจ์ ์ธ ์ถ”์ถœ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  5. ์˜คํ”ˆ์†Œ์Šค & ๋ฐฐํฌ ๊ฐ€๋Šฅ: API ํ‚ค ์—†์ด ์™„์ „ํžˆ ์˜คํ”ˆ์†Œ์Šค์ด๋ฉฐ Docker ๋ฐ ํด๋ผ์šฐ๋“œ ํ†ตํ•ฉ์ด ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  6. ํ™œ๋ฐœํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ: ํ™œ๊ธฐ์ฐฌ ์ปค๋ฎค๋‹ˆํ‹ฐ์™€ #1 ํŠธ๋ Œ๋”ฉ GitHub ์ €์žฅ์†Œ์— ์˜ํ•ด ์ ๊ทน์ ์œผ๋กœ ์œ ์ง€๋ณด์ˆ˜๋ฉ๋‹ˆ๋‹ค.

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

  1. Crawl4AI ์„ค์น˜:
# Install the package
pip install -U crawl4ai

# For pre release versions
pip install crawl4ai --pre

# Run post-installation setup
crawl4ai-setup

# Verify your installation
crawl4ai-doctor

๋ธŒ๋ผ์šฐ์ € ๊ด€๋ จ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์ˆ˜๋™์œผ๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

python -m playwright install --with-deps chromium
  1. Python์œผ๋กœ ๊ฐ„๋‹จํ•œ ์›น ํฌ๋กค๋ง ์‹คํ–‰:
import asyncio
from crawl4ai import *

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
        )
        print(result.markdown)

if __name__ == "__main__":
    asyncio.run(main())
  1. ๋˜๋Š” ์ƒˆ๋กœ์šด ๋ช…๋ น์ค„ ์ธํ„ฐํŽ˜์ด์Šค ์‚ฌ์šฉ:
# Basic crawl with markdown output
crwl https://www.nbcnews.com/business -o markdown

# Deep crawl with BFS strategy, max 10 pages
crwl https://docs.crawl4ai.com --deep-crawl bfs --max-pages 10

# Use LLM extraction with a specific question
crwl https://www.example.com/products -q "Extract all product prices"

โœจ ๊ธฐ๋Šฅ

๐Ÿ“ ๋งˆํฌ๋‹ค์šด ์ƒ์„ฑ
  • ๐Ÿงน ๊น”๋”ํ•œ ๋งˆํฌ๋‹ค์šด: ์ •ํ™•ํ•œ ํ˜•์‹์œผ๋กœ ๊น”๋”ํ•˜๊ณ  ๊ตฌ์กฐํ™”๋œ ๋งˆํฌ๋‹ค์šด์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๐ŸŽฏ ์ ํ•ฉํ•œ ๋งˆํฌ๋‹ค์šด: AI ์นœํ™”์ ์ธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๋…ธ์ด์ฆˆ ๋ฐ ๊ด€๋ จ ์—†๋Š” ๋ถ€๋ถ„์„ ์ œ๊ฑฐํ•˜๋Š” ํœด๋ฆฌ์Šคํ‹ฑ ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง.
  • ๐Ÿ”— ์ธ์šฉ ๋ฐ ์ฐธ์กฐ: ํŽ˜์ด์ง€ ๋งํฌ๋ฅผ ๊น”๋”ํ•œ ์ธ์šฉ์ด ํฌํ•จ๋œ ๋ฒˆํ˜ธ ๋งค๊ธฐ๊ธฐ ์ฐธ์กฐ ๋ชฉ๋ก์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ› ๏ธ ์‚ฌ์šฉ์ž ์ •์˜ ์ „๋žต: ํŠน์ • ์š”๊ตฌ ์‚ฌํ•ญ์— ๋งž์ถ˜ ์‚ฌ์šฉ์ž ์ •์˜ ๋งˆํฌ๋‹ค์šด ์ƒ์„ฑ ์ „๋žต์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๐Ÿ“š BM25 ์•Œ๊ณ ๋ฆฌ์ฆ˜: ํ•ต์‹ฌ ์ •๋ณด ์ถ”์ถœ ๋ฐ ๊ด€๋ จ ์—†๋Š” ์ฝ˜ํ…์ธ  ์ œ๊ฑฐ๋ฅผ ์œ„ํ•ด BM25 ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ“Š ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ
  • ๐Ÿค– LLM ๊ธฐ๋ฐ˜ ์ถ”์ถœ: ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ์„ ์œ„ํ•ด ๋ชจ๋“  LLM(์˜คํ”ˆ์†Œ์Šค ๋ฐ ๋…์ )์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿงฑ ์ฒญํ‚น ์ „๋žต: ๋Œ€์ƒ ์ฝ˜ํ…์ธ  ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ์ฒญํ‚น(์ฃผ์ œ ๊ธฐ๋ฐ˜, ์ •๊ทœ์‹, ๋ฌธ์žฅ ์ˆ˜์ค€)์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  • ๐ŸŒŒ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„: ์˜๋ฏธ๋ก ์  ์ถ”์ถœ์„ ์œ„ํ•ด ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ ๊ธฐ๋ฐ˜ ๊ด€๋ จ ์ฝ˜ํ…์ธ  ์ฒญํฌ๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค.
  • ๐Ÿ”Ž CSS ๊ธฐ๋ฐ˜ ์ถ”์ถœ: XPath ๋ฐ CSS ์„ ํƒ์ž๋ฅผ ์‚ฌ์šฉํ•œ ๋น ๋ฅธ ์Šคํ‚ค๋งˆ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ถ”์ถœ.
  • ๐Ÿ”ง ์Šคํ‚ค๋งˆ ์ •์˜: ๋ฐ˜๋ณต์ ์ธ ํŒจํ„ด์—์„œ ๊ตฌ์กฐํ™”๋œ JSON์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ์Šคํ‚ค๋งˆ ์ •์˜.
๐ŸŒ ๋ธŒ๋ผ์šฐ์ € ํ†ตํ•ฉ
  • ๐Ÿ–ฅ๏ธ ๊ด€๋ฆฌํ˜• ๋ธŒ๋ผ์šฐ์ €: ๋ด‡ ํƒ์ง€๋ฅผ ํ”ผํ•˜๋ฉด์„œ ์‚ฌ์šฉ์ž ์†Œ์œ ์˜ ๋ธŒ๋ผ์šฐ์ €๋ฅผ ์™„์ „ํ•œ ์ œ์–ด์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”„ ์›๊ฒฉ ๋ธŒ๋ผ์šฐ์ € ์ œ์–ด: ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ถ”์ถœ์„ ์œ„ํ•ด Chrome ๊ฐœ๋ฐœ์ž ๋„๊ตฌ ํ”„๋กœํ† ์ฝœ์— ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ‘ค ๋ธŒ๋ผ์šฐ์ € ํ”„๋กœํŒŒ์ผ๋Ÿฌ: ์ €์žฅ๋œ ์ธ์ฆ ์ƒํƒœ, ์ฟ ํ‚ค ๋ฐ ์„ค์ •์ด ํฌํ•จ๋œ ์ง€์†์ ์ธ ํ”„๋กœํ•„ ์ƒ์„ฑ ๋ฐ ๊ด€๋ฆฌ.
  • ๐Ÿ”’ ์„ธ์…˜ ๊ด€๋ฆฌ: ๋ธŒ๋ผ์šฐ์ € ์ƒํƒœ๋ฅผ ๋ณด์กดํ•˜๊ณ  ๋‹ค๋‹จ๊ณ„ ํฌ๋กค๋ง์„ ์œ„ํ•ด ์žฌ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿงฉ ํ”„๋ก์‹œ ์ง€์›: ๋ณด์•ˆ ์ ‘๊ทผ์„ ์œ„ํ•œ ์ธ์ฆ์ด ํฌํ•จ๋œ ํ”„๋ก์‹œ์— ์›ํ™œํ•˜๊ฒŒ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • โš™๏ธ ์™„์ „ํ•œ ๋ธŒ๋ผ์šฐ์ € ์ œ์–ด: ๋งž์ถคํ˜• ํฌ๋กค๋ง ์„ค์ •์„ ์œ„ํ•ด ํ—ค๋”, ์ฟ ํ‚ค, ์‚ฌ์šฉ์ž ์—์ด์ „ํŠธ ๋“ฑ์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ๐ŸŒ ๋‹ค์ค‘ ๋ธŒ๋ผ์šฐ์ € ์ง€์›: Chromium, Firefox ๋ฐ WebKit๊ณผ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ“ ๋™์  ๋ทฐํฌํŠธ ์กฐ์ •: ๋ธŒ๋ผ์šฐ์ € ๋ทฐํฌํŠธ๋ฅผ ํŽ˜์ด์ง€ ์ฝ˜ํ…์ธ ์— ๋งž๊ฒŒ ์ž๋™ ์กฐ์ •ํ•˜์—ฌ ๋ชจ๋“  ์š”์†Œ์˜ ์™„์ „ํ•œ ๋ Œ๋”๋ง ๋ฐ ์บก์ฒ˜๋ฅผ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ”Ž ํฌ๋กค๋ง & ์Šคํฌ๋ž˜ํ•‘
  • ๐Ÿ–ผ๏ธ ๋ฏธ๋””์–ด ์ง€์›: ์ด๋ฏธ์ง€, ์˜ค๋””์˜ค, ๋น„๋””์˜ค ๋ฐ srcset ๋ฐ picture์™€ ๊ฐ™์€ ๋ฐ˜์‘ํ˜• ์ด๋ฏธ์ง€ ํ˜•์‹์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿš€ ๋™์  ํฌ๋กค๋ง: JS ์‹คํ–‰ ๋ฐ ๋น„๋™๊ธฐ ๋˜๋Š” ๋™๊ธฐ ๋Œ€๊ธฐ๋ฅผ ํ†ตํ•ด ๋™์  ์ฝ˜ํ…์ธ  ์ถ”์ถœ.
  • ๐Ÿ“ธ ์Šคํฌ๋ฆฐ์ƒท: ๋””๋ฒ„๊น… ๋˜๋Š” ๋ถ„์„์„ ์œ„ํ•ด ํฌ๋กค๋ง ์ค‘ ํŽ˜์ด์ง€ ์Šคํฌ๋ฆฐ์ƒท ์บก์ฒ˜.
  • ๐Ÿ“‚ ์›์‹œ ๋ฐ์ดํ„ฐ ํฌ๋กค๋ง: ์›์‹œ HTML(raw:) ๋˜๋Š” ๋กœ์ปฌ ํŒŒ์ผ(file://)์„ ์ง์ ‘ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”— ํฌ๊ด„์ ์ธ ๋งํฌ ์ถ”์ถœ: ๋‚ด๋ถ€, ์™ธ๋ถ€ ๋งํฌ ๋ฐ ์ž„๋ฒ ๋””๋“œ iframe ์ฝ˜ํ…์ธ ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ› ๏ธ ์‚ฌ์šฉ์ž ์ •์˜ ๊ฐ€๋Šฅํ•œ ํ›…: ํฌ๋กค๋ง ๋™์ž‘์„ ์‚ฌ์šฉ์ž ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋“  ๋‹จ๊ณ„์—์„œ ํ›…์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ’พ ์บ์‹ฑ: ์†๋„ ํ–ฅ์ƒ ๋ฐ ์ค‘๋ณต ํŽ˜์น˜ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์บ์‹œํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ“„ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ถ”์ถœ: ์›น ํŽ˜์ด์ง€์—์„œ ๊ตฌ์กฐํ™”๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ“ก IFrame ์ฝ˜ํ…์ธ  ์ถ”์ถœ: ์ž„๋ฒ ๋””๋“œ iframe ์ฝ˜ํ…์ธ ์—์„œ ์›ํ™œํ•œ ์ถ”์ถœ.
  • ๐Ÿ•ต๏ธ ์ง€์—ฐ ๋กœ๋“œ ์ฒ˜๋ฆฌ: ์ด๋ฏธ์ง€๊ฐ€ ์™„์ „ํžˆ ๋กœ๋“œ๋  ๋•Œ๊นŒ์ง€ ๋Œ€๊ธฐํ•˜์—ฌ ์ง€์—ฐ ๋กœ๋“œ๋กœ ์ธํ•œ ์ฝ˜ํ…์ธ  ๋ˆ„๋ฝ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”„ ์ „์ฒด ํŽ˜์ด์ง€ ์Šค์บ๋‹: ๋ฌดํ•œ ์Šคํฌ๋กค ํŽ˜์ด์ง€์— ์™„๋ฒฝํ•œ ๋ชจ๋“  ๋™์  ์ฝ˜ํ…์ธ ๋ฅผ ๋กœ๋“œํ•˜๊ณ  ์บก์ฒ˜ํ•˜๊ธฐ ์œ„ํ•ด ์Šคํฌ๋กค์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿš€ ๋ฐฐํฌ
  • ๐Ÿณ Dockerized ์„ค์ •: ์‰ฌ์šด ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ FastAPI ์„œ๋ฒ„๊ฐ€ ํฌํ•จ๋œ ์ตœ์ ํ™”๋œ Docker ์ด๋ฏธ์ง€.
  • ๐Ÿ”‘ ๋ณด์•ˆ ์ธ์ฆ: API ๋ณด์•ˆ์„ ์œ„ํ•œ ๋‚ด์žฅ JWT ํ† ํฐ ์ธ์ฆ.
  • ๐Ÿ”„ API ๊ฒŒ์ดํŠธ์›จ์ด: API ๊ธฐ๋ฐ˜ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์œ„ํ•œ ๋ณด์•ˆ ํ† ํฐ ์ธ์ฆ์œผ๋กœ ์›ํด๋ฆญ ๋ฐฐํฌ.
  • ๐ŸŒ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜: ๋Œ€๊ทœ๋ชจ ์ƒ์‚ฐ ๋ฐ ์ตœ์ ํ™”๋œ ์„œ๋ฒ„ ์„ฑ๋Šฅ์„ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • โ˜๏ธ ํด๋ผ์šฐ๋“œ ๋ฐฐํฌ: ์ฃผ์š” ํด๋ผ์šฐ๋“œ ํ”Œ๋žซํผ์„ ์œ„ํ•œ ์ฆ‰์‹œ ๋ฐฐํฌ ๊ฐ€๋Šฅํ•œ ๊ตฌ์„ฑ.
๐ŸŽฏ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ
  • ๐Ÿ•ถ๏ธ ์Šคํ…”์Šค ๋ชจ๋“œ: ์‹ค์ œ ์‚ฌ์šฉ์ž๋ฅผ ๋ชจ๋ฐฉํ•˜์—ฌ ๋ด‡ ํƒ์ง€๋ฅผ ํ”ผํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿท๏ธ ํƒœ๊ทธ ๊ธฐ๋ฐ˜ ์ฝ˜ํ…์ธ  ์ถ”์ถœ: ์‚ฌ์šฉ์ž ์ •์˜ ํƒœ๊ทธ, ํ—ค๋” ๋˜๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํฌ๋กค๋ง์„ ์ •์ œํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”— ๋งํฌ ๋ถ„์„: ์ƒ์„ธํ•œ ๋ฐ์ดํ„ฐ ํƒ์ƒ‰์„ ์œ„ํ•ด ๋ชจ๋“  ๋งํฌ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ›ก๏ธ ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ: ์›ํ™œํ•œ ์‹คํ–‰์„ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ์˜ค๋ฅ˜ ๊ด€๋ฆฌ.
  • ๐Ÿ” CORS & ์ •์  ์„œ๋น™: ํŒŒ์ผ ์‹œ์Šคํ…œ ๊ธฐ๋ฐ˜ ์บ์‹ฑ ๋ฐ ๊ต์ฐจ ์ถœ์ฒ˜ ์š”์ฒญ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ“– ๋ช…ํ™•ํ•œ ๋ฌธ์„œํ™”: ์˜จ๋ณด๋”ฉ ๋ฐ ๊ณ ๊ธ‰ ์‚ฌ์šฉ์„ ์œ„ํ•œ ๋‹จ์ˆœํ™”๋˜๊ณ  ์—…๋ฐ์ดํŠธ๋œ ๊ฐ€์ด๋“œ.
  • ๐Ÿ™Œ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ธ์ •: ํˆฌ๋ช…์„ฑ์„ ์œ„ํ•ด ๊ธฐ์—ฌ์ž ๋ฐ ํ’€ ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ์ธ์ •ํ•ฉ๋‹ˆ๋‹ค.

์ง€๊ธˆ ์‚ฌ์šฉํ•ด๋ณด์„ธ์š”!

โœจ Open In Colab์—์„œ ์ง์ ‘ ์ฒดํ—˜ํ•ด๋ณด์„ธ์š”.

โœจ ๋ฌธ์„œ ์›น์‚ฌ์ดํŠธ ๋ฐฉ๋ฌธํ•˜๊ธฐ.

์„ค์น˜ ๐Ÿ› ๏ธ

Crawl4AI๋Š” ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋งž์ถฐ ์œ ์—ฐํ•œ ์„ค์น˜ ์˜ต์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Python ํŒจํ‚ค์ง€๋กœ ์„ค์น˜ํ•˜๊ฑฐ๋‚˜ Docker๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ pip ์‚ฌ์šฉ

ํ•„์š”์— ๋งž๋Š” ์„ค์น˜ ์˜ต์…˜์„ ์„ ํƒํ•˜์„ธ์š”:

๊ธฐ๋ณธ ์„ค์น˜

๊ธฐ๋ณธ์ ์ธ ์›น ํฌ๋กค๋ง ๋ฐ ์Šคํฌ๋ž˜ํ•‘ ์ž‘์—…์„ ์œ„ํ•œ:

pip install crawl4ai
crawl4ai-setup # Setup the browser

๊ธฐ๋ณธ์ ์œผ๋กœ Playwright๋ฅผ ์‚ฌ์šฉํ•˜๋Š” Crawl4AI์˜ ๋น„๋™๊ธฐ ๋ฒ„์ „์ด ์„ค์น˜๋ฉ๋‹ˆ๋‹ค.

๐Ÿ‘‰ ์ฐธ๊ณ : Crawl4AI๋ฅผ ์„ค์น˜ํ•  ๋•Œ crawl4ai-setup์ด ์ž๋™์œผ๋กœ Playwright๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Playwright ๊ด€๋ จ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ๋‹ค์Œ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ ์ˆ˜๋™์œผ๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  1. ๋ช…๋ น์ค„์„ ํ†ตํ•ด:

    playwright install
    
  2. ์œ„ ๋ฐฉ๋ฒ•์ด ์ž‘๋™ํ•˜์ง€ ์•Š์œผ๋ฉด ๋” ๊ตฌ์ฒด์ ์ธ ๋ช…๋ น์–ด ์‹œ๋„:

    python -m playwright install chromium
    

์ด ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ด ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ๋” ์•ˆ์ •์ ์ž…๋‹ˆ๋‹ค.


๋™๊ธฐ ๋ฒ„์ „ ์„ค์น˜

๋™๊ธฐ ๋ฒ„์ „์€ ๋” ์ด์ƒ ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฉฐ ํ–ฅํ›„ ๋ฒ„์ „์—์„œ ์ œ๊ฑฐ๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. Selenium์„ ์‚ฌ์šฉํ•˜๋Š” ๋™๊ธฐ ๋ฒ„์ „์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ:

pip install crawl4ai[sync]

๊ฐœ๋ฐœ ์„ค์น˜

์†Œ์Šค ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•  ๊ณ„ํš์ธ ๊ธฐ์—ฌ์ž๋ฅผ ์œ„ํ•œ:

git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai
pip install -e .                    # Basic installation in editable mode

์„ ํƒ์  ๊ธฐ๋Šฅ ์„ค์น˜:

pip install -e ".[torch]"           # With PyTorch features
pip install -e ".[transformer]"     # With Transformer features
pip install -e ".[cosine]"          # With cosine similarity features
pip install -e ".[sync]"            # With synchronous crawling (Selenium)
pip install -e ".[all]"             # Install all optional features
๐Ÿณ Docker ๋ฐฐํฌ

๐Ÿš€ ์ด์ œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ! ์™„์ „ํžˆ ์žฌ์„ค๊ณ„๋œ Docker ๊ตฌํ˜„์ด ๋„์ž…๋˜์—ˆ์Šต๋‹ˆ๋‹ค! ์ด ์ƒˆ๋กœ์šด ์†”๋ฃจ์…˜์€ ์ด์ „๋ณด๋‹ค ๋” ํšจ์œจ์ ์ด๊ณ  ์›ํ™œํ•œ ๋ฐฐํฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์ƒˆ๋กœ์šด Docker ๊ธฐ๋Šฅ

์ƒˆ๋กœ์šด Docker ๊ตฌํ˜„์—๋Š” ๋‹ค์Œ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค:

  • ํŽ˜์ด์ง€ ์‚ฌ์ „ ์›Œ๋ฐ์„ ํ†ตํ•œ ๋ธŒ๋ผ์šฐ์ € ํ’€๋ง์œผ๋กœ ๋” ๋น ๋ฅธ ์‘๋‹ต ์‹œ๊ฐ„
  • ์š”์ฒญ ์ฝ”๋“œ๋ฅผ ํ…Œ์ŠคํŠธํ•˜๊ณ  ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ๋Œ€ํ™”ํ˜• ํ”Œ๋ ˆ์ด๊ทธ๋ผ์šด๋“œ
  • Claude Code์™€ ๊ฐ™์€ AI ๋„๊ตฌ์— ์ง์ ‘ ์—ฐ๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ MCP ํ†ตํ•ฉ
  • HTML ์ถ”์ถœ, ์Šคํฌ๋ฆฐ์ƒท, PDF ์ƒ์„ฑ ๋ฐ JavaScript ์‹คํ–‰์„ ํฌํ•จํ•œ ํฌ๊ด„์ ์ธ API ์—”๋“œํฌ์ธํŠธ
  • ์ž๋™ ๊ฐ์ง€(AMD64/ARM64)๋ฅผ ํ†ตํ•œ ๋‹ค์ค‘ ์•„ํ‚คํ…์ฒ˜ ์ง€์›
  • ๊ฐœ์„ ๋œ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ๋ฅผ ํ†ตํ•œ ์ตœ์ ํ™”๋œ ๋ฆฌ์†Œ์Šค

์‹œ์ž‘ํ•˜๊ธฐ

# Pull and run the latest release candidate
docker pull unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number

# Visit the playground at http://localhost:11235/playground

์ „์ฒด ๋ฌธ์„œ๋Š” Docker ๋ฐฐํฌ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.


๋น ๋ฅธ ํ…Œ์ŠคํŠธ

๋น ๋ฅธ ํ…Œ์ŠคํŠธ ์‹คํ–‰(๋‘ Docker ์˜ต์…˜ ๋ชจ๋‘ ์ž‘๋™):

import requests

# Submit a crawl job
response = requests.post(
    "http://localhost:11235/crawl",
    json={"urls": "https://example.com", "priority": 10}
)
task_id = response.json()["task_id"]

# Continue polling until the task is complete (status="completed")
result = requests.get(f"http://localhost:11235/task/{task_id}")

๋” ๋งŽ์€ ์˜ˆ์ œ๋Š” Docker ์˜ˆ์ œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ๊ณ ๊ธ‰ ๊ตฌ์„ฑ, ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ๋ฐ ์‚ฌ์šฉ ์˜ˆ์ œ๋Š” Docker ๋ฐฐํฌ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

๐Ÿ”ฌ ๊ณ ๊ธ‰ ์‚ฌ์šฉ ์˜ˆ์ œ ๐Ÿ”ฌ

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ๋Š” https://github.com/unclecode/crawl4ai/docs/examples ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์˜ˆ์ œ๊ฐ€ ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์„œ๋Š” ์ผ๋ถ€ ์ธ๊ธฐ ์žˆ๋Š” ์˜ˆ์ œ๋ฅผ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“ ํœด๋ฆฌ์Šคํ‹ฑ ๋งˆํฌ๋‹ค์šด ์ƒ์„ฑ ๋ฐ ๊น”๋”ํ•˜๊ณ  ์ ํ•ฉํ•œ ๋งˆํฌ๋‹ค์šด
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.content_filter_strategy import PruningContentFilter, BM25ContentFilter
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator

async def main():
    browser_config = BrowserConfig(
        headless=True,  
        verbose=True,
    )
    run_config = CrawlerRunConfig(
        cache_mode=CacheMode.ENABLED,
        markdown_generator=DefaultMarkdownGenerator(
            content_filter=PruningContentFilter(threshold=0.48, threshold_type="fixed", min_word_threshold=0)
        ),
        # markdown_generator=DefaultMarkdownGenerator(
        #     content_filter=BM25ContentFilter(user_query="WHEN_WE_FOCUS_BASED_ON_A_USER_QUERY", bm25_threshold=1.0)
        # ),
    )
    
    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url="https://docs.micronaut.io/4.7.6/guide/",
            config=run_config
        )
        print(len(result.markdown.raw_markdown))
        print(len(result.markdown.fit_markdown))

if __name__ == "__main__":
    asyncio.run(main())
๐Ÿ–ฅ๏ธ JavaScript ์‹คํ–‰ ๋ฐ LLM ์—†์ด ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
import json

async def main():
    schema = {
    "name": "KidoCode Courses",
    "baseSelector": "section.charge-methodology .w-tab-content > div",
    "fields": [
        {
            "name": "section_title",
            "selector": "h3.heading-50",
            "type": "text",
        },
        {
            "name": "section_description",
            "selector": ".charge-content",
            "type": "text",
        },
        {
            "name": "course_name",
            "selector": ".text-block-93",
            "type": "text",
        },
        {
            "name": "course_description",
            "selector": ".course-content-text",
            "type": "text",
        },
        {
            "name": "course_icon",
            "selector": ".image-92",
            "type": "attribute",
            "attribute": "src"
        }
    }
}

    extraction_strategy = JsonCssExtractionStrategy(schema, verbose=True)

    browser_config = BrowserConfig(
        headless=False,
        verbose=True
    )
    run_config = CrawlerRunConfig(
        extraction_strategy=extraction_strategy,
        js_code=["""(async () => {const tabs = document.querySelectorAll("section.charge-methodology .tabs-menu-3 > div");for(let tab of tabs) {tab.scrollIntoView();tab.click();await new Promise(r => setTimeout(r, 500));}})();"""],
        cache_mode=CacheMode.BYPASS
    )
        
    async with AsyncWebCrawler(config=browser_config) as crawler:
        
        result = await crawler.arun(
            url="https://www.kidocode.com/degrees/technology",
            config=run_config
        )

        companies = json.loads(result.extracted_content)
        print(f"Successfully extracted {len(companies)} companies")
        print(json.dumps(companies[0], indent=2))


if __name__ == "__main__":
    asyncio.run(main())
๐Ÿ“š LLM์œผ๋กœ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ
import os
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, LLMConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy
from pydantic import BaseModel, Field

class OpenAIModelFee(BaseModel):
    model_name: str = Field(..., description="Name of the OpenAI model.")
    input_fee: str = Field(..., description="Fee for input token for the OpenAI model.")
    output_fee: str = Field(..., description="Fee for output token for the OpenAI model.")

async def main():
    browser_config = BrowserConfig(verbose=True)
    run_config = CrawlerRunConfig(
        word_count_threshold=1,
        extraction_strategy=LLMExtractionStrategy(
            # Here you can use any provider that Litellm library supports, for instance: ollama/qwen2
            # provider="ollama/qwen2", api_token="no-token", 
            llm_config = LLMConfig(provider="openai/gpt-4o", api_token=os.getenv('OPENAI_API_KEY')), 
            schema=OpenAIModelFee.schema(),
            extraction_type="schema",
            instruction="""From the crawled content, extract all mentioned model names along with their fees for input and output tokens. 
            Do not miss any models in the entire content. One extracted model JSON format should look like this: 
            {"model_name": "GPT-4", "input_fee": "US$10.00 / 1M tokens", "output_fee": "US$30.00 / 1M tokens"}."""
        ),            
        cache_mode=CacheMode.BYPASS,
    )
    
    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url='https://openai.com/api/pricing/',
            config=run_config
        )
        print(result.extracted_content)

if __name__ == "__main__":
    asyncio.run(main())
๐Ÿค– ์‚ฌ์šฉ์ž ์ •์˜ ์‚ฌ์šฉ์ž ํ”„๋กœํ•„๋กœ ์ž์‹ ์˜ ๋ธŒ๋ผ์šฐ์ € ์‚ฌ์šฉ
import os, sys
from pathlib import Path
import asyncio, time
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode

async def test_news_crawl():
    # Create a persistent user data directory
    user_data_dir = os.path.join(Path.home(), ".crawl4ai", "browser_profile")
    os.makedirs(user_data_dir, exist_ok=True)

    browser_config = BrowserConfig(
        verbose=True,
        headless=True,
        user_data_dir=user_data_dir,
        use_persistent_context=True,
    )
    run_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS
    )
    
    async with AsyncWebCrawler(config=browser_config) as crawler:
        url = "ADDRESS_OF_A_CHALLENGING_WEBSITE"
        
        result = await crawler.arun(
            url,
            config=run_config,
            magic=True,
        )
        
        print(f"Successfully crawled {url}")
        print(f"Content length: {len(result.markdown)}")

โœจ ์ตœ๊ทผ ์—…๋ฐ์ดํŠธ

๋ฒ„์ „ 0.6.0 ๋ฆด๋ฆฌ์Šค ํ•˜์ด๋ผ์ดํŠธ

  • ๐ŸŒŽ World-aware ํฌ๋กค๋ง: ์ง„์ •ํ•œ ๋กœ์ผ€์ผ๋ณ„ ์ฝ˜ํ…์ธ ๋ฅผ ์œ„ํ•œ ์ง€๋ฆฌ์  ์œ„์น˜, ์–ธ์–ด ๋ฐ ์‹œ๊ฐ„๋Œ€ ์„ค์ •:

      crun_cfg = CrawlerRunConfig(
          url="https://browserleaks.com/geo",          # ์œ„์น˜๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ํ…Œ์ŠคํŠธ ํŽ˜์ด์ง€
          locale="en-US",                              # Accept-Language & UI ๋กœ์ผ€์ผ
          timezone_id="America/Los_Angeles",           # JS Date()/Intl ์‹œ๊ฐ„๋Œ€
          geolocation=GeolocationConfig(                 # GPS ์ขŒํ‘œ ์žฌ์ •์˜
              latitude=34.0522,
              longitude=-118.2437,
              accuracy=10.0,
          )
      )
    
  • ๐Ÿ“Š ํ…Œ์ด๋ธ”-ํˆฌ-๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ถ”์ถœ: HTML ํ…Œ์ด๋ธ”์„ ์ง์ ‘ CSV ๋˜๋Š” pandas ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ์ถ”์ถœ:

      crawler = AsyncWebCrawler(config=browser_config)
      await crawler.start()
    
      try:
          # ์Šคํฌ๋ž˜ํ•‘ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„ค์ •
          crawl_config = CrawlerRunConfig(
              table_score_threshold=8,  # ์—„๊ฒฉํ•œ ํ…Œ์ด๋ธ” ๊ฐ์ง€
          )
    
          # ์‹œ์žฅ ๋ฐ์ดํ„ฐ ์ถ”์ถœ ์‹คํ–‰
          results: List[CrawlResult] = await crawler.arun(
              url="https://coinmarketcap.com/?page=1", config=crawl_config
          )
    
          # ๊ฒฐ๊ณผ ์ฒ˜๋ฆฌ
          raw_df = pd.DataFrame()
          for result in results:
              if result.success and result.media["tables"]:
                  raw_df = pd.DataFrame(
                      result.media["tables"][0]["rows"],
                      columns=result.media["tables"][0]["headers"],
                  )
                  break
          print(raw_df.head())
    
      finally:
          await crawler.stop()
    
  • ๐Ÿš€ ๋ธŒ๋ผ์šฐ์ € ํ’€๋ง: ์‚ฌ์ „ ์›Œ๋ฐ๋œ ๋ธŒ๋ผ์šฐ์ € ์ธ์Šคํ„ด์Šค๋กœ ํŽ˜์ด์ง€๊ฐ€ ๋œจ๊ฑฐ์šด ์ƒํƒœ๋กœ ์‹œ์ž‘๋˜์–ด ์ง€์—ฐ ์‹œ๊ฐ„ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ๊ฐ์†Œ

  • ๐Ÿ•ธ๏ธ ๋„คํŠธ์›Œํฌ ๋ฐ ์ฝ˜์†” ์บก์ฒ˜: ๋””๋ฒ„๊น…์„ ์œ„ํ•œ ์ „์ฒด ํŠธ๋ž˜ํ”ฝ ๋กœ๊ทธ ๋ฐ MHTML ์Šค๋ƒ…์ƒท:

    crawler_config = CrawlerRunConfig(
        capture_network=True,
        capture_console=True,
        mhtml=True
    )
    
  • ๐Ÿ”Œ MCP ํ†ตํ•ฉ: Model Context Protocol์„ ํ†ตํ•ด Claude Code์™€ ๊ฐ™์€ AI ๋„๊ตฌ์— ์—ฐ๊ฒฐ

    # Claude Code์— Crawl4AI ์ถ”๊ฐ€
    claude mcp add --transport sse c4ai-sse http://localhost:11235/mcp/sse
    
  • ๐Ÿ–ฅ๏ธ ๋Œ€ํ™”ํ˜• ํ”Œ๋ ˆ์ด๊ทธ๋ผ์šด๋“œ: http://localhost:11235//playground์—์„œ ๋‚ด์žฅ ์›น ์ธํ„ฐํŽ˜์ด์Šค๋กœ ๊ตฌ์„ฑ ํ…Œ์ŠคํŠธ ๋ฐ API ์š”์ฒญ ์ƒ์„ฑ

  • ๐Ÿณ ๊ฐœ์„ ๋œ Docker ๋ฐฐํฌ: ํ–ฅ์ƒ๋œ ๋ฆฌ์†Œ์Šค ํšจ์œจ์„ฑ์„ ๊ฐ–์ถ˜ ๊ฐ„์†Œํ™”๋œ ๋‹ค์ค‘ ์•„ํ‚คํ…์ฒ˜ Docker ์ด๋ฏธ์ง€

  • ๐Ÿ“ฑ ๋‹ค๋‹จ๊ณ„ ๋นŒ๋“œ ์‹œ์Šคํ…œ: ํ”Œ๋žซํผ๋ณ„ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ตœ์ ํ™”๋œ Dockerfile

์ž์„ธํ•œ ๋‚ด์šฉ์€ 0.6.0 ๋ฆด๋ฆฌ์Šค ๋…ธํŠธ ๋˜๋Š” CHANGELOG์—์„œ ํ™•์ธํ•˜์„ธ์š”.

์ด์ „ ๋ฒ„์ „: 0.5.0 ์ฃผ์š” ๋ฆด๋ฆฌ์Šค ํ•˜์ด๋ผ์ดํŠธ

  • ๐Ÿš€ ๋”ฅ ํฌ๋กค๋ง ์‹œ์Šคํ…œ: BFS, DFS, BestFirst ์ „๋žต์œผ๋กœ ์ดˆ๊ธฐ URL์„ ๋„˜์–ด ์›น์‚ฌ์ดํŠธ ํƒ์ƒ‰
  • โšก ๋ฉ”๋ชจ๋ฆฌ ์ ์‘ํ˜• ๋””์ŠคํŒจ์ฒ˜: ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•ด ๋™์ ์œผ๋กœ ๋™์‹œ์„ฑ ์กฐ์ •
  • ๐Ÿ”„ ๋‹ค์ค‘ ํฌ๋กค๋ง ์ „๋žต: ๋ธŒ๋ผ์šฐ์ € ๊ธฐ๋ฐ˜ ๋ฐ ๊ฒฝ๋Ÿ‰ HTTP ์ „์šฉ ํฌ๋กค๋Ÿฌ ์ง€์›
  • ๐Ÿ’ป ๋ช…๋ น์ค„ ์ธํ„ฐํŽ˜์ด์Šค: ์ƒˆ๋กœ์šด crwl CLI๋กœ ํ„ฐ๋ฏธ๋„์—์„œ ํŽธ๋ฆฌํ•˜๊ฒŒ ์ ‘๊ทผ
  • ๐Ÿ‘ค ๋ธŒ๋ผ์šฐ์ € ํ”„๋กœํŒŒ์ผ๋Ÿฌ: ์ง€์†์ ์ธ ๋ธŒ๋ผ์šฐ์ € ํ”„๋กœํ•„ ์ƒ์„ฑ ๋ฐ ๊ด€๋ฆฌ
  • ๐Ÿง  Crawl4AI ์ฝ”๋”ฉ ์–ด์‹œ์Šคํ„ดํŠธ: AI ๊ธฐ๋ฐ˜ ์ฝ”๋”ฉ ์ง€์› ๋„๊ตฌ
  • ๐ŸŽ๏ธ LXML ์Šคํฌ๋ž˜ํ•‘ ๋ชจ๋“œ: lxml ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ ๋น ๋ฅธ HTML ํŒŒ์‹ฑ
  • ๐ŸŒ ํ”„๋ก์‹œ ๋กœํ…Œ์ด์…˜: ๋‚ด์žฅ ํ”„๋ก์‹œ ์ „ํ™˜ ์ง€์›
  • ๐Ÿค– LLM ์ฝ˜ํ…์ธ  ํ•„ํ„ฐ: LLM์„ ํ™œ์šฉํ•œ ์ง€๋Šฅํ˜• ๋งˆํฌ๋‹ค์šด ์ƒ์„ฑ
  • ๐Ÿ“„ PDF ์ฒ˜๋ฆฌ: PDF ํŒŒ์ผ์—์„œ ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ถ”์ถœ

์ž์„ธํ•œ ๋‚ด์šฉ์€ 0.5.0 ๋ฆด๋ฆฌ์Šค ๋…ธํŠธ์—์„œ ํ™•์ธํ•˜์„ธ์š”.

Crawl4AI์˜ ๋ฒ„์ „ ๋ฒˆํ˜ธ ์ฒด๊ณ„

Crawl4AI์€ ๊ฐ ๋ฆด๋ฆฌ์Šค์˜ ์•ˆ์ •์„ฑ๊ณผ ๊ธฐ๋Šฅ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ๊ธฐ ์œ„ํ•ด ํ‘œ์ค€ Python ๋ฒ„์ „ ๋ฒˆํ˜ธ ์ฒด๊ณ„(PEP 440)๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

๋ฒ„์ „ ๋ฒˆํ˜ธ ์„ค๋ช…

๋ฒ„์ „ ๋ฒˆํ˜ธ๋Š” MAJOR.MINOR.PATCH ํŒจํ„ด์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค (์˜ˆ: 0.4.3).

ํ”„๋ฆฌ๋ฆด๋ฆฌ์Šค ๋ฒ„์ „

๊ฐœ๋ฐœ ๋‹จ๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ ‘๋ฏธ์‚ฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

  • dev (0.4.3dev1): ๊ฐœ๋ฐœ ๋ฒ„์ „, ๋ถˆ์•ˆ์ •
  • a (0.4.3a1): ์•ŒํŒŒ ๋ฆด๋ฆฌ์Šค, ์‹คํ—˜์  ๊ธฐ๋Šฅ
  • b (0.4.3b1): ๋ฒ ํƒ€ ๋ฆด๋ฆฌ์Šค, ๊ธฐ๋Šฅ ์™„์„ฑ but ํ…Œ์ŠคํŠธ ํ•„์š”
  • rc (0.4.3): ๋ฆด๋ฆฌ์Šค ํ›„๋ณด, ์ตœ์ข… ๋ฒ„์ „ ๊ฐ€๋Šฅ์„ฑ

์„ค์น˜ ๋ฐฉ๋ฒ•

  • ์•ˆ์ • ๋ฒ„์ „ ์„ค์น˜:

    pip install -U crawl4ai
    
  • ํ”„๋ฆฌ๋ฆด๋ฆฌ์Šค ๋ฒ„์ „ ์„ค์น˜:

    pip install crawl4ai --pre
    
  • ํŠน์ • ๋ฒ„์ „ ์„ค์น˜:

    pip install crawl4ai==0.4.3b1
    

ํ”„๋ฆฌ๋ฆด๋ฆฌ์Šค์˜ ๋ชฉ์ 

ํ”„๋ฆฌ๋ฆด๋ฆฌ์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค:

  • ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ƒˆ ๊ธฐ๋Šฅ ํ…Œ์ŠคํŠธ
  • ์ตœ์ข… ๋ฆด๋ฆฌ์Šค ์ „ ํ”ผ๋“œ๋ฐฑ ์ˆ˜์ง‘
  • ํ”„๋กœ๋•์…˜ ์‚ฌ์šฉ์ž๋ฅผ ์œ„ํ•œ ์•ˆ์ •์„ฑ ๋ณด์žฅ
  • ์ดˆ๊ธฐ ์‚ฌ์šฉ์ž๊ฐ€ ์ƒˆ ๊ธฐ๋Šฅ์„ ์‹œ๋„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉ

ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ๋Š” ์•ˆ์ • ๋ฒ„์ „ ์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ์ƒˆ ๊ธฐ๋Šฅ ํ…Œ์ŠคํŠธ๋ฅผ ์›ํ•  ๊ฒฝ์šฐ --pre ํ”Œ๋ž˜๊ทธ๋กœ ํ”„๋ฆฌ๋ฆด๋ฆฌ์Šค๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“– ๋ฌธ์„œ & ๋กœ๋“œ๋งต

๐Ÿšจ ๋ฌธ์„œ ์—…๋ฐ์ดํŠธ ์•Œ๋ฆผ: ์ตœ๊ทผ ์—…๋ฐ์ดํŠธ์™€ ๊ฐœ์„  ์‚ฌํ•ญ์„ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ ์ฃผ์— ๋Œ€๊ทœ๋ชจ ๋ฌธ์„œ ๊ฐœํŽธ์„ ์ง„ํ–‰ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. ๋ณด๋‹ค ํฌ๊ด„์ ์ด๊ณ  ์ตœ์‹  ์ •๋ณด๋ฅผ ์ œ๊ณตํ•  ์˜ˆ์ •์ด๋‹ˆ ๊ธฐ๋Œ€ํ•ด์ฃผ์„ธ์š”!

ํ˜„์žฌ ๋ฌธ์„œ(์„ค์น˜ ์ง€์นจ, ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ, API ์ฐธ์กฐ ๋“ฑ)๋Š” ๋ฌธ์„œ ์›น์‚ฌ์ดํŠธ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐœ๋ฐœ ๊ณ„ํš๊ณผ ์˜ˆ์ •๋œ ๊ธฐ๋Šฅ์€ ๋กœ๋“œ๋งต์—์„œ ํ™•์ธํ•˜์„ธ์š”.

๐Ÿ“ˆ ๊ฐœ๋ฐœ ์˜ˆ์ • ํ•ญ๋ชฉ
  • 0. ๊ทธ๋ž˜ํ”„ ํฌ๋กค๋Ÿฌ: ๊ทธ๋ž˜ํ”„ ํƒ์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•œ ์Šค๋งˆํŠธ ์›น์‚ฌ์ดํŠธ ์ˆœํšŒ ๋ฐ ์ค‘์ฒฉ ํŽ˜์ด์ง€ ์ถ”์ถœ
  • 1. ์งˆ๋ฌธ ๊ธฐ๋ฐ˜ ํฌ๋กค๋Ÿฌ: ์ž์—ฐ์–ด ๊ธฐ๋ฐ˜ ์›น ํƒ์ƒ‰ ๋ฐ ์ฝ˜ํ…์ธ  ์ถ”์ถœ
  • 2. ์ง€์‹ ์ตœ์ ํ™” ํฌ๋กค๋Ÿฌ: ๋ฐ์ดํ„ฐ ์ถ”์ถœ์„ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ ์ง€์‹์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ์Šค๋งˆํŠธ ํฌ๋กค๋ง
  • 3. ์—์ด์ „ํŠธ ํฌ๋กค๋Ÿฌ: ๋ณต์žกํ•œ ๋‹ค๋‹จ๊ณ„ ํฌ๋กค๋ง ์ž‘์—…์„ ์œ„ํ•œ ์ž์œจ ์‹œ์Šคํ…œ
  • 4. ์ž๋™ ์Šคํ‚ค๋งˆ ์ƒ์„ฑ๊ธฐ: ์ž์—ฐ์–ด๋ฅผ ์ถ”์ถœ ์Šคํ‚ค๋งˆ๋กœ ๋ณ€ํ™˜
  • 5. ๋„๋ฉ”์ธ ํŠนํ™” ์Šคํฌ๋ž˜ํผ: ์ผ๋ฐ˜ ํ”Œ๋žซํผ(ํ•™์ˆ , ์ „์ž์ƒ๊ฑฐ๋ž˜)์„ ์œ„ํ•œ ์‚ฌ์ „ ๊ตฌ์„ฑ ์ถ”์ถœ๊ธฐ
  • 6. ์›น ์ž„๋ฒ ๋”ฉ ์ธ๋ฑ์Šค: ํฌ๋กค๋ง๋œ ์ฝ˜ํ…์ธ ์˜ ์˜๋ฏธ๋ก ์  ๊ฒ€์ƒ‰ ์ธํ”„๋ผ
  • 7. ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ํ”Œ๋ ˆ์ด๊ทธ๋ผ์šด๋“œ: AI ์ง€์›์œผ๋กœ ์ „๋žต ํ…Œ์ŠคํŠธ ๋ฐ ๋น„๊ต๋ฅผ ์œ„ํ•œ ์›น UI
  • 8. ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ: ํฌ๋กค๋Ÿฌ ์šด์˜์— ๋Œ€ํ•œ ์‹ค์‹œ๊ฐ„ ์ธ์‚ฌ์ดํŠธ
  • 9. ํด๋ผ์šฐ๋“œ ํ†ตํ•ฉ: ํด๋ผ์šฐ๋“œ ์ œ๊ณต์—…์ฒด ๊ฐ„ ์›ํด๋ฆญ ๋ฐฐํฌ ์†”๋ฃจ์…˜
  • 10. ์Šคํฐ์„œ์‹ญ ํ”„๋กœ๊ทธ๋žจ: ๊ณ„์ธต๋ณ„ ํ˜œํƒ์ด ์žˆ๋Š” ๊ตฌ์กฐํ™”๋œ ์ง€์› ์‹œ์Šคํ…œ
  • 11. ๊ต์œก ์ฝ˜ํ…์ธ : "ํฌ๋กค๋ง ๋ฐฉ๋ฒ•" ๋น„๋””์˜ค ์‹œ๋ฆฌ์ฆˆ ๋ฐ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ํŠœํ† ๋ฆฌ์–ผ

๐Ÿค ๊ธฐ์—ฌ

์˜คํ”ˆ์†Œ์Šค ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ๊ธฐ์—ฌ๋ฅผ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๊ธฐ์—ฌ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

๋ผ์ด์„ ์Šค ์„น์…˜์„ ๋ฐฐ์ง€์™€ ํ•จ๊ป˜ ์ˆ˜์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ํ•˜ํ”„ํ†ค ํšจ๊ณผ๋ฅผ ์ ์šฉํ•œ ๋ฒ„์ „์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

์—…๋ฐ์ดํŠธ๋œ ๋ผ์ด์„ ์Šค ์„น์…˜:

๐Ÿ“„ ๋ผ์ด์„ ์Šค & ์ €์ž‘์ž ํ‘œ์‹œ

์ด ํ”„๋กœ์ ํŠธ๋Š” ํ•„์ˆ˜ ์ €์ž‘์ž ํ‘œ์‹œ ์กฐํ•ญ์ด ํฌํ•จ๋œ Apache License 2.0์œผ๋กœ ๋ผ์ด์„ ์Šค๊ฐ€ ๋ถ€์—ฌ๋ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Apache 2.0 ๋ผ์ด์„ ์Šค ํŒŒ์ผ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์ €์ž‘์ž ํ‘œ์‹œ ์š”๊ตฌ ์‚ฌํ•ญ

Crawl4AI์„ ์‚ฌ์šฉํ•  ๋•Œ ๋‹ค์Œ ์ค‘ ํ•˜๋‚˜์˜ ์ €์ž‘์ž ํ‘œ์‹œ ๋ฐฉ๋ฒ•์„ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

1. ๋ฐฐ์ง€ ํ‘œ์‹œ (๊ถŒ์žฅ)

README, ๋ฌธ์„œ ๋˜๋Š” ์›น์‚ฌ์ดํŠธ์— ๋‹ค์Œ ๋ฐฐ์ง€ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”:

ํ…Œ๋งˆ๋ฐฐ์ง€
๋””์Šค์ฝ” ํ…Œ๋งˆ (์• ๋‹ˆ๋ฉ”์ด์…˜)Powered by Crawl4AI
๋‚˜์ดํŠธ ํ…Œ๋งˆ (๋„ค์˜จ ํšจ๊ณผ ์žˆ๋Š” ๋‹คํฌ)Powered by Crawl4AI
๋‹คํฌ ํ…Œ๋งˆ (ํด๋ž˜์‹)Powered by Crawl4AI
๋ผ์ดํŠธ ํ…Œ๋งˆ (ํด๋ž˜์‹)Powered by Crawl4AI

๋ฐฐ์ง€ ์ถ”๊ฐ€ HTML ์ฝ”๋“œ:

<!-- Disco Theme (Animated) -->
<a href="https://github.com/unclecode/crawl4ai">
  <img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-disco.svg" alt="Powered by Crawl4AI" width="200"/>
</a>

<!-- Night Theme (Dark with Neon) -->
<a href="https://github.com/unclecode/crawl4ai">
  <img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-night.svg" alt="Powered by Crawl4AI" width="200"/>
</a>

<!-- Dark Theme (Classic) -->
<a href="https://github.com/unclecode/crawl4ai">
  <img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-dark.svg" alt="Powered by Crawl4AI" width="200"/>
</a>

<!-- Light Theme (Classic) -->
<a href="https://github.com/unclecode/crawl4ai">
  <img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-light.svg" alt="Powered by Crawl4AI" width="200"/>
</a>

<!-- Simple Shield Badge -->
<a href="https://github.com/unclecode/crawl4ai">
  <img src="https://img.shields.io/badge/Powered%20by-Crawl4AI-blue?style=flat-square" alt="Powered by Crawl4AI"/>
</a>

2. ํ…์ŠคํŠธ ํ‘œ์‹œ

๋ฌธ์„œ์— ๋‹ค์Œ ์ค„์„ ์ถ”๊ฐ€ํ•˜์„ธ์š”:

This project uses Crawl4AI (https://github.com/unclecode/crawl4ai) for web data extraction.

๐Ÿ“š ์ธ์šฉ

์—ฐ๊ตฌ๋‚˜ ํ”„๋กœ์ ํŠธ์—์„œ Crawl4AI์„ ์‚ฌ์šฉํ•˜์…จ๋‹ค๋ฉด ๋‹ค์Œ์„ ์ธ์šฉํ•ด ์ฃผ์„ธ์š”:

@software{crawl4ai2024,
  author = {UncleCode},
  title = {Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/unclecode/crawl4ai}},
  commit = {Please use the commit hash you're working with}
}

ํ…์ŠคํŠธ ์ธ์šฉ ํ˜•์‹:

UncleCode. (2024). Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper [Computer software]. 
GitHub. https://github.com/unclecode/crawl4ai

๐Ÿ“ง ์—ฐ๋ฝ์ฒ˜

์งˆ๋ฌธ, ์ œ์•ˆ ๋˜๋Š” ํ”ผ๋“œ๋ฐฑ์ด ์žˆ์œผ์‹œ๋ฉด ์–ธ์ œ๋“ ์ง€ ์—ฐ๋ฝ์ฃผ์„ธ์š”:

ํ–‰๋ณตํ•œ ํฌ๋กค๋ง ๋˜์„ธ์š”! ๐Ÿ•ธ๏ธ๐Ÿš€

๐Ÿ—พ ๋ฏธ์…˜

์šฐ๋ฆฌ์˜ ๋ฏธ์…˜์€ ๊ฐœ์ธ ๋ฐ ๊ธฐ์—… ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์น˜๋ฅผ ํ•ด์ œํ•˜์—ฌ ๋””์ง€ํ„ธ ํ”์ ์„ ๊ตฌ์กฐํ™”๋œ ๊ฑฐ๋ž˜ ๊ฐ€๋Šฅํ•œ ์ž์‚ฐ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Crawl4AI์€ ๊ฐœ์ธ๊ณผ ์กฐ์ง์— ์˜คํ”ˆ์†Œ์Šค ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๊ตฌ์กฐํ™”ํ•จ์œผ๋กœ์จ ๊ณต์œ  ๋ฐ์ดํ„ฐ ๊ฒฝ์ œ๋ฅผ ์กฐ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๋Š” AI๊ฐ€ ์‹ค์ œ ์ธ๊ฐ„ ์ง€์‹์œผ๋กœ ๊ตฌ๋™๋˜๋Š” ๋ฏธ๋ž˜๋ฅผ ์ƒ์ƒํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ์ฐฝ์ž‘์ž๊ฐ€ ์ž์‹ ์˜ ๊ธฐ์—ฌ๋กœ ์ง์ ‘ ํ˜œํƒ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋ฏผ์ฃผํ™”์™€ ์œค๋ฆฌ์  ๊ณต์œ ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ ์ง„์ •ํ•œ AI ๋ฐœ์ „์˜ ๊ธฐ๋ฐ˜์„ ๋งˆ๋ จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์ฃผ์š” ๊ธฐํšŒ
  • ๋ฐ์ดํ„ฐ ์ž๋ณธํ™”: ๋””์ง€ํ„ธ ํ”์ ์„ ์ธก์ • ๊ฐ€๋Šฅํ•˜๊ณ  ๊ฐ€์น˜ ์žˆ๋Š” ์ž์‚ฐ์œผ๋กœ ๋ณ€ํ™˜
  • ์ง„์ •์„ฑ ์žˆ๋Š” AI ๋ฐ์ดํ„ฐ: AI ์‹œ์Šคํ…œ์— ์‹ค์ œ ์ธ๊ฐ„ ํ†ต์ฐฐ๋ ฅ ์ œ๊ณต
  • ๊ณต์œ  ๊ฒฝ์ œ: ๋ฐ์ดํ„ฐ ์ฐฝ์ž‘์ž์—๊ฒŒ ํ˜œํƒ์ด ๋Œ์•„๊ฐ€๋Š” ๊ณต์ •ํ•œ ๋ฐ์ดํ„ฐ ์‹œ์žฅ ์ฐฝ์ถœ
๐Ÿš€ ๊ฐœ๋ฐœ ๊ฒฝ๋กœ
  1. ์˜คํ”ˆ์†Œ์Šค ๋„๊ตฌ: ํˆฌ๋ช…ํ•œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ์„ ์œ„ํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ฃผ๋„ ํ”Œ๋žซํผ
  2. ๋””์ง€ํ„ธ ์ž์‚ฐ ๊ตฌ์กฐํ™”: ๋””์ง€ํ„ธ ์ง€์‹์„ ์กฐ์งํ•˜๊ณ  ๊ฐ€์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๋„๊ตฌ
  3. ์œค๋ฆฌ์  ๋ฐ์ดํ„ฐ ์‹œ์žฅ: ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตํ™˜ํ•˜๊ธฐ ์œ„ํ•œ ์•ˆ์ „ํ•˜๊ณ  ๊ณต์ •ํ•œ ํ”Œ๋žซํผ

์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ „์ฒด ๋ฏธ์…˜ ์„ค๋ช…์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์Šคํƒ€ ํžˆ์Šคํ† ๋ฆฌ

Star History Chart