Crawl4AIã¯GitHubã§#1ãã¬ã³ãã®ãªããžããªã§ã掻çºãªã³ãã¥ããã£ã«ãã£ãŠã¡ã³ããã³ã¹ãããŠããŸããLLMãAIãšãŒãžã§ã³ããããŒã¿ãã€ãã©ã€ã³åãã«æé©åãããè¶ é«éãªWebã¯ããŒãªã³ã°ãæäŸããŸãããªãŒãã³ãœãŒã¹ã§æè»æ§ãé«ãããªã¢ã«ã¿ã€ã ããã©ãŒãã³ã¹ãèæ ®ããŠæ§ç¯ãããŠãããéçºè ã«æ¯é¡ãªãé床ã粟床ããããã€ã®å®¹æããæäŸããŸãã
âš ææ°ã¢ããããŒã v0.6.0ããã§ãã¯
ð ããŒãžã§ã³0.6.0ãå©çšå¯èœã«ãªããŸããïŒ ãã®ãªãªãŒã¹åè£ã§ã¯ããžãªãã±ãŒã·ã§ã³ãšãã±ãŒã«èšå®ã䌎ãWorld-aware CrawlingãTable-to-DataFrameæœåºããã©ãŠã¶ããŒãªã³ã°ãšäºåãŠã©ãŒãã³ã°ããããã¯ãŒã¯ããã³ã³ã³ãœãŒã«ãã©ãã£ãã¯ãã£ããã£ãAIããŒã«åãMCPçµ±åãå®å šã«å·æ°ãããDockerãããã€ã¡ã³ããå°å ¥ãããŠããŸãïŒãªãªãŒã¹ããŒããèªã â
ç§ã®ã³ã³ãã¥ãŒã¿ãšã®é¢ããã¯åäŸã®é ã«å§ãŸããŸãããã³ã³ãã¥ãŒã¿ç§åŠè ã§ããç¶ãAmstradã³ã³ãã¥ãŒã¿ã玹ä»ããŠãããã®ããã£ããã§ãããã®åæã®æ¥ã ãæè¡ãžã®èå³ãåŒãèµ·ãããç§ã¯ã³ã³ãã¥ãŒã¿ãµã€ãšã³ã¹ãåŠã³ã倧åŠé¢ã§ã¯NLPãå°éãšããŸããããã®ææã«åããŠWebã¯ããŒãªã³ã°ã«åãçµã¿ãç ç©¶è ãè«æãæŽçãåºçç©ããæ å ±ãæœåºããã®ãå©ããããŒã«ãæ§ç¯ããŸãããããã¯ããŒã¿æœåºã¹ãã«ãç£šãææŠçã§ããããã®ããçµéšã§ããã
2023幎ããããžã§ã¯ãåãã®ããŒã«ãéçºäžã«ããŠã§ãããŒãžãããŒã¯ããŠã³ã«å€æããã¯ããŒã©ãŒãå¿ èŠã«ãªããŸããããœãªã¥ãŒã·ã§ã³ãæ¢ããŠãããšããªãŒãã³ãœãŒã¹ã謳ããªããã¢ã«ãŠã³ãäœæãšAPIããŒã¯ã³çæãèŠæ±ãããã®ãèŠã€ããŸãããããã«æªãããšã«ãããã¯$16ãè«æ±ããSaaSã¢ãã«ã§ãå質ãç§ã®åºæºã«éããŠããŸããã§ããããã®äžæºãæãã«å€ãããç§ã¯ç¬èªã®ãœãªã¥ãŒã·ã§ã³ãæ§ç¯ããããšã決æããŸããããããæ°æ¥ã§Crawl4AIãäœæããŸãããé©ããããšã«ãããã¯ããºããæ°åã®GitHubã¹ã¿ãŒãç²åŸããã°ããŒãã«ãªã³ãã¥ããã£ã«å ±æãããŸããã
ç§ãCrawl4AIããªãŒãã³ãœãŒã¹ã«ããçç±ã¯2ã€ãããŸãã1ã€ç®ã¯ãç§ã®ãã£ãªã¢ãéããŠæ¯ããŠããããªãŒãã³ãœãŒã¹ã³ãã¥ããã£ãžã®æ©è¿ãã§ãã2ã€ç®ã¯ãããŒã¿ã¯èª°ããã¢ã¯ã»ã¹å¯èœã§ããã¹ãã§ããã€ãŠã©ãŒã«ã®åŸãã«éã蟌ããããããå°æ°ã«ãã£ãŠç¬å ãããããã¹ãã§ã¯ãªããšãã信念ã§ããããŒã¿ãžã®ãªãŒãã³ã¢ã¯ã»ã¹ã¯ãå人ãèªåèªèº«ã®ã¢ãã«ããã¬ãŒãã³ã°ããèªåã®æ å ±ã®æææš©ãåãããšãã§ããAIã®æ°äž»åã®ããžã§ã³ã®åºç€ãç¯ããŸãããã®ã©ã€ãã©ãªã¯ãæ ç±çãªã³ãã¥ããã£ã«ãã£ãŠååçã«æ§ç¯ããããå²äžæé«ã®ãªãŒãã³ãœãŒã¹ããŒã¿æœåºã»çæããŒã«ãäœã倧ããªæ ã®ç¬¬äžæ©ã§ãã
ãã®ãããžã§ã¯ãããµããŒããã䜿çšãããã£ãŒãããã¯ãå ±æããŠãããçããã«æè¬ããŸããããªãã®å±ãŸããç§ã«ããã«å€§ããªå€¢ãæ±ãããŸããç§ãã¡ã«åå ããåé¡ãå ±åããPRãæåºãããŸãã¯èšèãåºããŠãã ãããäžç·ã«ã人ã ãèªåèªèº«ã®ããŒã¿ã«ã¢ã¯ã»ã¹ããAIã®æªæ¥ãå圢æããçã«å匷ãããŒã«ãæ§ç¯ããŸãããã
# Install the package
pip install -U crawl4ai
# For pre release versions
pip install crawl4ai --pre
# Run post-installation setup
crawl4ai-setup
# Verify your installation
crawl4ai-doctor
ãã©ãŠã¶é¢é£ã®åé¡ãçºçããå Žåãæåã§ã€ã³ã¹ããŒã«ã§ããŸã:
python -m playwright install --with-deps chromium
import asyncio
from crawl4ai import *
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://www.nbcnews.com/business",
)
print(result.markdown)
if __name__ == "__main__":
asyncio.run(main())
# Basic crawl with markdown output
crwl https://www.nbcnews.com/business -o markdown
# Deep crawl with BFS strategy, max 10 pages
crwl https://docs.crawl4ai.com --deep-crawl bfs --max-pages 10
# Use LLM extraction with a specific question
crwl https://www.example.com/products -q "Extract all product prices"
srcset
ãpicture
ãªã©ã®ã¬ã¹ãã³ã·ãç»åãã©ãŒããããæœåºraw:
)ãŸãã¯ããŒã«ã«ãã¡ã€ã«(file://
)ãçŽæ¥åŠçâš ããã¥ã¡ã³ããŠã§ããµã€ãã蚪å
Crawl4AIã¯æ§ã ãªãŠãŒã¹ã±ãŒã¹ã«åãããæè»ãªã€ã³ã¹ããŒã«ãªãã·ã§ã³ãæäŸããŸããPythonããã±ãŒãžãšããŠã€ã³ã¹ããŒã«ããããDockerã䜿çšã§ããŸãã
ããŒãºã«æé©ãªã€ã³ã¹ããŒã«ãªãã·ã§ã³ãéžæ:
åºæ¬çãªWebã¯ããŒãªã³ã°ãšã¹ã¯ã¬ã€ãã³ã°ã¿ã¹ã¯åã:
pip install crawl4ai
crawl4ai-setup # Setup the browser
ããã©ã«ãã§ã¯ãWebã¯ããŒãªã³ã°ã«Playwrightã䜿çšããCrawl4AIã®éåæããŒãžã§ã³ãã€ã³ã¹ããŒã«ãããŸãã
ð æ³šæ: Crawl4AIãã€ã³ã¹ããŒã«ãããšãcrawl4ai-setup
ãèªåçã«Playwrightãã€ã³ã¹ããŒã«ããŠã»ããã¢ããããã¯ãã§ããããããPlaywrighté¢é£ã®ãšã©ãŒãçºçããå Žåã以äžã®ããããã®æ¹æ³ã§æåã§ã€ã³ã¹ããŒã«ã§ããŸã:
ã³ãã³ãã©ã€ã³ãã:
playwright install
äžèšãããŸããããªãå Žåããã®ããå ·äœçãªã³ãã³ãã詊ããŠãã ãã:
python -m playwright install chromium
ãã®2çªç®ã®æ¹æ³ã¯ãããã€ãã®ã±ãŒã¹ã§ããä¿¡é Œæ§ãé«ãããšã蚌æãããŠããŸãã
åæããŒãžã§ã³ã¯éæšå¥šã§ãå°æ¥ã®ããŒãžã§ã³ã§åé€ãããŸããSeleniumã䜿çšããåæããŒãžã§ã³ãå¿ èŠãªå Žå:
pip install crawl4ai[sync]
ãœãŒã¹ã³ãŒãã倿Žããäºå®ã®è²¢ç®è åã:
git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai
pip install -e . # Basic installation in editable mode
ãªãã·ã§ã³æ©èœãã€ã³ã¹ããŒã«:
pip install -e ".[torch]" # With PyTorch features
pip install -e ".[transformer]" # With Transformer features
pip install -e ".[cosine]" # With cosine similarity features
pip install -e ".[sync]" # With synchronous crawling (Selenium)
pip install -e ".[all]" # Install all optional features
ð å©çšå¯èœã«ãªããŸããïŒ å®å šã«åèšèšãããDockerå®è£ ãç»å ŽïŒãã®æ°ãããœãªã¥ãŒã·ã§ã³ã¯ããããŸã§ä»¥äžã«å¹ççã§ã·ãŒã ã¬ã¹ãªãããã€ã¡ã³ããå®çŸããŸãã
æ°ããDockerå®è£ ã«ã¯ä»¥äžãå«ãŸããŸã:
# Pull and run the latest release candidate
docker pull unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
# Visit the playground at http://localhost:11235/playground
å®å šãªããã¥ã¡ã³ãã«ã€ããŠã¯ãDockerãããã€ã¡ã³ãã¬ã€ããåç §ããŠãã ããã
ã¯ã€ãã¯ãã¹ããå®è¡ïŒäž¡æ¹ã®Dockerãªãã·ã§ã³ã§åäœïŒ:
import requests
# Submit a crawl job
response = requests.post(
"http://localhost:11235/crawl",
json={"urls": "https://example.com", "priority": 10}
)
task_id = response.json()["task_id"]
# Continue polling until the task is complete (status="completed")
result = requests.get(f"http://localhost:11235/task/{task_id}")
ããå€ãã®äŸã«ã€ããŠã¯ãDockeräŸãåç §ããŠãã ãããé«åºŠãªèšå®ãç°å¢å€æ°ãããã³äœ¿çšäŸã«ã€ããŠã¯ãDockerãããã€ã¡ã³ãã¬ã€ããåç §ããŠãã ããã
ãããžã§ã¯ãæ§é ã¯ãã£ã¬ã¯ããªhttps://github.com/unclecode/crawl4ai/docs/examplesã§ç¢ºèªã§ããŸããããã«ã¯æ§ã ãªäŸããããŸããããã§ã¯ãããã€ãã®äººæ°ã®ããäŸãå ±æããŸãã
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.content_filter_strategy import PruningContentFilter, BM25ContentFilter
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
async def main():
browser_config = BrowserConfig(
headless=True,
verbose=True,
)
run_config = CrawlerRunConfig(
cache_mode=CacheMode.ENABLED,
markdown_generator=DefaultMarkdownGenerator(
content_filter=PruningContentFilter(threshold=0.48, threshold_type="fixed", min_word_threshold=0)
),
# markdown_generator=DefaultMarkdownGenerator(
# content_filter=BM25ContentFilter(user_query="WHEN_WE_FOCUS_BASED_ON_A_USER_QUERY", bm25_threshold=1.0)
# ),
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://docs.micronaut.io/4.7.6/guide/",
config=run_config
)
print(len(result.markdown.raw_markdown))
print(len(result.markdown.fit_markdown))
if __name__ == "__main__":
asyncio.run(main())
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
import json
async def main():
schema = {
"name": "KidoCode Courses",
"baseSelector": "section.charge-methodology .w-tab-content > div",
"fields": [
{
"name": "section_title",
"selector": "h3.heading-50",
"type": "text",
},
{
"name": "section_description",
"selector": ".charge-content",
"type": "text",
},
{
"name": "course_name",
"selector": ".text-block-93",
"type": "text",
},
{
"name": "course_description",
"selector": ".course-content-text",
"type": "text",
},
{
"name": "course_icon",
"selector": ".image-92",
"type": "attribute",
"attribute": "src"
}
}
}
extraction_strategy = JsonCssExtractionStrategy(schema, verbose=True)
browser_config = BrowserConfig(
headless=False,
verbose=True
)
run_config = CrawlerRunConfig(
extraction_strategy=extraction_strategy,
js_code=["""(async () => {const tabs = document.querySelectorAll("section.charge-methodology .tabs-menu-3 > div");for(let tab of tabs) {tab.scrollIntoView();tab.click();await new Promise(r => setTimeout(r, 500));}})();"""],
cache_mode=CacheMode.BYPASS
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://www.kidocode.com/degrees/technology",
config=run_config
)
companies = json.loads(result.extracted_content)
print(f"Successfully extracted {len(companies)} companies")
print(json.dumps(companies[0], indent=2))
if __name__ == "__main__":
asyncio.run(main())
import os
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, LLMConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy
from pydantic import BaseModel, Field
class OpenAIModelFee(BaseModel):
model_name: str = Field(..., description="Name of the OpenAI model.")
input_fee: str = Field(..., description="Fee for input token for the OpenAI model.")
output_fee: str = Field(..., description="Fee for output token for the OpenAI model.")
async def main():
browser_config = BrowserConfig(verbose=True)
run_config = CrawlerRunConfig(
word_count_threshold=1,
extraction_strategy=LLMExtractionStrategy(
# Here you can use any provider that Litellm library supports, for instance: ollama/qwen2
# provider="ollama/qwen2", api_token="no-token",
llm_config = LLMConfig(provider="openai/gpt-4o", api_token=os.getenv('OPENAI_API_KEY')),
schema=OpenAIModelFee.schema(),
extraction_type="schema",
instruction="""From the crawled content, extract all mentioned model names along with their fees for input and output tokens.
Do not miss any models in the entire content. One extracted model JSON format should look like this:
{"model_name": "GPT-4", "input_fee": "US$10.00 / 1M tokens", "output_fee": "US$30.00 / 1M tokens"}."""
),
cache_mode=CacheMode.BYPASS,
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url='https://openai.com/api/pricing/',
config=run_config
)
print(result.extracted_content)
if __name__ == "__main__":
asyncio.run(main())
import os, sys
from pathlib import Path
import asyncio, time
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
async def test_news_crawl():
# Create a persistent user data directory
user_data_dir = os.path.join(Path.home(), ".crawl4ai", "browser_profile")
os.makedirs(user_data_dir, exist_ok=True)
browser_config = BrowserConfig(
verbose=True,
headless=True,
user_data_dir=user_data_dir,
use_persistent_context=True,
)
run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS
)
async with AsyncWebCrawler(config=browser_config) as crawler:
url = "ADDRESS_OF_A_CHALLENGING_WEBSITE"
result = await crawler.arun(
url,
config=run_config,
magic=True,
)
print(f"Successfully crawled {url}")
print(f"Content length: {len(result.markdown)}")
ð World-awareã¯ããŒãªã³ã°: æ¬ç©ã®ãã±ãŒã«åºæã³ã³ãã³ãã®ããã®ãžãªãã±ãŒã·ã§ã³ãèšèªãã¿ã€ã ãŸãŒã³ãèšå®:
crun_cfg = CrawlerRunConfig(
url="https://browserleaks.com/geo", # ããªãã®äœçœ®ã衚瀺ãããã¹ãããŒãž
locale="en-US", # Accept-Language & UIãã±ãŒã«
timezone_id="America/Los_Angeles", # JS Date()/Intlã¿ã€ã ãŸãŒã³
geolocation=GeolocationConfig( # GPS座æšãäžæžã
latitude=34.0522,
longitude=-118.2437,
accuracy=10.0,
)
)
ð ããŒãã«ããDataFrameãžã®æœåº: HTMLããŒãã«ãçŽæ¥CSVãŸãã¯pandas DataFramesã«æœåº:
crawler = AsyncWebCrawler(config=browser_config)
await crawler.start()
try:
# ã¹ã¯ã¬ã€ãã³ã°ãã©ã¡ãŒã¿ãèšå®
crawl_config = CrawlerRunConfig(
table_score_threshold=8, # 峿 ŒãªããŒãã«æ€åº
)
# åžå ŽããŒã¿æœåºãå®è¡
results: List[CrawlResult] = await crawler.arun(
url="https://coinmarketcap.com/?page=1", config=crawl_config
)
# çµæãåŠç
raw_df = pd.DataFrame()
for result in results:
if result.success and result.media["tables"]:
raw_df = pd.DataFrame(
result.media["tables"][0]["rows"],
columns=result.media["tables"][0]["headers"],
)
break
print(raw_df.head())
finally:
await crawler.stop()
ð ãã©ãŠã¶ããŒãªã³ã°: äºåãŠã©ãŒãã³ã°ããããã©ãŠã¶ã€ã³ã¹ã¿ã³ã¹ã§ããŒãžãèµ·åããã¬ã€ãã³ã·ãšã¡ã¢ãªäœ¿çšéãäœæž
ðžïž ãããã¯ãŒã¯ãšã³ã³ãœãŒã«ãã£ããã£: ãããã°ã®ããã®å®å šãªãã©ãã£ãã¯ãã°ãšMHTMLã¹ãããã·ã§ãã:
crawler_config = CrawlerRunConfig(
capture_network=True,
capture_console=True,
mhtml=True
)
ð MCPçµ±å: Model Context ProtocolãéããŠClaude Codeãªã©ã®AIããŒã«ã«æ¥ç¶
# Crawl4AIãClaude Codeã«è¿œå
claude mcp add --transport sse c4ai-sse http://localhost:11235/mcp/sse
ð¥ïž ã€ã³ã¿ã©ã¯ãã£ããã¬ã€ã°ã©ãŠã³ã: çµã¿èŸŒã¿ã®Webã€ã³ã¿ãŒãã§ãŒã¹ã§èšå®ããã¹ãããAPIãªã¯ãšã¹ããçæ http://localhost:11235//playground
ð³ å·æ°ãããDockerãããã€ã¡ã³ã: æ¹åããããªãœãŒã¹å¹çã䌎ãåçåããããã«ãã¢ãŒããã¯ãã£Dockerã€ã¡ãŒãž
ð± ãã«ãã¹ããŒãžãã«ãã·ã¹ãã : ãã©ãããã©ãŒã åºæã®ããã©ãŒãã³ã¹åäžã䌎ãæé©åãããDockerfile
詳现ã¯0.6.0ãªãªãŒã¹ããŒããŸãã¯CHANGELOGãåç §ããŠãã ããã
crwl
CLIã§ã¿ãŒããã«ã¢ã¯ã»ã¹ãå¯èœã«lxml
ã©ã€ãã©ãªã«ããé«éHTMLè§£æè©³çްã¯0.5.0ãªãªãŒã¹ããŒããã芧ãã ããã
Crawl4AIã¯Pythonã®æšæºããŒãžã§ã³çªå·èŠåïŒPEP 440ïŒã«åŸããåãªãªãŒã¹ã®å®å®æ§ãšæ©èœãæç¢ºã«ç€ºããŸãã
ããŒãžã§ã³çªå·ã¯MAJOR.MINOR.PATCH
圢åŒïŒäŸ: 0.4.3ïŒã§ãã
éçºæ®µéãè¡šãæ¥å°ŸèŸã䜿çš:
dev
(0.4.3dev1): éçºçãäžå®å®a
(0.4.3a1): ã¢ã«ãã¡çãå®éšçæ©èœb
(0.4.3b1): ããŒã¿çãæ©èœå®æã ããã¹ãå¿
èŠrc
(0.4.3): ãªãªãŒã¹åè£çãæçµçåè£å®å®çã€ã³ã¹ããŒã«:
pip install -U crawl4ai
ãã¬ãªãªãŒã¹çã€ã³ã¹ããŒã«:
pip install crawl4ai --pre
ç¹å®ããŒãžã§ã³ã€ã³ã¹ããŒã«:
pip install crawl4ai==0.4.3b1
ãã¬ãªãªãŒã¹çã䜿çšããçç±:
æ¬çªç°å¢ã§ã¯å®å®çã®äœ¿çšãæšå¥šããŸããæ°æ©èœãã¹ãã«ã¯--pre
ãã©ã°ã§ãã¬ãªãªãŒã¹çãéžæå¯èœã§ãã
ðš ããã¥ã¡ã³ãæŽæ°äºå: æ¥é±ãææ°ã¢ããããŒããåæ ããå€§èŠæš¡ãªããã¥ã¡ã³ãæ¹èšã宿œäºå®ã§ããããå æ¬çã§ææ°ã®ã¬ã€ãããæ¥œãã¿ã«ïŒ
çŸåšã®ããã¥ã¡ã³ãïŒã€ã³ã¹ããŒã«æé ãé«åºŠãªæ©èœãAPIãªãã¡ã¬ã³ã¹ïŒã¯ããã¥ã¡ã³ããµã€ããã芧ãã ããã
éçºèšç»ãšä»åŸã®æ©èœã¯ããŒããããã§ç¢ºèªã§ããŸãã
ãªãŒãã³ãœãŒã¹ã³ãã¥ããã£ããã®è²¢ç®ãæè¿ããŸããã³ã³ããªãã¥ãŒã·ã§ã³ã¬ã€ãã©ã€ã³ãã確èªãã ããã
ã©ã€ã»ã³ã¹ã»ã¯ã·ã§ã³ããããžä»ãã§æŽæ°ããŸããããŒãããŒã³å¹æãå«ãããŒãžã§ã³ã¯ä»¥äžã®éãã§ã:
æ¬ãããžã§ã¯ãã¯Apache License 2.0ã§ã©ã€ã»ã³ã¹ãããŠãããåž°å±è¡šç€ºãå¿ é ã§ãã詳现ã¯Apache 2.0ã©ã€ã»ã³ã¹ãã¡ã€ã«ãã芧ãã ããã
Crawl4AIäœ¿çšæã«ã¯ä»¥äžã®ããããã®åž°å±è¡šç€ºæ¹æ³ãå¿ èŠã§ã:
READMEãããã¥ã¡ã³ãããŠã§ããµã€ãã«ä»¥äžã®ãããžã远å :
ããŒã | ãããž |
---|---|
ãã£ã¹ã³ããŒãïŒã¢ãã¡ãŒã·ã§ã³ïŒ | |
ãã€ãããŒãïŒããªã³èª¿ããŒã¯ïŒ | |
ããŒã¯ããŒãïŒã¯ã©ã·ãã¯ïŒ | |
ã©ã€ãããŒãïŒã¯ã©ã·ãã¯ïŒ |
ãããžè¿œå çšHTMLã³ãŒã:
<!-- Disco Theme (Animated) -->
<a href="https://github.com/unclecode/crawl4ai">
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-disco.svg" alt="Powered by Crawl4AI" width="200"/>
</a>
<!-- Night Theme (Dark with Neon) -->
<a href="https://github.com/unclecode/crawl4ai">
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-night.svg" alt="Powered by Crawl4AI" width="200"/>
</a>
<!-- Dark Theme (Classic) -->
<a href="https://github.com/unclecode/crawl4ai">
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-dark.svg" alt="Powered by Crawl4AI" width="200"/>
</a>
<!-- Light Theme (Classic) -->
<a href="https://github.com/unclecode/crawl4ai">
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-light.svg" alt="Powered by Crawl4AI" width="200"/>
</a>
<!-- Simple Shield Badge -->
<a href="https://github.com/unclecode/crawl4ai">
<img src="https://img.shields.io/badge/Powered%20by-Crawl4AI-blue?style=flat-square" alt="Powered by Crawl4AI"/>
</a>
ããã¥ã¡ã³ãã«ä»¥äžã®è¡ã远å :
This project uses Crawl4AI (https://github.com/unclecode/crawl4ai) for web data extraction.
ç ç©¶ããããžã§ã¯ãã§Crawl4AIã䜿çšããå Žåã¯ä»¥äžã®åœ¢åŒã§åŒçšããŠãã ãã:
@software{crawl4ai2024,
author = {UncleCode},
title = {Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper},
year = {2024},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/unclecode/crawl4ai}},
commit = {Please use the commit hash you're working with}
}
ããã¹ãåŒçšåœ¢åŒ:
UncleCode. (2024). Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper [Computer software].
GitHub. https://github.com/unclecode/crawl4ai
質åãææ¡ããã£ãŒãããã¯ã¯ä»¥äžãŸã§ãæ°è»œã«:
Happy Crawling! ðžïžð
å人ã»äŒæ¥ããŒã¿ã®äŸ¡å€ãè§£æŸããããžã¿ã«ãããããªã³ããæ§é åãããååŒå¯èœãªè³ç£ãžå€æããããšã䜿åœãšããŸããCrawl4AIã¯ãªãŒãã³ãœãŒã¹ããŒã«ã§ããŒã¿æœåºã»æ§é åãå¯èœã«ããå ±æããŒã¿çµæžãä¿é²ããŸãã
ç§ãã¡ã¯ãå®åšãã人éã®ç¥èã«ãã£ãŠæ¯ããããAIã®æªæ¥ãæ§æ³ããŠããŸããããŒã¿æ°äž»åãšå«ççå ±æã«ãããçã®AI鲿©ã®åºç€ãç¯ããŸãã
詳现ã¯å®å šãªããã·ã§ã³ã¹ããŒãã¡ã³ããã芧ãã ããã