Official Python SDK for the ScrapeGraph AI API - Intelligent web scraping and search powered by AI. Extract structured data from any webpage or perform AI-powered web searches with natural language prompts.
Get your API key!

- π€ SmartScraper: Extract structured data from webpages using natural language prompts
- π SearchScraper: AI-powered web search with structured results and reference URLs
- π Markdownify: Convert any webpage into clean, formatted markdown
- π·οΈ SmartCrawler: Intelligently crawl and extract data from multiple pages
- π€ AgenticScraper: Perform automated browser actions with AI-powered session management
- π Scrape: Convert webpages to HTML with JavaScript rendering and custom headers
- β° Scheduled Jobs: Create and manage automated scraping workflows with cron scheduling
- π³ Credits Management: Monitor API usage and credit balance
- π¬ Feedback System: Provide ratings and feedback to improve service quality
ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..
You can find more informations at the following link
Integrations:
- API: Documentation
- SDK: Python
- LLM Frameworks: Langchain, Llama Index, Crew.ai, CamelAI
- Low-code Frameworks: Pipedream, Bubble, Zapier, n8n, LangFlow
- MCP server: Link
pip install scrapegraph-py- π€ AI-Powered Extraction & Search: Use natural language to extract data or search the web
- π Structured Output: Get clean, structured data with optional schema validation
- π Multiple Formats: Extract data as JSON, Markdown, or custom schemas
- β‘ High Performance: Concurrent processing and automatic retries
- π Enterprise Ready: Production-grade security and rate limiting
Using AI to extract structured data from any webpage or HTML content with natural language prompts.
Example Usage:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Extract data from a webpage
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading, description, and summary of the webpage",
)
print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")
client.close()Perform AI-powered web searches with structured results and reference URLs.
Example Usage:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Perform AI-powered web search
response = client.searchscraper(
user_prompt="What is the latest version of Python and what are its main features?",
num_results=3, # Number of websites to search (default: 3)
)
print(f"Result: {response['result']}")
print("\nReference URLs:")
for url in response["reference_urls"]:
print(f"- {url}")
client.close()Convert any webpage into clean, formatted markdown.
Example Usage:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Convert webpage to markdown
response = client.markdownify(
website_url="https://example.com",
)
print(f"Request ID: {response['request_id']}")
print(f"Markdown: {response['result']}")
client.close()Intelligently crawl and extract data from multiple pages with configurable depth and batch processing.
Example Usage:
from scrapegraph_py import Client
import os
import time
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Start crawl job
crawl_response = client.crawl(
url="https://example.com",
prompt="Extract page titles and main headings",
data_schema={
"type": "object",
"properties": {
"title": {"type": "string"},
"headings": {"type": "array", "items": {"type": "string"}}
}
},
depth=2,
max_pages=5,
same_domain_only=True,
)
crawl_id = crawl_response.get("id") or crawl_response.get("task_id")
# Poll for results
if crawl_id:
for _ in range(10):
time.sleep(5)
result = client.get_crawl(crawl_id)
if result.get("status") == "success":
print("Crawl completed:", result["result"]["llm_result"])
break
client.close()Perform automated browser actions on webpages using AI-powered agentic scraping with session management.
Example Usage:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Perform automated browser actions
response = client.agenticscraper(
url="https://example.com",
use_session=True,
steps=[
"Type email@gmail.com in email input box",
"Type password123 in password inputbox",
"click on login"
],
ai_extraction=False # Set to True for AI extraction
)
print(f"Request ID: {response['request_id']}")
print(f"Status: {response.get('status')}")
# Get results
result = client.get_agenticscraper(response['request_id'])
print(f"Result: {result.get('result')}")
client.close()Convert webpages into HTML format with optional JavaScript rendering and custom headers.
Example Usage:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Get HTML content from webpage
response = client.scrape(
website_url="https://example.com",
render_heavy_js=False, # Set to True for JavaScript-heavy sites
)
print(f"Request ID: {response['request_id']}")
print(f"HTML length: {len(response.get('html', ''))} characters")
client.close()Create, manage, and monitor scheduled scraping jobs with cron expressions and execution history.
Check your API credit balance and usage.
Send feedback and ratings for scraping requests to help improve the service.
- π Natural Language Queries: No complex selectors or XPath needed
- π― Precise Extraction: AI understands context and structure
- π Adaptive Processing: Works with both web content and direct HTML
- π Schema Validation: Ensure data consistency with Pydantic
- β‘ Async Support: Handle multiple requests efficiently
- π Source Attribution: Get reference URLs for search results
- π’ Business Intelligence: Extract company information and contacts
- π Market Research: Gather product data and pricing
- π° Content Aggregation: Convert articles to structured formats
- π Data Mining: Extract specific information from multiple sources
- π± App Integration: Feed clean data into your applications
- π Web Research: Perform AI-powered searches with structured results
For detailed documentation and examples, visit:
- π§ Email: support@scrapegraphai.com
- π» GitHub Issues: Create an issue
- π Feature Requests: Request a feature
This project is licensed under the MIT License - see the LICENSE file for details.
Made with β€οΈ by ScrapeGraph AI