diff --git a/cookbook/company-info/scrapegraph_sdk.ipynb b/cookbook/company-info/scrapegraph_sdk.ipynb
index 50ee223..4e7f5bb 100644
--- a/cookbook/company-info/scrapegraph_sdk.ipynb
+++ b/cookbook/company-info/scrapegraph_sdk.ipynb
@@ -6,7 +6,7 @@
"id": "jEkuKbcRrPcK"
},
"source": [
- "## 🕷️ Extract Company Info with Official Scrapegraph SDK\n"
+ "## 🕷️ Extract Company Info with Official Scrapegraph SDK"
]
},
{
diff --git a/cookbook/homes-forsale/scrapegraph_sdk.ipynb b/cookbook/homes-forsale/scrapegraph_sdk.ipynb
new file mode 100644
index 0000000..59e7b16
--- /dev/null
+++ b/cookbook/homes-forsale/scrapegraph_sdk.ipynb
@@ -0,0 +1,965 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "source": [
+ "## \ud83d\udd77\ufe0f Extract Houses Listing with Official Scrapegraph SDK\n",
+ "\n",
+ "[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk-2) [](https://colab.research.google.com/drive/1HHBUSFAHD_IvdeTAF60p6mtmeabo1s9P?usp=sharing)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8vZBkAWLq9C1"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "source": [
+ "### \ud83d\udd27 Install `dependencies`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "outputs": [],
+ "source": [
+ "%%capture\n",
+ "!pip install scrapegraph-py"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "source": [
+ "### \ud83d\udd11 Import `ScrapeGraph` API key"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "source": [
+ "You can find the Scrapegraph API key [here](https://scrapegraphai.com/dashboard)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "sffqFG2EJ8bI",
+ "outputId": "18dfce64-db37-4825-d316-fabd064100d0"
+ },
+ "outputs": [],
+ "source": [
+ "import getpass\n",
+ "import os\n",
+ "\n",
+ "if not os.environ.get(\"SGAI_API_KEY\"):\n",
+ " os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "source": [
+ "### \ud83d\udcdd Defining an `Output Schema` for Webpage Content Extraction\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "source": [
+ "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
+ "\n",
+ "Pydantic Schema Quick Guide
\n",
+ "\n",
+ "Types of Schemas \n",
+ "\n",
+ "1. Simple Schema \n",
+ "Use this when you want to extract straightforward information, such as a single piece of content. \n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "\n",
+ "# Simple schema for a single webpage\n",
+ "class PageInfoSchema(BaseModel):\n",
+ " title: str = Field(description=\"The title of the webpage\")\n",
+ " description: str = Field(description=\"The description of the webpage\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n",
+ " \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "2. Complex Schema (Nested) \n",
+ "If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n",
+ "\n",
+ "```python\n",
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List\n",
+ "\n",
+ "# Define a schema for a single repository\n",
+ "class RepositorySchema(BaseModel):\n",
+ " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
+ " description: str = Field(description=\"Description of the repository\")\n",
+ " stars: int = Field(description=\"Star count of the repository\")\n",
+ " forks: int = Field(description=\"Fork count of the repository\")\n",
+ " today_stars: int = Field(description=\"Stars gained today\")\n",
+ " language: str = Field(description=\"Programming language used\")\n",
+ "\n",
+ "# Define a schema for a list of repositories\n",
+ "class ListRepositoriesSchema(BaseModel):\n",
+ " repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n",
+ "\n",
+ "# Example Output JSON after AI extraction\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"google-gemini/cookbook\",\n",
+ " \"description\": \"Examples and guides for using the Gemini API\",\n",
+ " \"stars\": 8036,\n",
+ " \"forks\": 1001,\n",
+ " \"today_stars\": 649,\n",
+ " \"language\": \"Jupyter Notebook\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"TEN-framework/TEN-Agent\",\n",
+ " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n",
+ " \"stars\": 3224,\n",
+ " \"forks\": 311,\n",
+ " \"today_stars\": 361,\n",
+ " \"language\": \"Python\"\n",
+ " }\n",
+ " ]\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "Key Takeaways \n",
+ "- **Simple Schema**: Perfect for small, straightforward extractions. \n",
+ "- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n",
+ "\n",
+ "Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n",
+ "
\n",
+ "
\n",
+ "