diff --git a/sources/academy/ai/ai-agents.mdx b/sources/academy/ai/ai-agents.mdx index 8ccea0a205..33276a465d 100644 --- a/sources/academy/ai/ai-agents.mdx +++ b/sources/academy/ai/ai-agents.mdx @@ -12,7 +12,7 @@ slug: /ai/ai-agents AI agents are goal-oriented systems that make independent decisions. They interact with environments using predefined tools and workflows to automate complex tasks. -On Apify, AI agents are built as Actors—serverless cloud programs for web scraping, data processing, and AI deployment. Apify evolved from running scrapers in the cloud to supporting LLMs that follow predefined workflows with dynamically defined goals. +On Apify, AI agents are built as Actors - serverless cloud programs for web scraping, data processing, and AI deployment. Apify evolved from running scrapers in the cloud to supporting LLMs that follow predefined workflows with dynamically defined goals. ## Prerequisites diff --git a/sources/academy/build-and-publish/actor-ideas/actor_validation.md b/sources/academy/build-and-publish/actor-ideas/actor_validation.md index 2a40ef21de..e4ff7309c6 100644 --- a/sources/academy/build-and-publish/actor-ideas/actor_validation.md +++ b/sources/academy/build-and-publish/actor-ideas/actor_validation.md @@ -43,7 +43,7 @@ Examine current search results. Few quality results for a query like _download d Many results or ads for _Instagram scraper_ means the market is proven but competitive. You'll need to differentiate. -Check keyword difficulty and domain authority. If difficulty is 70+ and top pages have 80+ domain authority with thousands of backlinks—and Apify already has an official Actor with 100,000+ users—you can't compete directly. Find an adjacent angle or specialization. +Check keyword difficulty and domain authority. If difficulty is 70+ and top pages have 80+ domain authority with thousands of backlinks - and Apify already has an official Actor with 100,000+ users - you can't compete directly. Find an adjacent angle or specialization. ## Analyze Google Trends @@ -57,7 +57,7 @@ Watch for spikes. Sudden jumps from media coverage or viral moments usually don' Beyond SEO data, go where your potential users are. Browse Reddit, Hacker News, Stack Overflow, X (Twitter), Discord, and Facebook groups. What problems are people discussing? What tools do they wish existed? -Document your findings. Note quotes and recurring themes like _Multiple marketers on Reddit want easy competitor pricing tracking—no existing solution mentioned_. These insights complement your SEO data and help you speak your users' language. +Document your findings. Note quotes and recurring themes like _Multiple marketers on Reddit want easy competitor pricing tracking - no existing solution mentioned_. These insights complement your SEO data and help you speak your users' language. Zero discussion across multiple platforms over 4+ weeks means either no one cares about the problem or they've already solved it. @@ -78,7 +78,7 @@ You can also use tools like [F5Bot](https://f5bot.com/) or ### Q&A forums and Stack Overflow -Look for questions about doing the task manually. If thinking about a LinkedIn scraper, check Stack Overflow for questions like _How can I scrape LinkedIn profiles?_ Frequent questions or upvotes indicate many people trying to solve it without a good tool—an opportunity for your Actor. +Look for questions about doing the task manually. If thinking about a LinkedIn scraper, check Stack Overflow for questions like _How can I scrape LinkedIn profiles?_ Frequent questions or upvotes indicate many people trying to solve it without a good tool - an opportunity for your Actor. Use the `site:` parameter: @@ -132,7 +132,7 @@ Fork and commit activity shows developers actively work with the technology. Hig ## Review Product Hunt launches -Study successful automation tool launches from the past 12-24 months on Product Hunt. Filter by _Browser Automation_ and _Automation tools_, then sort by upvotes. Note which taglines, value propositions, and features resonated. Products with 500+ upvotes validated something—figure out what worked. +Study successful automation tool launches from the past 12-24 months on Product Hunt. Filter by _Browser Automation_ and _Automation tools_, then sort by upvotes. Note which taglines, value propositions, and features resonated. Products with 500+ upvotes validated something - figure out what worked. ## Research Apify Store @@ -152,17 +152,17 @@ If the market has 50+ Actors with strong leaders (Apify-maintained with 50,000+ ## Scan the broader market -Do a general Google search for tools or services that solve your problem. Your competition might not be another Actor—it could be a SaaS tool or API. If your idea is _monitor website uptime and screenshot changes_, established services probably exist. +Do a general Google search for tools or services that solve your problem. Your competition might not be another Actor - it could be a SaaS tool or API. If your idea is _monitor website uptime and screenshot changes_, established services probably exist. Note direct competitors: How do they price it? What audience do they target? Are users satisfied or complaining? This validates that people pay for the service and reveals gaps you can fill. -Understanding the competition helps you refine your unique value—whether that's lower cost, better features, or targeting an underserved niche. +Understanding the competition helps you refine your unique value - whether that's lower cost, better features, or targeting an underserved niche. No existing solutions? Ask why. You might have found an untapped need, or it's a red flag (too difficult to implement, or the target website aggressively blocks scraping). Use your judgment. ## Get feedback from potential users -Reach out to people who match your target user profile. Building a real estate data Actor? Contact real estate analysts or agents (LinkedIn works well) and ask if a tool that does X would help them. Keep it informal—describe the problem you're solving and ask if they'd use or pay for it. +Reach out to people who match your target user profile. Building a real estate data Actor? Contact real estate analysts or agents (LinkedIn works well) and ask if a tool that does X would help them. Keep it informal - describe the problem you're solving and ask if they'd use or pay for it. Direct feedback helps you: diff --git a/sources/academy/build-and-publish/actor-ideas/what_software_an_actor_can_be.md b/sources/academy/build-and-publish/actor-ideas/what_software_an_actor_can_be.md index 9d12e1ba88..12df65bc90 100644 --- a/sources/academy/build-and-publish/actor-ideas/what_software_an_actor_can_be.md +++ b/sources/academy/build-and-publish/actor-ideas/what_software_an_actor_can_be.md @@ -160,7 +160,7 @@ The [Actor ideas](https://apify.com/ideas) page is where you can find inspiratio 1. _Visit_ [apify.com/ideas](https://apify.com/ideas) to find ideas that interest you. Look for ideas that align with your skills. -1. _Select an Actor idea_: Review the details and requirements. Check the status—if it's marked **Open to develop**, you can start building. +1. _Select an Actor idea_: Review the details and requirements. Check the status - if it's marked **Open to develop**, you can start building. 1. _Build your Actor_: Develop your Actor based on the idea. You don't need to notify Apify during development. @@ -176,7 +176,7 @@ The [Actor ideas](https://apify.com/ideas) page is where you can find inspiratio diff --git a/sources/academy/build-and-publish/apify-store-basics/how_actor_monetization_works.md b/sources/academy/build-and-publish/apify-store-basics/how_actor_monetization_works.md index 28dc044c22..fd57ed8725 100644 --- a/sources/academy/build-and-publish/apify-store-basics/how_actor_monetization_works.md +++ b/sources/academy/build-and-publish/apify-store-basics/how_actor_monetization_works.md @@ -144,7 +144,7 @@ Also, remember that your Actor is a package deal with the Apify platform. All th ### Do research in Apify Store -Apify Store is like any other marketplace, so take a look at your competition there. Are you the first in your lane, or are there other similar tools? What makes yours stand out? Remember, your README is your first impression — communicate your tool's benefits clearly and offer something unique. Competing with other developers is great, but collaborations can drive even better results 😉 +Apify Store is like any other marketplace, so take a look at your competition there. Are you the first in your lane, or are there other similar tools? What makes yours stand out? Remember, your README is your first impression - communicate your tool's benefits clearly and offer something unique. Competing with other developers is great, but collaborations can drive even better results 😉 Learn more about what makes a good readme here: [How to create an Actor README](/academy/actor-marketing-playbook/actor-basics/how-to-create-an-actor-readme) diff --git a/sources/academy/build-and-publish/apify-store-basics/how_store_works.md b/sources/academy/build-and-publish/apify-store-basics/how_store_works.md index 752a5c6a83..8dbbac12ee 100644 --- a/sources/academy/build-and-publish/apify-store-basics/how_store_works.md +++ b/sources/academy/build-and-publish/apify-store-basics/how_store_works.md @@ -97,7 +97,7 @@ A high number of monthly users indicates widespread trust and effective performa Each Actor has an **Issues** tab in Apify Console and on the web. Here, users can open an issue (ticket) and engage in discussions with the Actor's creator, platform admins, and other users. The tab is ideal for asking questions, requesting new features, or providing feedback. -Since the **Issues** tab is public, the level of activity — or lack thereof — can be observed by potential users and may serve as an indicator of the Actor's reliability. A well-maintained Issues tab with prompt responses suggests an active and dependable Actor. +Since the **Issues** tab is public, the level of activity - or lack thereof - can be observed by potential users and may serve as an indicator of the Actor's reliability. A well-maintained Issues tab with prompt responses suggests an active and dependable Actor. Learn more about how to handle the [Issues tab](/academy/actor-marketing-playbook/interact-with-users/issues-tab) diff --git a/sources/academy/build-and-publish/apify-store-basics/how_to_create_actor_readme.md b/sources/academy/build-and-publish/apify-store-basics/how_to_create_actor_readme.md index 6b0d9d2e12..f324a77f34 100644 --- a/sources/academy/build-and-publish/apify-store-basics/how_to_create_actor_readme.md +++ b/sources/academy/build-and-publish/apify-store-basics/how_to_create_actor_readme.md @@ -29,7 +29,7 @@ Before we dive in, a little disclaimer: you don't need your Apify README to fulf Your Actor’s README has at least four functions: -1. _SEO_ - If your README is well-structured and includes important keywords — both in headings and across the text — it has a high chance of being noticed and promoted by Google. Organic search brings the most motivated type of potential users. If you win this game, you've won most of the SEO game. +1. _SEO_ - If your README is well-structured and includes important keywords - both in headings and across the text - it has a high chance of being noticed and promoted by Google. Organic search brings the most motivated type of potential users. If you win this game, you've won most of the SEO game. 2. _First impression_ - Your README is one of the first points of contact with a potential user. If you come across as convincing, clear, and reassuring it could be the factor that will make a user try your Actor for their task. 3. _Extended instruction_ - The README is also the space that explains specific complex input settings. For example, special formatting of the input, any coding-related, or extended functions. Of course, you could put that all in a blog post as well, but the README should be their first point of contact. 4. _Support_ - Your users come back to the README when they face issues. Use it as a space to let them know that's where they can find links to the tutorials if they run into issues, describe common troubleshooting techniques, share tricks, or warn you about bugs. @@ -56,7 +56,7 @@ Your Actor + the Apify platform. They come as a package. Don't forget to flaunt ::: -Imagine if there was a solution that is identical to yours but without the platform advantages such as monitoring, access to API, scheduling, possibility of integrations, proxy rotation. Now, if that tool suddenly gained all those advantages it would surely make a selling point out of it. This is how you should be thinking about your tool — as a solution boosted by the Apify platform. Don't ever forget that advantage. +Imagine if there was a solution that is identical to yours but without the platform advantages such as monitoring, access to API, scheduling, possibility of integrations, proxy rotation. Now, if that tool suddenly gained all those advantages it would surely make a selling point out of it. This is how you should be thinking about your tool - as a solution boosted by the Apify platform. Don't ever forget that advantage. What data can [Actor] extract? @@ -108,7 +108,7 @@ If your datasets come out too complex and you want to save your users some scrol ### Other Actors -Don't forget to promote your other Actors. While our system for Actor recommendation works - you can see related Actors at the bottom of the README — it only works within the same category or similar name. It won't recommend a completely different Actor from the same creator. Make sure to interconnect your work by taking the initiative yourself. You can mention your other Actors in a list or as a table. +Don't forget to promote your other Actors. While Apify's system for Actor recommendation works - you can see related Actors at the bottom of the README - it only works within the same category or similar name. It won't recommend a completely different Actor from the same creator. Make sure to interconnect your work by taking the initiative yourself. You can mention your other Actors in a list or as a table. ### FAQ, disclaimers, and support @@ -125,7 +125,7 @@ Here are just a few things we usually push to the FAQ section. - mentioning the Issues tab and highlighting that you're open for feedback and collecting feedback - mentioning being open to creating a custom solution based on the current one and showing a way to contact you - interlinking -- mentioning the possibility of transferring data using an API — API tab +- mentioning the possibility of transferring data using an API - API tab - possibility for integrations - use cases for the data scraped, success stories exemplifying the use of data diff --git a/sources/academy/build-and-publish/apify-store-basics/importance_of_actor_url.md b/sources/academy/build-and-publish/apify-store-basics/importance_of_actor_url.md index 8a7fdf692d..ff2ba4bfa8 100644 --- a/sources/academy/build-and-publish/apify-store-basics/importance_of_actor_url.md +++ b/sources/academy/build-and-publish/apify-store-basics/importance_of_actor_url.md @@ -28,7 +28,7 @@ The right naming can propel or hinder the success of the Actor on Google Search. ### Brainstorming -What does your Actor do? Does it scrape, find, extract, automate, connect? Think of these when you are looking for a name. You might already have a code name in mind, but it’s essential to ensure it stands out and is distinct from similar names—both on Google and on Apify Store. +What does your Actor do? Does it scrape, find, extract, automate, connect? Think of these when you are looking for a name. You might already have a code name in mind, but it’s essential to ensure it stands out and is distinct from similar names - both on Google and on Apify Store. ### Matching URL and name diff --git a/sources/academy/build-and-publish/apify-store-basics/name_your_actor.md b/sources/academy/build-and-publish/apify-store-basics/name_your_actor.md index 05013561a4..eaf64bdccd 100644 --- a/sources/academy/build-and-publish/apify-store-basics/name_your_actor.md +++ b/sources/academy/build-and-publish/apify-store-basics/name_your_actor.md @@ -37,7 +37,7 @@ Your Actor's name should be _40-50 characters_ long. You can change your Actor n ### Actor name vs. SEO name -There's an option to step away from your Actor's name for the sake of search engine optimization — the Actor SEO name. The Actor name and Actor SEO name serve different purposes: +There's an option to step away from your Actor's name for the sake of search engine optimization - the Actor SEO name. The Actor name and Actor SEO name serve different purposes: - _Actor name_: this is the name visible in Apify Store and Console. It should be easy for users to understand and quickly show what your Actor does. It’s about attracting users who browse the Store. diff --git a/sources/academy/build-and-publish/how-to-build/actor_bundles.md b/sources/academy/build-and-publish/how-to-build/actor_bundles.md index 3ad238f54d..e4b08a5baa 100644 --- a/sources/academy/build-and-publish/how-to-build/actor_bundles.md +++ b/sources/academy/build-and-publish/how-to-build/actor_bundles.md @@ -60,7 +60,7 @@ Remember, bundles originated as customized solutions for specific use cases - th This is also an opportunity to tell a story rather than just presenting a tool. Consider writing a blog post about how you created this tool, recording a video, or hosting a live webinar. If you go this route, it’s important to emphasize how the tool was created and what a technical feat it represents. -That said, don’t abandon SEO entirely. You can still capture some SEO value by referencing the bundle in the READMEs of the individual Actors that comprise it. For example, if a bundle collects reviews from multiple platforms, potential users are likely to search for review scrapers for each specific platform—Google Maps reviews scraper, Tripadvisor reviews scraper, Booking reviews scraper, etc. These keywords may not lead directly to your review scraping bundle, but they can guide users to the individual scrapers, where you can then present the bundle as a more comprehensive solution. +That said, don’t abandon SEO entirely. You can still capture some SEO value by referencing the bundle in the READMEs of the individual Actors that comprise it. For example, if a bundle collects reviews from multiple platforms, potential users are likely to search for review scrapers for each specific platform - Google Maps reviews scraper, Tripadvisor reviews scraper, Booking reviews scraper, etc. These keywords may not lead directly to your review scraping bundle, but they can guide users to the individual scrapers, where you can then present the bundle as a more comprehensive solution. --- diff --git a/sources/academy/build-and-publish/how-to-build/actorization_playbook.mdx b/sources/academy/build-and-publish/how-to-build/actorization_playbook.mdx index 44896742dd..b6d09aa1c7 100644 --- a/sources/academy/build-and-publish/how-to-build/actorization_playbook.mdx +++ b/sources/academy/build-and-publish/how-to-build/actorization_playbook.mdx @@ -14,9 +14,9 @@ Most Actors are developed by a global creator community, and some are developed Under the hood, Actors are programs packaged as Docker images, that accept a well-defined JSON input, perform an action, and optionally produce a well-defined JSON output. This makes it easy to auto-generate user interfaces for Actors and integrate them with one another or with external systems. For example, we have user-friendly integrations with Zapier, Make, LangChain, MCP, OpenAPI, and SDKs for TypeScript/Python, CLI, etc. etc. -Actors are a new way to build reusable serverless micro-apps that are easy to develop, share, integrate, and build upon—and, importantly, monetize. While Actors are our invention, we’re in the process of making them an open standard. Learn more at [https://whitepaper.actor](https://whitepaper.actor/). +Actors are a new way to build reusable serverless micro-apps that are easy to develop, share, integrate, and build upon - and, importantly, monetize. While Actors are our invention, we’re in the process of making them an open standard. Learn more at [https://whitepaper.actor](https://whitepaper.actor/). -While most Actors on our marketplace are web scrapers or crawlers, there are ever more Actors for other use cases including data processing, web automation, API backend, or [AI agents](https://apify.com/store/categories/agents). In fact, any piece of software that accepts input, performs a job, and can run in Docker, can be _Actorized_ simply by adding an `.actor` directory to it with a couple of JSON files. +While most Actors on Apify Store are web scrapers or crawlers, there are ever more Actors for other use cases including data processing, web automation, API backend, or [AI agents](https://apify.com/store/categories/agents). In fact, any piece of software that accepts input, performs a job, and can run in Docker, can be _Actorized_ simply by adding an `.actor` directory to it with a couple of JSON files. ## Why Actorize @@ -31,7 +31,7 @@ By publishing your service or project at [Apify Store](https://apify.com/store) For open-source developers, Actorization adds value without extra costs: - Host your code in the cloud for easy user trials (no local installs needed). -- Avoid managing cloud infrastructure—users cover the costs. +- Avoid managing cloud infrastructure - users cover the costs. - Earn income through [Apify’s Open Source Fair Share program](https://apify.com/partners/open-source-fair-share) via GitHub Sponsors or direct payouts. - Publish and monetize 10x faster than building a micro-SaaS, with Apify handling infra, billing, and access to 700,000+ monthly visitors and 70,000 signups. @@ -125,14 +125,14 @@ Perhaps the most important part of the Actorization process is writing the code Unless you’re writing an application targeted directly on the Apify platform, this will have the form of a script that calls your code and integrates it with the Apify Storages -Apify provides SDKs for [Javascript](/sdk/js/) and [Python](/sdk/python/) plus a [Apify CLI](/cli/) allowing an easy interaction with Apify platform from command line. +Apify provides SDKs for [JavaScript](/sdk/js/) and [Python](/sdk/python/) plus the [Apify CLI](/cli/) allowing an easy interaction with the Apify platform from the command line. Check out [programming interface](/platform/actors/development/programming-interface/) documentation article for details on interacting with the Apify platform in your Actor's code. ### 5. Deploy the Actor -Deployment to Apify platform can be done easily via `apify push` command of [Apify CLI](/cli/) and for details see [deployment](/platform/actors/development/deployment) documentation. +Deployment to the Apify platform can be done easily via `apify push` command of the [Apify CLI](/cli/) and for details see [deployment](/platform/actors/development/deployment) documentation. ### 6. Publish and monetize -For details on publishing the Actor in [Apify Store](https://apify.com/store) see the [Publishing and monetization](/platform/actors/publishing). You can also follow our guide on [How to create an Actor README](/academy/actor-marketing-playbook/actor-basics/how-to-create-an-actor-readme) and [Marketing checklist](/academy/actor-marketing-playbook/promote-your-actor/checklist). +For details on publishing the Actor in [Apify Store](https://apify.com/store) see the [Publishing and monetization](/platform/actors/publishing). You can also follow the guide on [How to create an Actor README](/academy/actor-marketing-playbook/actor-basics/how-to-create-an-actor-readme) and [Marketing checklist](/academy/actor-marketing-playbook/promote-your-actor/checklist). diff --git a/sources/academy/build-and-publish/how-to-build/how_to_create_a_great_input_schema.md b/sources/academy/build-and-publish/how-to-build/how_to_create_a_great_input_schema.md index 43f5b73f1e..51d97247fa 100644 --- a/sources/academy/build-and-publish/how-to-build/how_to_create_a_great_input_schema.md +++ b/sources/academy/build-and-publish/how-to-build/how_to_create_a_great_input_schema.md @@ -21,7 +21,7 @@ You've succeeded: your user has: Now they’re on your Actor's page in Apify Console. The SEO fight is over. What’s next? -Your user is finally one-on-one with your Actor — specifically, its input schema. This is the moment when they try your Actor and decide whether to stick with it. The input schema is your representative here, and you want it to work in your favor. +Your user is finally one-on-one with your Actor - specifically, its input schema. This is the moment when they try your Actor and decide whether to stick with it. The input schema is your representative here, and you want it to work in your favor. Technically, the input schema is a `JSON` object with various field types supported by the Apify platform, designed to simplify the use of the Actor. Based on the input schema you define, the Apify platform automatically generates a _user interface_ for your Actor. @@ -39,7 +39,7 @@ To fully understand the recommendations in this blog post, you’ll first need t It can feel intimidating when facing the Apify platform for the first time. You only have a few seconds for a user to assess the ease of using your Actor. -If something goes wrong or is unclear with the input, an ideal user will first turn to the tooltips in the input schema. Next, they might check the README or tutorials, and finally, they’ll reach out to you through the **Issues** tab. However, many users won’t go through all these steps — they may simply get overwhelmed and abandon the tool altogether. +If something goes wrong or is unclear with the input, an ideal user will first turn to the tooltips in the input schema. Next, they might check the README or tutorials, and finally, they’ll reach out to you through the **Issues** tab. However, many users won’t go through all these steps - they may simply get overwhelmed and abandon the tool altogether. A well-designed input schema is all about managing user expectations, reducing cognitive load, and preventing frustration. Ideally, a good input schema, as your first line of interaction, should: @@ -84,7 +84,7 @@ Unfortunately, when it comes to UX, there's only so much you can achieve armed w - Make the **prefilled text** example simple and easy to remember. - If your Actor accepts various URL formats, add a few different **prefilled URLs** to show that possibility. - Use the **prefilled date** format that the user is expected to follow. This way, they can learn the correct format without needing to check the tooltip. - - There’s also a type of field that looks like a prefill but isn’t — usually a `default` field. It’s not counted as actual input but serves as a mock input to show users what to type or paste. It is gray and disappears after clicking on it. Use this to your advantage. + - There’s also a type of field that looks like a prefill but isn’t - usually a `default` field. It’s not counted as actual input but serves as a mock input to show users what to type or paste. It is gray and disappears after clicking on it. Use this to your advantage. - **toggle** - The toggle is a boolean field. A boolean field represents a yes/no choice. - How would you word this toggle: **Skip closed places** or **Scrape open places only**? And should the toggle be enabled or disabled by default? @@ -158,7 +158,7 @@ The version above was the improved input schema. Here's what this tool's input s 3. Use specific terminology (e.g., posts, images, tweets) from the target website instead of generic terms like "results" or "pages." 4. Group related items for clarity and ease of use. 5. Use emojis as shortcuts and visual anchors to guide attention. -6. Avoid technical jargon — keep the language simple. +6. Avoid technical jargon - keep the language simple. 7. Minimize cognitive load wherever possible. ## Signs and tools for improving input schema diff --git a/sources/academy/build-and-publish/how-to-build/index.md b/sources/academy/build-and-publish/how-to-build/index.md index 5670747a3b..0413543ad3 100644 --- a/sources/academy/build-and-publish/how-to-build/index.md +++ b/sources/academy/build-and-publish/how-to-build/index.md @@ -5,15 +5,15 @@ category: build-and-publish slug: /actor-marketing-playbook/store-basics/how-to-build-actors --- -At Apify, we try to make building web scraping and automation straightforward. You can customize our universal scrapers with JavaScript for quick tweaks, use our code templates for rapid setup in JavaScript, TypeScript, or Python, or build from scratch using our JavaScript and Python SDKs or Crawlee libraries for Node.js and Python for ultimate flexibility and control. This guide offers a quick overview of our tools to help you find the right fit for your needs. +At Apify, we try to make building web scraping and automation straightforward. You can customize the universal scrapers with JavaScript for quick tweaks, use code templates for rapid setup in JavaScript, TypeScript, or Python, or build from scratch using the JavaScript and Python SDKs or Crawlee libraries for Node.js and Python for full flexibility and control. This guide offers a quick overview of the tools to help you find the right fit for your needs. ## Three ways to build Actors -1. [Our universal scrapers](https://apify.com/scrapers/universal-web-scrapers) — customize our boilerplate tools to your needs with a bit of JavaScript and setup. -2. [Our code templates](https://apify.com/templates) for web scraping projects — for a quick project setup to save you development time (includes JavaScript, TypeScript, and Python templates). -3. Open-source libraries and SDKs - 1. [JavaScript SDK](https://docs.apify.com/sdk/js/) & [Python SDK](https://docs.apify.com/sdk/python/) — for creating your own solution from scratch on the Apify platform using our free development kits. Involves more coding but offers infinite flexibility. - 2. [Crawlee](https://crawlee.dev/) and [Crawlee for Python](https://crawlee.dev/python) — for creating your own solutions from scratch using our free web automation libraries. Involves even more coding but offers infinite flexibility. There’s also no need to host these on the platform. +1. [Universal scrapers](https://apify.com/scrapers/universal-web-scrapers) - customize the boilerplate tools to your needs with a bit of JavaScript and setup. +1. [Code templates](https://apify.com/templates) for web scraping projects - for a quick project setup to save you development time (includes JavaScript, TypeScript, and Python templates). +1. Open-source libraries and SDKs + 1. [JavaScript SDK](https://docs.apify.com/sdk/js/) and [Python SDK](https://docs.apify.com/sdk/python/) - for creating your own solution from scratch on the Apify platform using the free development kits. Involves more coding but offers infinite flexibility. + 1. [Crawlee](https://crawlee.dev/) and [Crawlee for Python](https://crawlee.dev/python) - for creating your own solutions from scratch using the free web automation libraries. Involves even more coding but offers infinite flexibility. There’s also no need to host these on the platform. ## Universal scrapers & what are they for @@ -133,9 +133,9 @@ While these tools are distinct, they can be combined. For example, you can use C Basically, the choice here depends on how much flexibility you need and how much coding you're willing to do. More flexibility → more coding. -[Universal scrapers](https://apify.com/scrapers/universal-web-scrapers) are simple to set up but are less flexible and configurable. Our [libraries](https://crawlee.dev/), on the other hand, enable the development of a standard [Node.js](https://nodejs.org/) or Python application, so be prepared to write a little more code. The reward for that is almost infinite flexibility. +[Universal scrapers](https://apify.com/scrapers/universal-web-scrapers) are simple to set up but are less flexible and configurable. The [Crawlee libraries](https://crawlee.dev/), on the other hand, enable the development of a standard [Node.js](https://nodejs.org/) or Python application, so be prepared to write a little more code. The reward for that is almost infinite flexibility. -[Code templates](https://apify.com/templates) are sort of a middle ground between scrapers and libraries. But since they are built on libraries, they are still on the rather more coding than less coding side. They will only give you a starter code to begin with. Please take this into account when choosing the way to build your scraper, and if in doubt — just ask us, and we'll help you out. +[Code templates](https://apify.com/templates) are sort of a middle ground between scrapers and libraries. But since they are built on libraries, they are still on the rather more coding than less coding side. They will only give you a starter code to begin with. Please take this into account when choosing the way to build your scraper, and if in doubt - just ask us, and we'll help you out. ## Switching sides: How to transfer an existing solution from another platform @@ -157,7 +157,7 @@ To use SuperScraper API, you can deploy it with an Apify API token and access it - [How to integrate Scrapy projects](https://docs.apify.com/cli/docs/integrating-scrapy) - Scrapy monitoring: how to [manage your Scrapy spider on Apify](https://blog.apify.com/scrapy-monitoring-spidermon/) -- Run ScrapingBee, ScraperAPI, and ScrapingAnt on Apify — [SuperScraper API Tutorial](https://www.youtube.com/watch?v=YKs-I-2K1Rg) +- Run ScrapingBee, ScraperAPI, and ScrapingAnt on Apify - [SuperScraper API Tutorial](https://www.youtube.com/watch?v=YKs-I-2K1Rg) ## General resources diff --git a/sources/academy/build-and-publish/interacting-with-users/emails_to_actor_users.md b/sources/academy/build-and-publish/interacting-with-users/emails_to_actor_users.md index 663628547a..78a66b43ac 100644 --- a/sources/academy/build-and-publish/interacting-with-users/emails_to_actor_users.md +++ b/sources/academy/build-and-publish/interacting-with-users/emails_to_actor_users.md @@ -25,7 +25,7 @@ Emails can include text, formatting, images, GIFs, and links. Here are four main Additional tips: -- Show, don’t tell — use screenshots with arrows to illustrate your points. +- Show, don’t tell - use screenshots with arrows to illustrate your points. - If you’re asking users to take action, include a direct link to what you're referring to. - Provide alternatives if it suits the situation. - Always send a preview to yourself before sending the email to all your users. @@ -58,7 +58,7 @@ A common situation in web scraping that's out of your control. > >Hi, > ->We've got some news regarding your favorite Actor – [Facebook Ads Scraper](https://console.apify.com/actors/JJghSZmShuco4j9gJ/console). Recently, Facebook Ads have changed their data format. To keep our Actor running smoothly, we'll be adapting to these changes by slightly tweaking the Actor Output. Don't worry; it's a breeze! Some of the output data might just appear under new titles. +>We've got some news regarding your favorite Actor - [Facebook Ads Scraper](https://console.apify.com/actors/JJghSZmShuco4j9gJ/console). Recently, Facebook Ads have changed their data format. To keep our Actor running smoothly, we'll be adapting to these changes by slightly tweaking the Actor Output. Don't worry; it's a breeze! Some of the output data might just appear under new titles. > >This change will take place on October 10; please** **make sure to remap your integrations accordingly. > @@ -107,7 +107,7 @@ Actor downtime, performance issues, Actor directly influenced by platform hiccup > >Hi, > ->We've got a quick update on the Google Maps Scraper for you. If you've been running the Actor this week, you might have noticed some hiccups — scraping was failing for certain places, causing retries and overall slowness. +>We've got a quick update on the Google Maps Scraper for you. If you've been running the Actor this week, you might have noticed some hiccups - scraping was failing for certain places, causing retries and overall slowness. > >We apologize for any inconvenience this may have caused you. The **good news is those performance issues are now resolved**. Feel free to resurrect any affected runs using the "latest" build, should work like a charm now. > @@ -142,6 +142,6 @@ Newsletters are a great way to keep your users engaged without overwhelming them ## Emailing a separate user -There may be times when you need to reach out to a specific user — whether it’s to address a unique situation, ask a question that doesn’t fit the public forum of the **Issue tab**, or explore a collaboration opportunity. While there isn’t a quick way to do this through Apify Console just yet, you can ensure users can contact you by **adding your email or other contact info to your Store bio**. This makes it easy for them to reach out directly. +There may be times when you need to reach out to a specific user - whether it’s to address a unique situation, ask a question that doesn’t fit the public forum of the **Issue tab**, or explore a collaboration opportunity. While there isn’t a quick way to do this through Apify Console just yet, you can ensure users can contact you by **adding your email or other contact info to your Store bio**. This makes it easy for them to reach out directly. ✍🏻 Learn best practices on how to use your Store bio to connect with your users [Your Store bio](/academy/actor-marketing-playbook/interact-with-users/your-store-bio). diff --git a/sources/academy/build-and-publish/promoting-your-actor/affiliates.md b/sources/academy/build-and-publish/promoting-your-actor/affiliates.md index ab27b94dda..931576d787 100644 --- a/sources/academy/build-and-publish/promoting-your-actor/affiliates.md +++ b/sources/academy/build-and-publish/promoting-your-actor/affiliates.md @@ -17,7 +17,7 @@ The program rewards collaboration with up to 30% recurring commission and up to The Apify Affiliate Program lets you promote three main offerings: 1. _Apify Store_: recommend Actors from the marketplace that help businesses automate lead generation, pricing intelligence, content aggregation, and more. -1. _Apify platform_: promote the platform's features, including scheduling, monitoring, data export options, proxies, and integrations. +1. _The Apify platform_: promote the platform's features, including scheduling, monitoring, data export options, proxies, and integrations. 1. _Professional services_: refer customers who need custom web scraping solutions to Apify's Professional Services team and earn up to $2,500 per closed deal. ### Commission structure diff --git a/sources/academy/build-and-publish/promoting-your-actor/blogs_and_blog_resources.md b/sources/academy/build-and-publish/promoting-your-actor/blogs_and_blog_resources.md index b68b621a93..14daf740ab 100644 --- a/sources/academy/build-and-publish/promoting-your-actor/blogs_and_blog_resources.md +++ b/sources/academy/build-and-publish/promoting-your-actor/blogs_and_blog_resources.md @@ -23,7 +23,7 @@ slug: /actor-marketing-playbook/promote-your-actor/blogs-and-blog-resources 4. Trends. If you’ve noticed emerging trends in web scraping or automation, write about them. Tie your Actor into these trends to highlight its relevance. 5. Feature announcements or updates. Have you recently added new features to your Actor? Write a blog post explaining how these features work and what makes them valuable. -🪄 These days, blog posts always need to be written with SEO in mind. Yeah, it's annoying to use keywords, but think of it this way: even if there's the most interesting customer story and amazing programming insights, but nobody can find it, it won't have the impact you want. Do try to optimize your posts with relevant keywords and phrases — across text, structure, and even images — to ensure they reach your target audience. +🪄 These days, blog posts always need to be written with SEO in mind. Yeah, it's annoying to use keywords, but think of it this way: even if there's the most interesting customer story and amazing programming insights, but nobody can find it, it won't have the impact you want. Do try to optimize your posts with relevant keywords and phrases - across text, structure, and even images - to ensure they reach your target audience. --- diff --git a/sources/academy/build-and-publish/promoting-your-actor/seo.md b/sources/academy/build-and-publish/promoting-your-actor/seo.md index a92e3ffe85..f2ef60288d 100644 --- a/sources/academy/build-and-publish/promoting-your-actor/seo.md +++ b/sources/academy/build-and-publish/promoting-your-actor/seo.md @@ -76,13 +76,13 @@ Now, you can expand the “People Also Ask” questions. Click on each question Another way to collect more keywords is to use the official Google Keyword Planner. Go to [Google Keyword Planner](https://ads.google.com/home/tools/keyword-planner/) and open the tool. You need a Google Ads account, so just create one for free if you don’t have one already. -After you’re in the tool, click on “Discover new keywords”, make sure you’re in the “Start with keywords” tab, enter your Actor's main function or purpose, and then select the United States as the region and English as the language. Click “Get results” to see keywords related to your actor. +After you’re in the tool, click on “Discover new keywords”, make sure you’re in the “Start with keywords” tab, enter your Actor's main function or purpose, and then select the United States as the region and English as the language. Click “Get results” to see keywords related to your Actor. Write them down. ### Ahrefs Keyword Generator -Go to [Ahrefs Keyword Generator](https://ahrefs.com/keyword-generator), enter your Actor's main function or purpose, and click “Find keywords.” You should see a list of keywords related to your actor. +Go to [Ahrefs Keyword Generator](https://ahrefs.com/keyword-generator), enter your Actor's main function or purpose, and click “Find keywords.” You should see a list of keywords related to your Actor. Write them down. diff --git a/sources/academy/build-and-publish/why_publish.md b/sources/academy/build-and-publish/why_publish.md index 7774cee15f..d7e61a6b21 100644 --- a/sources/academy/build-and-publish/why_publish.md +++ b/sources/academy/build-and-publish/why_publish.md @@ -58,7 +58,7 @@ Apify Store is a growing library of thousands of Actors, most created by communi ### Maintain quality -Public Actors require higher standards than private ones. Since users depend on your Actor, you'll need to commit to regular maintenance—reserve approximately 2 hours per week for bug fixes, updates, and user support. Thorough documentation is essential; write clear README files using simple language since users may not be developers. Set up automated testing or use manual testing to prevent user issues, and respond promptly to issues through the Issues tab, where your response time is publicly visible. Learn more about metrics determining quality in [Actor quality score documentation](/platform/actors/publishing/quality-score). +Public Actors require higher standards than private ones. Since users depend on your Actor, you'll need to commit to regular maintenance - reserve approximately 2 hours per week for bug fixes, updates, and user support. Thorough documentation is essential; write clear README files using simple language since users may not be developers. Set up automated testing or use manual testing to prevent user issues, and respond promptly to issues through the Issues tab, where your response time is publicly visible. Learn more about metrics determining quality in [Actor quality score documentation](/platform/actors/publishing/quality-score). ### When you need to change things diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md b/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md index 926dc70131..7c994d4e2a 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md @@ -180,7 +180,7 @@ Actor detail https://console.apify.com/actors/Yk1bieximsduYDydP Success: Actor was deployed to Apify cloud and built there. ``` -The URLs tell us that our Actor's ID is `Yk1bieximsduYDydP`. With this `actorId`, and our `token`, which is retrievable through **Settings > Integrations** on the Apify Console, we can construct a link which will call the Actor: +The URLs tell us that our Actor's ID is `Yk1bieximsduYDydP`. With this `actorId`, and our `token`, which is retrievable through **Settings > Integrations** in Apify Console, we can construct a link which will call the Actor: ```text https://api.apify.com/v2/acts/Yk1bieximsduYDydP/runs?token=YOUR_TOKEN_HERE diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md index e2cf1cf0d3..7ac65d6153 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md @@ -252,7 +252,7 @@ The one main difference is that the Apify client automatically uses [**exponenti **Q: How do you pass input when running an Actor or task via API?** -**A:** The input should be passed into the **body** of the request when running an actor/task via API. +**A:** The input should be passed into the **body** of the request when running an Actor/task via API. **Q: Do you need to install the `apify-client` npm package when already using the `apify` package?** diff --git a/sources/academy/platform/getting_started/actors.md b/sources/academy/platform/getting_started/actors.md index 2f8ebfc87c..6953bf3315 100644 --- a/sources/academy/platform/getting_started/actors.md +++ b/sources/academy/platform/getting_started/actors.md @@ -27,7 +27,7 @@ Though the majority of Actors that are currently on the Apify platform are scrap For a super quick and dirty understanding of what a published Actor looks like, and how it works, let's run an SEO audit of _apify.com_ using the [SEO audit Actor](https://apify.com/misceres/seo-audit-tool). -On the front page of the Actor, click the green **Try for free** button. If you're logged into your Apify account which you created during the [Getting started](./index.md) lesson, you'll be taken to the Apify Console and greeted with a page that looks like this: +On the front page of the Actor, click the green **Try for free** button. If you're logged into your Apify account which you created during the [Getting started](./index.md) lesson, you'll be taken to Apify Console and greeted with a page that looks like this: ![Actor configuration](./images/seo-actor-config.png) diff --git a/sources/academy/platform/getting_started/apify_api.md b/sources/academy/platform/getting_started/apify_api.md index 5414aa5b0d..9c719b0fbf 100644 --- a/sources/academy/platform/getting_started/apify_api.md +++ b/sources/academy/platform/getting_started/apify_api.md @@ -17,7 +17,7 @@ In this lesson, we'll be learning how to use the Apify API to call an Actor and Within one of your Actors on the [Apify Console](https://console.apify.com?asrc=developers_portal) (we'll use the **adding-actor** from the previous lesson), click on the **API** button in the top right-hand corner: -![The "API" button on an Actor's page on the Apify Console](./images/api-tab.jpg) +![The "API" button on an Actor's page in Apify Console](./images/api-tab.jpg) You should see a long list of API endpoints that you can copy and paste elsewhere, or even test right within the **API** modal. Go ahead and copy the endpoint labeled **Run Actor synchronously and get dataset items**. It should look something like this: diff --git a/sources/academy/platform/getting_started/apify_client.md b/sources/academy/platform/getting_started/apify_client.md index 4cc4b4e6d2..a1d8d531e8 100644 --- a/sources/academy/platform/getting_started/apify_client.md +++ b/sources/academy/platform/getting_started/apify_client.md @@ -62,7 +62,7 @@ from apify_client import ApifyClient In the last lesson, we ran the **adding-actor** and retrieved its dataset items. That's exactly what we're going to do now; however, by using the Apify client instead. -Before we can use the client though, we must create a new instance of the `ApifyClient` class and pass it our API token from the [**Integrations** page](https://console.apify.com/account?tab=integrations&asrc=developers_portal) on the Apify Console: +Before we can use the client though, we must create a new instance of the `ApifyClient` class and pass it our API token from the [**Integrations** page](https://console.apify.com/account?tab=integrations&asrc=developers_portal) on Apify Console: diff --git a/sources/academy/platform/getting_started/creating_actors.md b/sources/academy/platform/getting_started/creating_actors.md index a1c6505b43..31556cfd76 100644 --- a/sources/academy/platform/getting_started/creating_actors.md +++ b/sources/academy/platform/getting_started/creating_actors.md @@ -32,7 +32,7 @@ If you already have your code hosted by a Git provider, you can use it to create ![Create an Actor from Git repository](./images/create-actor-git.png) -You can also push your existing code from your local machine using [Apify CLI](/cli). This is useful when you develop your code locally and then you want to push it to the Apify Console to run the code as an Actor in the cloud. For this option, you'll need the [Apify CLI installed](/cli/docs/installation) on your machine. By clicking on the **Push your code using the Apify command-line interface (CLI)** button, you will be presented with instructions on how to push your code to the Apify Console. +You can also push your existing code from your local machine using [Apify CLI](/cli). This is useful when you develop your code locally and then you want to push it to Apify Console to run the code as an Actor in the cloud. For this option, you'll need the [Apify CLI installed](/cli/docs/installation) on your machine. By clicking on the **Push your code using the Apify command-line interface (CLI)** button, you will be presented with instructions on how to push your code to Apify Console. ![Push your code using the Apify CLI](./images/create-actor-cli.png) @@ -72,7 +72,7 @@ If you want to use the template locally, you can again use our [Apify CLI](/cli) :::tip Local development -Creating an Actor from a template locally is a great option if you want to develop your code using your local environment and IDE and then push the final solution back to the Apify Console. +Creating an Actor from a template locally is a great option if you want to develop your code using your local environment and IDE and then push the final solution back to Apify Console. ::: diff --git a/sources/academy/platform/getting_started/index.md b/sources/academy/platform/getting_started/index.md index ab8768919d..6c0f744232 100644 --- a/sources/academy/platform/getting_started/index.md +++ b/sources/academy/platform/getting_started/index.md @@ -1,12 +1,12 @@ --- title: Getting started -description: Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify Actors are born! +description: Get started with the Apify platform by creating an account and learning about Apify Console, which is where all Apify Actors are born! sidebar_position: 8 category: apify platform slug: /getting-started --- -**Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify Actors are born!** +**Get started with the Apify platform by creating an account and learning about Apify Console, which is where all Apify Actors are born!** --- diff --git a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md index ad154cd126..e95e0cb0b8 100644 --- a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md +++ b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md @@ -51,7 +51,7 @@ For tasks, we can switch the path from **acts** to **actor-tasks** and keep the https://api.apify.com/v2/actor-tasks/TASK_NAME_OR_ID/runs?token=YOUR_TOKEN ``` -If we send a correct POST request to one of these endpoints, the actor/actor-task will start just as if we had pressed the **Start** button on the Actor's page in the [Apify Console](https://console.apify.com). +If we send a correct POST request to one of these endpoints, the Actor or task will start just as if we had pressed the **Start** button on the Actor's page in the [Apify Console](https://console.apify.com). ### Additional settings {#additional-settings} @@ -198,7 +198,7 @@ For runs longer than 5 minutes, the process consists of three steps: ### Wait for the run to finish {#wait-for-the-run-to-finish} -There may be cases where we need to run the Actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the actor/task to finish. +There may be cases where we need to run the Actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the Actor or task to finish. - [`waitForFinish` parameter](#waitforfinish-parameter) - [Webhooks](#webhooks) diff --git a/sources/academy/tutorials/node_js/add_external_libraries_web_scraper.md b/sources/academy/tutorials/node_js/add_external_libraries_web_scraper.md index bca4c9e0fa..71737a665a 100644 --- a/sources/academy/tutorials/node_js/add_external_libraries_web_scraper.md +++ b/sources/academy/tutorials/node_js/add_external_libraries_web_scraper.md @@ -65,5 +65,3 @@ With jQuery, we're using the `$.getScript()` helper to fetch the script for us a ## Dealing with errors Some websites employ security measures that disallow loading external scripts within their pages. Luckily, those measures can be overridden with Web Scraper. If you are encountering errors saying that your library cannot be loaded due to a security policy, select the Ignore CORS and CSP input option at the very bottom of Web Scraper input and the errors should go away. - -Happy scraping! diff --git a/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md b/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md index d2314a7bd7..1ea20dea58 100644 --- a/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md +++ b/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md @@ -78,9 +78,9 @@ Note that an error can happen only in a few pages out of a thousand and look com Snapshots can tell you if: - A website has changed its layout. This can also mean A/B testing or different content for different locations. -- You have been blocked—you open a [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) or an **Access Denied** page. -- Data load later dynamically—the page is empty. -- The page was redirected—the content is different. +- You have been blocked - you open a [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) or an **Access Denied** page. +- Data load later dynamically - the page is empty. +- The page was redirected - the content is different. You can learn how to take snapshots in Puppeteer or Playwright in [this short lesson](../../webscraping/puppeteer_playwright/page/page_methods.md) diff --git a/sources/academy/tutorials/node_js/how_to_save_screenshots_puppeteer.md b/sources/academy/tutorials/node_js/how_to_save_screenshots_puppeteer.md index de23e4371c..98e2c043ef 100644 --- a/sources/academy/tutorials/node_js/how_to_save_screenshots_puppeteer.md +++ b/sources/academy/tutorials/node_js/how_to_save_screenshots_puppeteer.md @@ -22,7 +22,7 @@ const saveScreen = async (page, key = 'debug-screen') => { This function takes the parameters page (an instance of a puppeteer page) and key (your screen is stored under this key function in the Apify key-value store). -Because this is so common use-case Apify SDK has a utility function called [saveSnapshot](/sdk/js/docs/api/puppeteer#puppeteersavesnapshot) that does exactly this and a little bit more: +Because this is such a common use case, the Apify SDK has a utility function called [saveSnapshot](/sdk/js/docs/api/puppeteer#puppeteersavesnapshot) that does exactly this and a little bit more: - You can choose the quality of your screenshots (high-quality images take more size) @@ -58,4 +58,3 @@ After you call the function, your screen appears in the KEY-VALUE STORE tab in t If you have any questions, feel free to contact us in chat. -Happy coding! diff --git a/sources/academy/tutorials/node_js/multiple-runs-scrape.md b/sources/academy/tutorials/node_js/multiple-runs-scrape.md index be451307be..f58f9ff8bf 100644 --- a/sources/academy/tutorials/node_js/multiple-runs-scrape.md +++ b/sources/academy/tutorials/node_js/multiple-runs-scrape.md @@ -36,13 +36,13 @@ It will set up a request queue and a dataset that the other Actor runs will util The Orchestrator Actor orchestrates the parallel execution of scraper Actor runs. It runs multiple instances of the scraper Actor and passes the request queue and dataset to them. -For the Actor's base structure, we use Apify CLI and create a new Actor with the following command and use the [Empty TypeScript Actor template](https://apify.com/templates/ts-empty). +For the Actor's base structure, we use the Apify CLI and create a new Actor with the following command and use the [Empty TypeScript Actor template](https://apify.com/templates/ts-empty). ```shell apify create orchestrator-actor ```` -If you don't have Apify CLI installed, check out our installation [instructions](https://docs.apify.com/cli/docs/installation). +If you don't have the Apify CLI installed, check out the installation [instructions](https://docs.apify.com/cli/docs/installation). ### Input Configuration @@ -164,7 +164,7 @@ If you are pushing the Actor for the first time, you will need to [login to your ::: -By running this command, you will be prompted to provide the Actor ID, which you can find in the Apify Console under the Actors tab. +By running this command, you will be prompted to provide the Actor ID, which you can find in Apify Console under the Actors tab. ![orchestrator-actor.png](./images/orchestrator-actor.png) @@ -211,19 +211,19 @@ You need to push the Scraper Actor to Apify using the following command from the apify push ``` -After pushing the Scraper Actor to Apify, you must get the Actor ID from the Apify Console. +After pushing the Scraper Actor to Apify, you must get the Actor ID from Apify Console. ![scraper-actor.png](./images/scraper-actor.png) ## Run orchestration in Apify Console -Once you have the Orchestrator Actor and Scraper Actor pushed to Apify, you can run the Orchestrator Actor in the Apify Console. +Once you have the Orchestrator Actor and Scraper Actor pushed to Apify, you can run the Orchestrator Actor in Apify Console. You can set the input for the Orchestrator Actor to specify the number of parallel runs and the target Actor ID, input, and run options. After you hit the **Start** button, the Orchestrator Actor will start the parallel runs of the Scraper Actor. ![orchestrator-actor-input.png](./images/orchestrator-actor-input.png) -After starting the Orchestrator Actor, you will see the parallel runs initiated in the Apify Console. +After starting the Orchestrator Actor, you will see the parallel runs initiated in Apify Console. ![scraper-actor-runs.png](./images/scraper-actor-runs.png) diff --git a/sources/academy/tutorials/node_js/when_to_use_puppeteer_scraper.md b/sources/academy/tutorials/node_js/when_to_use_puppeteer_scraper.md index c73105246f..4e8599a2b9 100644 --- a/sources/academy/tutorials/node_js/when_to_use_puppeteer_scraper.md +++ b/sources/academy/tutorials/node_js/when_to_use_puppeteer_scraper.md @@ -15,7 +15,7 @@ Puppeteer is a JavaScript program that's used to control the browser and by cont _Robot browsers can be detected in numerous ways.. But there are no ways to tell if a specific mouse click was made by a user or a robot._ -Ok, so both Web Scraper and Puppeteer Scraper use Puppeteer to give commands to Chrome. Where's the difference? It's called the execution environment. +Okay, so both Web Scraper and Puppeteer Scraper use Puppeteer to give commands to Chrome. Where's the difference? It's called the execution environment. ## Execution environment @@ -28,7 +28,7 @@ _This does not mean that you can't execute in-browser code with Puppeteer Scrape ## Practical differences -Ok, cool, different environments, but how does that help you scrape stuff? Actually, quite a lot. Some things you just can't do from within the browser, but you can do them with Puppeteer. We will not attempt to create an exhaustive list, but rather show you some very useful features that we use every day in our scraping. +Okay, cool, different environments, but how does that help you scrape stuff? Actually, quite a lot. Some things you just can't do from within the browser, but you can do them with Puppeteer. We will not attempt to create an exhaustive list, but rather show you some very useful features that we use every day in our scraping. ## Evaluating in-browser code @@ -45,7 +45,7 @@ The `context.page.evaluate()` call executes the provided function in the browser _See the_ `page.evaluate()` _[documentation](https://pptr.dev/#?product=Puppeteer&show=api-pageevaluatepagefunction-args) for info on how to pass variables from Node.js to browser._ -With the help of Apify SDK, we can even inject jQuery into the browser. You can use the `Pre goto function` input option to manipulate the page's environment before it loads. +With the help of the Apify SDK, we can even inject jQuery into the browser. You can use the `Pre goto function` input option to manipulate the page's environment before it loads. ```js async function preGotoFunction({ request, page, Apify }) { diff --git a/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md b/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md index 4a7c265eeb..2612cf29fd 100644 --- a/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md +++ b/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md @@ -49,7 +49,7 @@ Some websites also provide an HTML version, to help indexing bots find new conte /sitemap.html /sitemap_index -Apify provides the [Sitemap Sniffer](https://apify.com/vaclavrut/sitemap-sniffer), an open source actor that scans the URL variations automatically for you so that you don't have to check them manually. +Apify provides the [Sitemap Sniffer](https://apify.com/vaclavrut/sitemap-sniffer), an open source Actor that scans the URL variations automatically for you so that you don't have to check them manually. ## How to set up HTTP requests to download sitemaps diff --git a/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md b/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md index 618b605fb8..ccd82fa71c 100644 --- a/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md +++ b/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md @@ -7,7 +7,7 @@ slug: /advanced-web-scraping/crawling/crawling-with-search In this lesson, we will start with a simpler example of scraping HTML based websites with limited pagination. -Limiting pagination is a common practice on e-commerce sites. It makes sense: a real user will never want to look through more than 200 pages of results – only bots love unlimited pagination. Fortunately, there are ways to overcome this limit while keeping our code clean and generic. +Limiting pagination is a common practice on e-commerce sites. It makes sense: a real user will never want to look through more than 200 pages of results - only bots love unlimited pagination. Fortunately, there are ways to overcome this limit while keeping our code clean and generic. ![Pagination in on Google search results page](./images/pagination.png) diff --git a/sources/academy/webscraping/scraping_basics_javascript/01_devtools_inspecting.md b/sources/academy/webscraping/scraping_basics_javascript/01_devtools_inspecting.md index 9c7681761c..bfdbac24b7 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/01_devtools_inspecting.md +++ b/sources/academy/webscraping/scraping_basics_javascript/01_devtools_inspecting.md @@ -14,7 +14,7 @@ import Exercises from '../scraping_basics/_exercises.mdx'; --- -A browser is the most complete tool for navigating websites. Scrapers are like automated browsers—and sometimes, they actually are automated browsers. The key difference? There's no user to decide where to go or eyes to see what's displayed. Everything has to be pre-programmed. +A browser is the most complete tool for navigating websites. Scrapers are like automated browsers - and sometimes, they actually are automated browsers. The key difference? There's no user to decide where to go or eyes to see what's displayed. Everything has to be pre-programmed. All modern browsers provide developer tools, or _DevTools_, for website developers to debug their work. We'll use them to understand how websites are structured and identify the behavior our scraper needs to mimic. Here's the typical workflow for creating a scraper: @@ -28,7 +28,7 @@ Now let's spend some time figuring out what the detective work in step 1 is abou Google Chrome is currently the most popular browser, and many others use the same core. That's why we'll focus on [Chrome DevTools](https://developer.chrome.com/docs/devtools) here. However, the steps are similar in other browsers, as Safari has its [Web Inspector](https://developer.apple.com/documentation/safari-developer-tools/web-inspector) and Firefox also has [DevTools](https://firefox-source-docs.mozilla.org/devtools-user/). -Now let's peek behind the scenes of a real-world website—say, Wikipedia. We'll open Google Chrome and visit [wikipedia.org](https://www.wikipedia.org/). Then, let's press **F12**, or right-click anywhere on the page and select **Inspect**. +Now let's peek behind the scenes of a real-world website - say, Wikipedia. We'll open Google Chrome and visit [wikipedia.org](https://www.wikipedia.org/). Then, let's press **F12**, or right-click anywhere on the page and select **Inspect**. ![Wikipedia with Chrome DevTools open](../scraping_basics/images/devtools-wikipedia.png) @@ -51,7 +51,7 @@ Think of [HTML](https://developer.mozilla.org/en-US/docs/Learn/HTML) elements as ``` -HTML, a markup language, describes how everything on a page is organized, how elements relate to each other, and what they mean. It doesn't define how elements should look—that's where [CSS](https://developer.mozilla.org/en-US/docs/Learn/CSS) comes in. CSS is like the velvet covering the frame. Using styles, we can select elements and assign rules that tell the browser how they should appear. For instance, we can style all elements with `heading` in their `class` attribute to make the text blue and uppercase. +HTML, a markup language, describes how everything on a page is organized, how elements relate to each other, and what they mean. It doesn't define how elements should look - that's where [CSS](https://developer.mozilla.org/en-US/docs/Learn/CSS) comes in. CSS is like the velvet covering the frame. Using styles, we can select elements and assign rules that tell the browser how they should appear. For instance, we can style all elements with `heading` in their `class` attribute to make the text blue and uppercase. ```css .heading { @@ -62,7 +62,7 @@ HTML, a markup language, describes how everything on a page is organized, how el While HTML and CSS describe what the browser should display, JavaScript adds interaction to the page. In DevTools, the **Console** tab allows ad-hoc experimenting with JavaScript. -If you don't see it, press ESC to toggle the Console. Running commands in the Console lets us manipulate the loaded page—we’ll try this shortly. +If you don't see it, press ESC to toggle the Console. Running commands in the Console lets us manipulate the loaded page - we’ll try this shortly. ![Console in Chrome DevTools](../scraping_basics/images/devtools-console.png) @@ -136,9 +136,9 @@ When we change elements in the Console, those changes reflect immediately on the ![Changing textContent in Chrome DevTools Console](../scraping_basics/images/devtools-console-textcontent.png) -But don't worry—we haven't hacked Wikipedia. The change only happens in our browser. If we reload the page, the change will disappear. This, however, is an easy way to craft a screenshot with fake content. That's why screenshots shouldn't be trusted as evidence. +But don't worry - we haven't hacked Wikipedia. The change only happens in our browser. If we reload the page, the change will disappear. This, however, is an easy way to craft a screenshot with fake content. That's why screenshots shouldn't be trusted as evidence. -We're not here for playing around with elements, though—we want to create a scraper for an e-commerce website to watch prices. In the next lesson, we'll examine the website and use CSS selectors to locate HTML elements containing the data we need. +We're not here for playing around with elements, though - we want to create a scraper for an e-commerce website to watch prices. In the next lesson, we'll examine the website and use CSS selectors to locate HTML elements containing the data we need. --- diff --git a/sources/academy/webscraping/scraping_basics_javascript/02_devtools_locating_elements.md b/sources/academy/webscraping/scraping_basics_javascript/02_devtools_locating_elements.md index b0f784c603..485829cb05 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/02_devtools_locating_elements.md +++ b/sources/academy/webscraping/scraping_basics_javascript/02_devtools_locating_elements.md @@ -38,7 +38,7 @@ The page displays a grid of product cards, each showing a product's title and pi ![Selecting an element with DevTools](../scraping_basics/images/devtools-product-title.png) -Next, let's find all the elements containing details about this subwoofer—its price, number of reviews, image, and more. +Next, let's find all the elements containing details about this subwoofer - its price, number of reviews, image, and more. In the **Elements** tab, we'll move our cursor up from the `a` element containing the subwoofer's title. On the way, we'll hover over each element until we highlight the entire product card. Alternatively, we can use the arrow-up key. The `div` element we land on is the **parent element**, and all nested elements are its **child elements**. @@ -106,13 +106,13 @@ You can combine selectors to narrow results. For example, `p.lead` matches `p` e ``` -How did we know `.product-item` selects a product card? By inspecting the markup of the product card element. After checking its classes, we chose the one that best fit our purpose. Testing in the **Console** confirmed it—selecting by the most descriptive class worked. +How did we know `.product-item` selects a product card? By inspecting the markup of the product card element. After checking its classes, we chose the one that best fit our purpose. Testing in the **Console** confirmed it - selecting by the most descriptive class worked. ## Choosing good selectors Multiple approaches often exist for creating a CSS selector that targets the element we want. We should pick selectors that are simple, readable, unique, and semantically tied to the data. These are **resilient selectors**. They're the most reliable and likely to survive website updates. We better avoid randomly generated attributes like `class="F4jsL8"`, as they tend to change without warning. -The product card has four classes: `product-item`, `product-item--vertical`, `1/3--tablet-and-up`, and `1/4--desk`. Only the first one checks all the boxes. A product card *is* a product item, after all. The others seem more about styling—defining how the element looks on the screen—and are probably tied to CSS rules. +The product card has four classes: `product-item`, `product-item--vertical`, `1/3--tablet-and-up`, and `1/4--desk`. Only the first one checks all the boxes. A product card *is* a product item, after all. The others seem more about styling - defining how the element looks on the screen - and are probably tied to CSS rules. This class is also unique enough in the page's context. If it were something generic like `item`, there would be a higher risk that developers of the website might use it for unrelated elements. In the **Elements** tab, we can see a parent element `product-list` that contains all the product cards marked as `product-item`. This structure aligns with the data we're after. @@ -120,7 +120,7 @@ This class is also unique enough in the page's context. If it were something gen ## Locating all product cards -In the **Console**, hovering our cursor over objects representing HTML elements highlights the corresponding elements on the page. This way we can verify that when we query `.product-item`, the result represents the JBL Flip speaker—the first product card in the list. +In the **Console**, hovering our cursor over objects representing HTML elements highlights the corresponding elements on the page. This way we can verify that when we query `.product-item`, the result represents the JBL Flip speaker - the first product card in the list. ![Highlighting a querySelector() result](../scraping_basics/images/devtools-hover-queryselector.png) @@ -132,7 +132,7 @@ document.querySelectorAll('.product-item'); The returned value is a [`NodeList`](https://developer.mozilla.org/en-US/docs/Web/API/NodeList), a collection of nodes. Browsers understand an HTML document as a tree of nodes. Most nodes are HTML elements, but there are also text nodes for plain text, and others. -We'll expand the result by clicking the small arrow, then hover our cursor over the third element in the list. Indexing starts at 0, so the third element is at index 2. There it is—the product card for the subwoofer! +We'll expand the result by clicking the small arrow, then hover our cursor over the third element in the list. Indexing starts at 0, so the third element is at index 2. There it is - the product card for the subwoofer! ![Highlighting a querySelectorAll() result](../scraping_basics/images/devtools-hover-queryselectorall.png) @@ -151,7 +151,7 @@ Even though we're just playing in the browser's **Console**, we're inching close ### Locate headings on Wikipedia's Main Page -On English Wikipedia's [Main Page](https://en.wikipedia.org/wiki/Main_Page), use CSS selectors in the **Console** to list the HTML elements representing headings of the colored boxes (including the grey ones). +On English Wikipedia's [Main Page](https://en.wikipedia.org/wiki/Main_Page), use CSS selectors in the **Console** to list the HTML elements representing headings of the colored boxes (including the gray ones). ![Wikipedia's Main Page headings](../scraping_basics/images/devtools-exercise-wikipedia.png) diff --git a/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md b/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md index fbd3fa1de9..fedd418abd 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md +++ b/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md @@ -206,7 +206,7 @@ Sony XBR-950G BRAVIA 4K HDR Ultra HD TV | From $1,398.00 ... ``` -Great! We have managed to use CSS selectors and walk the HTML tree to get a list of product titles and prices. But wait a second—what's `From $1,398.00`? One does not simply scrape a price! We'll need to clean that. But that's a job for the next lesson, which is about extracting data. +Great! We have managed to use CSS selectors and walk the HTML tree to get a list of product titles and prices. But wait a second - what's `From $1,398.00`? One does not simply scrape a price! We'll need to clean that. But that's a job for the next lesson, which is about extracting data. --- diff --git a/sources/academy/webscraping/scraping_basics_javascript/10_crawling.md b/sources/academy/webscraping/scraping_basics_javascript/10_crawling.md index fc55568cf2..ba390f3706 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/10_crawling.md +++ b/sources/academy/webscraping/scraping_basics_javascript/10_crawling.md @@ -198,7 +198,7 @@ If we run the program now, it'll take longer to finish since it's making 24 more ## Extracting price -Scraping the vendor's name is nice, but the main reason we started checking the detail pages in the first place was to figure out how to get a price for each product. From the product listing, we could only scrape the min price, and remember—we're building a Node.js app to track prices! +Scraping the vendor's name is nice, but the main reason we started checking the detail pages in the first place was to figure out how to get a price for each product. From the product listing, we could only scrape the min price, and remember - we're building a Node.js app to track prices! Looking at the [Sony XBR-950G BRAVIA](https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv), it's clear that the listing only shows min prices, because some products have variants, each with a different price. And different stock availability. And different SKUs… diff --git a/sources/academy/webscraping/scraping_basics_javascript/11_scraping_variants.md b/sources/academy/webscraping/scraping_basics_javascript/11_scraping_variants.md index 5c256f17ae..1b8861ae06 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/11_scraping_variants.md +++ b/sources/academy/webscraping/scraping_basics_javascript/11_scraping_variants.md @@ -135,7 +135,7 @@ const data = itemLists.flat(); After modifying the loop, we also updated how we collect the items into the `data` array. Since the loop now produces an array of items per product, the result of `await Promise.all()` is an array of arrays. We use [`.flat()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/flat) to merge them into a single, non-nested array. -If we run the program now, we'll see 34 items in total. Some items don't have variants, so they won't have a variant name. However, they should still have a price set—our scraper should already have that info from the product listing page. +If we run the program now, we'll see 34 items in total. Some items don't have variants, so they won't have a variant name. However, they should still have a price set - our scraper should already have that info from the product listing page. ```json title=products.json diff --git a/sources/academy/webscraping/scraping_basics_javascript/12_framework.md b/sources/academy/webscraping/scraping_basics_javascript/12_framework.md index 8c9a687473..6589d22dd0 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/12_framework.md +++ b/sources/academy/webscraping/scraping_basics_javascript/12_framework.md @@ -137,7 +137,7 @@ const crawler = new CheerioCrawler({ }); ``` -Now for the price. We're not doing anything new here—just copy-paste the code from our old scraper. The only change will be in the selector. +Now for the price. We're not doing anything new here - just copy-paste the code from our old scraper. The only change will be in the selector. In `oldindex.js`, we look for `.price` within a `$productItem` object representing a product card. Here, we're looking for `.price` within the entire product detail page. It's better to be more specific so we don't accidentally match another price on the same page: @@ -245,7 +245,7 @@ await crawler.run(['https://warehouse-theme-metal.myshopify.com/collections/sale If we run this scraper, we should get the same data for the 24 products as before. Crawlee has saved us a lot of effort by managing downloading, parsing, and parallelization. -Crawlee doesn't do much to help with locating and extracting the data—that part of the code remains almost the same, framework or not. This is because the detective work of finding and extracting the right data is the core value of custom scrapers. With Crawlee, we can focus on just that while letting the framework take care of everything else. +Crawlee doesn't do much to help with locating and extracting the data - that part of the code remains almost the same, framework or not. This is because the detective work of finding and extracting the right data is the core value of custom scrapers. With Crawlee, we can focus on just that while letting the framework take care of everything else. ## Saving data diff --git a/sources/academy/webscraping/scraping_basics_javascript/13_platform.md b/sources/academy/webscraping/scraping_basics_javascript/13_platform.md index 4bb57a84e8..7aa9bfea54 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_javascript/13_platform.md @@ -18,13 +18,13 @@ Before starting with a scraping platform, let's highlight a few caveats in our c - _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. And if we want alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day. - _No monitoring:_ If we have a spare server or a Raspberry Pi lying around, we could use [cron](https://en.wikipedia.org/wiki/Cron) to schedule it. But even then, we'd have little insight into whether it ran successfully, what errors or warnings occurred, how long it took, or what resources it used. - _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky since different analysis tools often require different formats. -- _Anti-scraping risks:_ If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually, they'd block that too—risking seriously annoying our barista. +- _Anti-scraping risks:_ If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually, they'd block that too - risking seriously annoying our barista. In this lesson, we'll use a platform to address all of these issues. Generic cloud platforms like [GitHub Actions](https://github.com/features/actions) can work for simple scenarios. But platforms dedicated to scraping, like [Apify](https://apify.com/), offer extra features such as monitoring scrapers, managing retrieved data, and overcoming anti-scraping measures. :::info Why Apify -Scraping platforms come in many varieties, offering a wide range of tools and approaches. As the course authors, we're obviously biased toward Apify—we think it's both powerful and complete. +Scraping platforms come in many varieties, offering a wide range of tools and approaches. As the course authors, we're obviously biased toward Apify - we think it's both powerful and complete. That said, the main goal of this lesson is to show how deploying to _any platform_ can make life easier. Plus, everything we cover here fits within [Apify's free tier](https://apify.com/pricing). @@ -32,7 +32,7 @@ That said, the main goal of this lesson is to show how deploying to _any platfor ## Registering -First, let's [create a new Apify account](https://console.apify.com/sign-up). We'll go through a few checks to confirm we're human and our email is valid—annoying but necessary to prevent abuse of the platform. +First, let's [create a new Apify account](https://console.apify.com/sign-up). We'll go through a few checks to confirm we're human and our email is valid - annoying but necessary to prevent abuse of the platform. Apify serves both as an infrastructure where to privately deploy and run own scrapers, and as a marketplace, where anyone can offer their ready scrapers to others for rent. But let's hold off on exploring Apify Store for now. @@ -64,7 +64,7 @@ Success: You are logged in to Apify as user1234! ## Turning our program to an Actor -Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://docs.apify.com/platform/actors)—a standardized container with designated places for input and output. +Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://docs.apify.com/platform/actors) - a standardized container with designated places for input and output. Many [Actor templates](https://apify.com/templates/categories/javascript) simplify the setup for new projects. We'll skip those, as we're about to package an existing program. @@ -173,7 +173,7 @@ Actor build detail https://console.apify.com/actors/a123bCDefghiJkLMN#/builds/0. ? Do you want to open the Actor detail in your browser? (Y/n) ``` -After opening the link in our browser, assuming we're logged in, we should see the **Source** screen on the Actor's detail page. We'll go to the **Input** tab of that screen. We won't change anything—just hit **Start**, and we should see logs similar to what we see locally, but this time our scraper will be running in the cloud. +After opening the link in our browser, assuming we're logged in, we should see the **Source** screen on the Actor's detail page. We'll go to the **Input** tab of that screen. We won't change anything - just hit **Start**, and we should see logs similar to what we see locally, but this time our scraper will be running in the cloud. ![Actor's detail page, screen Source, tab Input](../scraping_basics/images/actor-input.webp) @@ -189,7 +189,7 @@ We don't need to click buttons to download the data. It's possible to retrieve i ## Running the scraper periodically -Now that our scraper is deployed, let's automate its execution. In the Apify web interface, we'll go to [Schedules](https://console.apify.com/schedules). Let's click **Create new**, review the periodicity (default: daily), and specify the Actor to run. Then we'll click **Enable**—that's it! +Now that our scraper is deployed, let's automate its execution. In the Apify web interface, we'll go to [Schedules](https://console.apify.com/schedules). Let's click **Create new**, review the periodicity (default: daily), and specify the Actor to run. Then we'll click **Enable** - that's it! From now on, the Actor will execute daily. We can inspect each run, view logs, check collected data, [monitor stats and charts](https://docs.apify.com/platform/monitoring), and even set up alerts. @@ -322,7 +322,7 @@ We'll leave it as is and click **Start**. This time, the logs should show `Using ## Congratulations! -We've reached the end of the course—congratulations! Together, we've built a program that: +We've reached the end of the course - congratulations! Together, we've built a program that: - Crawls a shop and extracts product and pricing data. - Exports the results in several formats. @@ -331,4 +331,4 @@ We've reached the end of the course—congratulations! Together, we've built a p - Executes periodically without manual intervention, collecting data over time. - Uses proxies to avoid being blocked. -We hope this serves as a solid foundation for your next scraping project. Perhaps you'll even [start publishing scrapers](https://docs.apify.com/platform/actors/publishing) for others to use—for a fee? +We hope this serves as a solid foundation for your next scraping project. Perhaps you'll even [start publishing scrapers](https://docs.apify.com/platform/actors/publishing) for others to use - for a fee? diff --git a/sources/academy/webscraping/scraping_basics_javascript/index.md b/sources/academy/webscraping/scraping_basics_javascript/index.md index 8c4e95389b..906ab0892f 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/index.md +++ b/sources/academy/webscraping/scraping_basics_javascript/index.md @@ -56,7 +56,7 @@ Scrapers are programs specifically designed to mine data from the internet. Poin ### Why become a scraper dev -As a scraper developer, you are not limited by whether certain data is available programmatically through an official API—the entire web becomes your API! Here are some things you can do if you understand scraping: +As a scraper developer, you are not limited by whether certain data is available programmatically through an official API - the entire web becomes your API! Here are some things you can do if you understand scraping: - Improve your productivity by building personal tools, such as your own real estate or rare sneakers watchdog. - Companies can hire you to build custom scrapers mining data important for their business. diff --git a/sources/academy/webscraping/scraping_basics_legacy/crawling/first_crawl.md b/sources/academy/webscraping/scraping_basics_legacy/crawling/first_crawl.md index 2e87a94b42..ace0db79b0 100644 --- a/sources/academy/webscraping/scraping_basics_legacy/crawling/first_crawl.md +++ b/sources/academy/webscraping/scraping_basics_legacy/crawling/first_crawl.md @@ -14,7 +14,7 @@ import LegacyAdmonition from '../../scraping_basics/_legacy.mdx'; --- -In the previous lessons, we learned what crawling is and how to extract URLs from a page's HTML. The only thing that remains is to write the code—let's get right to it! +In the previous lessons, we learned what crawling is and how to extract URLs from a page's HTML. The only thing that remains is to write the code - let's get right to it! > If the code starts to look too complex to you, don't worry. We're showing it for educational purposes, so that you can learn how crawling works. Near the end of this course, we'll show you a much easier and faster way to crawl, using a specialized scraping library. If you want, you can skip the details and [go there now](./pro_scraping.md). diff --git a/sources/academy/webscraping/scraping_basics_legacy/data_extraction/save_to_csv.md b/sources/academy/webscraping/scraping_basics_legacy/data_extraction/save_to_csv.md index acb2bf0536..f3da67c6c6 100644 --- a/sources/academy/webscraping/scraping_basics_legacy/data_extraction/save_to_csv.md +++ b/sources/academy/webscraping/scraping_basics_legacy/data_extraction/save_to_csv.md @@ -135,7 +135,7 @@ const csv = parse(results); writeFileSync('products.csv', csv); // <---- added writing of CSV to file ``` -Finally, run it with `node main.js` in your terminal. After running it, you will find the **products.csv** file in your project folder. And when you open it with Excel/Google Sheets—voila! +Finally, run it with `node main.js` in your terminal. After running it, you will find the **products.csv** file in your project folder. And when you open it with Excel/Google Sheets - voila! ![Displaying CSV data in Google Sheets](./images/csv-data-in-sheets.png) diff --git a/sources/academy/webscraping/scraping_basics_python/01_devtools_inspecting.md b/sources/academy/webscraping/scraping_basics_python/01_devtools_inspecting.md index 9a06641d28..b55842acce 100644 --- a/sources/academy/webscraping/scraping_basics_python/01_devtools_inspecting.md +++ b/sources/academy/webscraping/scraping_basics_python/01_devtools_inspecting.md @@ -11,7 +11,7 @@ import Exercises from '../scraping_basics/_exercises.mdx'; --- -A browser is the most complete tool for navigating websites. Scrapers are like automated browsers—and sometimes, they actually are automated browsers. The key difference? There's no user to decide where to go or eyes to see what's displayed. Everything has to be pre-programmed. +A browser is the most complete tool for navigating websites. Scrapers are like automated browsers - and sometimes, they actually are automated browsers. The key difference? There's no user to decide where to go or eyes to see what's displayed. Everything has to be pre-programmed. All modern browsers provide developer tools, or _DevTools_, for website developers to debug their work. We'll use them to understand how websites are structured and identify the behavior our scraper needs to mimic. Here's the typical workflow for creating a scraper: @@ -25,7 +25,7 @@ Now let's spend some time figuring out what the detective work in step 1 is abou Google Chrome is currently the most popular browser, and many others use the same core. That's why we'll focus on [Chrome DevTools](https://developer.chrome.com/docs/devtools) here. However, the steps are similar in other browsers, as Safari has its [Web Inspector](https://developer.apple.com/documentation/safari-developer-tools/web-inspector) and Firefox also has [DevTools](https://firefox-source-docs.mozilla.org/devtools-user/). -Now let's peek behind the scenes of a real-world website—say, Wikipedia. We'll open Google Chrome and visit [wikipedia.org](https://www.wikipedia.org/). Then, let's press **F12**, or right-click anywhere on the page and select **Inspect**. +Now let's peek behind the scenes of a real-world website - say, Wikipedia. We'll open Google Chrome and visit [wikipedia.org](https://www.wikipedia.org/). Then, let's press **F12**, or right-click anywhere on the page and select **Inspect**. ![Wikipedia with Chrome DevTools open](../scraping_basics/images/devtools-wikipedia.png) @@ -48,7 +48,7 @@ Think of [HTML](https://developer.mozilla.org/en-US/docs/Learn/HTML) elements as ``` -HTML, a markup language, describes how everything on a page is organized, how elements relate to each other, and what they mean. It doesn't define how elements should look—that's where [CSS](https://developer.mozilla.org/en-US/docs/Learn/CSS) comes in. CSS is like the velvet covering the frame. Using styles, we can select elements and assign rules that tell the browser how they should appear. For instance, we can style all elements with `heading` in their `class` attribute to make the text blue and uppercase. +HTML, a markup language, describes how everything on a page is organized, how elements relate to each other, and what they mean. It doesn't define how elements should look - that's where [CSS](https://developer.mozilla.org/en-US/docs/Learn/CSS) comes in. CSS is like the velvet covering the frame. Using styles, we can select elements and assign rules that tell the browser how they should appear. For instance, we can style all elements with `heading` in their `class` attribute to make the text blue and uppercase. ```css .heading { @@ -59,7 +59,7 @@ HTML, a markup language, describes how everything on a page is organized, how el While HTML and CSS describe what the browser should display, [JavaScript](https://developer.mozilla.org/en-US/docs/Learn/JavaScript) is a general-purpose programming language that adds interaction to the page. -In DevTools, the **Console** tab allows ad-hoc experimenting with JavaScript. If you don't see it, press ESC to toggle the Console. Running commands in the Console lets us manipulate the loaded page—we’ll try this shortly. +In DevTools, the **Console** tab allows ad-hoc experimenting with JavaScript. If you don't see it, press ESC to toggle the Console. Running commands in the Console lets us manipulate the loaded page - we’ll try this shortly. ![Console in Chrome DevTools](../scraping_basics/images/devtools-console.png) @@ -133,9 +133,9 @@ When we change elements in the Console, those changes reflect immediately on the ![Changing textContent in Chrome DevTools Console](../scraping_basics/images/devtools-console-textcontent.png) -But don't worry—we haven't hacked Wikipedia. The change only happens in our browser. If we reload the page, the change will disappear. This, however, is an easy way to craft a screenshot with fake content. That's why screenshots shouldn't be trusted as evidence. +But don't worry - we haven't hacked Wikipedia. The change only happens in our browser. If we reload the page, the change will disappear. This, however, is an easy way to craft a screenshot with fake content. That's why screenshots shouldn't be trusted as evidence. -We're not here for playing around with elements, though—we want to create a scraper for an e-commerce website to watch prices. In the next lesson, we'll examine the website and use CSS selectors to locate HTML elements containing the data we need. +We're not here for playing around with elements, though - we want to create a scraper for an e-commerce website to watch prices. In the next lesson, we'll examine the website and use CSS selectors to locate HTML elements containing the data we need. --- diff --git a/sources/academy/webscraping/scraping_basics_python/02_devtools_locating_elements.md b/sources/academy/webscraping/scraping_basics_python/02_devtools_locating_elements.md index 515cf1f5e1..4a08cf9ae3 100644 --- a/sources/academy/webscraping/scraping_basics_python/02_devtools_locating_elements.md +++ b/sources/academy/webscraping/scraping_basics_python/02_devtools_locating_elements.md @@ -35,7 +35,7 @@ The page displays a grid of product cards, each showing a product's title and pi ![Selecting an element with DevTools](../scraping_basics/images/devtools-product-title.png) -Next, let's find all the elements containing details about this subwoofer—its price, number of reviews, image, and more. +Next, let's find all the elements containing details about this subwoofer - its price, number of reviews, image, and more. In the **Elements** tab, we'll move our cursor up from the `a` element containing the subwoofer's title. On the way, we'll hover over each element until we highlight the entire product card. Alternatively, we can use the arrow-up key. The `div` element we land on is the **parent element**, and all nested elements are its **child elements**. @@ -55,7 +55,7 @@ The `class` attribute can hold multiple values separated by whitespace. This par ## Programmatically locating a product card -Let's jump into the **Console** and write some JavaScript. Don't worry—we don't need to know the language, and yes, this is a helpful step on our journey to creating a scraper in Python. +Let's jump into the **Console** and write some JavaScript. Don't worry - we don't need to know the language, and yes, this is a helpful step on our journey to creating a scraper in Python. In browsers, JavaScript represents the current page as the [`Document`](https://developer.mozilla.org/en-US/docs/Web/API/Document) object, accessible via `document`. This object offers many useful methods, including [`querySelector()`](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector). This method takes a CSS selector as a string and returns the first HTML element that matches. We'll try typing this into the **Console**: @@ -105,13 +105,13 @@ You can combine selectors to narrow results. For example, `p.lead` matches `p` e ``` -How did we know `.product-item` selects a product card? By inspecting the markup of the product card element. After checking its classes, we chose the one that best fit our purpose. Testing in the **Console** confirmed it—selecting by the most descriptive class worked. +How did we know `.product-item` selects a product card? By inspecting the markup of the product card element. After checking its classes, we chose the one that best fit our purpose. Testing in the **Console** confirmed it - selecting by the most descriptive class worked. ## Choosing good selectors Multiple approaches often exist for creating a CSS selector that targets the element we want. We should pick selectors that are simple, readable, unique, and semantically tied to the data. These are **resilient selectors**. They're the most reliable and likely to survive website updates. We better avoid randomly generated attributes like `class="F4jsL8"`, as they tend to change without warning. -The product card has four classes: `product-item`, `product-item--vertical`, `1/3--tablet-and-up`, and `1/4--desk`. Only the first one checks all the boxes. A product card *is* a product item, after all. The others seem more about styling—defining how the element looks on the screen—and are probably tied to CSS rules. +The product card has four classes: `product-item`, `product-item--vertical`, `1/3--tablet-and-up`, and `1/4--desk`. Only the first one checks all the boxes. A product card *is* a product item, after all. The others seem more about styling - defining how the element looks on the screen - and are probably tied to CSS rules. This class is also unique enough in the page's context. If it were something generic like `item`, there would be a higher risk that developers of the website might use it for unrelated elements. In the **Elements** tab, we can see a parent element `product-list` that contains all the product cards marked as `product-item`. This structure aligns with the data we're after. @@ -119,7 +119,7 @@ This class is also unique enough in the page's context. If it were something gen ## Locating all product cards -In the **Console**, hovering our cursor over objects representing HTML elements highlights the corresponding elements on the page. This way we can verify that when we query `.product-item`, the result represents the JBL Flip speaker—the first product card in the list. +In the **Console**, hovering our cursor over objects representing HTML elements highlights the corresponding elements on the page. This way we can verify that when we query `.product-item`, the result represents the JBL Flip speaker - the first product card in the list. ![Highlighting a querySelector() result](../scraping_basics/images/devtools-hover-queryselector.png) @@ -131,7 +131,7 @@ document.querySelectorAll('.product-item'); The returned value is a [`NodeList`](https://developer.mozilla.org/en-US/docs/Web/API/NodeList), a collection of nodes. Browsers understand an HTML document as a tree of nodes. Most nodes are HTML elements, but there are also text nodes for plain text, and others. -We'll expand the result by clicking the small arrow, then hover our cursor over the third element in the list. Indexing starts at 0, so the third element is at index 2. There it is—the product card for the subwoofer! +We'll expand the result by clicking the small arrow, then hover our cursor over the third element in the list. Indexing starts at 0, so the third element is at index 2. There it is - the product card for the subwoofer! ![Highlighting a querySelectorAll() result](../scraping_basics/images/devtools-hover-queryselectorall.png) @@ -150,7 +150,7 @@ Even though we're just playing with JavaScript in the browser's **Console**, we' ### Locate headings on Wikipedia's Main Page -On English Wikipedia's [Main Page](https://en.wikipedia.org/wiki/Main_Page), use CSS selectors in the **Console** to list the HTML elements representing headings of the colored boxes (including the grey ones). +On English Wikipedia's [Main Page](https://en.wikipedia.org/wiki/Main_Page), use CSS selectors in the **Console** to list the HTML elements representing headings of the colored boxes (including the gray ones). ![Wikipedia's Main Page headings](../scraping_basics/images/devtools-exercise-wikipedia.png) diff --git a/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md b/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md index cfb1d69f8d..3614f4b911 100644 --- a/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md +++ b/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md @@ -211,7 +211,7 @@ Sony XBR-950G BRAVIA 4K HDR Ultra HD TV | From $1,398.00 ... ``` -Great! We have managed to use CSS selectors and walk the HTML tree to get a list of product titles and prices. But wait a second—what's `From $1,398.00`? One does not simply scrape a price! We'll need to clean that. But that's a job for the next lesson, which is about extracting data. +Great! We have managed to use CSS selectors and walk the HTML tree to get a list of product titles and prices. But wait a second - what's `From $1,398.00`? One does not simply scrape a price! We'll need to clean that. But that's a job for the next lesson, which is about extracting data. --- diff --git a/sources/academy/webscraping/scraping_basics_python/10_crawling.md b/sources/academy/webscraping/scraping_basics_python/10_crawling.md index f2af6190b9..31fcb3f1a5 100644 --- a/sources/academy/webscraping/scraping_basics_python/10_crawling.md +++ b/sources/academy/webscraping/scraping_basics_python/10_crawling.md @@ -171,7 +171,7 @@ If we run the program now, it'll take longer to finish since it's making 24 more ## Extracting price -Scraping the vendor's name is nice, but the main reason we started checking the detail pages in the first place was to figure out how to get a price for each product. From the product listing, we could only scrape the min price, and remember—we're building a Python app to track prices! +Scraping the vendor's name is nice, but the main reason we started checking the detail pages in the first place was to figure out how to get a price for each product. From the product listing, we could only scrape the min price, and remember - we're building a Python app to track prices! Looking at the [Sony XBR-950G BRAVIA](https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv), it's clear that the listing only shows min prices, because some products have variants, each with a different price. And different stock availability. And different SKUs… diff --git a/sources/academy/webscraping/scraping_basics_python/11_scraping_variants.md b/sources/academy/webscraping/scraping_basics_python/11_scraping_variants.md index e654ee34eb..feee1ff5c4 100644 --- a/sources/academy/webscraping/scraping_basics_python/11_scraping_variants.md +++ b/sources/academy/webscraping/scraping_basics_python/11_scraping_variants.md @@ -103,7 +103,7 @@ Since Python 3.9, you can use `|` to merge two dictionaries. If the [docs](https ::: -If we run the program now, we'll see 34 items in total. Some items don't have variants, so they won't have a variant name. However, they should still have a price set—our scraper should already have that info from the product listing page. +If we run the program now, we'll see 34 items in total. Some items don't have variants, so they won't have a variant name. However, they should still have a price set - our scraper should already have that info from the product listing page. ```json title=products.json diff --git a/sources/academy/webscraping/scraping_basics_python/12_framework.md b/sources/academy/webscraping/scraping_basics_python/12_framework.md index 01f1095c17..eb7d9425f8 100644 --- a/sources/academy/webscraping/scraping_basics_python/12_framework.md +++ b/sources/academy/webscraping/scraping_basics_python/12_framework.md @@ -28,7 +28,7 @@ In this lesson, we'll address all of the above issues while keeping the code con :::info Why Crawlee and not Scrapy -From the two main open-source options for Python, [Scrapy](https://scrapy.org/) and [Crawlee](https://crawlee.dev/python/), we chose the latter—not just because we're the company financing its development. +From the two main open-source options for Python, [Scrapy](https://scrapy.org/) and [Crawlee](https://crawlee.dev/python/), we chose the latter - not just because we're the company financing its development. We genuinely believe beginners to scraping will like it more, since it allows to create a scraper with less code and less time spent reading docs. Scrapy's long history ensures it's battle-tested, but it also means its code relies on technologies that aren't really necessary today. Crawlee, on the other hand, builds on modern Python features like asyncio and type hints. @@ -206,11 +206,11 @@ async def main(): :::note Fragile code -The code above assumes the `.select_one()` call doesn't return `None`. If your editor checks types, it might even warn that `text` is not a known attribute of `None`. This isn't robust and could break, but in our program, that's fine. We expect the elements to be there, and if they're not, we'd rather the scraper break quickly—it's a sign something's wrong and needs fixing. +The code above assumes the `.select_one()` call doesn't return `None`. If your editor checks types, it might even warn that `text` is not a known attribute of `None`. This isn't robust and could break, but in our program, that's fine. We expect the elements to be there, and if they're not, we'd rather the scraper break quickly - it's a sign something's wrong and needs fixing. ::: -Now for the price. We're not doing anything new here—just copy-paste the code from our old scraper. The only change will be in the selector. +Now for the price. We're not doing anything new here - just copy-paste the code from our old scraper. The only change will be in the selector. The only change will be in the selector. In `oldmain.py`, we look for `.price` within a `product_soup` object representing a product card. Here, we're looking for `.price` within the entire product detail page. It's better to be more specific so we don't accidentally match another price on the same page: @@ -295,7 +295,7 @@ if __name__ == '__main__': If we run this scraper, we should get the same data for the 24 products as before. Crawlee has saved us a lot of effort by managing downloading, parsing, and parallelization. The code is also cleaner, with two separate and labeled handlers. -Crawlee doesn't do much to help with locating and extracting the data—that part of the code remains almost the same, framework or not. This is because the detective work of finding and extracting the right data is the core value of custom scrapers. With Crawlee, we can focus on just that while letting the framework take care of everything else. +Crawlee doesn't do much to help with locating and extracting the data - that part of the code remains almost the same, framework or not. This is because the detective work of finding and extracting the right data is the core value of custom scrapers. With Crawlee, we can focus on just that while letting the framework take care of everything else. ## Saving data diff --git a/sources/academy/webscraping/scraping_basics_python/13_platform.md b/sources/academy/webscraping/scraping_basics_python/13_platform.md index aaa287bdbc..6b8b2f86d2 100644 --- a/sources/academy/webscraping/scraping_basics_python/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_python/13_platform.md @@ -14,13 +14,13 @@ Before starting with a scraping platform, let's highlight a few caveats in our c - _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. And if we want alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day. - _No monitoring:_ If we have a spare server or a Raspberry Pi lying around, we could use [cron](https://en.wikipedia.org/wiki/Cron) to schedule it. But even then, we'd have little insight into whether it ran successfully, what errors or warnings occurred, how long it took, or what resources it used. - _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky since different analysis tools often require different formats. -- _Anti-scraping risks:_ If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually, they'd block that too—risking seriously annoying our barista. +- _Anti-scraping risks:_ If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually, they'd block that too - risking seriously annoying our barista. In this lesson, we'll use a platform to address all of these issues. Generic cloud platforms like [GitHub Actions](https://github.com/features/actions) can work for simple scenarios. But platforms dedicated to scraping, like [Apify](https://apify.com/), offer extra features such as monitoring scrapers, managing retrieved data, and overcoming anti-scraping measures. :::info Why Apify -Scraping platforms come in many varieties, offering a wide range of tools and approaches. As the course authors, we're obviously biased toward Apify—we think it's both powerful and complete. +Scraping platforms come in many varieties, offering a wide range of tools and approaches. As the course authors, we're obviously biased toward Apify - we think it's both powerful and complete. That said, the main goal of this lesson is to show how deploying to _any platform_ can make life easier. Plus, everything we cover here fits within [Apify's free tier](https://apify.com/pricing). @@ -28,7 +28,7 @@ That said, the main goal of this lesson is to show how deploying to _any platfor ## Registering -First, let's [create a new Apify account](https://console.apify.com/sign-up). We'll go through a few checks to confirm we're human and our email is valid—annoying but necessary to prevent abuse of the platform. +First, let's [create a new Apify account](https://console.apify.com/sign-up). We'll go through a few checks to confirm we're human and our email is valid - annoying but necessary to prevent abuse of the platform. Apify serves both as an infrastructure where to privately deploy and run own scrapers, and as a marketplace, where anyone can offer their ready scrapers to others for rent. But let's hold off on exploring Apify Store for now. @@ -82,7 +82,7 @@ Inside the `warehouse-watchdog` directory, we should see a `src` subdirectory co The file contains a single asynchronous function, `main()`. At the beginning, it handles [input](https://docs.apify.com/platform/actors/running/input-and-output#input), then passes that input to a small crawler built on top of the Crawlee framework. -Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://docs.apify.com/platform/actors)—a standardized container with designated places for input and output. Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled explicitly in the code. +Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://docs.apify.com/platform/actors) - a standardized container with designated places for input and output. Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled explicitly in the code. ![The expected file structure](../scraping_basics/images/actor-file-structure.webp) @@ -256,7 +256,7 @@ Actor build detail https://console.apify.com/actors/a123bCDefghiJkLMN#/builds/0. ? Do you want to open the Actor detail in your browser? (Y/n) ``` -After opening the link in our browser, assuming we're logged in, we should see the **Source** screen on the Actor's detail page. We'll go to the **Input** tab of that screen. We won't change anything—just hit **Start**, and we should see logs similar to what we see locally, but this time our scraper will be running in the cloud. +After opening the link in our browser, assuming we're logged in, we should see the **Source** screen on the Actor's detail page. We'll go to the **Input** tab of that screen. We won't change anything - just hit **Start**, and we should see logs similar to what we see locally, but this time our scraper will be running in the cloud. ![Actor's detail page, screen Source, tab Input](../scraping_basics/images/actor-input.webp) @@ -272,7 +272,7 @@ We don't need to click buttons to download the data. It's possible to retrieve i ## Running the scraper periodically -Now that our scraper is deployed, let's automate its execution. In the Apify web interface, we'll go to [Schedules](https://console.apify.com/schedules). Let's click **Create new**, review the periodicity (default: daily), and specify the Actor to run. Then we'll click **Enable**—that's it! +Now that our scraper is deployed, let's automate its execution. In the Apify web interface, we'll go to [Schedules](https://console.apify.com/schedules). Let's click **Create new**, review the periodicity (default: daily), and specify the Actor to run. Then we'll click **Enable** - that's it! From now on, the Actor will execute daily. We can inspect each run, view logs, check collected data, [monitor stats and charts](https://docs.apify.com/platform/monitoring), and even set up alerts. @@ -425,7 +425,7 @@ We'll leave it as is and click **Start**. This time, the logs should show `Using ## Congratulations! -We've reached the end of the course—congratulations! Together, we've built a program that: +We've reached the end of the course - congratulations! Together, we've built a program that: - Crawls a shop and extracts product and pricing data. - Exports the results in several formats. @@ -434,4 +434,4 @@ We've reached the end of the course—congratulations! Together, we've built a p - Executes periodically without manual intervention, collecting data over time. - Uses proxies to avoid being blocked. -We hope this serves as a solid foundation for your next scraping project. Perhaps you'll even [start publishing scrapers](https://docs.apify.com/platform/actors/publishing) for others to use—for a fee? +We hope this serves as a solid foundation for your next scraping project. Perhaps you'll even [start publishing scrapers](https://docs.apify.com/platform/actors/publishing) for others to use - for a fee? diff --git a/sources/academy/webscraping/scraping_basics_python/index.md b/sources/academy/webscraping/scraping_basics_python/index.md index 5f72a07930..e0380b2a22 100644 --- a/sources/academy/webscraping/scraping_basics_python/index.md +++ b/sources/academy/webscraping/scraping_basics_python/index.md @@ -53,7 +53,7 @@ Scrapers are programs specifically designed to mine data from the internet. Poin ### Why become a scraper dev -As a scraper developer, you are not limited by whether certain data is available programmatically through an official API—the entire web becomes your API! Here are some things you can do if you understand scraping: +As a scraper developer, you are not limited by whether certain data is available programmatically through an official API - the entire web becomes your API! Here are some things you can do if you understand scraping: - Improve your productivity by building personal tools, such as your own real estate or rare sneakers watchdog. - Companies can hire you to build custom scrapers mining data important for their business. diff --git a/sources/api/getting-started.mdx b/sources/api/getting-started.mdx index 78548719d0..413aed4e0e 100644 --- a/sources/api/getting-started.mdx +++ b/sources/api/getting-started.mdx @@ -37,7 +37,7 @@ You must authenticate all API requests presented on this page. You can authentic Authorization: Bearer YOUR_API_TOKEN ``` -You can find your API token in the Apify Console under **[Settings > Integrations](https://console.apify.com/settings/integrations)**. +You can find your API token in Apify Console under **[Settings > Integrations](https://console.apify.com/settings/integrations)**. ### Verify your account diff --git a/sources/platform/actors/development/actor_definition/actor_json.md b/sources/platform/actors/development/actor_definition/actor_json.md index ad10ee4dba..af019b016e 100644 --- a/sources/platform/actors/development/actor_definition/actor_json.md +++ b/sources/platform/actors/development/actor_definition/actor_json.md @@ -70,11 +70,11 @@ Actor `name`, `version`, `buildTag`, and `environmentVariables` are currently on | --- | --- | --- | | `actorSpecification` | Required | The version of the Actor specification. This property must be set to `1`, which is the only version available. | | `name` | Required | The name of the Actor. | -| `title` | Optional | The display title of the Actor. This is the human-readable title shown in the Apify Console and Store. If not specified, the `name` property is used as the title. | +| `title` | Optional | The display title of the Actor. This is the human-readable title shown in Apify Console and Apify Store. If not specified, the `name` property is used as the title. | | `version` | Required | The version of the Actor, specified in the format `[Number].[Number]`, e.g., `0.1`, `0.3`, `1.0`, `1.3`, etc. | | `buildTag` | Optional | The tag name to be applied to a successful build of the Actor. If not specified, defaults to `latest`. Refer to the [builds](../builds_and_runs/builds.md) for more information. | | `meta` | Optional | Metadata object containing additional information about the Actor. Currently supports `templateId` field to identify the template from which the Actor was created. | -| `environmentVariables` | Optional | A map of environment variables to be used during local development. These variables will also be applied to the Actor when deployed on the Apify platform. For more details, see the [environment variables](/cli/docs/vars) section of Apify CLI documentation. | +| `environmentVariables` | Optional | A map of environment variables to be used during local development. These variables will also be applied to the Actor when deployed on the Apify platform. For more details, see the [environment variables](/cli/docs/vars) section of the Apify CLI documentation. | | `dockerfile` | Optional | The path to the Dockerfile to be used for building the Actor on the platform. If not specified, the system will search for Dockerfiles in the `.actor/Dockerfile` and `Dockerfile` paths, in that order. Refer to the [Dockerfile](./docker.md) section for more information. | | `dockerContextDir` | Optional | The path to the directory to be used as the Docker context when building the Actor. The path is relative to the location of the `actor.json` file. This property is useful for monorepos containing multiple Actors. Refer to the [Actor monorepos](../deployment/source_types.md#actor-monorepos) section for more details. | | `readme` | Optional | The path to the README file to be used on the platform. If not specified, the system will look for README files in the `.actor/README.md` and `README.md` paths, in that order of preference. Check out [Apify Marketing Playbook to learn how to write a quality README files](https://apify.notion.site/How-to-create-an-Actor-README-759a1614daa54bee834ee39fe4d98bc2) guidance. | diff --git a/sources/platform/actors/development/actor_definition/docker.md b/sources/platform/actors/development/actor_definition/docker.md index 7040907e1b..c4fa57ac5b 100644 --- a/sources/platform/actors/development/actor_definition/docker.md +++ b/sources/platform/actors/development/actor_definition/docker.md @@ -200,7 +200,7 @@ This means the system expects the source code to be in `main.js` by default. If :::tip Optimization tips -You can check out various optimization tips for Dockerfile in our [Performance](../performance.md) documentation. +You can check out various optimization tips for Dockerfile in the [Performance](../performance.md) documentation. ::: diff --git a/sources/platform/actors/development/actor_definition/output_schema/index.md b/sources/platform/actors/development/actor_definition/output_schema/index.md index b1e5baf5d2..96694a4d10 100644 --- a/sources/platform/actors/development/actor_definition/output_schema/index.md +++ b/sources/platform/actors/development/actor_definition/output_schema/index.md @@ -172,7 +172,7 @@ Then to specify that output is stored in the default dataset, create `.actor/out To show that the output is stored in the default dataset, the schema defines a property called `results`. -The `title` is a human-readable name for the output, shown in the Apify Console. +The `title` is a human-readable name for the output, shown in Apify Console. The `template` uses a variable `{{links.apiDefaultDatasetUrl}}`, which is replaced with the URL of the default dataset when the Actor run finishes. diff --git a/sources/platform/actors/development/deployment/continuous_integration.md b/sources/platform/actors/development/deployment/continuous_integration.md index 3059ee6713..7bf0995881 100644 --- a/sources/platform/actors/development/deployment/continuous_integration.md +++ b/sources/platform/actors/development/deployment/continuous_integration.md @@ -20,7 +20,7 @@ You can automate Actor builds and tests using your Git repository's automated wo :::tip Using Bitbucket? -Follow our step-by-step guide to set up continuous integration for your Actors with Bitbucket Pipelines: [Read the Bitbucket CI guide](https://help.apify.com/en/articles/6988586-setting-up-continuous-integration-for-apify-actors-on-bitbucket). +Follow the step-by-step guide to set up continuous integration for your Actors with Bitbucket Pipelines: [Read the Bitbucket CI guide](https://help.apify.com/en/articles/6988586-setting-up-continuous-integration-for-apify-actors-on-bitbucket). ::: diff --git a/sources/platform/actors/development/deployment/index.md b/sources/platform/actors/development/deployment/index.md index daa04ba8a2..d14cf0c5db 100644 --- a/sources/platform/actors/development/deployment/index.md +++ b/sources/platform/actors/development/deployment/index.md @@ -7,11 +7,11 @@ slug: /actors/development/deployment Deploying an Actor involves uploading your [source code](/platform/actors/development/actor-definition) and [building](/platform/actors/development/builds-and-runs/builds) it on the Apify platform. Once deployed, you can run and scale your Actor in the cloud. -## Deploy using Apify CLI +## Deploy using the Apify CLI The fastest way to deploy and build your Actor is by using the [Apify CLI](/cli). If you've completed one of the tutorials from the [academy](/academy), you should have already have it installed. If not, follow the [Apify CLI installation instructions](/cli/docs/installation). -To deploy your Actor using Apify CLI: +To deploy your Actor using the Apify CLI: 1. Log in to your Apify account: diff --git a/sources/platform/actors/development/deployment/source_types.md b/sources/platform/actors/development/deployment/source_types.md index 409e7634bc..fa3d535123 100644 --- a/sources/platform/actors/development/deployment/source_types.md +++ b/sources/platform/actors/development/deployment/source_types.md @@ -72,7 +72,7 @@ Remember that each key can only be used once per Git hosting service (GitHub, Bi To manage multiple Actors in a single repository, use the `dockerContextDir` property in the [Actor definition](/platform/actors/development/actor-definition/actor-json) to set the Docker context directory (if not provided then the repository root is used). In the Dockerfile, copy both the Actor's source and any shared code into the Docker image. To enable sharing Dockerfiles between multiple Actors, the Actor build process passes the `ACTOR_PATH_IN_DOCKER_CONTEXT` build argument to the Docker build. -It contains the relative path from `dockerContextDir` to the directory selected as the root of the Actor in the Apify Console (the "directory" part of the Actor's git URL). +It contains the relative path from `dockerContextDir` to the directory selected as the root of the Actor in Apify Console (the "directory" part of the Actor's git URL). For an example, see the [`apify/actor-monorepo-example`](https://github.com/apify/actor-monorepo-example) repository. To build Actors from this monorepo, you would set the source URL (including branch name and folder) as `https://github.com/apify/actor-monorepo-example#main:actors/javascript-actor` and `https://github.com/apify/actor-monorepo-example#main:actors/typescript-actor` respectively. diff --git a/sources/platform/actors/development/permissions/index.md b/sources/platform/actors/development/permissions/index.md index 0b102a8b29..cab86e3f11 100644 --- a/sources/platform/actors/development/permissions/index.md +++ b/sources/platform/actors/development/permissions/index.md @@ -43,7 +43,7 @@ To learn how to migrate your Actors to run under limited permissions, check out ### Configure Actor permissions level -You can set the permission level for your Actor in the Apify Console under its **Settings** tab. New Actors are configured to use limited permissions by default. Older Actors might still use full permissions until you update their configuration. +You can set the permission level for your Actor in Apify Console under its **Settings** tab. New Actors are configured to use limited permissions by default. Older Actors might still use full permissions until you update their configuration. ![Actor permissions configuration in Actor settings](./images/actor_settings_permissions.webp) diff --git a/sources/platform/actors/development/programming_interface/environment_variables.md b/sources/platform/actors/development/programming_interface/environment_variables.md index b295748a64..ae32115b9f 100644 --- a/sources/platform/actors/development/programming_interface/environment_variables.md +++ b/sources/platform/actors/development/programming_interface/environment_variables.md @@ -35,7 +35,7 @@ Apify sets several system environment variables for each Actor run. These variab Here's a table of key system environment variables: | Environment Variable | Description | -|----------------------|-------------| +| -------------------- | ----------- | | `ACTOR_ID` | ID of the Actor. | | `ACTOR_FULL_NAME` | Full technical name of the Actor, in the format `owner-username/actor-name`. | | `ACTOR_RUN_ID` | ID of the Actor run. | @@ -47,7 +47,7 @@ Here's a table of key system environment variables: | `ACTOR_DEFAULT_DATASET_ID` | Unique identifier for the default dataset associated with the current Actor run. | | `ACTOR_DEFAULT_KEY_VALUE_STORE_ID` | Unique identifier for the default key-value store associated with the current Actor run. | | `ACTOR_DEFAULT_REQUEST_QUEUE_ID` | Unique identifier for the default request queue associated with the current Actor run. | -| `ACTOR_INPUT_KEY` | Key of the record in the default key-value store that holds the [Actor input](/platform/actors/running/input-and-output#input). | +| `ACTOR_INPUT_KEY` | Key of the record in the default key-value store that holds the [Actor input](/platform/actors/running/input-and-output#input). | | `ACTOR_MAX_PAID_DATASET_ITEMS` | For paid-per-result Actors, the user-set limit on returned results. Do not exceed this limit. | | `ACTOR_MAX_TOTAL_CHARGE_USD` | For pay-per-event Actors, the user-set limit on run cost. Do not exceed this limit. | | `ACTOR_RESTART_ON_ERROR` | If **1**, the Actor run will be restarted if it fails. | @@ -56,7 +56,7 @@ Here's a table of key system environment variables: | `ACTOR_MEMORY_MBYTES` | Size of memory allocated for the Actor run, in megabytes. Can be used to optimize memory usage or finetuning of low-level external libraries. | | `ACTOR_PERMISSION_LEVEL` | [Permission level](../../running/permissions.md) the Actor is run under (`LIMITED_PERMISSIONS` or `FULL_PERMISSIONS`). This determines what resources in the user’s account the Actor can access. | | `APIFY_PROXY_PASSWORD` | Password for accessing Apify Proxy services. This password enables the Actor to utilize proxy servers on behalf of the user who initiated the Actor run. | -| `APIFY_PROXY_PORT` | TCP port number to be used for connecting to the Apify Proxy. | +| `APIFY_PROXY_PORT` | TCP port number to be used for connecting to Apify Proxy. | | `APIFY_PROXY_STATUS_URL` | URL for retrieving proxy status information. Appending `?format=json` to this URL returns the data in JSON format for programmatic processing. | | `ACTOR_STANDBY_URL` | URL for accessing web servers of Actor runs in the [Actor Standby](/platform/actors/development/programming-interface/standby) mode. | | `ACTOR_STARTED_AT` | Date when the Actor was started. | @@ -109,7 +109,7 @@ Be aware that if you define `environmentVariables` in `.actor/actor.json`, it on Actor owners can define custom environment variables to pass additional configuration to their Actors. To set custom variables: -1. Go to your Actor's **Source** page in the Apify Console +1. Go to your Actor's **Source** page in Apify Console 1. Navigate to the **Environment variables** section. diff --git a/sources/platform/actors/development/programming_interface/index.mdx b/sources/platform/actors/development/programming_interface/index.mdx index 1f6b19c8a5..b51c87ddc2 100644 --- a/sources/platform/actors/development/programming_interface/index.mdx +++ b/sources/platform/actors/development/programming_interface/index.mdx @@ -8,7 +8,7 @@ slug: /actors/development/programming-interface import Card from '@site/src/components/Card'; import CardGrid from '@site/src/components/CardGrid'; -This chapter will guide you through all the commands you need to build your first Actor. This interface is provided by [Apify SDKs](/sdk). The chapter starts with basic commands and guides you through system events and environment variables that are available to your Actor both locally and when running on Apify platform. +This chapter will guide you through all the commands you need to build your first Actor. This interface is provided by [Apify SDKs](/sdk). The chapter starts with basic commands and guides you through system events and environment variables that are available to your Actor both locally and when running on the Apify platform. Sharing is caring but you can also make money from your Actors. Check out our [blog post](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) for more context. +> Sharing is caring but you can also make money from your Actors. Check out the [blog post](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) for more context. ## Publish process @@ -36,7 +36,7 @@ Packaging your software as an Actor allows you to launch new SaaS product faster - Pay-per-result for usage-based pricing - Pay-per-event for specific operations -To learn more visit our [Actors in Store](https://docs.apify.com/platform/actors/running/actors-in-store#pricing-models) page. +To learn more, visit the [Actors in Store](https://docs.apify.com/platform/actors/running/actors-in-store#pricing-models) page. ## Maintain public Actors @@ -69,9 +69,9 @@ To find ideas for new Actor, consider the following sources: - Your own experiences with friends, colleagues, and customers - SEO tools to identify search terms, websites related to web scraping, web automation, or web integrations (see the [SEO article](https://apify.notion.site/SEO-990259fe88a84fd0a85ce6d3b394d8c1) for more details) - The [Actor ideas page](https://apify.com/ideas) to find Actors in demand by the Apify community -- Our [Discord community](https://discord.com/invite/jyEM2PRvMU), especially the [#hire-freelancers](https://discord.com/channels/801163717915574323/1022804760484659210) channel can offer great insights +- The [Discord community](https://discord.com/invite/jyEM2PRvMU), especially the [#hire-freelancers](https://discord.com/channels/801163717915574323/1022804760484659210) channel can offer great insights -Additionally, you can refer to our [blog](https://blog.apify.com/) for examples of how we write about and present Actors, such as the: +Additionally, you can refer to the [Apify blog](https://blog.apify.com/) for examples of how we write about and present Actors, such as the: - [Content Checker article](https://blog.apify.com/set-up-alert-when-webpage-changes/) - [Kickstarter scraper article](https://blog.apify.com/kickstarter-search-actor-create-your-own-kickstarter-api/) diff --git a/sources/platform/actors/publishing/monetize/index.mdx b/sources/platform/actors/publishing/monetize/index.mdx index 312f281e0c..3646342778 100644 --- a/sources/platform/actors/publishing/monetize/index.mdx +++ b/sources/platform/actors/publishing/monetize/index.mdx @@ -73,14 +73,14 @@ All other changes (such as decreasing prices, adjusting descriptions, or removin :::important Frequency of major monetization adjustments -You can make major monetization changes to each Actor only **once per month**. After making a major change, you must wait until it takes effect (14 days) plus an additional period before making another major change. For further information & guidelines, please refer to our [Terms & Conditions](/legal/store-publishing-terms-and-conditions) +You can make major monetization changes to each Actor only **once per month**. After making a major change, you must wait until it takes effect (14 days) plus an additional period before making another major change. For further information & guidelines, please refer to the [Terms & Conditions](/legal/store-publishing-terms-and-conditions) ::: ## Monthly payouts and analytics Payout invoices are automatically generated on the 11th of each month, summarizing the profits from all your Actors for the previous month. -In accordance with our [Terms & Conditions](/legal/store-publishing-terms-and-conditions), only funds from legitimate users who have already paid are included in the payout invoice. +In accordance with the [Terms & Conditions](/legal/store-publishing-terms-and-conditions), only funds from legitimate users who have already paid are included in the payout invoice. :::note How negative profits are handled @@ -94,7 +94,7 @@ If no action is taken, the payout will be automatically approved on the 14th, wi - $20 for PayPal - $100 for other payout methods -If the monthly profit does not meet these thresholds, as per our [Terms & Conditions](/legal/store-publishing-terms-and-conditions), the funds will roll over to the next month until the threshold is reached. +If the monthly profit does not meet these thresholds, as per the [Terms & Conditions](/legal/store-publishing-terms-and-conditions), the funds will roll over to the next month until the threshold is reached. ## Handle free users diff --git a/sources/platform/actors/publishing/monetize/pay_per_event.mdx b/sources/platform/actors/publishing/monetize/pay_per_event.mdx index e9a21cc3f2..31692874e2 100644 --- a/sources/platform/actors/publishing/monetize/pay_per_event.mdx +++ b/sources/platform/actors/publishing/monetize/pay_per_event.mdx @@ -78,7 +78,7 @@ The `eventChargeLimitReached` property checks if the user's limit allows for ano :::info ACTOR_MAX_TOTAL_CHARGE_USD environment variable -For pay-per-event Actors, users set a spending limit through the Apify Console. This limit is available in your Actor code as the `ACTOR_MAX_TOTAL_CHARGE_USD` [environment variable](/platform/actors/development/programming-interface/environment-variables), which contains the user's maximum cost. +For pay-per-event Actors, users set a spending limit through Apify Console. This limit is available in your Actor code as the `ACTOR_MAX_TOTAL_CHARGE_USD` [environment variable](/platform/actors/development/programming-interface/environment-variables), which contains the user's maximum cost. The Apify SDK's `ChargeResult` respects the user set limit already. ::: @@ -146,7 +146,7 @@ When using [Crawlee](https://crawlee.dev/), use `crawler.autoscaledPool.abort()` ## Best practices for PPE Actors -Use our [SDKs](/sdk) (JS and, Python or use [`apify actor charge`](/cli/docs/next/reference#apify-actor-charge-eventname) when using our Apify CLI) to simplify PPE implementation into your Actor. SDKs help you handle pricing, usage tracking, idempotency keys, API errors, and, event charging via an API. You can also choose not to use it, but then you must handle API integration and possible edge cases manually. +Use the [Apify SDKs](/sdk) (JS or Python) or the [`apify actor charge`](/cli/docs/next/reference#apify-actor-charge-eventname) when using the Apify CLI to simplify PPE implementation into your Actor. SDKs help you handle pricing, usage tracking, idempotency keys, API errors, and, event charging via an API. You can also choose not to use it, but then you must handle API integration and possible edge cases manually. ### Use synthetic start event `apify-actor-start` @@ -162,7 +162,7 @@ One of the options to charge for the time spent on starting the Actor is to char We want to make it easier for Actor creators to stay competitive, but also help them to be profitable. Therefore, we have the Apify Actor synthetic start event `apify-actor-start`. This event is enabled by default for all new PPE Actors, and when you use it Apify will cover the compute unit cost of the first 5 seconds of every Actor run. -The default price of the event is set intentionally low. This pricing means that the free 5 seconds of compute we provide costs us more than the revenue generated from the event. We've made this investment to _support our creator community_ by reducing your startup costs while keeping your Actors competitively priced for users. +The default price of the event is set intentionally low. This pricing means that the free 5 seconds of compute we provide costs us more than the revenue generated from the event. We've made this investment to _support the creator community_ by reducing your startup costs while keeping your Actors competitively priced for users. #### How the synthetic start event works diff --git a/sources/platform/actors/publishing/publish.mdx b/sources/platform/actors/publishing/publish.mdx index 8c0f6ee042..c7697b985f 100644 --- a/sources/platform/actors/publishing/publish.mdx +++ b/sources/platform/actors/publishing/publish.mdx @@ -11,7 +11,7 @@ Before making your Actor public, it's important to ensure your Actor has a clear Once you've finished coding and testing your Actor, it's time to publish it. Follow these steps: -1. From your Actor's page in the Apify Console, go to **Publication** > **Display information** +1. From your Actor's page in Apify Console, go to **Publication** > **Display information** 2. Fill in all the relevant fields for your Actor (e.g., **Icon**, **Actor name**, **Description**, **Categories**) 3. Save your changes diff --git a/sources/platform/actors/publishing/testing.mdx b/sources/platform/actors/publishing/testing.mdx index 8e0feeb2ef..452377dcd7 100644 --- a/sources/platform/actors/publishing/testing.mdx +++ b/sources/platform/actors/publishing/testing.mdx @@ -34,5 +34,5 @@ If that's the case with your Actor, please contact support at [support@apify.com ## Advanced Actor testing You can easily implement your own tests and customize them to fit your Actor's particularities -by using our public [Actor Testing](https://apify.com/pocesar/actor-testing) tool available in Apify Store. +by using the public [Actor Testing](https://apify.com/pocesar/actor-testing) tool available in Apify Store. For more information, see the [automated testing](../development/automated_tests.md) section. diff --git a/sources/platform/actors/running/index.md b/sources/platform/actors/running/index.md index 7bdf8601fc..9b037a6ae7 100644 --- a/sources/platform/actors/running/index.md +++ b/sources/platform/actors/running/index.md @@ -65,7 +65,7 @@ And that's it! You've run your first Actor! Now you can go back to the **Input** tab and try again with different settings, run other [Apify Actors](https://apify.com/store), or [build your own](./development). -## Run Actors with Apify API +## Run Actors with the Apify API To invoke Actors with the Apify API, send an HTTP POST request to the [Run Actor](/api/v2/act-runs-post) endpoint. For example: diff --git a/sources/platform/actors/running/store.md b/sources/platform/actors/running/store.md index a9018ea7f0..1f1c5087e0 100644 --- a/sources/platform/actors/running/store.md +++ b/sources/platform/actors/running/store.md @@ -62,7 +62,7 @@ When you run an Actor that is _paid per result_, you pay for the successful resu :::info Estimation simplified -This makes it transparent and easy to estimate upfront costs. If you have any feedback or would like to ask something, please join our [Discord](https://discord.gg/qkMS6pU4cF) community and let us know! +This makes it transparent and easy to estimate upfront costs. If you have any feedback or would like to ask something, please join the [Discord](https://discord.gg/qkMS6pU4cF) community and let us know! ::: @@ -140,7 +140,7 @@ To check an Actor's pricing and available discounts, visit the Pricing section o ![Apify Store discounts](./images/store/apify_store_discounts_web.png) -In the Apify Console, you can find information about pricing and available discounts in the Actor's header section. +In Apify Console, you can find information about pricing and available discounts in the Actor's header section. ![Apify Store discounts](./images/store/apify_store_discounts_console.png) diff --git a/sources/platform/actors/running/usage_and_resources.md b/sources/platform/actors/running/usage_and_resources.md index 2acef8204e..2aa61396e5 100644 --- a/sources/platform/actors/running/usage_and_resources.md +++ b/sources/platform/actors/running/usage_and_resources.md @@ -64,7 +64,7 @@ If the Actor doesn't have this information, or you want to use your own solution :::tip Estimating usage -Check out our article on [estimating consumption](https://help.apify.com/en/articles/3470975-how-to-estimate-compute-unit-usage-for-your-project) for more details. +Check out the article on [estimating consumption](https://help.apify.com/en/articles/3470975-how-to-estimate-compute-unit-usage-for-your-project) for more details. ::: diff --git a/sources/platform/collaboration/general-resource-access.md b/sources/platform/collaboration/general-resource-access.md index 0419078784..53fc5fea48 100644 --- a/sources/platform/collaboration/general-resource-access.md +++ b/sources/platform/collaboration/general-resource-access.md @@ -152,20 +152,20 @@ When you retrieve dataset or key-value store details using: - `GET https://api.apify.com/v2/datasets/:datasetId` - `GET https://api.apify.com/v2/key-value-stores/:storeId` -the API response includes automatically generated fields: +the API response includes automatically generated fields: -- `itemsPublicUrl` – a pre-signed URL providing access to dataset items -- `keysPublicUrl` – a pre-signed URL providing access to key-value store keys +- `itemsPublicUrl` - a pre-signed URL providing access to dataset items +- `keysPublicUrl` - a pre-signed URL providing access to key-value store keys These automatically generated URLs are _valid for 14 days_. The response also contains: -- `consoleUrl` - provides a stable link to the resource's page in the Apify Console. Unlike a direct API link, Console link will prompt unauthenticated users to sign in, ensuring they have required permissions to view the resource. +- `consoleUrl` - provides a stable link to the resource's page in Apify Console. Unlike a direct API link, Console link will prompt unauthenticated users to sign in, ensuring they have required permissions to view the resource. ::: -You can create pre-signed URLs either through the Apify Console or programmatically via the Apify API client. +You can create pre-signed URLs either through Apify Console or programmatically via the Apify API client. #### How to generate pre-signed URLs in Apify Console @@ -237,7 +237,7 @@ If the `expiresInSecs` option is not specified, the generated link will be _perm #### Signing URLs manually -If you need finer control - for example, generating links without using Apify client - you can sign URLs manually using our reference implementation. +If you need finer control - for example, generating links without using Apify client - you can sign URLs manually using the reference implementation. [Check the reference implementation in Apify clients](https://github.com/apify/apify-client-js/blob/5efd68a3bc78c0173a62775f79425fad78f0e6d1/src/resource_clients/dataset.ts#L179) @@ -257,7 +257,7 @@ This is very useful if you wish to expose a storage publicly with an easy to rem If you own a public Actor in Apify Store, you need to make sure that your Actor will work even for users who have restricted access to their resources. Over time, you might see a growing number of users with _General resource access_ set to _Restricted_. -In practice, this means that all API calls originating from the Actor need to have a valid API token. If you are using Apify SDK, this should be the default behavior. See the detailed guide below for more information. +In practice, this means that all API calls originating from the Actor need to have a valid API token. If you are using the Apify SDK, this should be the default behavior. See the detailed guide below for more information. :::caution Actor runs inherit user permissions diff --git a/sources/platform/collaboration/organization_account/how_to_use.md b/sources/platform/collaboration/organization_account/how_to_use.md index 31fe3bb068..eba67bd5d6 100644 --- a/sources/platform/collaboration/organization_account/how_to_use.md +++ b/sources/platform/collaboration/organization_account/how_to_use.md @@ -1,6 +1,6 @@ --- title: Using the organization account -description: Learn to use and manage your organization account using the Apify Console or API. View the organizations you are in and manage your memberships. +description: Learn to use and manage your organization account using Apify Console or the Apify API. View the organizations you are in and manage your memberships. sidebar_position: 2 slug: /collaboration/organization-account/how-to-use sidebar_label: How to use @@ -10,9 +10,9 @@ Once an account becomes an organization, you can no longer log into it. Instead, While you can't manage an organization account via [API](/api/v2), you can still manage its runs and resources via API like you would with any other account. -**[See our video tutorial](https://www.youtube.com/watch?v=BIL6HqtnvKk) on organization accounts.** +**[See the video tutorial](https://www.youtube.com/watch?v=BIL6HqtnvKk) on organization accounts.** -## In the Apify Console +## In Apify Console You can switch into **Organization account** view using the account button in the top-left corner. diff --git a/sources/platform/collaboration/organization_account/index.md b/sources/platform/collaboration/organization_account/index.md index 5205d7e3a0..64e79323f9 100644 --- a/sources/platform/collaboration/organization_account/index.md +++ b/sources/platform/collaboration/organization_account/index.md @@ -14,11 +14,11 @@ You can set up an organization in two ways. * [Create a new organization](#create-a-new-organization). If you don't have integrations set up yet, or if they are easy to change, you can create a new organization, preserving your personal account. * [Convert an existing account](#convert-an-existing-account) into an organization. If your Actors and [integrations](../../integrations/index.mdx) are set up in a personal account, it is probably best to convert that account into an organization. This will preserve all your integrations but means you will have a new personal account created for you. -> Prefer video to reading? [See our video tutorial](https://www.youtube.com/watch?v=BIL6HqtnvKk) for organization accounts. +> Prefer video to reading? [See the video tutorial](https://www.youtube.com/watch?v=BIL6HqtnvKk) for organization accounts. ## Availability and pricing -The organization account is available on all our plans. [Visit our pricing page](https://apify.com/pricing) for more information. +The organization account is available on all plans. [Visit the pricing page](https://apify.com/pricing) for more information. ## Create a new organization diff --git a/sources/platform/collaboration/organization_account/setup.md b/sources/platform/collaboration/organization_account/setup.md index 0ae4c6bd1f..f5807b07d1 100644 --- a/sources/platform/collaboration/organization_account/setup.md +++ b/sources/platform/collaboration/organization_account/setup.md @@ -19,7 +19,7 @@ In the **Account** tab's **Security** section, you can set security requirements - Maximum session lifespan - Two-factor authentication requirement -**[See our video tutorial](https://www.youtube.com/watch?v=BIL6HqtnvKk) on organization accounts.** +**[See the video tutorial](https://www.youtube.com/watch?v=BIL6HqtnvKk) on organization accounts.** ## Add users to your organization diff --git a/sources/platform/console/index.md b/sources/platform/console/index.md index 16cc04f7e0..ce9cac2f35 100644 --- a/sources/platform/console/index.md +++ b/sources/platform/console/index.md @@ -23,7 +23,7 @@ After you click the **Sign up** button, we will send you a verification email. T We are using Google reCAPTCHA to prevent spam accounts. Usually, you will not see it, but if Google evaluates your browser as suspicious, they will ask you to solve a reCAPTCHA before we create your account and send you the verification email. ::: -If you did not receive the email, you can visit the [sign-in page](https://console.apify.com/sign-in). There, you will either proceed to our verification page right away, or you can sign in and will be redirected afterward. On the verification page, you can click on the **Resend verification email** button to send the email again. +If you did not receive the email, you can visit the [sign-in page](https://console.apify.com/sign-in). There, you will either proceed to the verification page right away, or you can sign in and will be redirected afterward. On the verification page, you can click on the **Resend verification email** button to send the email again. ![Apify Console email verification page](./images/console-email-verification-page.png) diff --git a/sources/platform/console/settings.md b/sources/platform/console/settings.md index d29fba2151..e99ffe97c6 100644 --- a/sources/platform/console/settings.md +++ b/sources/platform/console/settings.md @@ -30,7 +30,7 @@ In the **Session Information** section, you can adjust the session configuration ## Integrations -The **Integrations** tab provides essential tools for enhancing your interaction with our platform. Here, you can access your **Personal API Tokens**, which are necessary for using our [REST API](https://docs.apify.com/api/v2). This page also facilitates the integration of your Slack workspace and lists your **Actor Integration Accounts**. This section represents any third-party integrations added by you or your team. For detailed guidance on utilizing these integrations, refer to our [Integrations documentation](https://docs.apify.com/platform/integrations). +The **Integrations** tab provides essential tools for enhancing your interaction with the Apify platform. Here, you can access your **Personal API Tokens**, which are necessary for using the [REST API](https://docs.apify.com/api/v2). This page also facilitates the integration of your Slack workspace and lists your **Actor Integration Accounts**. This section represents any third-party integrations added by you or your team. For detailed guidance on utilizing these integrations, refer to the [Integrations documentation](https://docs.apify.com/platform/integrations). ## Organization diff --git a/sources/platform/console/store.md b/sources/platform/console/store.md index 92fb504671..aaa2547301 100644 --- a/sources/platform/console/store.md +++ b/sources/platform/console/store.md @@ -8,7 +8,7 @@ slug: /console/store ![apify-console-store](./images/console-store.png) -Apify Store is a place where you can explore a variety of Actors, both created and maintained by Apify or our community members. +Apify Store is a place where you can explore a variety of Actors, both created and maintained by Apify or the community. Use the search box at the top of the page to find Actors by service names, such as TikTok, Google, Facebook, or by their authors. Alternatively, you can explore Actors grouped under predefined categories below the search box. You can also organize the results from the store by different criteria, including: @@ -20,4 +20,4 @@ You can also organize the results from the store by different criteria, includin Once you select an Actor from the store, you'll be directed to its specific page. Here, you can configure the settings for your future Actor run, save these configurations for later use, or run the Actor immediately. -For more information on Actors in Apify Store, visit our [Apify Store documentation](/sources/platform/actors/running/store.md). +For more information on Actors in Apify Store, visit the [Apify Store documentation](/sources/platform/actors/running/store.md). diff --git a/sources/platform/console/two-factor-authentication.md b/sources/platform/console/two-factor-authentication.md index 8da93ccd69..8d88532740 100644 --- a/sources/platform/console/two-factor-authentication.md +++ b/sources/platform/console/two-factor-authentication.md @@ -36,7 +36,7 @@ After you scan the QR code or set up your app manually, the app will generate a ![Apify Console setup two-factor authentication - recovery codes](./images/console-two-factor-recovery-setup.png) -In this step, you will see 16 recovery codes. If you ever lose access to your authenticated app, you will be able to use these codes to access the Apify Console. We recommend saving these codes in a safe place; ideally, you should store them in a secure password manager or print them out and keep them separate from your device. +In this step, you will see 16 recovery codes. If you ever lose access to your authenticated app, you will be able to use these codes to access Apify Console. We recommend saving these codes in a safe place; ideally, you should store them in a secure password manager or print them out and keep them separate from your device. Under the recovery codes, you will find two fields for your recovery information. These two fields are what the support team will ask you to provide in case you lose access to your authenticator app and also to your recovery codes. We will never use the phone number for anything other than to verify your identity and help you regain access to your account, only as a last resort. Ideally, the personal information you provide will be enough to verify your identity. Always provide both the kind of personal information you provide and the actual information. @@ -54,7 +54,7 @@ When you close the setup process, you should see that your two-factor authentica ## Verification after sign-in -After you enable two-factor authentication, the next time you attempt to sign in, you'll need to enter a code before you can get into the Apify Console. To do that, open your authenticator app and enter the code for your Apify account into the **Code** field. After you enter the code, click on the **Verify** button, and if the provided code is correct, you will proceed to Apify Console. +After you enable two-factor authentication, the next time you attempt to sign in, you'll need to enter a code before you can get into Apify Console. To do that, open your authenticator app and enter the code for your Apify account into the **Code** field. After you enter the code, click on the **Verify** button, and if the provided code is correct, you will proceed to Apify Console. ![Apify Console two-factor authentication form](./images/console-two-factor-authentication.png) @@ -65,7 +65,7 @@ In case you lose access to your authenticator app, you can use the recovery code If the provided recovery code is correct, you will proceed to Apify Console, the same as if you provided the code from the authenticator app. After gaining access to Apify Console, we recommend going to the [Login & Privacy](https://console.apify.com/settings/security) section of your account settings, disabling the two-factor authentication there, and then enabling it again with the new authenticator app. :::info Removal of recovery codes -When you successfully use a recovery code, we remove the code from the original list as it's no longer possible to use it again. If you use all of your recovery codes, you will not be able to sign in to your account with them anymore, and you will need to either use your authenticator app or contact our support to help you regain access to your account. +When you successfully use a recovery code, we remove the code from the original list as it's no longer possible to use it again. If you use all of your recovery codes, you will not be able to sign in to your account with them anymore, and you will need to either use your authenticator app or contact Apify support to help you regain access to your account. ::: ![Apify Console two-factor authentication with recovery code form](./images/console-two-factor-use-recovery-code.png) @@ -80,9 +80,9 @@ After you disable the two-factor authentication you will be able to sign in to y ## What to do when you get locked out -If you lose access to your authenticator app and do not have any recovery codes left, or you lost them as well, you will not be able to sign in to your account. In this case, you will need to contact our support. To do that, you can either send us an email to [support@apify.com](mailto:support@apify.com?subject='Locked%20out%20of%20account%20with%202FA%20enabled') or you can go to the [sign-in page](https://console.apify.com/sign-in) and sign in with your email and password. Then, on the two-factor authentication page, click on the **recovery code or begin 2FA account recovery** link. On the two-factor recovery page, click on the **Contact our support** link. This link will open up our online chat, and our support team can help you from there. +If you lose access to your authenticator app and do not have any recovery codes left, or you lost them as well, you will not be able to sign in to your account. In this case, you will need to contact Apify support. To do that, you can either send an email to [support@apify.com](mailto:support@apify.com?subject='Locked%20out%20of%20account%20with%202FA%20enabled') or you can go to the [sign-in page](https://console.apify.com/sign-in) and sign in with your email and password. Then, on the two-factor authentication page, click on the **recovery code or begin 2FA account recovery** link. On the two-factor recovery page, click on the **Contact our support** link. This link will open up the online chat, and the support team can help you from there. -For our support team to help you recover your account, you will need to provide them with the personal information you have configured during the two-factor authentication setup. If you provide the correct information, the support team will help you regain access to your account. +For the support team to help you recover your account, you will need to provide them with the personal information you have configured during the two-factor authentication setup. If you provide the correct information, the support team will help you regain access to your account. :::caution Support verification The support team will not give you any clues about the information you provided; they will only verify if it is correct. diff --git a/sources/platform/index.mdx b/sources/platform/index.mdx index 0f41731a8a..5a6963d92b 100644 --- a/sources/platform/index.mdx +++ b/sources/platform/index.mdx @@ -30,7 +30,7 @@ Learn how to run any Actor in Apify Store or create your own. A step-by-step gui /> diff --git a/sources/platform/integrations/actors/index.md b/sources/platform/integrations/actors/index.md index 1701cd1a08..5d3aeea41f 100644 --- a/sources/platform/integrations/actors/index.md +++ b/sources/platform/integrations/actors/index.md @@ -8,7 +8,7 @@ slug: /integrations/actors :::note Integration Actors -You can check out a catalogue of our Integration Actors within [Apify Store](https://apify.com/store/categories/integrations). +You can check out a catalog of Integration Actors within [Apify Store](https://apify.com/store/categories/integrations). ::: diff --git a/sources/platform/integrations/actors/integration_ready_actors.md b/sources/platform/integrations/actors/integration_ready_actors.md index 84f6a191bf..2709d372f1 100644 --- a/sources/platform/integrations/actors/integration_ready_actors.md +++ b/sources/platform/integrations/actors/integration_ready_actors.md @@ -41,7 +41,7 @@ And in the Actor code, we'd use this to get the values: const { datasetId, connectionString, tableName } = await Actor.getInput(); ``` -To make the integration process smoother, it's possible to define an input that's going to be prefilled when your Actor is being used as an integration. You can do that in the Actor's **Settings** tab, on the **Integrations** form. In our example, we'd use: +To make the integration process smoother, it's possible to define an input that's going to be prefilled when your Actor is being used as an integration. You can do that in the Actor's **Settings** tab, on the **Integrations** form. In this example, we'd use: ```json { @@ -87,7 +87,7 @@ In the above example, we're focusing on accessing a run's default dataset, but t To allow other users to use your Actor as an integration, all you need to do is [publish it in Apify Store](/platform/actors/publishing), so users can then integrate it using the **Connect Actor or task** button on the **Integrations** tab of any Actor. While publishing the Actor is enough, there are two ways to make it more visible to users. -For Actors that are generic enough to be used with most other Actors, it's possible to have them listed under **Generic integrations** in the **Integrations** tab. This includes (but is not limited to) Actors that upload datasets to databases, send notifications through various messaging systems, create issues in ticketing systems, etc. To have your Actor listed under our generic integrations, [contact support](mailto:support@apify.com?subject=Actor%20generic%20integration). +For Actors that are generic enough to be used with most other Actors, it's possible to have them listed under **Generic integrations** in the **Integrations** tab. This includes (but is not limited to) Actors that upload datasets to databases, send notifications through various messaging systems, create issues in ticketing systems, etc. To have your Actor listed under the generic integrations, [contact support](mailto:support@apify.com?subject=Actor%20generic%20integration). Some Actors can only be integrated with a few or even just one other Actor. Let's say that you have an Actor that's capable of scraping profiles from a social network. It makes sense to show it for Actors that produce usernames from the social network but not for Actors that produce lists of products. In this case, it's possible to have the Actor listed as **Specific to this Actor** under the Actor's **Integrations** tab. To have your Actor listed as specific to another Actor, [contact support](mailto:support@apify.com?subject=Actor%specific%20integration). diff --git a/sources/platform/integrations/ai/milvus.md b/sources/platform/integrations/ai/milvus.md index 071004339b..759a758936 100644 --- a/sources/platform/integrations/ai/milvus.md +++ b/sources/platform/integrations/ai/milvus.md @@ -35,7 +35,7 @@ Once the cluster is ready, and you have the `URI` and `Token`, you can set up th ### Integration Methods -You can integrate Apify with Milvus using either the Apify Console or the Apify Python SDK. +You can integrate Apify with Milvus using either Apify Console or the Apify Python SDK. :::note Website Content Crawler usage diff --git a/sources/platform/integrations/ai/pinecone.md b/sources/platform/integrations/ai/pinecone.md index 02ce0e3232..52a370de1c 100644 --- a/sources/platform/integrations/ai/pinecone.md +++ b/sources/platform/integrations/ai/pinecone.md @@ -33,7 +33,7 @@ Once the index is created and ready, you can proceed with integrating Apify. ### Integration Methods -You can integrate Apify with Pinecone using either the Apify Console or the Apify Python SDK. +You can integrate Apify with Pinecone using either Apify Console or the Apify Python SDK. :::note Website Content Crawler usage diff --git a/sources/platform/integrations/ai/qdrant.md b/sources/platform/integrations/ai/qdrant.md index cf0d4c48be..6ac6c29f56 100644 --- a/sources/platform/integrations/ai/qdrant.md +++ b/sources/platform/integrations/ai/qdrant.md @@ -33,7 +33,7 @@ With the cluster ready and its URL and API key in hand, you can proceed with int ### Integration Methods -You can integrate Apify with Qdrant using either the Apify Console or the Apify Python SDK. +You can integrate Apify with Qdrant using either Apify Console or the Apify Python SDK. :::note Website Content Crawler usage diff --git a/sources/platform/integrations/ai/skyfire.md b/sources/platform/integrations/ai/skyfire.md index 13b6cc44d6..e6674a73dd 100644 --- a/sources/platform/integrations/ai/skyfire.md +++ b/sources/platform/integrations/ai/skyfire.md @@ -178,7 +178,7 @@ After your Actor run completes, you can retrieve results using the [dataset endp ### Supported Actors -Not all Actors in the Apify Store can be run using agentic payments. +Not all Actors in Apify Store can be run using agentic payments. Apify maintains a curated list of Actors approved for agentic payments. To check if an Actor supports agentic payments, use the `allowsAgenticUsers=true` query parameter when [searching the store via API](https://docs.apify.com/api/v2#/reference/store/store-actors-collection/get-list-of-actors-in-store). diff --git a/sources/platform/integrations/integrate_with_apify.md b/sources/platform/integrations/integrate_with_apify.md index 6508eef7c7..ede30afa0e 100644 --- a/sources/platform/integrations/integrate_with_apify.md +++ b/sources/platform/integrations/integrate_with_apify.md @@ -6,7 +6,7 @@ sidebar_position: 90.00 slug: /integrations/integrate --- -If you are building a service and your users could benefit from integrating with Apify or vice versa, we would love to hear from you! Contact us at [integrations@apify.com](mailto:integrations@apify.com) to discuss potential collaboration. We are always looking for ways to make our platform more useful and powerful for our users. +If you are building a service and your users could benefit from integrating with Apify or vice versa, we would love to hear from you! Contact us at [integrations@apify.com](mailto:integrations@apify.com) to discuss potential collaboration. We are always looking for ways to make the Apify platform more useful and powerful for users. ## Why integrate with Apify diff --git a/sources/platform/integrations/programming/api.md b/sources/platform/integrations/programming/api.md index 5476609f54..ad194f4539 100644 --- a/sources/platform/integrations/programming/api.md +++ b/sources/platform/integrations/programming/api.md @@ -7,7 +7,7 @@ slug: /integrations/api --- All aspects of the Apify platform can be controlled via a REST API, which is described in detail in the [**API Reference**](/api/v2). -If you want to use the Apify API from JavaScript/Node.js or Python, we strongly recommend to use one of our API clients: +If you want to use the Apify API from JavaScript/Node.js or Python, we strongly recommend using one of the API clients: - [**apify-client**](/api/client/js/) `npm` package for JavaScript, supporting both browser and server - [**apify-client**](/api/client/python/) PyPI package for Python. @@ -40,7 +40,7 @@ API tokens include security features to protect your account and data. You can s ## Rotation -If you suspect that a token has been compromised or accidentally exposed, you can rotate it through the Apify Console. When rotating a token, you have the option to keep the old token active for 24 hours, allowing you to update your applications with the new token before the old one becomes invalid. After the rotation period, the token will be regenerated, and any applications connected to the old token will need to be updated with the new token to continue functioning. +If you suspect that a token has been compromised or accidentally exposed, you can rotate it through Apify Console. When rotating a token, you have the option to keep the old token active for 24 hours, allowing you to update your applications with the new token before the old one becomes invalid. After the rotation period, the token will be regenerated, and any applications connected to the old token will need to be updated with the new token to continue functioning. ![Rotate token in Apify Console](../images/api-token-rotate.png) @@ -52,7 +52,7 @@ For better security awareness, the UI marks tokens identified as compromised, ma When working under an organization account, you will see two types of API tokens on the Integrations page. -![Integrations page in the Apify Console in organization mode](../images/api-token-organization.png) +![Integrations page in Apify Console in organization mode](../images/api-token-organization.png) The Personal API tokens are different from your own Personal API tokens mentioned above. If you use this token in an integration, it will have the same permissions that you have within the organization, and all the operations you use it for will be ascribed to you. @@ -65,7 +65,7 @@ By default, tokens can access all data in your account. If that is not desirable **A scoped token can access only those resources that you'll explicitly allow it to.** :::info Actor modification restrictions -We do not allow scoped tokens to create or modify Actors. If you do need to create or modify Actors through Apify API, use an unscoped token. +We do not allow scoped tokens to create or modify Actors. If you do need to create or modify Actors through the Apify API, use an unscoped token. ::: ### How to create a scoped token diff --git a/sources/platform/integrations/programming/webhooks/actions.md b/sources/platform/integrations/programming/webhooks/actions.md index eb82232829..75a8e98ba4 100644 --- a/sources/platform/integrations/programming/webhooks/actions.md +++ b/sources/platform/integrations/programming/webhooks/actions.md @@ -130,7 +130,7 @@ If the string being interpolated contains only the variable, the actual variable { "text": "My user id is abf6vtB2nvQZ4nJzo" } ``` -To enable string interpolation, use **Interpolate variables in string fields** switch within the Apify Console. In JS API Client it's called `shouldInterpolateStrings`. This field is always `true` when integrating Actors or tasks. +To enable string interpolation, use **Interpolate variables in string fields** switch within Apify Console. In JS API Client it's called `shouldInterpolateStrings`. This field is always `true` when integrating Actors or tasks. ### Payload template example diff --git a/sources/platform/integrations/workflows-and-notifications/bubble.md b/sources/platform/integrations/workflows-and-notifications/bubble.md index 1716173a02..e85fc568ea 100644 --- a/sources/platform/integrations/workflows-and-notifications/bubble.md +++ b/sources/platform/integrations/workflows-and-notifications/bubble.md @@ -277,7 +277,7 @@ Ensure your API token is correctly set in the action (preferably as `Current Use ### Missing Actors or Tasks -If your Actor or Task doesn't appear in list responses, run it at least once in the Apify Console so it becomes discoverable. +If your Actor or Task doesn't appear in list responses, run it at least once in Apify Console so it becomes discoverable. ### Timeout errors @@ -287,4 +287,4 @@ Bubble workflows have execution time limits. For long‑running Actors, set the Check that your JSON input is valid when providing **Input overrides** and that dynamic expressions resolve to valid JSON values. Verify the structure of the dataset output when displaying it in your app. -If you have any questions or need help, feel free to reach out to us on our [developer community on Discord](https://discord.com/invite/jyEM2PRvMU). +If you have any questions or need help, feel free to reach out on the [Apify developer community on Discord](https://discord.com/invite/jyEM2PRvMU). diff --git a/sources/platform/integrations/workflows-and-notifications/ifttt.md b/sources/platform/integrations/workflows-and-notifications/ifttt.md index 790a9794e7..6fa9751962 100644 --- a/sources/platform/integrations/workflows-and-notifications/ifttt.md +++ b/sources/platform/integrations/workflows-and-notifications/ifttt.md @@ -77,7 +77,7 @@ To use Apify as an action in your Applet: :::note - IFTTT displays up to 50 recent items in a dropdown. If your Actor or task isn't visible, try using it at least once via API or in the Apify Console to make it appear in the list. + IFTTT displays up to 50 recent items in a dropdown. If your Actor or task isn't visible, try using it at least once via API or in Apify Console to make it appear in the list. ::: diff --git a/sources/platform/integrations/workflows-and-notifications/make/ai-crawling.md b/sources/platform/integrations/workflows-and-notifications/make/ai-crawling.md index 7f8727387c..ba07ea07bb 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/ai-crawling.md +++ b/sources/platform/integrations/workflows-and-notifications/make/ai-crawling.md @@ -19,7 +19,7 @@ To use these modules, you need an [Apify account](https://console.apify.com) and ![Sign up page](images/ai-crawling/wcc-signup.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/amazon.md b/sources/platform/integrations/workflows-and-notifications/make/amazon.md index 0c18ea04fb..28475015b7 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/amazon.md +++ b/sources/platform/integrations/workflows-and-notifications/make/amazon.md @@ -11,7 +11,7 @@ unlisted: true The Amazon Scraper module from [Apify](https://apify.com) allows you to extract product, search, or category data from Amazon. -To use the module, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token), which you can find in the Apify Console under **Settings > Integrations**. After connecting, you can automate data extraction and incorporate the results into your workflows. +To use the module, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token), which you can find in Apify Console under **Settings > Integrations**. After connecting, you can automate data extraction and incorporate the results into your workflows. ## Connect Apify Scraper for Amazon Data modules to Make @@ -19,7 +19,7 @@ To use the module, you need an [Apify account](https://console.apify.com) and an ![Sign up page](images/amazon/image.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/facebook.md b/sources/platform/integrations/workflows-and-notifications/make/facebook.md index cff2abf61a..58ae6116b4 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/facebook.md +++ b/sources/platform/integrations/workflows-and-notifications/make/facebook.md @@ -19,7 +19,7 @@ To use these modules, you need an [Apify account](https://console.apify.com) and ![Sign up page](images/facebook/signup.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Actor rental](images/facebook/actor-rental.png) @@ -27,7 +27,7 @@ To use these modules, you need an [Apify account](https://console.apify.com) and ![Start Actor rental](images/facebook/start-rental.png) -1. Connect your Apify account with Make, you need to get the Apify API token. In the Apify Console, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)**. +1. Connect your Apify account with Make, you need to get the Apify API token. In Apify Console, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)**. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/index.md b/sources/platform/integrations/workflows-and-notifications/make/index.md index b4e94b6c21..e4660f64cd 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/index.md +++ b/sources/platform/integrations/workflows-and-notifications/make/index.md @@ -32,7 +32,7 @@ Alternatively, you can choose to connect using Apify API token: ![API token](../../images/apify-token.png) -1. You can find this token in the Apify Console by navigating to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** +1. You can find this token in Apify Console by navigating to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** ![Integrations token](../../images/apify-integrations-token.png) @@ -93,7 +93,7 @@ You’re all set! Once the scenario is started, it will run the Actor synchronou ### Asynchronous run using the trigger module In this example, we will demonstrate how to run an Actor asynchronously and export its output to Google Sheets. -Before starting, decide where you want to initiate the Actor run. You can do this manually via the Apify Console, on a schedule, or from a separate Make.com scenario. +Before starting, decide where you want to initiate the Actor run. You can do this manually via Apify Console, on a schedule, or from a separate Make.com scenario. #### Step 1: Add the Apify "Watch Actor Runs" module @@ -121,7 +121,7 @@ In the "Spreadsheet ID" field, enter the ID of the target Google Sheets file, wh ![make-com-async-3.png](../../images/make-com/make-com-async-3.png) That’s it! Once the Actor run is complete, its data will be exported to the Google Sheets file. -You can initiate the Actor run via the Apify Console, a scheduler, or from another Make.com scenario. +You can initiate the Actor run via Apify Console, a scheduler, or from another Make.com scenario. ## Available modules and triggers diff --git a/sources/platform/integrations/workflows-and-notifications/make/instagram.md b/sources/platform/integrations/workflows-and-notifications/make/instagram.md index 9d1c0276c0..7b1b5c51dd 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/instagram.md +++ b/sources/platform/integrations/workflows-and-notifications/make/instagram.md @@ -19,7 +19,7 @@ To use these modules, you need an [Apify account](https://console.apify.com) and ![Sign up page](images/instagram/Apify_Make_Sign_up_page.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/llm.md b/sources/platform/integrations/workflows-and-notifications/make/llm.md index cfe4ef3f13..9253cc620a 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/llm.md +++ b/sources/platform/integrations/workflows-and-notifications/make/llm.md @@ -11,7 +11,7 @@ toc_max_heading_level: 4 Apify Scraper for LLMs from [Apify](https://apify.com) is a web browsing module for OpenAI Assistants, RAG pipelines, and AI agents. It can query Google Search, scrape the top results, and return page content as Markdown for downstream AI processing. -To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the Apify Console under **Settings > Integrations**. After connecting, you can automate content extraction and integrate results into your AI workflows. +To use these modules, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in Apify Console under **Settings > Integrations**. After connecting, you can automate content extraction and integrate results into your AI workflows. ## Connect Apify Scraper for LLMs @@ -19,7 +19,7 @@ To use these modules, you need an [Apify account](https://console.apify.com) and ![Make interface showing API token field and connection name field for Apify integration setup](images/llm/rag-signup.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the token, go to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the token, go to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console sign-up page with email, Gmail, and GitHub sign-up options](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/maps.md b/sources/platform/integrations/workflows-and-notifications/make/maps.md index af91286f73..9882b1f941 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/maps.md +++ b/sources/platform/integrations/workflows-and-notifications/make/maps.md @@ -22,7 +22,7 @@ For more details, follow the tutorial below. ![Sign up page](images/maps/maps-signup.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/search.md b/sources/platform/integrations/workflows-and-notifications/make/search.md index d192b9b21f..6af241e4f1 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/search.md +++ b/sources/platform/integrations/workflows-and-notifications/make/search.md @@ -11,7 +11,7 @@ unlisted: true The Google search modules from [Apify](https://apify.com) allows you to crawl Google Search Results Pages (SERPs) and extract data from those web pages in structured format such as JSON, XML, CSV, or Excel. -To use the module, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token), which you can find in the Apify Console under **Settings > Integrations**. After connecting, you can automate data extraction and incorporate the results into your workflows. +To use the module, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token), which you can find in Apify Console under **Settings > Integrations**. After connecting, you can automate data extraction and incorporate the results into your workflows. ## Connect Apify Scraper for Google Search modules to Make @@ -19,7 +19,7 @@ To use the module, you need an [Apify account](https://console.apify.com) and an ![Sign up page](images/search/search-signup.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/tiktok.md b/sources/platform/integrations/workflows-and-notifications/make/tiktok.md index ef2c426756..2b84ca90e2 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/tiktok.md +++ b/sources/platform/integrations/workflows-and-notifications/make/tiktok.md @@ -19,7 +19,7 @@ To use these modules, you need an [Apify account](https://console.apify.com) and ![Sign up page](images/tiktok/image.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/make/youtube.md b/sources/platform/integrations/workflows-and-notifications/make/youtube.md index 7d0daf0750..475d12e7d6 100644 --- a/sources/platform/integrations/workflows-and-notifications/make/youtube.md +++ b/sources/platform/integrations/workflows-and-notifications/make/youtube.md @@ -11,7 +11,7 @@ unlisted: true The YouTube Scraper module from [apify.com](https://apify.com) allows you to extract channel, video, streams, shorts, and search data from YouTube. -To use this module, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token), which you can find in the Apify Console under **Settings > Integrations**. After connecting, you can automate data extraction and incorporate the results into your workflows. +To use this module, you need an [Apify account](https://console.apify.com) and an [API token](https://docs.apify.com/platform/integrations/api#api-token), which you can find in Apify Console under **Settings > Integrations**. After connecting, you can automate data extraction and incorporate the results into your workflows. For more details, follow the tutorial below. @@ -21,7 +21,7 @@ For more details, follow the tutorial below. ![Sign up page](images/youtube/image.png) -1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to Make, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for Make.png](images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/n8n/website-content-crawler.md b/sources/platform/integrations/workflows-and-notifications/n8n/website-content-crawler.md index e65ed0bbbf..7dd0c75423 100644 --- a/sources/platform/integrations/workflows-and-notifications/n8n/website-content-crawler.md +++ b/sources/platform/integrations/workflows-and-notifications/n8n/website-content-crawler.md @@ -82,7 +82,7 @@ If you're running a self-hosted n8n instance, you can install the Apify communit ![Sign up page](../make/images/ai-crawling/wcc-signup.png) -1. To connect your Apify account to n8n, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console. +1. To connect your Apify account to n8n, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in Apify Console. ![Apify Console token for n8n](../make/images/apify-console-token-for-make.png) diff --git a/sources/platform/integrations/workflows-and-notifications/workato.md b/sources/platform/integrations/workflows-and-notifications/workato.md index f1b9f6267f..edf628df71 100644 --- a/sources/platform/integrations/workflows-and-notifications/workato.md +++ b/sources/platform/integrations/workflows-and-notifications/workato.md @@ -296,4 +296,4 @@ Workato's visual interface makes it easy to connect Apify data with other busine - _Resource not found errors:_ Check that IDs are correct and case-sensitive - _Dataset field mapping issues:_ If you experience incorrect data types or missing fields in the Get Dataset Items action data pill, this may be caused by non-homogeneous data in your dataset. The connector samples only the first 25 items to determine field types, so inconsistent data structures can lead to mapping problems. Try to ensure your dataset has consistent field names and data types across all items. -If you have any questions or need help, feel free to reach out to us on our [Discord channel](https://discord.com/invite/jyEM2PRvMU). +If you have any questions or need help, feel free to reach out on the [Apify Discord channel](https://discord.com/invite/jyEM2PRvMU). diff --git a/sources/platform/limits.md b/sources/platform/limits.md index 0c06541da7..bb856dd410 100644 --- a/sources/platform/limits.md +++ b/sources/platform/limits.md @@ -129,7 +129,7 @@ The tables below demonstrate the Apify platform's default resource limits. For A ## Usage limit -The Apify platform also introduces usage limits based on the billing plan to protect users from accidental overspending. To learn more about usage limits, head over to the [Limits](./console/billing.md#limits) section of our docs. +The Apify platform also introduces usage limits based on the billing plan to protect users from accidental overspending. To learn more about usage limits, head over to the [Limits](./console/billing.md#limits) section of the docs. View these limits and adjust your maximum usage limit in [Apify Console](https://console.apify.com/billing#/limits): diff --git a/sources/platform/proxy/datacenter_proxy.md b/sources/platform/proxy/datacenter_proxy.md index f9fd98e304..f1357022b9 100644 --- a/sources/platform/proxy/datacenter_proxy.md +++ b/sources/platform/proxy/datacenter_proxy.md @@ -12,7 +12,7 @@ Datacenter proxies are a cheap, fast and stable way to mask your identity online Datacenter proxies allow you to mask and [rotate](./usage.md#ip-address-rotation) your IP address during web scraping and automation jobs, reducing the possibility of them being [blocked](/academy/anti-scraping/techniques#access-denied). For each [HTTP/S request](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods), the proxy takes the list of all available IP addresses and selects the one used the longest time ago for the specific hostname. -You can refer to our [blog post](https://blog.apify.com/datacenter-proxies-when-to-use-them-and-how-to-make-the-most-of-them/) for tips on how to make the most out of datacenter proxies. +You can refer to the [blog post](https://blog.apify.com/datacenter-proxies-when-to-use-them-and-how-to-make-the-most-of-them/) for tips on how to make the most out of datacenter proxies. ## Features @@ -33,7 +33,7 @@ When using Apify's datacenter proxies, you can either select a proxy group, or t Each user has access to a selected number of proxy servers from a shared pool. These servers are spread into groups (called proxy groups). Each group shares a common feature (location, provider, speed, etc.). -For a full list of plans and number of allocated proxy servers for each plan, see our [pricing](https://apify.com/pricing). To get access to more servers, you can upgrade your plan in the [subscription settings](https://console.apify.com/billing/subscription); +For a full list of plans and number of allocated proxy servers for each plan, see the [pricing page](https://apify.com/pricing). To get access to more servers, you can upgrade your plan in the [subscription settings](https://console.apify.com/billing/subscription); ### Dedicated proxy groups @@ -307,7 +307,7 @@ await Actor.exit(); ## Examples using standard libraries and languages -You can find your proxy password on the [Proxy page](https://console.apify.com/proxy) of the Apify Console. +You can find your proxy password on the [Proxy page](https://console.apify.com/proxy) of Apify Console. > The `username` field is **not** your Apify username.
> Instead, you specify proxy settings (e.g. `groups-BUYPROXIES94952`, `session-123`).
diff --git a/sources/platform/proxy/index.md b/sources/platform/proxy/index.md index 7f53403bb1..76016450c0 100644 --- a/sources/platform/proxy/index.md +++ b/sources/platform/proxy/index.md @@ -19,7 +19,7 @@ You can view your proxy settings and password on the [Proxy](https://console.api ## Quickstart -Usage of Apify Proxy means just a couple of lines of code, thanks to our [SDKs](/sdk): +Usage of Apify Proxy means just a couple of lines of code, thanks to the [Apify SDKs](/sdk): diff --git a/sources/platform/proxy/residential_proxy.md b/sources/platform/proxy/residential_proxy.md index 174b69bf23..0f3d7d2c88 100644 --- a/sources/platform/proxy/residential_proxy.md +++ b/sources/platform/proxy/residential_proxy.md @@ -14,7 +14,7 @@ This solution allows you to access a larger pool of servers than datacenter prox Residential proxies support [IP address rotation](./usage.md#ip-address-rotation) and [sessions](#session-persistence). -**Pricing is based on data traffic**. It is measured for each connection made and displayed on your [proxy usage dashboard](https://console.apify.com/proxy/usage) in the Apify Console. +**Pricing is based on data traffic**. It is measured for each connection made and displayed on your [proxy usage dashboard](https://console.apify.com/proxy/usage) in Apify Console. ## Connect to residential proxy diff --git a/sources/platform/proxy/usage.md b/sources/platform/proxy/usage.md index 08c4aa2730..491dbb57b6 100644 --- a/sources/platform/proxy/usage.md +++ b/sources/platform/proxy/usage.md @@ -22,7 +22,7 @@ All usage of Apify Proxy with your password is charged towards your account. Do ### External connection If you want to connect to Apify Proxy from outside of the Apify platform, you need to have a paid Apify plan (to prevent abuse). -If you need to test Apify Proxy before you subscribe, please [contact our support](https://apify.com/contact). +If you need to test Apify Proxy before you subscribe, please [contact Apify support](https://apify.com/contact). | Parameter | Value / explanation | | :--- | :--- | @@ -116,7 +116,7 @@ If you want to specify one parameter and not the others, just provide that param ## Code examples -We have code examples for connecting to our proxy using the [Apify SDK](/sdk) and [Crawlee](https://crawlee.dev/) and other libraries, as well as examples in PHP. +There are code examples for connecting to Apify Proxy using the [Apify SDK](/sdk) and [Crawlee](https://crawlee.dev/) and other libraries, as well as examples in PHP. * [Datacenter proxy](./datacenter_proxy.md#examples) * [Residential proxy](./residential_proxy.md#connecting-to-residential-proxy) @@ -137,7 +137,7 @@ Depending on whether you use a [browser](https://apify.com/apify/web-scraper) or * Browser - a different IP address is used for each browser. * HTTP request - a different IP address is used for each request. -Use [sessions](#sessions) to control how you rotate IP addresses. See our guide [Anti-scraping techniques](/academy/anti-scraping/techniques) to learn more about IP address rotation and our findings on how blocking works. +Use [sessions](#sessions) to control how you rotate IP addresses. See the guide [Anti-scraping techniques](/academy/anti-scraping/techniques) to learn more about IP address rotation and Apify's findings on how blocking works. ## Sessions @@ -152,7 +152,7 @@ residential_proxy.md#session-persistence) proxies. For datacenter proxies, a ses ## Proxy groups -You can see which proxy groups you have access to on the [Proxy page](https://console.apify.com/proxy/groups) in the Apify Console. To use a specific proxy group (or multiple groups), specify it in the `username` parameter. +You can see which proxy groups you have access to on the [Proxy page](https://console.apify.com/proxy/groups) in Apify Console. To use a specific proxy group (or multiple groups), specify it in the `username` parameter. ## Proxy IP addresses @@ -173,7 +173,7 @@ https://api.apify.com/v2/browser-info/ ### A different approach to `502 Bad Gateway` -Sometimes when the `502` status code is not comprehensive enough. Therefore, we have modified our server with `590-599` codes instead to provide more insight: +Sometimes the `502` status code is not comprehensive enough. Therefore, Apify Proxy uses `590-599` codes instead to provide more insight: * `590 Non Successful`: upstream responded with non-200 status code. * `591 RESERVED`: *this status code is reserved for further use.* @@ -194,5 +194,5 @@ The typical issues behind these codes are: * `597` indicates incorrect upstream credentials. * `599` is a generic error, where the above is not applicable. - Note that the Apify Proxy is based on the [proxy-chain](https://github.com/apify/proxy-chain) open-source `npm` package developed and maintained by Apify. + Note that Apify Proxy is based on the [proxy-chain](https://github.com/apify/proxy-chain) open-source `npm` package developed and maintained by Apify. You can find the details of the above errors and their implementation there. diff --git a/sources/platform/proxy/your_own_proxies.md b/sources/platform/proxy/your_own_proxies.md index 4bcc840e12..c9f54f6411 100644 --- a/sources/platform/proxy/your_own_proxies.md +++ b/sources/platform/proxy/your_own_proxies.md @@ -5,7 +5,7 @@ sidebar_position: 10.5 slug: /proxy/using-your-own-proxies --- -In addition to our proxies, you can use your own both in Apify Console and SDK. +In addition to Apify Proxy, you can use your own proxies both in Apify Console and the SDK. ## Custom proxies in console diff --git a/sources/platform/schedules.md b/sources/platform/schedules.md index 09240118c4..c8f2ee6366 100644 --- a/sources/platform/schedules.md +++ b/sources/platform/schedules.md @@ -106,7 +106,7 @@ If the request is successful, you will receive a `201` [HTTP response code](http You can add multiple Actor and task runs to a schedule with a single `POST` request. Simply add another object with the run's details to the **actions** array in your `POST` request's payload object. -For more information, refer to the [schedules](/api/v2/schedule-get) section in our API documentation. +For more information, refer to the [schedules](/api/v2/schedule-get) section in the API documentation. ## Schedule setup diff --git a/sources/platform/storage/dataset.md b/sources/platform/storage/dataset.md index 5bebec9207..763bd75866 100644 --- a/sources/platform/storage/dataset.md +++ b/sources/platform/storage/dataset.md @@ -93,7 +93,7 @@ To add data to a dataset, issue a POST request to the [Put items](/api/v2/datase https://api.apify.com/v2/datasets/{DATASET_ID}/items ``` -> API data push to a dataset is capped at _400 requests per second_ to avoid overloading our servers. +> API data push to a dataset is capped at _400 requests per second_ to avoid overloading the servers. Example payload: @@ -276,7 +276,7 @@ async def main(): hotel_and_cafe_data = await dataset.get_data(fields=['hotel', 'cafe']) ``` -For more information, visit our [Python SDK documentation](/sdk/python/docs/concepts/storages#working-with-datasets) and the `Dataset` class's [API reference](/sdk/python/reference/class/Dataset) for details on managing datasets with the Python SDK. +For more information, visit the [Python SDK documentation](/sdk/python/docs/concepts/storages#working-with-datasets) and the `Dataset` class's [API reference](/sdk/python/reference/class/Dataset) for details on managing datasets with the Python SDK. ## Hidden fields diff --git a/sources/platform/storage/index.md b/sources/platform/storage/index.md index ea660455fe..6244cf314b 100644 --- a/sources/platform/storage/index.md +++ b/sources/platform/storage/index.md @@ -10,7 +10,7 @@ import Card from "@site/src/components/Card"; import CardGrid from "@site/src/components/CardGrid"; import StoragePricingCalculator from "@site/src/components/StoragePricingCalculator"; -The Apify platform provides three types of storage accessible both within our [Apify Console](https://console.apify.com/storage) and externally through our [REST API](/api/v2) [Apify API Clients](/api) or [SDKs](/sdk). +The Apify platform provides three types of storage accessible both within [Apify Console](https://console.apify.com/storage) and externally through the [REST API](/api/v2), [Apify API Clients](/api), and [SDKs](/sdk). Named key-value stores are retained indefinitely.
> Unnamed key-value stores expire after 7 days unless otherwise specified.
> [Learn more](/platform/storage/usage#named-and-unnamed-storages) @@ -261,9 +261,9 @@ Check out the [Python SDK documentation](/sdk/python/docs/concepts/storages#work Previously, when using the [Store record](/api/v2/key-value-store-record-put) endpoint, every record was automatically compressed with Gzip before being uploaded. However, this process has been updated. _Now, records are stored exactly as you upload them._ This change means that it is up to you whether the record is stored compressed or uncompressed. -You can compress a record and use the [Content-Encoding request header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding) to let our platform know which compression it uses. We recommend compressing large key-value records to save storage space and network traffic. +You can compress a record and use the [Content-Encoding request header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding) to let the platform know which compression it uses. We recommend compressing large key-value records to save storage space and network traffic. -_Using the [JavaScript SDK](/sdk/js/reference/class/KeyValueStore#setValue) or our [JavaScript API client](/api/client/js/reference/class/KeyValueStoreClient#setRecord) automatically compresses your files._ We advise utilizing the JavaScript API client for data compression prior to server upload and decompression upon retrieval, minimizing storage costs. +_Using the [JavaScript SDK](/sdk/js/reference/class/KeyValueStore#setValue) or the [JavaScript API client](/api/client/js/reference/class/KeyValueStoreClient#setRecord) automatically compresses your files._ We advise utilizing the JavaScript API client for data compression prior to server upload and decompression upon retrieval, minimizing storage costs. ## Share diff --git a/sources/platform/storage/usage.md b/sources/platform/storage/usage.md index a06d3a8741..475274e8c4 100644 --- a/sources/platform/storage/usage.md +++ b/sources/platform/storage/usage.md @@ -130,7 +130,7 @@ Apify securely stores your ten most recent runs indefinitely, ensuring your reco ### Preserve your storages -To ensure indefinite retention of your storages, assign them a name. This can be done via Apify Console or through our API. First, you'll need your store's ID. You can find it in the details of the run that created it. In Apify Console, head over to your run's details and select the **Dataset**, **Key-value store**, or **Request queue** tab as appropriate. Check that store's details, and you will find its ID among them. +To ensure indefinite retention of your storages, assign them a name. This can be done via Apify Console or through the API. First, you'll need your store's ID. You can find it in the details of the run that created it. In Apify Console, head over to your run's details and select the **Dataset**, **Key-value store**, or **Request queue** tab as appropriate. Check that store's details, and you will find its ID among them. ![Finding your store's ID](./images/find-store-id.png)