-
Notifications
You must be signed in to change notification settings - Fork 2
Closed
Labels
Description
Now it's failing, check what going on.
It seems you need to grab sessión from previous page (cookie), somethings to add.
integrations/scraping-cuenca-mediterranea/src/api/cuenca.api.ts
First add some header to simulate we are making the request from a web browser:
import axios from 'axios';
import https from 'https';
const httpsAgent = new https.Agent({
rejectUnauthorized: false,
});
const browserHeaders = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
Accept:
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'es-ES,es;q=0.9,en;q=0.8',
};Then
First visits the base SAIH page to establish a session, then fetches the target URL (this could be done in a cleaner way, e.g. add and additiona param to the function).
export async function getCuencaPageHTMLContent(url: string): Promise<string> {
+ const baseUrl = 'https://www.redhidrosurmedioambiente.es/saih/';
+ // First request: visit the base page to get session cookies
+ const sessionResponse = await axios.get(baseUrl, {
+ httpsAgent,
+ maxRedirects: 5,
+ headers: browserHeaders,
+ });
+ // Extract cookies from the response
+ const setCookieHeaders = sessionResponse.headers['set-cookie'];
+ const cookieString = setCookieHeaders
+ ? setCookieHeaders.map((c: string) => c.split(';')[0]).join('; ')
+ : '';
+ // Second request: fetch the target page with session cookies
+ const { data: html } = await axios.get(url, {
+ httpsAgent,
+ maxRedirects: 5,
+ headers: {
+ ...browserHeaders,
+ ...(cookieString ? { Cookie: cookieString } : {}),
+ Referer: baseUrl,
+ },
+ });
return html;
}
Reactions are currently unavailable