Skip to content

Fix cuenca mediterraneo #64

@brauliodiez

Description

@brauliodiez

Now it's failing, check what going on.

It seems you need to grab sessión from previous page (cookie), somethings to add.

integrations/scraping-cuenca-mediterranea/src/api/cuenca.api.ts

First add some header to simulate we are making the request from a web browser:

import axios from 'axios';
import https from 'https';

const httpsAgent = new https.Agent({
  rejectUnauthorized: false,
});

const browserHeaders = {
  'User-Agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
  Accept:
    'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
  'Accept-Language': 'es-ES,es;q=0.9,en;q=0.8',
};

Then

First visits the base SAIH page to establish a session, then fetches the target URL (this could be done in a cleaner way, e.g. add and additiona param to the function).

export async function getCuencaPageHTMLContent(url: string): Promise<string> {
+  const baseUrl = 'https://www.redhidrosurmedioambiente.es/saih/';

+  // First request: visit the base page to get session cookies
+  const sessionResponse = await axios.get(baseUrl, {
+    httpsAgent,
+    maxRedirects: 5,
+    headers: browserHeaders,
+  });

+  // Extract cookies from the response
+  const setCookieHeaders = sessionResponse.headers['set-cookie'];
+  const cookieString = setCookieHeaders
+   ? setCookieHeaders.map((c: string) => c.split(';')[0]).join('; ')
+    : '';

+  // Second request: fetch the target page with session cookies
+  const { data: html } = await axios.get(url, {
+    httpsAgent,
+    maxRedirects: 5,
+    headers: {
+      ...browserHeaders,
+      ...(cookieString ? { Cookie: cookieString } : {}),
+      Referer: baseUrl,
+    },
+  });

  return html;
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions