Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 31 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,16 +54,17 @@ Check the return for a ```success``` flag. If success is set to true, then the u

## Options

| Name | Info | Default Value | Required |
|----------------------|----------------------------------------------------------------------------|---------------|----------|
| url | URL of the site. | | x |
| html | You can pass in an HTML string to run ogs on it. (use without options.url) | | |
| fetchOptions | Options that are used by the Fetch API | {} | |
| timeout | Request timeout for Fetch (Default is 10 seconds) | 10 | |
| blacklist | Pass in an array of sites you don't want ogs to run on. | [] | |
| onlyGetOpenGraphInfo | Only fetch open graph info and don't fall back on anything else. Also accepts an array of properties for which no fallback should be used | false | |
| customMetaTags | Here you can define custom meta tags you want to scrape. | [] | |
| urlValidatorSettings | Sets the options used by validator.js for testing the URL | [Here](https://github.com/jshemas/openGraphScraper/blob/master/lib/utils.ts#L4-L17) | |
| Name | Info | Default Value | Required |
|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|----------|
| url | URL of the site. | | x |
| html | You can pass in an HTML string to run ogs on it. (use without options.url) | | |
| fetchOptions | Options that are used by the Fetch API | {} | |
| timeout | Request timeout for Fetch (Default is 10 seconds) | 10 | |
| blacklist | Pass in an array of sites you don't want ogs to run on. | [] | |
| onlyGetOpenGraphInfo | Only fetch open graph info and don't fall back on anything else. Also accepts an array of properties for which no fallback should be used | false | |
| customMetaTags | Here you can define custom meta tags you want to scrape. | [] | |
| urlValidatorSettings | Sets the options used by validator.js for testing the URL | [Here](https://github.com/jshemas/openGraphScraper/blob/master/lib/utils.ts#L4-L17) | |
| jsonLDOptions | Sets the options used when parsing JSON-LD data | | |

Note: `open-graph-scraper` uses the [Fetch API](https://nodejs.org/dist/latest-v18.x/docs/api/globals.html#fetch) for requests and most of [Fetch's options](https://developer.mozilla.org/en-US/docs/Web/API/fetch#options) should work as `open-graph-scraper`'s `fetchOptions` options.

Expand Down Expand Up @@ -159,6 +160,26 @@ ogs({ url: 'https://www.wikipedia.org/', fetchOptions: { headers: { 'user-agent'
})
```

## JSON-LD Parsing Options Example

`throwOnJSONParseError` and `logOnJSONParseError` properties control what happens if `JSON.parse`
throws an error when parsing JSON-LD data.
If `throwOnJSONParseError` is set to `true`, then the error will be thrown.
If `logOnJSONParseError` is set to `true`, then the error will be logged to the console.

```javascript
const ogs = require("open-graph-scraper");
const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36';
ogs({ url: 'https://www.wikipedia.org/', jsonLDOptions: { throwOnJSONParseError: true } })
.then((data) => {
const { error, html, result, response } = data;
console.log('error:', error); // This returns true or false. True if there was an error. The error itself is inside the result object.
console.log('html:', html); // This contains the HTML of page
console.log('result:', result); // This contains all of the Open Graph results
console.log('response:', response); // This contains response from the Fetch API
})
```

## Running the example app

Inside the `example` folder contains a simple express app where you can run `npm ci && npm run start` to spin up. Once the app is running, open a web browser and go to `http://localhost:3000/scraper?url=http://ogp.me/` to test it out. There is also a `Dockerfile` if you want to run this example app in a docker container.
11 changes: 10 additions & 1 deletion lib/extract.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
$('meta').each((index, meta) => {
if (!meta.attribs || (!meta.attribs.property && !meta.attribs.name)) return;
const property = meta.attribs.property || meta.attribs.name;
const content: any = meta.attribs.content || meta.attribs.value;

Check warning on line 27 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (18)

Unexpected any. Specify a different type

Check warning on line 27 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (20)

Unexpected any. Specify a different type

Check warning on line 27 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (22)

Unexpected any. Specify a different type
metaFields.forEach((item) => {
if (item && property.toLowerCase() === item.property.toLowerCase()) {
// check if fieldName is one of mediaMapperProperties
Expand Down Expand Up @@ -57,10 +57,10 @@
if (!ogObject[item.fieldName]) {
ogObject[item.fieldName] = [content];
} else {
ogObject[item.fieldName]?.push(content);

Check warning on line 60 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (18)

Unsafe argument of type `any` assigned to a parameter of type `string`

Check warning on line 60 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (20)

Unsafe argument of type `any` assigned to a parameter of type `string`

Check warning on line 60 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (22)

Unsafe argument of type `any` assigned to a parameter of type `string`
}
} else {
ogObject[item.fieldName] = content;

Check warning on line 63 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (18)

Unsafe assignment of an `any` value

Check warning on line 63 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (20)

Unsafe assignment of an `any` value

Check warning on line 63 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (22)

Unsafe assignment of an `any` value
}
}
});
Expand All @@ -70,7 +70,7 @@
if (!ogObject.customMetaTags) ogObject.customMetaTags = {};
if (item && property.toLowerCase() === item.property.toLowerCase()) {
if (!item.multiple) {
ogObject.customMetaTags[item.fieldName] = content;

Check warning on line 73 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (18)

Unsafe assignment of an `any` value

Check warning on line 73 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (20)

Unsafe assignment of an `any` value

Check warning on line 73 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (22)

Unsafe assignment of an `any` value
} else if (!ogObject.customMetaTags[item.fieldName]) {
ogObject.customMetaTags[item.fieldName] = [content];
} else if (Array.isArray(ogObject.customMetaTags[item.fieldName])) {
Expand Down Expand Up @@ -99,7 +99,16 @@
if (scriptText) {
scriptText = scriptText.replace(/(\r\n|\n|\r)/gm, ''); // remove newlines
scriptText = unescapeScriptText(scriptText);
ogObject.jsonLD.push(JSON.parse(scriptText));
try {
ogObject.jsonLD.push(JSON.parse(scriptText));

Check warning on line 103 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (18)

Unsafe argument of type `any` assigned to a parameter of type `object`

Check warning on line 103 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (20)

Unsafe argument of type `any` assigned to a parameter of type `object`

Check warning on line 103 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (22)

Unsafe argument of type `any` assigned to a parameter of type `object`
} catch (error: unknown) {
if (options.jsonLDOptions?.logOnJSONParseError) {
console.error('Error parsing JSON-LD script tag:', error);

Check warning on line 106 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (18)

Unexpected console statement

Check warning on line 106 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (20)

Unexpected console statement

Check warning on line 106 in lib/extract.ts

View workflow job for this annotation

GitHub Actions / buildAndTest (22)

Unexpected console statement
}
if (options.jsonLDOptions?.throwOnJSONParseError) {
throw error;
}
}
}
}
});
Expand Down
9 changes: 9 additions & 0 deletions lib/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ export interface OpenGraphScraperOptions {
timeout?: number;
url?: string;
urlValidatorSettings?: ValidatorSettings;
jsonLDOptions?: JSONLDOptions;
}

/**
Expand Down Expand Up @@ -67,6 +68,14 @@ export interface ValidatorSettings {
validate_length: boolean;
}

/**
* Options for the JSON-LD parser
*/
export interface JSONLDOptions {
throwOnJSONParseError?: boolean;
logOnJSONParseError?: boolean;
}

/**
* The type for user defined custom meta tags you want to scrape.
*
Expand Down
69 changes: 69 additions & 0 deletions tests/unit/static.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,75 @@ describe('static check meta tags', function () {
});
});

it('jsonLD - invalid JSON string that cannot be parsed does not throw error', function () {
const metaHTML = `<html><head>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Organization",
"name": "Blah ",
"sameAs": [
"https:\\\\/\\\\/twitter.com\\\\/blah?lang=en"
"https:\\\\/\\\\/www.facebook.com\\\\/blah\\\\/"
""
"https:\\\\/\\\\/www.instagram.com\\\\/blah\\\\/"
""
""
"https:\\\\/\\\\/www.youtube.com\\\\/@blah"
""
],
"url": "https:\\\\/\\\\/blah.com"
}

</script>
</head></html>`;

mockAgent.get('http://www.test.com')
.intercept({ path: '/' })
.reply(200, metaHTML);

return ogs({ url: 'www.test.com' })
.then(function (data) {
expect(data.result.success).to.be.eql(true);
expect(data.result.requestUrl).to.be.eql('http://www.test.com');
expect(data.result.jsonLD).to.be.eql([]);
expect(data.html).to.be.eql(metaHTML);
expect(data.response).to.be.a('response');
});
});

it('jsonLD - invalid JSON string that cannot be parsed throws error when options.jsonLDOptions.throwOnJSONParseError = true', function () {
const metaHTML = `<html><head>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Organization",
"name": "Blah ",
"sameAs": [
"https:\\\\/\\\\/twitter.com\\\\/blah?lang=en"
"https:\\\\/\\\\/www.facebook.com\\\\/blah\\\\/"
""
"https:\\\\/\\\\/www.instagram.com\\\\/blah\\\\/"
""
""
"https:\\\\/\\\\/www.youtube.com\\\\/@blah"
""
],
"url": "https:\\\\/\\\\/blah.com"
}

</script>
</head></html>`;

mockAgent.get('http://www.test.com')
.intercept({ path: '/' })
.reply(200, metaHTML);

return ogs({ url: 'www.test.com', jsonLDOptions: { throwOnJSONParseError: true } }).catch((data) => {
expect(data.result.success).to.be.eql(false);
});
});

it('encoding - utf-8', function () {
/* eslint-disable max-len */
const metaHTML = `<html><head>
Expand Down
Loading