Skip to content

Commit 7cd03d4

Browse files
github-actions[bot]CopilotRepo AssistGitHub Copilotdsyme
authored
[Repo Assist] Add schema.org microdata support to HtmlProvider (#1676)
* Add schema.org microdata support to HtmlProvider (closes #611) - Parse itemscope/itemtype/itemprop HTML microdata attributes at design time - Generate a typed 'Schemas' container on HtmlProvider documents - Each schema type (e.g. http://schema.org/Person) becomes a property returning an array of typed items with one property per itemprop name - Items are erased to HtmlSchemaItem at runtime - Property values follow the HTML microdata spec: content attr, href, src, datetime, or inner text depending on element type - Nested itemscope elements are not traversed (correct per spec) - 6 unit tests (HtmlRuntime.getSchemas) + 3 integration tests (HtmlProvider) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: trigger CI checks * Update DesignTime snapshot tests for HTML files with microdata The HtmlProvider now generates schema types for HTML elements with itemscope/itemtype/itemprop attributes. Update the expected signature snapshots for the 4 HTML test files that contain microdata (zoopla.html, zoopla2.html, ebay_cars.htm, imdb_chart.htm) so the DesignTime tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: trigger CI checks * docs: improve HtmlProvider documentation with S&P 500 example and schema.org microdata section - Update introduction to clearly explain table naming, column type inference, and when to use the provider - Add Wikipedia S&amp;P 500 companies example demonstrating groupBy analysis - Add schema.org microdata section showing ProductCatalog and mixed-page samples - Update NuGet stats example with improved regex and formatting - Rename Doctor Who groupBy variable for clarity - Add note about JSON-LD on JS-rendered sites (IMDB, eBay) - Remove outdated 'Introducing the provider' framing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: trigger CI checks * Add JSON-LD structured data support to HtmlProvider Adds native JSON-LD support to HtmlProvider: when an HTML document contains <script type="application/ld+json"> blocks, the provider now generates a typed .JsonLd container (e.g. doc.JsonLd.Article) with one strongly-typed string property per top-level scalar field discovered in the sample. This mirrors the .Schemas container added for HTML microdata and is especially useful for Wikipedia pages (and many modern websites) which embed schema.org Article/WebPage/Person JSON-LD in the <head> element for SEO purposes. Changes: - Add HtmlJsonLdItem and HtmlJsonLdGroup types to HtmlRuntime.fs - Add JsonLdGroup case to HtmlObjectDescription - Add getJsonLd parser (finds script[type=application/ld+json] elements, parses JSON, groups by @type, flattens scalar properties into Map<string,string>) - Add GetJsonLd(id) method to HtmlDocument - Update HtmlGenerator to generate a typed JsonLd container - Add FSharp.Data.Json.Core project reference to Html.Core - Add 5 runtime tests + 6 integration tests - Rewrite HtmlProvider.fsx with Wikipedia JSON-LD examples and summary table of all three formats (tables/microdata/JSON-LD) - Update RELEASE_NOTES.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: trigger CI checks --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Repo Assist <repo-assist@github.com> Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Don Syme <dsyme@users.noreply.github.com>
1 parent 07e6748 commit 7cd03d4

File tree

12 files changed

+1216
-53
lines changed

12 files changed

+1216
-53
lines changed

RELEASE_NOTES.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
## 8.1.0-beta
44

5+
- Add schema.org microdata support to `HtmlProvider`: when an HTML document contains elements with `itemscope`/`itemtype`/`itemprop` attributes, the provider now generates a typed `Schemas` container (e.g. `doc.Schemas.Person`) with one strongly-typed property per `itemprop` name discovered in the sample (closes #611)
6+
- Add JSON-LD support to `HtmlProvider`: when an HTML document contains `<script type="application/ld+json">` blocks, the provider generates a typed `JsonLd` container (e.g. `doc.JsonLd.Article`) with one strongly-typed property per top-level scalar field discovered in the sample — Wikipedia pages, for instance, embed schema.org `Article` JSON-LD with `name`, `headline`, `description`, `url`, `datePublished`, and `dateModified`
57
- Add `ExceptionIfMissing` static parameter to `JsonProvider` and `XmlProvider`: when true, accessing a non-optional field that is missing in the data raises an exception instead of silently returning a default value (empty string for string, NaN for float). Defaults to false for backward compatibility.
68
- Add `Http.ParseLinkHeader` utility for parsing RFC 5988 `Link` response headers (used by GitHub, GitLab, and other paginated APIs) into a `Map<string, string>` from relation name to URL (closes #805)
79
- Add `PreferDateTimeOffset` parameter to `CsvProvider`, `JsonProvider`, and `XmlProvider`: when true, date-time values without an explicit timezone offset are inferred as `DateTimeOffset` (using local offset) instead of `DateTime` (closes #1100, #1072)

0 commit comments

Comments
 (0)