Commit 7cd03d4
[Repo Assist] Add schema.org microdata support to HtmlProvider (#1676)
* Add schema.org microdata support to HtmlProvider (closes #611)
- Parse itemscope/itemtype/itemprop HTML microdata attributes at design time
- Generate a typed 'Schemas' container on HtmlProvider documents
- Each schema type (e.g. http://schema.org/Person) becomes a property
returning an array of typed items with one property per itemprop name
- Items are erased to HtmlSchemaItem at runtime
- Property values follow the HTML microdata spec: content attr, href,
src, datetime, or inner text depending on element type
- Nested itemscope elements are not traversed (correct per spec)
- 6 unit tests (HtmlRuntime.getSchemas) + 3 integration tests (HtmlProvider)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ci: trigger CI checks
* Update DesignTime snapshot tests for HTML files with microdata
The HtmlProvider now generates schema types for HTML elements with
itemscope/itemtype/itemprop attributes. Update the expected signature
snapshots for the 4 HTML test files that contain microdata
(zoopla.html, zoopla2.html, ebay_cars.htm, imdb_chart.htm) so the
DesignTime tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ci: trigger CI checks
* docs: improve HtmlProvider documentation with S&P 500 example and schema.org microdata section
- Update introduction to clearly explain table naming, column type inference,
and when to use the provider
- Add Wikipedia S&P 500 companies example demonstrating groupBy analysis
- Add schema.org microdata section showing ProductCatalog and mixed-page samples
- Update NuGet stats example with improved regex and formatting
- Rename Doctor Who groupBy variable for clarity
- Add note about JSON-LD on JS-rendered sites (IMDB, eBay)
- Remove outdated 'Introducing the provider' framing
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ci: trigger CI checks
* Add JSON-LD structured data support to HtmlProvider
Adds native JSON-LD support to HtmlProvider: when an HTML document contains
<script type="application/ld+json"> blocks, the provider now generates a
typed .JsonLd container (e.g. doc.JsonLd.Article) with one strongly-typed
string property per top-level scalar field discovered in the sample.
This mirrors the .Schemas container added for HTML microdata and is especially
useful for Wikipedia pages (and many modern websites) which embed schema.org
Article/WebPage/Person JSON-LD in the <head> element for SEO purposes.
Changes:
- Add HtmlJsonLdItem and HtmlJsonLdGroup types to HtmlRuntime.fs
- Add JsonLdGroup case to HtmlObjectDescription
- Add getJsonLd parser (finds script[type=application/ld+json] elements,
parses JSON, groups by @type, flattens scalar properties into Map<string,string>)
- Add GetJsonLd(id) method to HtmlDocument
- Update HtmlGenerator to generate a typed JsonLd container
- Add FSharp.Data.Json.Core project reference to Html.Core
- Add 5 runtime tests + 6 integration tests
- Rewrite HtmlProvider.fsx with Wikipedia JSON-LD examples and
summary table of all three formats (tables/microdata/JSON-LD)
- Update RELEASE_NOTES.md
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ci: trigger CI checks
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Repo Assist <repo-assist@github.com>
Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Don Syme <dsyme@users.noreply.github.com>1 parent 07e6748 commit 7cd03d4
File tree
12 files changed
+1216
-53
lines changed- docs/library
- src
- FSharp.Data.DesignTime/Html
- FSharp.Data.Html.Core
- tests
- FSharp.Data.Core.Tests
- FSharp.Data.DesignTime.Tests/expected
- FSharp.Data.Tests
12 files changed
+1216
-53
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
| |||
0 commit comments