Skip to content

Support for wildcards (*) in rml:source for files? #269

@faubulous

Description

@faubulous

Hello everyone,

We're evaluating the use of RML to implement a ETL pipeline that operates on a large amount JSON data in the form of files which are regularly updated. Since these are millions of files with the same structure, it is very inefficient for us to compute an explicit RML rule set for every single file that a mapping is applied to. From the examples that are documented here I can only see that you must specify the file name in the logical source like this:

<#PersonMapping>
  rml:logicalSource [
    rml:source "People.json";
   ...

This limits the applicability of the rule set to a single file only. Is there a way to add wildcard operators to rml:source so that it can be applied to more than one file based on a pattern? Something like this:

<#PersonMapping>
  rml:logicalSource [
    rml:source "persons/*.json";

What do you think about this? Are there any reasons why this is a bad idea?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions