@@ -4,59 +4,59 @@ Feature: Pipeline tests using the books dataset
44 This tests submissions using nested, complex JSON datasets with arrays, and
55 introduces more complex transformations that require aggregation.
66
7- Scenario : Validate complex nested XML data (spark)
8- Given I submit the books file nested_books.xml for processing
9- And A spark pipeline is configured with schema file 'nested_books.dischema.json'
10- And I add initial audit entries for the submission
11- Then the latest audit record for the submission is marked with processing status file_transformation
12- When I run the file transformation phase
13- Then the header entity is stored as a parquet after the file_transformation phase
14- And the nested_books entity is stored as a parquet after the file_transformation phase
15- And the latest audit record for the submission is marked with processing status data_contract
16- When I run the data contract phase
17- Then there is 1 record rejection from the data_contract phase
18- And the header entity is stored as a parquet after the data_contract phase
19- And the nested_books entity is stored as a parquet after the data_contract phase
20- And the latest audit record for the submission is marked with processing status business_rules
21- When I run the business rules phase
22- Then The rules restrict "nested_books" to 3 qualifying records
23- And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
24- And the nested_books entity is stored as a parquet after the business_rules phase
25- And the latest audit record for the submission is marked with processing status error_report
26- When I run the error report phase
27- Then An error report is produced
28- And The statistics entry for the submission shows the following information
29- | parameter | value |
30- | record_count | 4 |
31- | number_record_rejections | 2 |
32- | number_warnings | 0 |
7+ # Scenario: Validate complex nested XML data (spark)
8+ # Given I submit the books file nested_books.xml for processing
9+ # And A spark pipeline is configured with schema file 'nested_books.dischema.json'
10+ # And I add initial audit entries for the submission
11+ # Then the latest audit record for the submission is marked with processing status file_transformation
12+ # When I run the file transformation phase
13+ # Then the header entity is stored as a parquet after the file_transformation phase
14+ # And the nested_books entity is stored as a parquet after the file_transformation phase
15+ # And the latest audit record for the submission is marked with processing status data_contract
16+ # When I run the data contract phase
17+ # Then there is 1 record rejection from the data_contract phase
18+ # And the header entity is stored as a parquet after the data_contract phase
19+ # And the nested_books entity is stored as a parquet after the data_contract phase
20+ # And the latest audit record for the submission is marked with processing status business_rules
21+ # When I run the business rules phase
22+ # Then The rules restrict "nested_books" to 3 qualifying records
23+ # And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
24+ # And the nested_books entity is stored as a parquet after the business_rules phase
25+ # And the latest audit record for the submission is marked with processing status error_report
26+ # When I run the error report phase
27+ # Then An error report is produced
28+ # And The statistics entry for the submission shows the following information
29+ # | parameter | value |
30+ # | record_count | 4 |
31+ # | number_record_rejections | 2 |
32+ # | number_warnings | 0 |
3333
34- Scenario : Validate complex nested XML data (duckdb)
35- Given I submit the books file nested_books.xml for processing
36- And A duckdb pipeline is configured with schema file 'nested_books_ddb.dischema.json'
37- And I add initial audit entries for the submission
38- Then the latest audit record for the submission is marked with processing status file_transformation
39- When I run the file transformation phase
40- Then the header entity is stored as a parquet after the file_transformation phase
41- And the nested_books entity is stored as a parquet after the file_transformation phase
42- And the latest audit record for the submission is marked with processing status data_contract
43- When I run the data contract phase
44- Then there is 1 record rejection from the data_contract phase
45- And the header entity is stored as a parquet after the data_contract phase
46- And the nested_books entity is stored as a parquet after the data_contract phase
47- And the latest audit record for the submission is marked with processing status business_rules
48- When I run the business rules phase
49- Then The rules restrict "nested_books" to 3 qualifying records
50- And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
51- And the nested_books entity is stored as a parquet after the business_rules phase
52- And the latest audit record for the submission is marked with processing status error_report
53- When I run the error report phase
54- Then An error report is produced
55- And The statistics entry for the submission shows the following information
56- | parameter | value |
57- | record_count | 4 |
58- | number_record_rejections | 2 |
59- | number_warnings | 0 |
34+ # Scenario: Validate complex nested XML data (duckdb)
35+ # Given I submit the books file nested_books.xml for processing
36+ # And A duckdb pipeline is configured with schema file 'nested_books_ddb.dischema.json'
37+ # And I add initial audit entries for the submission
38+ # Then the latest audit record for the submission is marked with processing status file_transformation
39+ # When I run the file transformation phase
40+ # Then the header entity is stored as a parquet after the file_transformation phase
41+ # And the nested_books entity is stored as a parquet after the file_transformation phase
42+ # And the latest audit record for the submission is marked with processing status data_contract
43+ # When I run the data contract phase
44+ # Then there is 1 record rejection from the data_contract phase
45+ # And the header entity is stored as a parquet after the data_contract phase
46+ # And the nested_books entity is stored as a parquet after the data_contract phase
47+ # And the latest audit record for the submission is marked with processing status business_rules
48+ # When I run the business rules phase
49+ # Then The rules restrict "nested_books" to 3 qualifying records
50+ # And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
51+ # And the nested_books entity is stored as a parquet after the business_rules phase
52+ # And the latest audit record for the submission is marked with processing status error_report
53+ # When I run the error report phase
54+ # Then An error report is produced
55+ # And The statistics entry for the submission shows the following information
56+ # | parameter | value |
57+ # | record_count | 4 |
58+ # | number_record_rejections | 2 |
59+ # | number_warnings | 0 |
6060
6161 Scenario : Handle a file with a malformed tag (duckdb)
6262 Given I submit the books file malformed_books.xml for processing
0 commit comments