Skip to content

Commit 233ec9d

Browse files
committed
Added README
1 parent 020c040 commit 233ec9d

File tree

1 file changed

+92
-0
lines changed

1 file changed

+92
-0
lines changed

README.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Chunker - A lightweight, glazing fast XML splitter written in PHP
2+
3+
The main goal of this library is to create chunks with predefined sizes from a big XML file (or to 'split' it into multiple chunks, so to say).
4+
5+
The algorithm was written using the XMLParser php library, which is capable of parsing an XML file line to line (or tag to tag) without state-control, and not by a string to string comparison or simple I/O operations. This attribute of the library makes it possible to implement validation on the said tags, everytime they are parsed.
6+
7+
## Usage
8+
### Simple Chunking
9+
The implementation is Object-oriented, so in order to split the files, an instance of Chunker has to be created first.
10+
11+
An example of a simple Chunker instance without validation, with **maximum 100 main tags**/chunk, and with outputfile names of *"out-{CHUNK}.xml"*:
12+
```
13+
$chunkSize = 100;
14+
$outputFilePrefix = "out-";
15+
$xmlfile = "bigFile.xml";
16+
$validationFunction = fn($data, $tag) => {
17+
return true;
18+
}
19+
$checkingTags = array();
20+
21+
$chunker = new Chunker($xmlfile, $chunkSize, $outputFilePrefix, $validationFunction, $checkingTags);
22+
```
23+
24+
25+
### Constructor variables
26+
The following table contains the parameters that can be (and should be) passed to the constructor.
27+
28+
| Parameter | Type | Description | Default value | Is required |
29+
| --------- | ---- | ----------- | ------------- | ----------- |
30+
| $xmlfile | string | The big XML file to be chunked | empty string | Yes |
31+
| $chunkSize | int | The number of main tags maximum in a chunk | 100 | No |
32+
| $outputFilePrefix | string | The prefix that will be used as the filename for the output chunks. Pattern: **'{outputFilePrefix}{CHUNK-NUMBER}.xml'** | 'out-' | No |
33+
| $validationFunction | callable | The validator function that is used everytime a tag found, that is inside $checkingTags. If the tag data passes the validation, it will be included in the chunks, and will not be otherwise. It has to receive **two parameters**: first is the *data* that is inside the tag to be validated, and the second is the *tag* itself (both being strings). It has to **return a boolean**. | null | Yes |
34+
| $checkingTags | array | An array of tags, where their data has to be validated using the $validationFunction callable. If we don't want any validation, we can pass an empty array to this parameter, or not specify it at all since it's not required. | empy array | No |
35+
36+
If any of the required parameters are empty/not specified, a Fatal error will be raised.
37+
38+
### Launch the chunking!
39+
40+
After you created an instance of Chunker, and all the parameters were set, you can start the chunking process. You can do this with the `Chunker::chunkXML` method. An example is shown below:
41+
```
42+
// ... the instance is created in $chunker
43+
$chunker.chunkXML("item", "root");
44+
```
45+
46+
This example will create xml chunks from the big file (if validation is enabled, then only the validated main tags will be included), with `$chunkSize` number of *main tags* (here it's called **"item"**). Every main tag is enclosed between one *root tag* (here it's called **"root"**) in every file (so every chunked file will contain **one root tag**, and `$chunkSize` number of **main tags inside** it).
47+
48+
THe method returns the logging session's string conversion (see below for more information).
49+
50+
## Logging
51+
52+
The class has an implemented logging feature. Everytime the `Chunker::chunkXML` is run, a new logging session is launched, which can be retrieved with the very same function. After its run, it returns the logging session converted into string:
53+
```
54+
// ...
55+
$log = $chunker.chunkXML(....);
56+
echo $log;
57+
58+
/*
59+
60+
[timestamp] Starting new chunking...
61+
[timestamp] ..
62+
[timestamp] ..
63+
*/
64+
65+
```
66+
It is really helpful, when something is not working for your needs, and has to be debugged from step to step. **It is not neccessary to catch, so you can just call the function like its return value is void.**
67+
68+
## Examples
69+
70+
### Basic validation
71+
72+
Lets say, that you have an XML file (*"feed.xml"*) with a **Shop** root element, and multiple **shopItem** elements inside it (10.000+). You want it to split into files named *"feed-{chunk}.xml"* containing 1000 **shopItem**s maximum. And you also want to only include **shopItem**s, that has a *weight_kg* tag inside, which can only be greater than 10 (or '10 kgs'). The solution is like the following:
73+
74+
```
75+
$chunkSize = 1000;
76+
$xmlfile = "feed.xml";
77+
$outPrefix = "feed-";
78+
$checkingTags = array("weight_kg");
79+
function validation($data, $tag) {
80+
if($tag == "weight_kg"){
81+
if(!empty($data) && intval($data) > 0) return true;
82+
}
83+
return false;
84+
}
85+
86+
$mainTag = "shopItem";
87+
$rootTag = "Shop";
88+
89+
$chunker = new Chunker($xmlfile, $chunkSize, $outPrefix, "validation", $chekingTags);
90+
$chunker.chunkXML($mainTag, $rootTag);
91+
```
92+

0 commit comments

Comments
 (0)