You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* A lightweight, fast, and optimized XML file splitter with build in tag data validation, written with the XMLParser library. The main goal of this is to split an XML file into multiple small chunks (hence the name), then save it into multiple different little XML files, so that slower servers, plugins etc can process XML files with more than even 10.000+ records. It is built on XMLParser, a powerful php xml processing library.
* The data used for one iteration of the main tag.
43
+
* @var string
44
+
*/
45
+
privatestring$PAYLOAD_TEMP = '';
46
+
47
+
/**
48
+
* A container used to implement validation.
49
+
* @var
50
+
*/
51
+
privatestring$DATA_BETWEEN = '';
52
+
53
+
/**
54
+
* The root tag of the yet-to-process xml file.
55
+
* @var string
56
+
*/
57
+
privatestring$rootTag;
58
+
59
+
/**
60
+
* The charset used for the decoding/encoding process.
61
+
* @var string
62
+
*/
63
+
privatestring$CHARSET;
64
+
65
+
/**
66
+
* The prefix used for the output files.
67
+
* @var string
68
+
*/
69
+
privatestring$outputFilePrefix;
70
+
71
+
/**
72
+
* Counter for the items put into one chunk.
73
+
* @var int
74
+
*/
75
+
privateint$ITEMCOUNT = 0;
76
+
77
+
/**
78
+
* The main tag, of which defines one item in the chunking.
79
+
* @var string
80
+
*/
81
+
privatestring$CHUNKON;
82
+
83
+
/**
84
+
* A variable used for logging.
85
+
* @var string
86
+
*/
87
+
privatestring$log = "";
88
+
89
+
/**
90
+
* The total number of processed main tags.
91
+
* @var int
92
+
*/
93
+
privateint$totalItems = 0;
94
+
95
+
/**
96
+
* A variable that indicates if a maintag that doesn't satisfy the validation has been found.
97
+
* @var bool
98
+
*/
99
+
privatebool$excludedItemFound = false;
100
+
101
+
/**
102
+
* A variable to indicate that the next data that will be read, has to be validated since its opening tag is present in $checkingTags.
103
+
* @var bool
104
+
*/
105
+
privatebool$checkNextData = false;
106
+
107
+
/**
108
+
* A variable that carries the tagname of the data that is about to be validated.
109
+
* @var string
110
+
*/
111
+
privatestring$checkNextDataTag = '';
112
+
113
+
/**
114
+
* An array of tags, where their data has to be validated runtime.
115
+
* @var array
116
+
*/
117
+
privatearray$checkingTags = array();
118
+
119
+
/**
120
+
* A callback function, that processes the validation. Has to be a callable.
121
+
* @var callable
122
+
*/
123
+
private$passesValidation;
124
+
125
+
/**
126
+
* The constructor of the class, it creates an instance of Chunker.
127
+
*
128
+
* @param string $xmlfile The path of the xml file
129
+
* @param int $chunkSize The number of which every little/chunked file should maximum contain from the main XML tag specified lated. **Default: 100**
130
+
* @param string $outputFilePrefix The name that will be the prefix for the chunk's filenames. The pattern is the following: *{outputFilePrefix}{CHUNK_NUMBER}.xml* **Default: 'out-'.** Example files with the default prefix: 'out-1.xml', 'out-2.xml' etc
131
+
* @param callable $validationFunction The validator function to be run every time the parser has found a tag, that is in $checkingTags. If it did, it runs the validator through the tag, and if the function returned **true** (so the tag data was *valid*), it includes it in the chunk, otherwise ignores it. The validator function has to return **bool**, and cannot be **null**. If it is null, a Fatal error will be raised. The passed callback HAS to have the following parameters:
132
+
* - $data: string, the currently processed tag data (what is inside the tag) will be inside this parameter
133
+
* - $tag: string, the currently processed tagname will be inside this parameter
134
+
* @param array $checkingTags This array consists of tagnames where the data inside the tag has to be validated. It can be empty, and can be omitted, if no validation is required (not like the validator function, which HAS to be provided through here, otherwise an error will be raised)
if(empty($xmlfile)) trigger_error("[Chunker] Fatal error: no XML file/empty filestring specified in __construct.", E_USER_ERROR);
140
+
if(!$validationFunction) trigger_error("[Chunker] Fatal error: no callback handler specified for validation checks.", E_USER_ERROR);
141
+
$this->checkingTags = $checkingTags;
142
+
$this->passesValidation = $validationFunction;
143
+
$this->xmlFile = $xmlfile;
144
+
$this->chunkSize = $chunkSize;
145
+
$this->CHUNKS = 0;
146
+
$this->outputFilePrefix = $outputFilePrefix;
147
+
}
148
+
149
+
/**
150
+
* This function processes a whole chunk (max size <= $chunkSize) by writing the **PAYLOAD** into a chunkfile, and resetting all stationary variables.
151
+
* @param bool $lastChunk Indicates if the current is the last chunk in the file. Sometimes if its not indicated, and it is the last chunk, the closing tag is not always present.
* A handler function used by the parser for starting elements. It checks if the currently parsed tag is present in the $checkingTags array, and sets some stationary variables if a validation needs to be done.
180
+
* @param XMLParser $xml The parser
181
+
* @param string $tag the currently parsed tag
182
+
* @param array $attrs An array of attributes of the tag. We dont use it here, so it is only there for syntax purposes
* A handler function used by the parser for ending elements. It checks if the currently parsed main tag had any tags that were present in the $checkingTags array, and had data that couldn't have been validated. If true, the lastly parsed main element will be excluded from the chunking process, and will be written into a chunk file otherwise. If the processed main tag's number has reached the $chunkSize limit, a new chunk will be written to the disk.
* A handler function used by the parser for data between tags. If the $checkNextData stationary property was set to true, then it means, that the currently parsed data has to be validated. It it did not pass the validation, the main element will be flagged as 'excluded from chunking', and will not be written to disk.
250
+
* @param XMLParser $xml The parser
251
+
* @param string $data The data to be handled
252
+
*/
253
+
privatefunctiondataHandler($xml, $data) {
254
+
//GLOBAL $PAYLOAD;
255
+
256
+
257
+
258
+
$this->DATA_BETWEEN .= $data;
259
+
$this->PAYLOAD_TEMP .= $data;
260
+
}
261
+
262
+
/**
263
+
* A handler function, not used by this class, just for formal purposes.
264
+
*/
265
+
privatefunctiondefaultHandler($xml, $data) {
266
+
// a.k.a. Wild Text Fallback Handler, or WTFHandler for short.
267
+
$this->logging("WTF text found: " .$data);
268
+
}
269
+
270
+
/**
271
+
* A helper function that creates the XML parser instance, sets the options for the parsing, and establishes the setup.
272
+
* @param string $CHARSET The charset that will be used by the parser. **Default: "UTF-8"**
273
+
* @param bool $bareXML Indicates if the incoming data is unformatted/maybe invalid XML. Not used in this class.
* A funcion to start the chunking process. It will initiate the parsint instance, and start the XML parsing, along with the chunking of the data in every specified $chunkSize intervals.
293
+
* @param string $mainTag The tag of which will be used to count the number of main elements in a chunk. Usually the second-level XML tag in a document.
294
+
* @param string $rootTag The root tag of which every other $mainTag is the children of. There is only one of this in an XML document (not the XML header, which is in the first row).
295
+
* @param string $charset The character set used by the parser. **Default: UTF-8** Possible values: "UTF-8", "ISO-8859-1"
296
+
*
297
+
* @return string The main log that was created during the chunking
$this->logging("Ended chunking. Total processed '" .$this->CHUNKON."' objects: " .$this->totalItems);
342
+
return$this->log;
343
+
}
344
+
/**
345
+
* Used for administrative purposes. A message can be logged into the internal logging variable, and then later be returned/passed back as value by some functions.
346
+
* @param string $msg The message to be logged
347
+
* @param bool $start Indicates if the logging has to be started over (so the past logged messages will be deleted, and a cleared loggin variable will be set). **Default: false**
0 commit comments