From b64a0e05f761a56d96a7c07d3fef7398269c0599 Mon Sep 17 00:00:00 2001
From: "daniel.watson"
+ * This case separates tokens on uppercase ascii alpha characters, with the exception
+ * that the first token begin with a lowercase ascii alpha character.
+ *
+ * Parses each character of the string parameter and creates new tokens when uppercase ascii
+ * letters are encountered. The upppercase letter is considered part of the new token. The very
+ * first character of the string is an exception to this rule and must be a lowercase ascii
+ * character. This method places no other restrictions on the content of the string.
+ * Iterates each token and creates a camel case formatted string. Each token must begin with an
+ * ascii letter, which will be forced uppercase in the output, except for the very first token,
+ * which will have a lowercase first character. The remaining characters in all tokens will be
+ * forced lowercase. This Case does not support empty tokens.
+ * Tokens are iterated on and appended to an output stream, with an instance of a
+ * delimiter character between them. This method validates that the delimiter character is not
+ * part of the token. If it is found within the token an exception is thrown.
+ * Input string is parsed one character at a time until a delimiter character is reached.
+ * When a delimiter character is reached a new token begins. The delimiter character is
+ * considered reserved, and is omitted from the returned parsed tokens.
+ * KebabCase is a delimited case where the delimiter is a hyphen character '-'.
+ *
+ * PascalCase is a case where tokens are delimited by uppercase characters. Each parsed token
+ * must begin with an uppercase character, but the case of the remaining token characters is
+ * ignored and returned as-is.
+ *
+ * String characters are iterated over and any time an upper case ascii character is
+ * encountered, that character is considered to be the start of a new token, with the character
+ * itself included in the token. This method should never return empty tokens. The first
+ * character of the string must be an uppercase ascii character. No further restrictions are
+ * placed on string contents.
+ *
+ * Iterates the tokens and formates each one into a Pascal Case token. The first character of
+ * the token must be an ascii alpha character. This character is forced upper case in the
+ * output. The remaining alpha characters of the token are forced lowercase. Any other
+ * characters in the token are returned as-is. Empty tokens are not supported.
+ *
+ * SnakeCase is a delimited case where the delimiter is the underscore character '_'.
+ * Provides algorithms for parsing and formatting various programming "Cases". The provided implementations are for the four most common cases:
- * PascalCase is a case where tokens are delimited by uppercase characters. Each parsed token
+ * PascalCase is a case where tokens are delimited by uppercase ASCII characters. Each parsed token
* must begin with an uppercase character, but the case of the remaining token characters is
* ignored and returned as-is.
*
- * String characters are iterated over and any time an upper case ascii character is
+ * String characters are iterated over and any time an upper case ASCII character is
* encountered, that character is considered to be the start of a new token, with the character
* itself included in the token. This method should never return empty tokens. The first
- * character of the string must be an uppercase ascii character. No further restrictions are
+ * character of the string must be an uppercase ASCII character. No further restrictions are
* placed on string contents.
*
- * Iterates the tokens and formates each one into a Pascal Case token. The first character of
- * the token must be an ascii alpha character. This character is forced upper case in the
+ * Iterates the tokens and formats each one into a Pascal Case token. The first character of
+ * the token must be an ASCII alpha character. This character is forced upper case in the
* output. The remaining alpha characters of the token are forced lowercase. Any other
* characters in the token are returned as-is. Empty tokens are not supported.
*
- * This case separates tokens on uppercase ascii alpha characters, with the exception
- * that the first token begin with a lowercase ascii alpha character.
+ * This case separates tokens on uppercase ASCII alpha characters, with the exception
+ * that the first token begin with a lowercase ASCII alpha character.
*
- * Parses each character of the string parameter and creates new tokens when uppercase ascii
- * letters are encountered. The upppercase letter is considered part of the new token. The very
- * first character of the string is an exception to this rule and must be a lowercase ascii
+ * Parses each character of the string parameter and creates new tokens when uppercase ASCII
+ * letters are encountered. The uppercase letter is considered part of the new token. The very
+ * first character of the string is an exception to this rule and must be a lowercase ASCII
* character. This method places no other restrictions on the content of the string.
* Iterates each token and creates a camel case formatted string. Each token must begin with an
- * ascii letter, which will be forced uppercase in the output, except for the very first token,
+ * ASCII letter, which will be forced uppercase in the output, except for the very first token,
* which will have a lowercase first character. The remaining characters in all tokens will be
* forced lowercase. This Case does not support empty tokens.
- * This case separates tokens on uppercase ASCII alpha characters, with the exception
- * that the first token begin with a lowercase ASCII alpha character.
+ * This case separates tokens on uppercase ASCII alpha characters. Each token begins with an
+ * uppercase ASCII alpha character, except the first token, which begins with a lowercase ASCII
+ * alpha character.
*
- * Iterates each token and creates a camel case formatted string. Each token must begin with an
+ * Iterates over tokens and creates a camel case formatted string. Each token must begin with an
* ASCII letter, which will be forced uppercase in the output, except for the very first token,
* which will have a lowercase first character. The remaining characters in all tokens will be
* forced lowercase. This Case does not support empty tokens.
+ * Note: This method should never produce empty tokens.
+ *
+ * No other restrictions are placed on token contents.
+ *
+ * No other restrictions are placed on the contents of the tokens.
+ * Note: This Case does support empty tokens.
+ *
+ * No other restrictions are placed on the contents of the input string.
+ *
+ * CamelCase - delimited by ascii uppercase alpha characters and always beginning with a lowercase ascii alpha
+ * PascalCase - Similar to CamelCase but always begins with an uppercase ascii alpha
+ * DelimitedCase - delimited by a constant character, which is omitted from parsed tokens
+ * SnakeCase - implementation of DelimitedCase in which the delimiter is an underscore '_'
+ * KebabCase - implementation of DelimitedCase in which the delimiter is a hyphen '-'
+ *
* Note: This method should never produce empty tokens.
*
* No other restrictions are placed on token contents.
*
diff --git a/src/main/java/org/apache/commons/text/cases/DelimitedCase.java b/src/main/java/org/apache/commons/text/cases/DelimitedCase.java
index 8070504ad2..f13988318e 100644
--- a/src/main/java/org/apache/commons/text/cases/DelimitedCase.java
+++ b/src/main/java/org/apache/commons/text/cases/DelimitedCase.java
@@ -39,7 +39,7 @@ public class DelimitedCase implements Case {
* Constructs a new Delimited Case.
* @param delimiter the character to use as both the parse and format delimiter
*/
- public DelimitedCase(char delimiter) {
+ protected DelimitedCase(char delimiter) {
this(new char[] { delimiter }, CharUtils.toString(delimiter));
}
@@ -48,7 +48,7 @@ public DelimitedCase(char delimiter) {
* @param parseDelimiters The array of delimiters to use when parsing
* @param formatDelimiter The delimiter to use when formatting
*/
- public DelimitedCase(char[] parseDelimiters, String formatDelimiter) {
+ protected DelimitedCase(char[] parseDelimiters, String formatDelimiter) {
super();
if (parseDelimiters == null || parseDelimiters.length == 0) {
throw new IllegalArgumentException("Parse Delimiters cannot be null or empty");
diff --git a/src/main/java/org/apache/commons/text/cases/KebabCase.java b/src/main/java/org/apache/commons/text/cases/KebabCase.java
index af1860e1f9..485774cd16 100644
--- a/src/main/java/org/apache/commons/text/cases/KebabCase.java
+++ b/src/main/java/org/apache/commons/text/cases/KebabCase.java
@@ -25,15 +25,15 @@
public class KebabCase extends DelimitedCase {
/** constant for delimiter. */
- public static final char DELIMITER = '-';
+ private static final char DELIMITER = '-';
- /** constant reuseable instance of this case. */
+ /** constant reusable instance of this case. */
public static final KebabCase INSTANCE = new KebabCase();
/**
* Constructs a new KebabCase instance.
*/
- public KebabCase() {
+ private KebabCase() {
super(DELIMITER);
}
diff --git a/src/main/java/org/apache/commons/text/cases/PascalCase.java b/src/main/java/org/apache/commons/text/cases/PascalCase.java
index c2e0fd92ba..1f6c9ebc43 100644
--- a/src/main/java/org/apache/commons/text/cases/PascalCase.java
+++ b/src/main/java/org/apache/commons/text/cases/PascalCase.java
@@ -38,7 +38,7 @@ public class PascalCase implements Case {
/**
* Constructs a new PascalCase instance.
*/
- public PascalCase() {
+ private PascalCase() {
}
/**
diff --git a/src/main/java/org/apache/commons/text/cases/SnakeCase.java b/src/main/java/org/apache/commons/text/cases/SnakeCase.java
index db14ac4cfb..63b0266ad0 100644
--- a/src/main/java/org/apache/commons/text/cases/SnakeCase.java
+++ b/src/main/java/org/apache/commons/text/cases/SnakeCase.java
@@ -25,15 +25,15 @@
public class SnakeCase extends DelimitedCase {
/** constant for delimiter. */
- public static final char DELIMITER = '_';
+ private static final char DELIMITER = '_';
- /** constant reuseable instance of this case. */
+ /** constant reusable instance of this case. */
public static final SnakeCase INSTANCE = new SnakeCase();
/**
* Constructs a new SnakeCase instance.
*/
- public SnakeCase() {
+ private SnakeCase() {
super(DELIMITER);
}
From 67ccbb8921ceb0ddc8d628f289bd1c006e924942 Mon Sep 17 00:00:00 2001
From: "daniel.watson"
* Note: This method should never produce empty tokens.
*
* No other restrictions are placed on token contents.
*
* Iterates over tokens and creates a camel case formatted string. Each token must begin with an
- * ASCII letter, which will be forced uppercase in the output, except for the very first token,
+ * ASCII letter, which will be converted to uppercase in the output, except for the very first token,
* which will have a lowercase first character. The remaining characters in all tokens will be
- * forced lowercase. This Case does not support empty tokens.
+ * converted to lowercase. This Case does not support empty tokens.
* No other restrictions are placed on token contents.
*
- * This case separates tokens on uppercase ASCII alpha characters. Each token begins with an - * uppercase ASCII alpha character, except the first token, which begins with a lowercase ASCII - * alpha character. + * This case separates tokens on uppercase Unicode letter characters, according to the logic in {@link java.lang.Character#toUpperCase} + * and {@link java.lang.Character#toLowerCase} which should following the mapping present in + * the Unicode data file. + * Each token begins with an + * uppercase unicode letter, except the first token, which begins with a lowercase unicode letter character. *
*/ public final class CamelCase implements Case { @@ -45,15 +44,15 @@ private CamelCase() { /** * Parses string tokens from a Camel Case formatted string. *
- * Parses each character of the string parameter and creates new tokens when uppercase ASCII
+ * Parses each character of the string parameter and creates new tokens when uppercase Unicode
* letters are encountered. The uppercase letter is considered part of the new token. The very
- * first character of the string is an exception to this rule and must be a lowercase ASCII
- * character. This method places no other restrictions on the content of the string.
+ * first character of the string is an exception to this rule and must be a lowercase Unicode
+ * letter. This method places no other restrictions on the content of the string.
* Note: This method should never produce empty tokens.
*
- * Iterates over tokens and creates a camel case formatted string. Each token must begin with an - * ASCII letter, which will be converted to uppercase in the output, except for the very first token, + * Iterates over tokens and creates a camel case formatted string. Each token must begin with a + * Unicode lower/upper cased letter, which will be converted to uppercase in the output, except for the very first token, * which will have a lowercase first character. The remaining characters in all tokens will be * converted to lowercase. This Case does not support empty tokens. * No other restrictions are placed on token contents. *
* @param tokens string tokens to format into camel case * @return camel case formatted string - * @throws IllegalArgumentException if any tokens are empty String or do not begin with ASCII alpha characters + * @throws IllegalArgumentException if any tokens are empty String or do not begin with Unicode upper/lower letter characters */ @Override public String format(Iterable
@@ -60,15 +57,15 @@ public List
- * This case separates tokens on uppercase Unicode letter characters, according to the logic in {@link java.lang.Character#toUpperCase}
- * and {@link java.lang.Character#toLowerCase} which should following the mapping present in
- * the Unicode data file.
- * Each token begins with an
- * uppercase unicode letter, except the first token, which begins with a lowercase unicode letter character.
+ * CamelCase is a case where tokens are delimited by upper case unicode characters. The very first
+ * token should begin with lower or non cased character, and any subsequent tokens begin with an
+ * upper case character. All remaining characters will be lower cased or non cased.
*
- * Parses each character of the string parameter and creates new tokens when uppercase Unicode
- * letters are encountered. The uppercase letter is considered part of the new token. The very
- * first character of the string is an exception to this rule and must be a lowercase Unicode
- * letter. This method places no other restrictions on the content of the string.
- * Iterates over tokens and creates a camel case formatted string. Each token must begin with a
- * Unicode lower/upper cased letter, which will be converted to uppercase in the output, except for the very first token,
- * which will have a lowercase first character. The remaining characters in all tokens will be
- * converted to lowercase. This Case does not support empty tokens.
- * No other restrictions are placed on token contents.
- *
- * PascalCase is a case where tokens are delimited by uppercase ASCII characters. Each parsed token
- * must begin with an uppercase character, but the case of the remaining token characters is
- * ignored and returned as-is.
+ * PascalCase is a case where tokens are delimited by upper case unicode characters. Each parsed token
+ * begins with an upper case character, and remaining token characters are either lower case or non cased.
*
- * String characters are iterated over and any time an upper case ASCII character is
- * encountered, that character is considered to be the start of a new token, with the character
- * itself included in the token. This method should never return empty tokens. The first
- * character of the string must be an uppercase ASCII character. No further restrictions are
- * placed on string contents.
- *
- * Iterates the tokens and formats each one into a Pascal Case token. The first character of
- * the token must be an ASCII alpha character. This character is forced upper case in the
- * output. The remaining alpha characters of the token are forced lowercase. Any other
- * characters in the token are returned as-is. Empty tokens are not supported.
- *
+ * String characters are iterated over and when an upper case unicode character is
+ * encountered, that character is considered to be the start of a new token, with the character
+ * itself included in the token. This method will never return empty tokens.
+ *
+ * Iterates the tokens and formats each one into a token where the first character of the token
+ * is forced upper case in the output. The remaining characters of the token will be lower case
+ * or non cased. Conversions to lower case are attempted and any conversion that is not possible
+ * throws an exception. Any other characters in the token are returned as-is. Empty tokens are
+ * not supported and will cause an exception to be thrown.
+ *
- * Note: This method should never produce empty tokens.
- *
* CamelCase is a case where tokens are delimited by upper case unicode characters. The very first - * token should begin with lower or non cased character, and any subsequent tokens begin with an + * token should begin with a lower case character, and any subsequent tokens begin with an * upper case character. All remaining characters will be lower cased or non cased. *
*/ public final class CamelCase extends UpperCaseDelimitedCase { - /** constant reusable instance of this case. */ + /** Constant reusable instance of this case. */ public static final CamelCase INSTANCE = new CamelCase(); /** diff --git a/src/main/java/org/apache/commons/text/cases/CharacterDelimitedCase.java b/src/main/java/org/apache/commons/text/cases/CharacterDelimitedCase.java index 7f7fe51c6f..a3ed9d1048 100644 --- a/src/main/java/org/apache/commons/text/cases/CharacterDelimitedCase.java +++ b/src/main/java/org/apache/commons/text/cases/CharacterDelimitedCase.java @@ -29,10 +29,10 @@ */ public class CharacterDelimitedCase implements Case { - /** delimiters to be used when parsing. */ + /** Delimiters to be used when parsing. */ private Set- * PascalCase is a case where tokens are delimited by upper case unicode characters. Each parsed token + * PascalCase tokens are delimited by upper case unicode characters. Each parsed token * begins with an upper case character, and remaining token characters are either lower case or non cased. *
*/ public final class PascalCase extends UpperCaseDelimitedCase { - /** constant reusable instance of this case. */ + /** Constant reusable instance of this case. */ public static final PascalCase INSTANCE = new PascalCase(); /** diff --git a/src/main/java/org/apache/commons/text/cases/SnakeCase.java b/src/main/java/org/apache/commons/text/cases/SnakeCase.java index 4a33e2dce8..b6e1ae74d3 100644 --- a/src/main/java/org/apache/commons/text/cases/SnakeCase.java +++ b/src/main/java/org/apache/commons/text/cases/SnakeCase.java @@ -24,10 +24,10 @@ */ public final class SnakeCase extends CharacterDelimitedCase { - /** constant for delimiter. */ + /** Constant for delimiter. */ private static final char DELIMITER = '_'; - /** constant reusable instance of this case. */ + /** Constant reusable instance of this case. */ public static final SnakeCase INSTANCE = new SnakeCase(); /** diff --git a/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java b/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java index 17b313396e..17526daaa6 100644 --- a/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java +++ b/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java @@ -24,7 +24,7 @@ */ public class UpperCaseDelimitedCase implements Case { - /** flag to indicate whether the first character of the first token should be upper cased. */ + /** Flag to indicate whether the first character of the first token should be upper cased. */ private boolean lowerCaseFirstCharacter = false; /** @@ -87,7 +87,7 @@ public List- * CamelCase is a case where tokens are delimited by upper case unicode characters. The very first + * CamelCase is a case where tokens are delimited by upper case Unicode characters. The very first * token should begin with a lower case character, and any subsequent tokens begin with an - * upper case character. All remaining characters will be lower cased or non cased. + * upper case character. All remaining characters will be lower case or non cased. *
*/ public final class CamelCase extends UpperCaseDelimitedCase { diff --git a/src/main/java/org/apache/commons/text/cases/Case.java b/src/main/java/org/apache/commons/text/cases/Case.java index 99b7f9a0ed..5f4f089452 100644 --- a/src/main/java/org/apache/commons/text/cases/Case.java +++ b/src/main/java/org/apache/commons/text/cases/Case.java @@ -19,8 +19,8 @@ import java.util.List; /** - * Handles formatting and parsing tokens to/from a String. For most implementations tokens returned - * by the parse method should abide by any restrictions present in the format method. i.e. calling + * Formats and parses tokens to/from a String. In most implementations tokens returned + * by the parse method abide by any restrictions present in the format method. That is, calling * format() with the results of a call to parse() on the same Case instance should return a * matching String. * diff --git a/src/main/java/org/apache/commons/text/cases/PascalCase.java b/src/main/java/org/apache/commons/text/cases/PascalCase.java index d3fe08190a..3d298cdb37 100644 --- a/src/main/java/org/apache/commons/text/cases/PascalCase.java +++ b/src/main/java/org/apache/commons/text/cases/PascalCase.java @@ -19,7 +19,7 @@ /** * Case implementation which parses and formats strings of the form 'MyPascalString' *- * PascalCase tokens are delimited by upper case unicode characters. Each parsed token + * PascalCase tokens are delimited by upper case Unicode characters. Each parsed token * begins with an upper case character, and remaining token characters are either lower case or non cased. *
*/ diff --git a/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java b/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java index 17526daaa6..d8764ecd43 100644 --- a/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java +++ b/src/main/java/org/apache/commons/text/cases/UpperCaseDelimitedCase.java @@ -24,7 +24,7 @@ */ public class UpperCaseDelimitedCase implements Case { - /** Flag to indicate whether the first character of the first token should be upper cased. */ + /** Flag to indicate whether the first character of the first token should be upper case. */ private boolean lowerCaseFirstCharacter = false; /** @@ -37,9 +37,9 @@ protected UpperCaseDelimitedCase(boolean lowerCaseFirstCharacter) { /** * Parses a string into tokens. *- * String characters are iterated over and when an upper case unicode character is - * encountered, that character is considered to be the start of a new token, with the character - * itself included in the token. This method will never return empty tokens. + * String characters are iterated over and when an upper case Unicode character is + * encountered, that character starts a new token, with the character + * itself included in the token. This method never returns empty tokens. *
* @param string the string to parse * @return the list of tokens found in the string @@ -90,8 +90,8 @@ public ListProvides algorithms for parsing and formatting various programming "Cases".
- *The provided implementations are for the four most common cases:
- * CamelCase - delimited by ascii uppercase alpha characters and always beginning with a lowercase ascii alpha
- * PascalCase - Similar to CamelCase but always begins with an uppercase ascii alpha
- * DelimitedCase - delimited by a constant character, which is omitted from parsed tokens
- * SnakeCase - implementation of DelimitedCase in which the delimiter is an underscore '_'
- * KebabCase - implementation of DelimitedCase in which the delimiter is a hyphen '-'
+ *
Two base classes are provided to hold functionality common to multiple cases:
+ * UpperCaseDelimitedCase - delimited by upper case characters.
+ * DelimitedCase - delimited by a constant character, which is omitted from parsed tokens.
+ * Four full implementations are provided for the most widely used cases:
+ * CamelCase - extension of UpperCaseDelimitedCase where first character must be lower case.
+ * PascalCase - extension of UpperCaseDelimitedCase where first character must be upper case.
+ * SnakeCase - extension of DelimitedCase in which the delimiter is an underscore '_'.
+ * KebabCase - extension of DelimitedCase in which the delimiter is a hyphen '-'.
*
- * Tokens are iterated on and appended to an output stream, with an instance of a
- * delimiter character between them. This method validates that the delimiter character is not
- * part of the token. If it is found within the token an exception is thrown.
- * No other restrictions are placed on the contents of the tokens.
- * Note: This Case does support empty tokens.
+ * Tokens are appended to a string, with a delimiter between them. This method
+ * validates that the delimiter character is not part of the token. If it is found within the
+ * token an exception is thrown.
+ * No other restrictions are placed on the contents of the tokens. Note: This Case does support
+ * empty tokens.
*
- * Input string is parsed one character at a time until a delimiter character is reached.
+ * Input string is parsed one character at a time until a delimiter is reached.
* When a delimiter character is reached a new token begins. The delimiter character is
* considered reserved, and is omitted from the returned parsed tokens.
+ * Thread=safe.
+ *
From 992ac13b0d03ceeedce78857dd1a0ac1bd57a450 Mon Sep 17 00:00:00 2001
From: theshoeshiner <2922868+theshoeshiner@users.noreply.github.com>
Date: Tue, 28 Nov 2023 16:05:40 -0500
Subject: [PATCH 43/52] test omit delimiter flag
---
.../commons/text/StringTokenizerTest.java | 34 +++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/src/test/java/org/apache/commons/text/StringTokenizerTest.java b/src/test/java/org/apache/commons/text/StringTokenizerTest.java
index 458cc813f6..57f0ad2b39 100644
--- a/src/test/java/org/apache/commons/text/StringTokenizerTest.java
+++ b/src/test/java/org/apache/commons/text/StringTokenizerTest.java
@@ -375,6 +375,40 @@ public void testBasicIgnoreTrimmed4() {
assertFalse(tok.hasNext());
}
+ @Test
+ public void testOmitDelimiter1() {
+ final String input = "AbcDefGhi";
+ final StringTokenizer tok = new StringTokenizer(input, StringMatcherFactory.INSTANCE.uppercaseMatcher());
+ tok.setOmitDelimiterMatches(false);
+ assertEquals("Abc", tok.next());
+ assertEquals("Def", tok.next());
+ assertEquals("Ghi", tok.next());
+ assertFalse(tok.hasNext());
+ }
+
+ @Test
+ public void testOmitDelimiter2() {
+ final String input = "Abc:Def:Ghi";
+ final StringTokenizer tok = new StringTokenizer(input, ':');
+ tok.setOmitDelimiterMatches(false);
+ assertEquals("Abc", tok.next());
+ assertEquals(":Def", tok.next());
+ assertEquals(":Ghi", tok.next());
+ assertFalse(tok.hasNext());
+ }
+
+ @Test
+ public void testOmitDelimiter3() {
+ final String input = "Abc :Def :Ghi ";
+ final StringTokenizer tok = new StringTokenizer(input, ':');
+ tok.setTrimmerMatcher(StringMatcherFactory.INSTANCE.trimMatcher());
+ tok.setOmitDelimiterMatches(false);
+ assertEquals("Abc", tok.next());
+ assertEquals(":Def", tok.next());
+ assertEquals(":Ghi", tok.next());
+ assertFalse(tok.hasNext());
+ }
+
@Test
public void testBasicQuoted1() {
final String input = "a 'b' c";
From a50975ca245e7f1b1db814d88c7c198f14e760b6 Mon Sep 17 00:00:00 2001
From: theshoeshiner <2922868+theshoeshiner@users.noreply.github.com>
Date: Tue, 28 Nov 2023 17:43:48 -0500
Subject: [PATCH 44/52] converted cases api to use StringTokenizer and
TokenStringifier logic
---
.../apache/commons/text/TokenFormatter.java | 5 +
.../commons/text/TokenFormatterFactory.java | 114 ++++++++++++++++
.../apache/commons/text/TokenStringifier.java | 81 ++++++++++++
.../apache/commons/text/cases/CamelCase.java | 7 +-
.../text/cases/CharacterDelimitedCase.java | 125 +++---------------
.../text/cases/PascalTokenFormatter.java | 103 +++++++++++++++
.../text/cases/UpperCaseDelimitedCase.java | 120 +++--------------
.../apache/commons/text/cases/CasesTest.java | 74 +++++++++--
8 files changed, 407 insertions(+), 222 deletions(-)
create mode 100644 src/main/java/org/apache/commons/text/TokenFormatter.java
create mode 100644 src/main/java/org/apache/commons/text/TokenFormatterFactory.java
create mode 100644 src/main/java/org/apache/commons/text/TokenStringifier.java
create mode 100644 src/main/java/org/apache/commons/text/cases/PascalTokenFormatter.java
diff --git a/src/main/java/org/apache/commons/text/TokenFormatter.java b/src/main/java/org/apache/commons/text/TokenFormatter.java
new file mode 100644
index 0000000000..5dd90613c0
--- /dev/null
+++ b/src/main/java/org/apache/commons/text/TokenFormatter.java
@@ -0,0 +1,5 @@
+package org.apache.commons.text;
+
+public interface TokenFormatter {
+ String format(char[] prior, int tokenIndex, char[] token);
+}
diff --git a/src/main/java/org/apache/commons/text/TokenFormatterFactory.java b/src/main/java/org/apache/commons/text/TokenFormatterFactory.java
new file mode 100644
index 0000000000..e3789a0077
--- /dev/null
+++ b/src/main/java/org/apache/commons/text/TokenFormatterFactory.java
@@ -0,0 +1,114 @@
+package org.apache.commons.text;
+
+import org.apache.commons.lang3.StringUtils;
+
+public class TokenFormatterFactory {
+
+ /**
+ * Token formatter that returns the token as is.
+ */
+ public static class NoOpFormatter implements TokenFormatter {
+ @Override
+ public String format(char[] prior, int tokenIndex, char[] token) {
+ return new String(token);
+ }
+
+ }
+
+ /**
+ * Token formatter that always returns a constant string, and optionally checks the passed in token
+ * for the constant and throws an error when found.
+ */
+ public static class ConstantTokenFormatter implements TokenFormatter {
+
+ /**
+ * The constant to return.
+ */
+ private char[] constant;
+
+ /**
+ * Whether or not to throw an exception if the constant is found.
+ */
+ private boolean failOnConstantFound = true;
+
+ public ConstantTokenFormatter(char constant) {
+ this(new char[] {constant}, true);
+ }
+
+ public ConstantTokenFormatter(char constant, boolean failOnConstantFound) {
+ this(new char[] {constant}, failOnConstantFound);
+ }
+
+ public ConstantTokenFormatter(String constant) {
+ this(constant, true);
+ }
+
+ public ConstantTokenFormatter(String constant, boolean failOnConstantFound) {
+ this(constant.toCharArray(), failOnConstantFound);
+ }
+
+ public ConstantTokenFormatter(char[] constant, boolean failOnConstantFound) {
+ this.constant = constant;
+ this.failOnConstantFound = failOnConstantFound;
+ }
+
+ @Override
+ public String format(char[] prior, int tokenIndex, char[] token) {
+ if (failOnConstantFound) {
+ for (int i = 0; i < token.length; i++) {
+ boolean match = false;
+ int t = i;
+ for (int j = 0; j < constant.length; j++) {
+ if (token[t] == constant[j]) {
+ match = true;
+ } else {
+ match = false;
+ break;
+ }
+ t++;
+ }
+ if (match) {
+ throw new IllegalArgumentException("Token " + tokenIndex + " contains illegal character '" + new String(constant) + "' at index " + t);
+ }
+ }
+ }
+
+ return new String(constant);
+ }
+
+ /**
+ * Set whether to check the token for the constant.
+ * @param checkTokenForConstant whether to check.
+ */
+ public void setFailOnConstantFound(boolean checkTokenForConstant) {
+ this.failOnConstantFound = checkTokenForConstant;
+ }
+
+ }
+
+ /**
+ * Reuseable NoOpFormatter instance.
+ */
+ private static final NoOpFormatter NOOP_FORMATTER = new NoOpFormatter();
+
+ /**
+ * Reuseable Empty String formatter instance.
+ */
+ private static final ConstantTokenFormatter EMPTY_STRING_FORMATTER = new ConstantTokenFormatter(StringUtils.EMPTY, false);
+
+ public static NoOpFormatter noOpFormatter() {
+ return NOOP_FORMATTER;
+ }
+
+ public static ConstantTokenFormatter constantFormatter(char[] constant, boolean failOnConstant) {
+ return new ConstantTokenFormatter(constant, failOnConstant);
+ }
+
+ public static ConstantTokenFormatter constantFormatter(char constant, boolean failOnConstant) {
+ return new ConstantTokenFormatter(constant, failOnConstant);
+ }
+
+ public static ConstantTokenFormatter emptyFormatter() {
+ return EMPTY_STRING_FORMATTER;
+ }
+}
diff --git a/src/main/java/org/apache/commons/text/TokenStringifier.java b/src/main/java/org/apache/commons/text/TokenStringifier.java
new file mode 100644
index 0000000000..424b277e6b
--- /dev/null
+++ b/src/main/java/org/apache/commons/text/TokenStringifier.java
@@ -0,0 +1,81 @@
+package org.apache.commons.text;
+
+/**
+ * Takes a collection of String tokens and combines them into a single String.
+ *
+ * This class functions as the inverse of {@link org.apache.commons.text.StringTokenizer}. All tokens are formatted
+ * by a {@link TokenFormatter} which allows fine grained control over the final output.
+ *
- * Tokens are appended to a string, with a delimiter between them. This method
- * validates that the delimiter character is not part of the token. If it is found within the
- * token an exception is thrown.
- * Input string is parsed one character at a time until a delimiter is reached.
- * When a delimiter character is reached a new token begins. The delimiter character is
- * considered reserved, and is omitted from the returned parsed tokens.
+ * Thread=safe.
+ *
* No other restrictions are placed on the contents of the input string.
@@ -133,9 +133,9 @@ public List
- * No other restrictions are placed on the contents of the tokens. Note: This Case does support
- * empty tokens.
- *
- * No other restrictions are placed on the contents of the input string.
- *