org.daiitech.naftah.utils.script.ScriptUtils

public final class ScriptUtils extends Object

Utility class providing various methods for handling Arabic text processing, including text shaping, bidi reordering, transliteration, diacritics removal, padding for terminal display, and detection of Arabic characters.

This class is designed as a final utility class and cannot be instantiated.

It uses ICU4J ArabicShaping and Transliterator for shaping and transliteration.

Author:: Chakib Daii

Field Summary

Fields

Modifier and Type

Field

Description

static final String

ANSI_ESCAPE

ANSI escape sequence to clear the screen.

static final String

ARABIC_DIACRITICS_REGEX

Regular expression matching Arabic diacritic marks in Unicode.

static final String

ARABIC_LANGUAGE

Language code for Arabic.

private static final char[]

ARABIC_LETTERS

Arabic alphabet letters used for transliteration to Latin letters.

static final Locale

ARABIC_LOCALE

Locale instance representing Arabic language.

static String

CUSTOM_RULES

Custom transliteration rules defined as a multi-line string.

static final ResourceBundle

CUSTOM_RULES_BUNDLE

ResourceBundle loaded with custom transliteration rules for Arabic.

static final Set<String>

CUSTOM_RULES_KEYS

a key set of custom transliteration rules for Arabic.

static final String

DEFAULT_ARABIC_LANGUAGE_COUNTRY

Default country code used in Arabic locale.

private static final Set<String>

ICU_RESERVED_WORDS

A set of reserved words used by the ICU (International Components for Unicode) transliteration and normalization APIs.

private static final String

IDENTIFIER_SPLIT_REGEX

Regular expression used to split identifiers into components based on transitions between uppercase letters, digits, and lowercase letters.

static final String

LATIN_ARABIC_TRANSLITERATION_ID

ICU Transliterator ID for Latin-to-Arabic and Arabic-to-Latin transliteration.

private static final char[]

LATIN_LETTERS

Latin uppercase letters used as transliteration equivalents for Arabic letters.

static final String

LTR_DIRECTION

Escape code to set Left-To-Right (LTR) text direction in compatible terminals.

static ThreadLocal<com.ibm.icu.text.NumberFormat>

NUMBER_FORMAT

A reusable NumberFormat instance configured for the Arabic locale.

static final String

RTL_DIRECTION

Escape code to set Right-To-Left (RTL) text direction in compatible terminals.

private static Boolean

SHOULD_RESHAPE

Cached flag indicating whether Arabic text reshaping should be applied for the current environment.

private static Map<String,Matcher>

TEXT_MATCHER_CACHE

Cache of precompiled Matcher instances for text processing, keyed by the input text string.

static final Pattern

TEXT_MULTILINE_PATTERN

Pattern to detect lines in multiline text, capturing line content and newline characters.
Constructor Summary

Constructors

Modifier

Constructor

Description

private

ScriptUtils()

Private constructor to prevent instantiation.
Method Summary

Modifier and Type

Method

Description

private static String

addPadding(StringBuilder inputSb, int terminalWidth)

Adds padding spaces to the given StringBuilder input to align the text to the specified terminal width.

private static String

addPadding(String input, int padding)

Adds padding spaces to the left or right of the input to reach the specified padding length.

static String

applyBiFunction(String input, boolean print, ThrowingBiFunction<String,Boolean,String> function)

Applies a bi-function to each line in the input text.

static String

applyFunction(String input, ThrowingFunction<String,String> function)

Applies a function to each line in the input text.

private static List<String>

chunk(String input, int size)

Splits the given string into consecutive substrings of the specified size.

static boolean

containsArabicLetters(String text)

Checks if the given text contains any Arabic characters.

static String

convertArabicToLatinLetterByLetter(String text)

Converts an input string from Arabic characters and digits to their Latin and Ascii equivalents.

private static String

doPadText(String input, boolean print)

Pads the input text to align within the terminal width, adjusting for overflow.

private static String

doPadText(String input, int terminalWidth, boolean print)

Pads the input text to fit the specified terminal width, splitting it into multiple lines if necessary.

private static StringBuilder

doPadText(List<String> lines, String word, StringBuilder currentLine, int terminalWidth, boolean print)

Splits a list of words into lines that fit the terminal width, adding padding if needed.

private static String

doShape(String input)

Performs Arabic shaping and bidirectional reordering on a single input line.

static List<com.ibm.icu.impl.Pair<String,String>>

getRawHexBytes(char[] charArray)

Returns a list of pairs representing the Unicode code points (in hex) and characters from the given character array.

static List<com.ibm.icu.impl.Pair<String,String>>

getRawHexBytes(String text)

Converts the given String into a list of pairs, where each pair contains the Unicode hexadecimal representation of a character and the character itself.

private static Matcher

getTextMatcher(String input)

Retrieves a cached Matcher for the given input string using the TEXT_MULTILINE_PATTERN pattern.

static boolean

isArabicChar(int cp)

Checks if the given Unicode code point belongs to the Arabic Unicode script.

static boolean

isArabicCharCp(int cp)

Checks if the given Unicode code point is an Arabic character.

static boolean

isArabicIndicDigit(char ch)

Checks whether a character is an Arabic-Indic digit (٠ to ٩).

static boolean

isArabicText(String text)

Checks if the given text consists entirely of Arabic characters.

static boolean

isAsciiDigit(int ch)

Checks whether a character is a Ascii digit (0-9).

static boolean

isLatinLetter(char ch)

Checks whether a character is a Latin letter (A-Z or a-z).

static boolean

isMultiline(String input)

Checks if the given input string contains multiple lines.

static String

numberToString(Number number)

Converts a Number into a string using formatting rules, replacing the standard Ascii decimal separator with a comma (U+066C), and optionally converting Ascii digits (0–9) to Arabic-Indic digits (٠–٩).

static String

padText(String input, boolean print)

Pads the input text to align it within the terminal width.

static Map<String,String>

parseRules(String rules)

Parses a set of transformation rules from a string into a map.

static String

removeDiacritics(String text)

Removes Arabic diacritic marks from the given Arabic text.

static String

shape(String input)

Applies Arabic shaping and bidirectional reordering to the input text.

static boolean

shouldReshape()

Determines whether Arabic text reshaping should be applied for the current runtime environment.

static List<String>

splitIdentifier(String input)

Splits an identifier string into constituent parts based on various naming conventions.

static String

transliterateScript(com.ibm.icu.text.Transliterator transliterator, boolean removeDiacritics, String word)

Transliterates a single word using the given Transliterator.

static String[]

transliterateScript(String transliteratorID, boolean removeDiacritics, String customRules, String... text)

Transliterates the given text(s) from Latin script to Arabic or vice versa, using the specified ICU Transliterator ID and optional custom rules.

static String[]

transliterateScript(String transliteratorID, String... text)

Transliterates one or more strings using the specified transliterator ID.

static String[]

transliterateScript(String transliteratorID, String customRules, String... text)

Transliterates one or more strings using the specified transliterator ID and custom rules.

static String

transliterateScriptLetterByLetter(String transliteratorID, String textInput)

Transliterates the input text letter by letter using the specified transliterator ID.

static String[]

transliterateToArabicScript(boolean removeDiacritics, String... text)

Transliterates one or more strings to Arabic script.

static String[]

transliterateToArabicScript(boolean removeDiacritics, String customRules, String... text)

Transliterates one or more strings to Arabic script using provided custom rules.

static String[]

transliterateToArabicScript(String... text)

Transliterates one or more strings to Arabic script.

static String[]

transliterateToArabicScript(String customRules, String... text)

Transliterates one or more strings to Arabic script using the provided custom rules.

static String[]

transliterateToArabicScriptDefault(boolean removeDiacritics, String... text)

Transliterates one or more strings to Arabic script using default custom rules.

static String[]

transliterateToArabicScriptDefault(String... text)

Transliterates one or more strings to Arabic script using default custom rules.

static String

transliterateToArabicScriptLetterByLetter(String text)

Transliterates the given text to Arabic script letter by letter.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- RTL_DIRECTION
  
  public static final String RTL_DIRECTION
  
  Escape code to set Right-To-Left (RTL) text direction in compatible terminals.
  See Also:
  
  Constant Field Values
- LTR_DIRECTION
  
  public static final String LTR_DIRECTION
  
  Escape code to set Left-To-Right (LTR) text direction in compatible terminals.
  See Also:
  
  Constant Field Values
- ARABIC_DIACRITICS_REGEX
  
  public static final String ARABIC_DIACRITICS_REGEX
  
  Regular expression matching Arabic diacritic marks in Unicode.
  See Also:
  
  Constant Field Values
- ANSI_ESCAPE
  
  public static final String ANSI_ESCAPE
  
  ANSI escape sequence to clear the screen.
  See Also:
  
  Constant Field Values
- LATIN_ARABIC_TRANSLITERATION_ID
  
  public static final String LATIN_ARABIC_TRANSLITERATION_ID
  
  ICU Transliterator ID for Latin-to-Arabic and Arabic-to-Latin transliteration.
  See Also:
  
  Constant Field Values
- ARABIC_LANGUAGE
  
  public static final String ARABIC_LANGUAGE
  
  Language code for Arabic.
  See Also:
  
  Constant Field Values
- DEFAULT_ARABIC_LANGUAGE_COUNTRY
  
  public static final String DEFAULT_ARABIC_LANGUAGE_COUNTRY
  
  Default country code used in Arabic locale.
  See Also:
  
  Constant Field Values
- ARABIC_LOCALE
  
  public static final Locale ARABIC_LOCALE
  
  Locale instance representing Arabic language.
- CUSTOM_RULES_BUNDLE
  
  public static final ResourceBundle CUSTOM_RULES_BUNDLE
  
  ResourceBundle loaded with custom transliteration rules for Arabic.
- TEXT_MULTILINE_PATTERN
  
  public static final Pattern TEXT_MULTILINE_PATTERN
  
  Pattern to detect lines in multiline text, capturing line content and newline characters.
- CUSTOM_RULES_KEYS
  
  public static final Set<String> CUSTOM_RULES_KEYS
  
  a key set of custom transliteration rules for Arabic.
- IDENTIFIER_SPLIT_REGEX
  
  private static final String IDENTIFIER_SPLIT_REGEX
  Regular expression used to split identifiers into components based on transitions between uppercase letters, digits, and lowercase letters.
  For example:
  
  "JSONTo" → "JSON", "To"
  
  "userAccount" → "user", "Account"
  
  "IPv6" → "IPv", "6"
  
  "6Parser" → "6", "Parser"
  See Also:
  
  Constant Field Values
- ARABIC_LETTERS
  
  private static final char[] ARABIC_LETTERS
  Arabic alphabet letters used for transliteration to Latin letters.
  The characters are mapped positionally (index by index) to uppercase Latin letters. This list includes 26 Arabic letters starting from 'ا' to 'ه', and is intended to be used for character-by-character mapping to Ascii base encoding (e.g., base 11 to base 36 systems).
  Examples of mapping:
  
  'ا' → 'A'
  
  'ب' → 'B'
  
  'ت' → 'C'
  ...
  'ه' → 'Z'
- LATIN_LETTERS
  
  private static final char[] LATIN_LETTERS
  Latin uppercase letters used as transliteration equivalents for Arabic letters.
  Each letter corresponds to an Arabic letter by position in the ARABIC_LETTERS array. This mapping supports systems like base-36 encodings or custom symbolic notations using Arabic letters.
  Examples of mapping:
  
  'A' → 'ا'
  
  'B' → 'ب'
  
  'C' → 'ت'
  ...
  'Z' → 'ه'
- ICU_RESERVED_WORDS
  
  private static final Set<String> ICU_RESERVED_WORDS
  A set of reserved words used by the ICU (International Components for Unicode) transliteration and normalization APIs. These words have special meaning in ICU transliteration rules and Unicode transformations.
  Examples of usage contexts include:
  
  Transliteration rule syntax (e.g., "::NFD;" or "::Latin-ASCII;")
  
  Normalization forms (e.g., "NFC", "NFD", "NFKC", "NFKD")
  
  Unicode script and block identifiers (e.g., "Latin", "Greek", "Han")
  
  Keywords in rule definitions (e.g., "use", "import", "function")
  
  This set can be used to:
  
  Validate user-defined transliteration rules
  
  Highlight or flag reserved words in editors or tools
  
  Prevent conflicts in custom ICU rule definitions
  See Also:
  
  Transliterator
  
  Normalizer2
  
  ICU Transliteration Guide
- NUMBER_FORMAT
  
  public static volatile ThreadLocal<com.ibm.icu.text.NumberFormat> NUMBER_FORMAT
  
  A reusable NumberFormat instance configured for the Arabic locale.
  This formatter uses Arabic locale conventions for decimal and grouping separators, and may render numbers using Arabic-Indic digits (e.g., ٠١٢٣٤٥٦٧٨٩), depending on JVM settings and font support.
  Note: NumberFormat instances are not thread-safe. If this formatter is used across multiple threads, synchronize access or create a new instance via NumberFormat.getNumberInstance(ARABIC).
  See Also:
  
  Locale.forLanguageTag(String)
  
  NumberFormat.getNumberInstance(Locale)
- CUSTOM_RULES
  
  public static String CUSTOM_RULES
  
  Custom transliteration rules defined as a multi-line string. Each rule maps Latin script sequences to their corresponding Arabic script sequences. For example, "com > كوم" transliterates "com" to Arabic "كوم".
- TEXT_MATCHER_CACHE
  
  private static Map<String,Matcher> TEXT_MATCHER_CACHE
  
  Cache of precompiled Matcher instances for text processing, keyed by the input text string. Used to improve performance by avoiding repeated compilation of patterns.
- SHOULD_RESHAPE
  
  private static Boolean SHOULD_RESHAPE
  
  Cached flag indicating whether Arabic text reshaping should be applied for the current environment.
Constructor Details
- ScriptUtils
  
  private ScriptUtils()
  
  Private constructor to prevent instantiation. Always throws a NaftahBugError when called.
Method Details
- parseRules
  
  public static Map<String,String> parseRules(String rules)
  Parses a set of transformation rules from a string into a map.
  The input string should contain one rule per line in the format:
  source > target;
  
  Each line:
  
  Is stripped of leading/trailing whitespace
  
  Ignores empty lines
  
  Removes trailing semicolons
  
  Splits on the first occurrence of the '>' character
  
  Example input:
  a > b; c > d;
  
  Will result in a map:
  { "a" -> "b", "c" -> "d" }
  Parameters:
  
  rules - A string containing one or more transformation rules separated by newlines
  
  Returns:
  
  A map of source-to-target transformations
- isMultiline
  
  public static boolean isMultiline(String input)
  
  Checks if the given input string contains multiple lines.
  
  Parameters:
  
  input - the input string to check
  
  Returns:
  
  true if the input contains one or more newline characters; false otherwise
- getTextMatcher
  
  private static Matcher getTextMatcher(String input)
  
  Retrieves a cached Matcher for the given input string using the TEXT_MULTILINE_PATTERN pattern. If a matcher for the input already exists in the cache, it is reset and returned; otherwise, a new matcher is created, cached, reset, and returned.
  This caching mechanism improves performance by reusing matcher instances for repeated input strings.
  
  Parameters:
  
  input - the input string to create or retrieve a matcher for
  
  Returns:
  
  a reset Matcher instance ready for matching against the input
- applyBiFunction
  
  public static String applyBiFunction(String input, boolean print, ThrowingBiFunction<String,Boolean,String> function)
  
  Applies a bi-function to each line in the input text.
  If the input is multiline, applies the function to each line individually, preserving line separators. Otherwise, applies the function once to the whole input.
  
  Parameters:
  
  input - the input text (possibly multiline)
  
  print - if true, the result is printed to the console; if false, the result is returned
  
  function - a bi-function taking a line and the print flag, returning the processed line
  
  Returns:
  
  the processed text if print is false; otherwise, null
- applyFunction
  
  public static String applyFunction(String input, ThrowingFunction<String,String> function)
  
  Applies a function to each line in the input text.
  If the input is multiline, applies the function to each line individually, preserving line separators. Otherwise, applies the function once to the whole input.
  
  Parameters:
  
  input - the input text (possibly multiline)
  
  function - a function taking a line and returning the processed line
  
  Returns:
  
  the processed text with all lines processed by the function
- shape
  
  public static String shape(String input)
  
  Applies Arabic shaping and bidirectional reordering to the input text.
  
  Parameters:
  
  input - the input Arabic text
  
  Returns:
  
  the shaped and reordered text suitable for visual rendering in terminals
- doShape
  
  private static String doShape(String input) throws com.ibm.icu.text.ArabicShapingException
  
  Performs Arabic shaping and bidirectional reordering on a single input line.
  
  Parameters:
  
  input - the input Arabic text
  
  Returns:
  
  the shaped and reordered text
  
  Throws:
  
  com.ibm.icu.text.ArabicShapingException - if an error occurs during shaping
- padText
  
  public static String padText(String input, boolean print)
  
  Pads the input text to align it within the terminal width.
  If print is true, prints the padded text; otherwise, returns it.
  
  Parameters:
  
  input - the input text to pad
  
  print - if true, print the padded text; else return it
  
  Returns:
  
  the padded text if print is false; otherwise null
- doPadText
  
  private static String doPadText(String input, boolean print)
  
  Pads the input text to align within the terminal width, adjusting for overflow.
  
  Parameters:
  
  input - the input text to pad
  
  print - if true, prints the padded lines; else returns them as a single string
  
  Returns:
  
  the padded text if print is false; otherwise null
- doPadText
  
  private static StringBuilder doPadText(List<String> lines, String word, StringBuilder currentLine, int terminalWidth, boolean print)
  
  Splits a list of words into lines that fit the terminal width, adding padding if needed. If printing is enabled, lines are printed directly to the console; otherwise, they are collected in a list.
  
  Parameters:
  
  lines - the list to store padded lines (ignored if printing)
  
  word - the current word to add
  
  currentLine - the StringBuilder holding the current line
  
  terminalWidth - the width of the terminal for padding
  
  print - whether to print lines immediately or store in list
  
  Returns:
  
  a new StringBuilder starting with the current word for the next line
- doPadText
  
  private static String doPadText(String input, int terminalWidth, boolean print)
  
  Pads the input text to fit the specified terminal width, splitting it into multiple lines if necessary. Lines are either printed directly or returned as a joined string depending on the print flag.
  
  Parameters:
  
  input - the input text to pad
  
  terminalWidth - the width of the terminal
  
  print - if true, prints padded lines; otherwise returns them as a single string
  
  Returns:
  
  the padded text as a string if print is false; otherwise null
- addPadding
  
  private static String addPadding(StringBuilder inputSb, int terminalWidth)
  
  Adds padding spaces to the given StringBuilder input to align the text to the specified terminal width. The padding is calculated as the difference between the terminal width and the current length of the input.
  If any exception occurs during padding calculation, the original input string is returned without modification.
  
  Parameters:
  
  inputSb - the StringBuilder containing the text to pad
  
  terminalWidth - the total width of the terminal to align the text to
  
  Returns:
  
  a String with added padding spaces to align the text, or the original text if padding cannot be applied
- addPadding
  
  private static String addPadding(String input, int padding)
  
  Adds padding spaces to the left or right of the input to reach the specified padding length.
  Padding is appended on the right if the input contains Arabic characters; otherwise on the left.
  
  Parameters:
  
  input - the input text
  
  padding - the number of spaces to add
  
  Returns:
  
  the padded string
- chunk
  
  private static List<String> chunk(String input, int size)
  Splits the given string into consecutive substrings of the specified size.
  Each chunk is created using String.substring(int, int). The last chunk may be shorter than size if the input string's length is not evenly divisible by the chunk size.
  
  Examples:
  
  chunk("abcdef", 2) → ["ab", "cd", "ef"]
  
  chunk("abcde", 2) → ["ab", "cd", "e"]
  Parameters:
  
  input - the string to split; must not be null
  
  size - the size of each chunk; must be greater than zero
  
  Returns:
  
  a list containing the resulting substrings, in order
  
  Throws:
  
  IllegalArgumentException - if size is less than 1
- removeDiacritics
  
  public static String removeDiacritics(String text)
  
  Removes Arabic diacritic marks from the given Arabic text.
  
  Parameters:
  
  text - the Arabic text possibly containing diacritics
  
  Returns:
  
  the Arabic text with diacritics removed
- transliterateScript
  
  public static String[] transliterateScript(String transliteratorID, boolean removeDiacritics, String customRules, String... text)
  
  Transliterates the given text(s) from Latin script to Arabic or vice versa, using the specified ICU Transliterator ID and optional custom rules.
  
  Parameters:
  
  transliteratorID - the ICU Transliterator ID to use
  
  removeDiacritics - whether to remove diacritics after transliteration
  
  customRules - optional custom transliteration rules; may be null
  
  text - one or more strings to transliterate
  
  Returns:
  
  an array of transliterated strings in the same order
- transliterateScript
  
  public static String transliterateScript(com.ibm.icu.text.Transliterator transliterator, boolean removeDiacritics, String word)
  
  Transliterates a single word using the given Transliterator.
  
  Parameters:
  
  transliterator - the ICU Transliterator instance to use
  
  removeDiacritics - whether to remove diacritics after transliteration
  
  word - the input word to transliterate
  
  Returns:
  
  the transliterated word
- transliterateScriptLetterByLetter
  
  public static String transliterateScriptLetterByLetter(String transliteratorID, String textInput)
  
  Transliterates the input text letter by letter using the specified transliterator ID.
  
  Parameters:
  
  transliteratorID - the ICU Transliterator ID to use
  
  textInput - the input text to transliterate
  
  Returns:
  
  the transliterated text
- transliterateScript
  
  public static String[] transliterateScript(String transliteratorID, String... text)
  
  Transliterates one or more strings using the specified transliterator ID. Diacritics are not removed.
  
  Parameters:
  
  transliteratorID - the ICU Transliterator ID
  
  text - the input strings
  
  Returns:
  
  transliterated strings
- transliterateScript
  
  public static String[] transliterateScript(String transliteratorID, String customRules, String... text)
  
  Transliterates one or more strings using the specified transliterator ID and custom rules. Diacritics are not removed.
  
  Parameters:
  
  transliteratorID - the ICU Transliterator ID
  
  customRules - custom transliteration rules
  
  text - the input strings
  
  Returns:
  
  transliterated strings
- transliterateToArabicScript
  
  public static String[] transliterateToArabicScript(boolean removeDiacritics, String... text)
  
  Transliterates one or more strings to Arabic script. Diacritics are removed by default.
  
  Parameters:
  
  removeDiacritics - whether to remove diacritics after transliteration
  
  text - the input strings
  
  Returns:
  
  transliterated Arabic script strings
- transliterateToArabicScriptDefault
  
  public static String[] transliterateToArabicScriptDefault(boolean removeDiacritics, String... text)
  
  Transliterates one or more strings to Arabic script using default custom rules. Diacritics are removed by default.
  
  Parameters:
  
  removeDiacritics - whether to remove diacritics after transliteration
  
  text - the input strings
  
  Returns:
  
  transliterated Arabic script strings
- transliterateToArabicScript
  
  public static String[] transliterateToArabicScript(boolean removeDiacritics, String customRules, String... text)
  
  Transliterates one or more strings to Arabic script using provided custom rules. Diacritics are removed by default.
  
  Parameters:
  
  removeDiacritics - whether to remove diacritics after transliteration
  
  customRules - custom transliteration rules
  
  text - the input strings
  
  Returns:
  
  transliterated Arabic script strings
- transliterateToArabicScript
  
  public static String[] transliterateToArabicScript(String... text)
  
  Transliterates one or more strings to Arabic script. Diacritics are removed by default.
  
  Parameters:
  
  text - the input strings
  
  Returns:
  
  transliterated Arabic script strings
- transliterateToArabicScriptDefault
  
  public static String[] transliterateToArabicScriptDefault(String... text)
  
  Transliterates one or more strings to Arabic script using default custom rules. Diacritics are removed by default.
  
  Parameters:
  
  text - the input strings=
  
  Returns:
  
  transliterated Arabic script strings
- transliterateToArabicScript
  
  public static String[] transliterateToArabicScript(String customRules, String... text)
  
  Transliterates one or more strings to Arabic script using the provided custom rules. Diacritics are removed by default.
  
  Parameters:
  
  customRules - custom transliteration rules
  
  text - the input strings
  
  Returns:
  
  transliterated Arabic script strings
- transliterateToArabicScriptLetterByLetter
  
  public static String transliterateToArabicScriptLetterByLetter(String text)
  
  Transliterates the given text to Arabic script letter by letter.
  
  Parameters:
  
  text - the input text
  
  Returns:
  
  the transliterated Arabic script text
- shouldReshape
  
  public static boolean shouldReshape()
  
  Determines whether Arabic text reshaping should be applied for the current runtime environment.
  Arabic reshaping is required on platforms or terminal environments that do not perform proper contextual shaping and bidirectional rendering (such as Windows consoles, WSL environments, or real xterm-based terminals on Unix systems).
  
  The result is computed once and cached in SHOULD_RESHAPE to avoid repeated OS and terminal capability checks.
  
  Returns:
  
  true if Arabic reshaping should be applied (Windows, WSL, or Unix running inside a real xterm); false otherwise
- containsArabicLetters
  
  public static boolean containsArabicLetters(String text)
  
  Checks if the given text contains any Arabic characters.
  This method returns true if at least one character in the string is identified as an Arabic character according to isArabicChar(int).
  
  Parameters:
  
  text - the text to check; may be null (treated as empty)
  
  Returns:
  
  true if the text contains one or more Arabic characters, false otherwise
- isArabicText
  
  public static boolean isArabicText(String text)
  
  Checks if the given text consists entirely of Arabic characters.
  This method returns true only if every character in the string is an Arabic character according to isArabicChar(int).
  
  Parameters:
  
  text - the text to check; may be null (treated as empty)
  
  Returns:
  
  true if all characters in the text are Arabic, false otherwise
- isArabicCharCp
  
  public static boolean isArabicCharCp(int cp)
  
  Checks if the given Unicode code point is an Arabic character.
  
  Parameters:
  
  cp - the Unicode code point
  
  Returns:
  
  true if the code point is in Arabic Unicode blocks, false otherwise
- isArabicChar
  
  public static boolean isArabicChar(int cp)
  
  Checks if the given Unicode code point belongs to the Arabic Unicode script.
  
  Parameters:
  
  cp - the Unicode code point
  
  Returns:
  
  true if the code point belongs to the Arabic Unicode script, false otherwise
- getRawHexBytes
  
  public static List<com.ibm.icu.impl.Pair<String,String>> getRawHexBytes(char[] charArray)
  
  Returns a list of pairs representing the Unicode code points (in hex) and characters from the given character array.
  
  Parameters:
  
  charArray - the array of characters to analyze
  
  Returns:
  
  list of pairs with Unicode code point hex strings and character strings
- getRawHexBytes
  
  public static List<com.ibm.icu.impl.Pair<String,String>> getRawHexBytes(String text)
  
  Converts the given String into a list of pairs, where each pair contains the Unicode hexadecimal representation of a character and the character itself.
  
  Parameters:
  
  text - the input string to process
  
  Returns:
  
  a list of pairs of the form ("U+XXXX", "char"), representing each character's Unicode code point and character
- splitIdentifier
  
  public static List<String> splitIdentifier(String input)
  
  Splits an identifier string into constituent parts based on various naming conventions. It handles underscores, dashes, whitespace, camelCase, PascalCase, acronyms, and digits.
  Example: - "userAccount" → ["user", "Account"] - "IPv6Address" → ["IPv", "6", "Address"] - "snake_case-name" → ["snake", "case", "name"]
  
  Parameters:
  
  input - the identifier string to split
  
  Returns:
  
  a list of strings representing the split components of the identifier
- convertArabicToLatinLetterByLetter
  
  public static String convertArabicToLatinLetterByLetter(String text)
  Converts an input string from Arabic characters and digits to their Latin and Ascii equivalents.
  This method supports:
  
  Arabic letters mapped one-to-one to Latin uppercase letters (A-Z).
  
  Arabic-Indic digits (٠-٩) mapped to Ascii digits (0-9).
  
  Latin letters (A-Z, a-z) and Ascii digits (0-9) passed through unchanged.
  
  Any unsupported character will cause a NaftahBugError to be thrown.
  Parameters:
  
  text - the input string containing Arabic characters and/or digits
  
  Returns:
  
  the Latin-equivalent string after transliteration
  
  Throws:
  
  NaftahBugError - if the input contains unsupported characters
- isLatinLetter
  
  public static boolean isLatinLetter(char ch)
  
  Checks whether a character is a Latin letter (A-Z or a-z).
  
  Parameters:
  
  ch - the character to check
  
  Returns:
  
  true if the character is a Latin letter; false otherwise
- isAsciiDigit
  
  public static boolean isAsciiDigit(int ch)
  
  Checks whether a character is a Ascii digit (0-9).
  
  Parameters:
  
  ch - the character to check
  
  Returns:
  
  true if the character is an Ascii digit; false otherwise
- isArabicIndicDigit
  
  public static boolean isArabicIndicDigit(char ch)
  
  Checks whether a character is an Arabic-Indic digit (٠ to ٩).
  
  Parameters:
  
  ch - the character to check
  
  Returns:
  
  true if the character is an Arabic digit; false otherwise
- numberToString
  
  public static String numberToString(Number number)
  
  Converts a Number into a string using formatting rules, replacing the standard Ascii decimal separator with a comma (U+066C), and optionally converting Ascii digits (0–9) to Arabic-Indic digits (٠–٩).
  If the system property naftah.number.arabicIndic.active is set to true, this method will convert each Ascii digit to its Arabic-Indic equivalent. Otherwise, digits remain unchanged.
  This method does not use locale-aware formatting; it operates directly on the string representation of the number returned by Object.toString().
  Parameters:
  
  number - the number to convert; must not be null
  
  Returns:
  
  a string representing the number with a decimal separator, and optionally Arabic-Indic digits
  
  Throws:
  
  NullPointerException - if number is null
  
  See Also:
  
  isAsciiDigit(int)

Class ScriptUtils

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

RTL_DIRECTION

LTR_DIRECTION

ARABIC_DIACRITICS_REGEX

ANSI_ESCAPE

LATIN_ARABIC_TRANSLITERATION_ID

ARABIC_LANGUAGE

DEFAULT_ARABIC_LANGUAGE_COUNTRY

ARABIC_LOCALE

CUSTOM_RULES_BUNDLE

TEXT_MULTILINE_PATTERN

CUSTOM_RULES_KEYS

IDENTIFIER_SPLIT_REGEX

ARABIC_LETTERS

LATIN_LETTERS

ICU_RESERVED_WORDS

NUMBER_FORMAT

CUSTOM_RULES

TEXT_MATCHER_CACHE

SHOULD_RESHAPE

Constructor Details

ScriptUtils

Method Details

parseRules

isMultiline

getTextMatcher

applyBiFunction

applyFunction

shape

doShape

padText

doPadText

doPadText

doPadText

addPadding

addPadding

chunk

removeDiacritics

transliterateScript

transliterateScript

transliterateScriptLetterByLetter

transliterateScript

transliterateScript

transliterateToArabicScript

transliterateToArabicScriptDefault

transliterateToArabicScript

transliterateToArabicScript

transliterateToArabicScriptDefault

transliterateToArabicScript

transliterateToArabicScriptLetterByLetter

shouldReshape

containsArabicLetters

isArabicText

isArabicCharCp

isArabicChar

getRawHexBytes

getRawHexBytes

splitIdentifier

convertArabicToLatinLetterByLetter

isLatinLetter

isAsciiDigit

isArabicIndicDigit

numberToString