Class StringTools

java.lang.Object
org.languagetool.tools.StringTools

public final class StringTools extends Object
Tools for working with strings.
  • Field Details

    • XML_COMMENT_PATTERN

      private static final Pattern XML_COMMENT_PATTERN
    • XML_PATTERN

      private static final Pattern XML_PATTERN
    • UPPERCASE_GREEK_LETTERS

      public static final Set<String> UPPERCASE_GREEK_LETTERS
    • LOWERCASE_GREEK_LETTERS

      public static final Set<String> LOWERCASE_GREEK_LETTERS
  • Constructor Details

    • StringTools

      private StringTools()
  • Method Details

    • assureSet

      public static void assureSet(String s, String varName)
      Throw exception if the given string is null or empty or only whitespace.
    • readStream

      public static String readStream(InputStream stream, String encoding) throws IOException
      Read the text stream using the given encoding.
      Parameters:
      stream - InputStream the stream to be read
      encoding - the stream's character encoding, e.g. utf-8, or null to use the system encoding
      Returns:
      a string with the stream's content, lines separated by \n (note that \n will be added to the last line even if it is not in the stream)
      Throws:
      IOException
      Since:
      2.3
    • isAllUppercase

      public static boolean isAllUppercase(String str)
      Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).
    • isMixedCase

      public static boolean isMixedCase(String str)
      Returns true if the given string is mixed case, like MixedCase or mixedCase (but not Mixedcase).
      Parameters:
      str - input str
    • isNotAllLowercase

      public static boolean isNotAllLowercase(String str)
      Returns true if str is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).
      Since:
      2.5
    • isCapitalizedWord

      public static boolean isCapitalizedWord(String str)
      Parameters:
      str - input string
      Returns:
      true if word starts with an uppercase letter and all other letters are lowercase
    • startsWithUppercase

      public static boolean startsWithUppercase(String str)
      Whether the first character of str is an uppercase character.
    • uppercaseFirstChar

      @Nullable public static @Nullable String uppercaseFirstChar(String str)
      Return str modified so that its first character is now an uppercase character. If str starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
    • uppercaseFirstChar

      @Nullable public static @Nullable String uppercaseFirstChar(String str, Language language)
      Like uppercaseFirstChar(String), but handles a special case for Dutch (IJ in e.g. "ijsselmeer" -> "IJsselmeer").
      Parameters:
      language - the language, will be ignored if it's null
      Since:
      2.7
    • lowercaseFirstChar

      @Nullable public static @Nullable String lowercaseFirstChar(String str)
      Return str modified so that its first character is now an lowercase character. If str starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
    • changeFirstCharCase

      @Nullable private static @Nullable String changeFirstCharCase(String str, boolean toUpperCase)
      Return str modified so that its first character is now an lowercase or uppercase character, depending on toUpperCase. If str starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
    • readerToString

      public static String readerToString(Reader reader) throws IOException
      Throws:
      IOException
    • streamToString

      public static String streamToString(InputStream is, String charsetName) throws IOException
      Throws:
      IOException
    • escapeXML

      public static String escapeXML(String s)
    • escapeForXmlAttribute

      public static String escapeForXmlAttribute(String s)
      Since:
      2.9
    • escapeForXmlContent

      public static String escapeForXmlContent(String s)
      Since:
      2.9
    • escapeHTML

      public static String escapeHTML(String s)
      Escapes these characters: less than, greater than, quote, ampersand.
    • trimWhitespace

      public static String trimWhitespace(String s)
      Filters any whitespace characters. Useful for trimming the contents of token elements that cannot possibly contain any spaces, with the exception for a single space in a word (for example, if the language supports numbers formatted with spaces as single tokens, as Catalan in LanguageTool).
      Parameters:
      s - String to be filtered.
      Returns:
      Filtered s.
    • trimSpecialCharacters

      public static String trimSpecialCharacters(String s)
      eliminate special (unicode) characters, e.g. soft hyphens
      Parameters:
      s - String to filter
      Returns:
      s, with non-(alphanumeric, punctuation, space) characters deleted
      Since:
      4.3
    • addSpace

      public static String addSpace(String word, Language language)
      Adds spaces before words that are not punctuation.
      Parameters:
      word - Word to add the preceding space.
      language - Language of the word (to check typography conventions). Currently French convention of not adding spaces only before '.' and ',' is implemented; other languages assume that before ,.;:!? no spaces should be added.
      Returns:
      String containing a space or an empty string.
    • isWhitespace

      public static boolean isWhitespace(String str)
      Checks if a string contains a whitespace, including:
      • all Unicode whitespace
      • the non-breaking space (U+00A0)
      • the narrow non-breaking space (U+202F)
      • the zero width space (U+200B), used in Khmer
      Parameters:
      str - String to check
      Returns:
      true if the string is a whitespace character
    • isNonBreakingWhitespace

      public static boolean isNonBreakingWhitespace(String str)
      Checks if a string is the non-breaking whitespace ( ).
      Since:
      2.1
    • isPositiveNumber

      public static boolean isPositiveNumber(char ch)
      Parameters:
      ch - Character to check
      Returns:
      True if the character is a positive number (decimal digit from 1 to 9).
    • isEmpty

      public static boolean isEmpty(String str)
      Helper method to replace calls to "".equals().
      Parameters:
      str - String to check
      Returns:
      true if string is empty or null
    • filterXML

      public static String filterXML(String str)
      Simple XML filtering for XML tags.
      Parameters:
      str - XML string to be filtered.
      Returns:
      Filtered string without XML tags.
    • asString

      @Nullable public static @Nullable String asString(CharSequence s)
    • isParagraphEnd

      public static boolean isParagraphEnd(String sentence, boolean singleLineBreaksMarksPara)
      Since:
      4.3
    • loadLines

      public static List<String> loadLines(String path)
      Loads file, ignoring comments (lines starting with #).
      Parameters:
      path - path in resource dir
      Since:
      4.6