Class WordData

java.lang.Object
morfologik.stemming.WordData
All Implemented Interfaces:
Cloneable

public final class WordData extends Object implements Cloneable
Stem and tag data associated with a given word. Instances of this class are reused and mutable (values returned from getStem(), getWord() and other related methods change on subsequent calls to DictionaryLookup class that returned a given instance of WordData. If you need a copy of the stem or tag data for a given word, you have to create a custom buffer yourself and copy the associated data, perform clone() or create strings (they are immutable) using getStem() and then CharSequence.toString(). For reasons above it makes no sense to use instances of this class in associative containers or lists. In fact, both equals(Object) and hashCode() are overridden and throw exceptions to prevent accidental damage.
  • Field Details

    • COLLECTIONS_ERROR_MESSAGE

      private static final String COLLECTIONS_ERROR_MESSAGE
      Error information if somebody puts us in a Java collection.
      See Also:
    • decoder

      private final CharsetDecoder decoder
      Character encoding in internal buffers.
    • wordCharSequence

      private CharSequence wordCharSequence
      Inflected word form data.
    • stemCharSequence

      private CharBuffer stemCharSequence
      Character sequence after converting stemBuffer using decoder.
    • tagCharSequence

      private CharBuffer tagCharSequence
      Character sequence after converting tagBuffer using decoder.
    • wordBuffer

      ByteBuffer wordBuffer
      Byte buffer holding the inflected word form data.
    • stemBuffer

      ByteBuffer stemBuffer
      Byte buffer holding stem data.
    • tagBuffer

      ByteBuffer tagBuffer
      Byte buffer holding tag data.
  • Constructor Details

    • WordData

      WordData(CharsetDecoder decoder)
      Package scope constructor.
    • WordData

      WordData(String stem, String tag, String encoding)
      A constructor for tests only.
  • Method Details

    • getStemBytes

      public ByteBuffer getStemBytes(ByteBuffer target)
      Copy the stem's binary data (no charset decoding) to a custom byte buffer. The buffer is cleared prior to copying and flipped for reading upon returning from this method. If the buffer is null or not large enough to hold the result, a new buffer is allocated.
      Parameters:
      target - Target byte buffer to copy the stem buffer to or null if a new buffer should be allocated.
      Returns:
      Returns target or the new reallocated buffer.
    • getTagBytes

      public ByteBuffer getTagBytes(ByteBuffer target)
      Copy the tag's binary data (no charset decoding) to a custom byte buffer. The buffer is cleared prior to copying and flipped for reading upon returning from this method. If the buffer is null or not large enough to hold the result, a new buffer is allocated.
      Parameters:
      target - Target byte buffer to copy the tag buffer to or null if a new buffer should be allocated.
      Returns:
      Returns target or the new reallocated buffer.
    • getWordBytes

      public ByteBuffer getWordBytes(ByteBuffer target)
      Copy the inflected word's binary data (no charset decoding) to a custom byte buffer. The buffer is cleared prior to copying and flipped for reading upon returning from this method. If the buffer is null or not large enough to hold the result, a new buffer is allocated.
      Parameters:
      target - Target byte buffer to copy the word buffer to or null if a new buffer should be allocated.
      Returns:
      Returns target or the new reallocated buffer.
    • getTag

      public CharSequence getTag()
      Returns:
      Return tag data decoded to a character sequence or null if no associated tag data exists.
    • getStem

      public CharSequence getStem()
      Returns:
      Return stem data decoded to a character sequence or null if no associated stem data exists.
    • getWord

      public CharSequence getWord()
      Returns:
      Return inflected word form data. Usually the parameter passed to DictionaryLookup.lookup(CharSequence).
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • clone

      public WordData clone()
      Declare a covariant of Object.clone() that returns a deep copy of this object. The content of all internal buffers is copied.
      Overrides:
      clone in class Object
    • cloneCharSequence

      private CharSequence cloneCharSequence(CharSequence chs)
      Clone char sequences only if not immutable.
    • update

      void update(ByteBuffer wordBuffer, CharSequence word)