Class FSA

java.lang.Object
morfologik.fsa.FSA
All Implemented Interfaces:
Iterable<ByteBuffer>
Direct Known Subclasses:
CFSA, CFSA2, ConstantArcSizeFSA, FSA5

public abstract class FSA extends Object implements Iterable<ByteBuffer>
This is a top abstract class for handling finite state automata. These automata are arc-based, a design described in Jan Daciuk's Incremental Construction of Finite-State Automata and Transducers, and Their Use in the Natural Language Processing (PhD thesis, Technical University of Gdansk).
  • Constructor Details

    • FSA

      public FSA()
  • Method Details

    • getRootNode

      public abstract int getRootNode()
      Returns:
      Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
    • getFirstArc

      public abstract int getFirstArc(int node)
      Parameters:
      node - Identifier of the node.
      Returns:
      Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.
    • getNextArc

      public abstract int getNextArc(int arc)
      Parameters:
      arc - The arc's identifier.
      Returns:
      Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.
    • getArc

      public abstract int getArc(int node, byte label)
      Parameters:
      node - Identifier of the node.
      label - The arc's label.
      Returns:
      Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.
    • getArcLabel

      public abstract byte getArcLabel(int arc)
      Parameters:
      arc - The arc's identifier.
      Returns:
      Return the label associated with a given arc.
    • isArcFinal

      public abstract boolean isArcFinal(int arc)
      Parameters:
      arc - The arc's identifier.
      Returns:
      Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
    • isArcTerminal

      public abstract boolean isArcTerminal(int arc)
      Parameters:
      arc - The arc's identifier.
      Returns:
      Returns true if this arc does not have a terminating node (@link getEndNode(int) will throw an exception). Implies isArcFinal(int).
    • getEndNode

      public abstract int getEndNode(int arc)
      Parameters:
      arc - The arc's identifier.
      Returns:
      Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
    • getFlags

      public abstract Set<FSAFlags> getFlags()
      Returns:
      Returns a set of flags for this FSA instance.
    • getArcCount

      public int getArcCount(int node)
      Parameters:
      node - Identifier of the node.
      Returns:
      Calculates and returns the number of arcs of a given node.
    • getRightLanguageCount

      public int getRightLanguageCount(int node)
      Parameters:
      node - Identifier of the node.
      Returns:
      Returns the number of sequences reachable from the given state if the automaton was compiled with FSAFlags.NUMBERS. The size of the right language of the state, in other words.
      Throws:
      UnsupportedOperationException - If the automaton was not compiled with FSAFlags.NUMBERS. The value can then be computed by manual count of getSequences(int).
    • getSequences

      public Iterable<ByteBuffer> getSequences(int node)
      Returns an iterator over all binary sequences starting at the given FSA state (node) and ending in final nodes. This corresponds to a set of suffixes of a given prefix from all sequences stored in the automaton.

      The returned iterator is a ByteBuffer whose contents changes on each call to Iterator.next(). The keep the contents between calls to Iterator.next(), one must copy the buffer to some other location.

      Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.

      Parameters:
      node - Identifier of the starting node from which to return subsequences.
      Returns:
      An iterable over all sequences encoded starting at the given node.
    • getSequences

      public final Iterable<ByteBuffer> getSequences()
      An alias of calling iterator() directly (FSA is also Iterable).
      Returns:
      Returns all sequences encoded in the automaton.
    • iterator

      public final Iterator<ByteBuffer> iterator()
      Returns an iterator over all binary sequences starting from the initial FSA state (node) and ending in final nodes. The returned iterator is a ByteBuffer whose contents changes on each call to Iterator.next(). The keep the contents between calls to Iterator.next(), one must copy the buffer to some other location.

      Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.

      Specified by:
      iterator in interface Iterable<ByteBuffer>
    • visitAllStates

      public <T extends StateVisitor> T visitAllStates(T v)
      Visit all states. The order of visiting is undefined. This method may be faster than traversing the automaton in post or preorder since it can scan states linearly. Returning false from StateVisitor.accept(int) immediately terminates the traversal.
      Type Parameters:
      T - A subclass of StateVisitor.
      Parameters:
      v - Visitor to receive traversal calls.
      Returns:
      Returns the argument (for access to anonymous class fields).
    • visitInPostOrder

      public <T extends StateVisitor> T visitInPostOrder(T v)
      Same as visitInPostOrder(StateVisitor, int), starting from root automaton node.
      Type Parameters:
      T - A subclass of StateVisitor.
      Parameters:
      v - Visitor to receive traversal calls.
      Returns:
      Returns the argument (for access to anonymous class fields).
    • visitInPostOrder

      public <T extends StateVisitor> T visitInPostOrder(T v, int node)
      Visits all states reachable from node in postorder. Returning false from StateVisitor.accept(int) immediately terminates the traversal.
      Type Parameters:
      T - A subclass of StateVisitor.
      Parameters:
      v - Visitor to receive traversal calls.
      node - Identifier of the node.
      Returns:
      Returns the argument (for access to anonymous class fields).
    • visitInPostOrder

      private boolean visitInPostOrder(StateVisitor v, int node, BitSet visited)
      Private recursion.
    • visitInPreOrder

      public <T extends StateVisitor> T visitInPreOrder(T v)
      Same as visitInPreOrder(StateVisitor, int), starting from root automaton node.
      Type Parameters:
      T - A subclass of StateVisitor.
      Parameters:
      v - Visitor to receive traversal calls.
      Returns:
      Returns the argument (for access to anonymous class fields).
    • visitInPreOrder

      public <T extends StateVisitor> T visitInPreOrder(T v, int node)
      Visits all states in preorder. Returning false from StateVisitor.accept(int) skips traversal of all sub-states of a given state.
      Type Parameters:
      T - A subclass of StateVisitor.
      Parameters:
      v - Visitor to receive traversal calls.
      node - Identifier of the node.
      Returns:
      Returns the argument (for access to anonymous class fields).
    • readRemaining

      protected static final byte[] readRemaining(InputStream in) throws IOException
      Parameters:
      in - The input stream.
      Returns:
      Reads all remaining bytes from an input stream and returns them as a byte array.
      Throws:
      IOException - Rethrown if an I/O exception occurs.
    • visitInPreOrder

      private void visitInPreOrder(StateVisitor v, int node, BitSet visited)
      Private recursion.
    • read

      public static FSA read(InputStream stream) throws IOException
      A factory for reading automata in any of the supported versions.
      Parameters:
      stream - The input stream to read automaton data from. The stream is not closed.
      Returns:
      Returns an instantiated automaton. Never null.
      Throws:
      IOException - If the input stream does not represent an automaton or is otherwise invalid.
    • read

      public static <T extends FSA> T read(InputStream stream, Class<? extends T> clazz) throws IOException
      A factory for reading a specific FSA subclass, including proper casting.
      Type Parameters:
      T - A subclass of FSA to cast the read automaton to.
      Parameters:
      stream - The input stream to read automaton data from. The stream is not closed.
      clazz - A subclass of FSA to cast the read automaton to.
      Returns:
      Returns an instantiated automaton. Never null.
      Throws:
      IOException - If the input stream does not represent an automaton, is otherwise invalid or the class of the automaton read from the input stream is not assignable to clazz.