Class MarkedUpTextAssembler

java.lang.Object
org.openpdf.text.pdf.parser.MarkedUpTextAssembler
All Implemented Interfaces:
TextAssembler

public class MarkedUpTextAssembler extends Object implements TextAssembler
We'll get called on a variety of marked section content (perhaps including the results of nested sections), and will assemble it into an order as we can.
Author:
dgd
  • Method Details

    • process

      public void process(ParsedText unassembled, String contextName)
      Remember an unassembled chunk until we hit the end of this element, or we hit an assembled chunk, and need to pull things together.
      Specified by:
      process in interface TextAssembler
      Parameters:
      unassembled - chunk of text rendering instruction to contribute to final text
      contextName - Name of the element context we are in. Null value if it's an Artifact.
    • process

      public void process(FinalText completed, String contextName)
      Slot fully-assembled chunk into our result at the current location. If there are unassembled chunks waiting, assemble them first.
      Specified by:
      process in interface TextAssembler
      Parameters:
      completed - This is a chunk from a nested element
      contextName - Name of the element context we are in. Null value if it's an Artifact.
    • process

      public void process(Word completed, String contextName)
      Specified by:
      process in interface TextAssembler
      Parameters:
      completed - process a complete chunk -- just add this subsection into the proper place.
      contextName - Name of the element context we are in. Null value if it's an Artifact.
      See Also:
    • endParsingContext

      public FinalText endParsingContext(String containingElementName)
      Specified by:
      endParsingContext in interface TextAssembler
      Parameters:
      containingElementName - This is an element name to surround the extracted text
      Returns:
      the final text for the set of fragments and fully parsed items we were passed during processing.
      See Also:
    • reset

      public void reset()
      Specified by:
      reset in interface TextAssembler
      See Also:
    • renderText

      public void renderText(FinalText finalText)
      Specified by:
      renderText in interface TextAssembler
      Parameters:
      finalText - process a complete chunk -- just add this subsection into the proper place.
    • renderText

      public void renderText(ParsedTextImpl partialWord)
      Captures text using a simplified algorithm for inserting hard returns and spaces
      Specified by:
      renderText in interface TextAssembler
      Parameters:
      partialWord - process one of a number of raw pdf text chunks, with placement, font, etc.
      See Also:
    • getReader

      protected PdfReader getReader()
      Getter.
      Returns:
      reader
    • setPage

      public void setPage(int page)
      Specified by:
      setPage in interface TextAssembler
      Parameters:
      page - number of the page we are assembling
      See Also:
    • getWordId

      public String getWordId()
      assembler can calculate an identifier for each word on a page, for use in markup.
      Specified by:
      getWordId in interface TextAssembler
      Returns:
      the new unique id.
      See Also: