Skip to content

PDFTextExtractionContext

Namespace: O2S.Components.PDF4NET.Content

Defines the context for extracting text from PDF files.

public class PDFTextExtractionContext

Inheritance ObjectPDFTextExtractionContext

Constructors

PDFTextExtractionContext()

Initializes a new PDFTextExtractionContext object.

public PDFTextExtractionContext()

PDFTextExtractionContext(PDFDisplayRectangle)

Initializes a new PDFTextExtractionContext object.

public PDFTextExtractionContext(PDFDisplayRectangle visualExtractionBounds)

Parameters

visualExtractionBounds PDFDisplayRectangle
Bounds for text extraction.


Properties

EnableExtendedInformation

Gets or sets a value indicating whether extended text information should be loaded for text.

public bool EnableExtendedInformation { get; set; }

Property Value

Boolean
If true then extended information is loaded.

Remarks

This flag is used only by PDFContentExtractor.ExtractTextRuns(), PDFContentExtractor.ExtractTextRuns(PDFContentExtractionContext), PDFContentExtractor.ExtractTextRuns() and PDFContentExtractor.ExtractTextRuns(PDFContentExtractionContext) methods.
By default this property is true which allows to analyze text fragment positions in order to group extracted text into lines. If it is set to false then only the text is loaded and no other properties (suc as positions, font info, colors, etc).


IncludePartialMatches

Gets or sets a value indicating whether characters that fit partially the extraction bounds should be included in the extracted text.

public bool IncludePartialMatches { get; set; }

Property Value

Boolean
If true then the characters that fit partially inside the extraction bounds are included in the extracted text.


UseActualTextIfAvailable

Gets or sets a flag indicating whether the text extraction process should use the text included in the /ActualText entry applied to current showText operator.

public bool UseActualTextIfAvailable { get; set; }

Property Value

Boolean
True if the text extraction process should ignore the glyph values and font encoding and use the text included in the /ActualText entry applied to current showText operator.


VisualExtractionBounds

Gets or sets the bounds (in visual coordinates) for text extraction.

public PDFDisplayRectangle VisualExtractionBounds { get; set; }

Property Value

PDFDisplayRectangle
A rectangle in visual coordinates that specifies the area on the page from which the text should be extracted.