pdfHandler module

class pdfHandler.CharData(character, bb0, bb1, bb2, bb3)

Bases: object

Hold properties of characters, which are extracted from the pdf data.

class pdfHandler.PageData

Bases: object

Hold the data of each page including sentences and characters.

addChar(c)

Add character data to the list.

Parameters

c – Added character.

Returns

None

addSentence(s)

Add sentence data to the list.

Parameters

s – Added sentence.

Returns

None

class pdfHandler.PdfHandler(pdfPath)

Bases: object

Handle whole pdf data.

generateHighlightedPdf()

Generate highlighted pdf with respect to each color and annotating text of it.

Returns

None

getSentence()

Gets all sentences of the pdf data.

Returns

Whole sentences.

makeSentence()

Make sentences from extracted characters.

Returns

None

textExtracWithCoord()

Extract each character from pdf data. The character and its coordinates are extracted.

Returns

None

class pdfHandler.SentenceData(sentence, rectList, pageNum)

Bases: object

setAnnotation(annotText)

Set annotate text for pdf. The text will be annotated in the pdf data.

Parameters

annotText – Annotate text.

Returns

None

setColor(color)

Set the annotation color for the sentence.

Parameters

color – The color

Returns

None

setRectList(offset)

Define original coordinate for pdf.

Parameters

offset – The offsets from original coordinate.

Returns

None