Skip to Main content Skip to Navigation
Conference papers

Daniel at the FinSBD-2 task : Extracting Lists and Sentences from PDF Documents: a model-driven end-to-end approach to PDF document analysis

Abstract : In this paper, we present the method we have designed and implemented for identifying lists and sentences in PDF documents while participating to FinSBD-2 Financial Document Analysis Shared Task. We propose a model-driven approach for the French and English datasets. It relies on a top-down process from the PDF itself in order to keep control of the workflow. Our objective is to use PDF structure extraction to improve text segment boundaries detection in an end-to-end fashion.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03097523
Contributor : Giguet Emmanuel <>
Submitted on : Tuesday, January 5, 2021 - 1:35:37 PM
Last modification on : Wednesday, January 20, 2021 - 3:38:00 AM
Long-term archiving on: : Wednesday, April 7, 2021 - 9:25:58 AM

File

2020.finnlp-1.11.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-03097523, version 1

Citation

Emmanuel Giguet, Gaël Lejeune. Daniel at the FinSBD-2 task : Extracting Lists and Sentences from PDF Documents: a model-driven end-to-end approach to PDF document analysis. Second Workshop on Financial Technology and Natural Language Processing in conjunction with IJCAI-PRICAI 2020, Jan 2021, Kyoto, Japan. pp.67-74. ⟨hal-03097523⟩

Share

Metrics

Record views

16

Files downloads

20