A comparative study of information extraction strategies using an attention-based neural network - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

A comparative study of information extraction strategies using an attention-based neural network

Résumé

This article focuses on information extraction in historical handwritten marriage records. Traditional approaches rely on a sequential pipeline of two consecutive tasks: handwriting recognition is applied before named entity recognition. More recently, joint approaches that handle both tasks at the same time have been investigated, yielding state-of-the-art results. However, as these approaches have been used in different experimental conditions, they have not been fairly compared yet. In this work, we conduct a comparative study of sequential and joint approaches based on the same attention-based architecture, in order to quantify the gain that can be attributed to the joint learning strategy. We also investigate three new joint learning configurations based on multi-task or multi-scale learning. Our study shows that relying on a joint learning strategy can lead to an 8% increase of the complete recognition score. We also highlight the interest of multi-task learning and demonstrate the benefit of attention-based networks for information extraction. Our work achieves state-of-the-art performance in the ICDAR 2017 Information Extraction competition on the Esposalles database at line-level, without any language modelling or post-processing.
Fichier principal
Vignette du fichier
DAS2022_comparative_study_IE.pdf (1.41 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03677908 , version 1 (25-05-2022)

Identifiants

  • HAL Id : hal-03677908 , version 1

Citer

Solène Tarride, Aurélie Lemaitre, Bertrand B. Coüasnon, Sophie Tardivel. A comparative study of information extraction strategies using an attention-based neural network. 15th IAPR International Workshop on Document Analysis Systems, May 2022, La Rochelle, France. ⟨hal-03677908⟩
46 Consultations
188 Téléchargements

Partager

Gmail Facebook X LinkedIn More