ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents¶
Info
This competition concluded on April 3rd, 2023 23:59 UTC+1:00. We would like to thank all participants for their contributions to the competition! Check the results on our EvalAI challenge page.
Introduction¶
In this competition, we challenge you to advance the research in accurately segmenting the layout in a very broad range of document styles and domains.
Converting documents into a machine-processable format is an on-going challenge due to their huge variability in formats and complex structure. Recovering the layout structure and content from documents has remained a key problem since decades, and is as relevant as ever in 2023. To this date, a highly generalising model for structure and layout understanding has yet to be achieved.
To raise the bar over previous competitions, we propose our newly published, human-annotated DocLayNet data-set as the base for a new challenge on various documents from corporate, technical and law domains.
News¶
Apr. 26th, 2023 | The ground-truth of our competition dataset is now available here. |
Apr. 3rd, 2023 | The competition has concluded. Check the results here. |
Mar. 27th, 2023 | The competition deadline is extended until April 3rd, 2023 23:59 UTC+1:00 |
Mar. 20th, 2023 | The competition deadline is extended until March 26th, 2023 12:00 AM (midnight) |
Jan. 13th, 2023 | The competition-dataset is now released and submissions are opened on EvalAI. Register here. |
Dec. 19th, 2022 | This competition is now live and will run until March 20th, 2023. Find the detailed schedule here. |
Task and resources¶
We invite you to develop a model that can accurately segment the layout components in document pages as bounding boxes on our competition data-set. The layout prediction accuracy you achieve with your solution will be evaluated on this dataset with our human-annotated layout ground-truth. Code submissions are not required. Find the details here.
We highly recommend you to use our recently published DocLayNet dataset for training and internal validation. DocLayNet is highly diverse in layout coverage, and includes Financial reports, Patents, Manuals, Laws, Tenders and Technical Papers. It is human-annotated with 11 distinct layout class labels. Added value for model development is provided through the original PDF pages and a paired JSON representation of the text cells.
Participation¶
Everyone is welcome to participate in this competition. To ensure fairness, we require teams to abide by the participation rules.