BookNet: Revolutionizing Dual-Page Image Rectification for Bound Documents (2026)

BookNet Achieves Dual-Page Image Rectification, Modelling Complex Binding Distortions Effectively

The challenge of rectifying distorted book images, caused by the curvature of left and right pages due to binding constraints, has been tackled by Shaokai Liu from Hefei University of Technology and his colleagues. They introduce BookNet, a novel deep learning framework specifically designed for dual-page book image rectification. BookNet's unique strength lies in its ability to capture the geometric relationships between adjacent pages using a dual-branch architecture with cross-page attention, allowing it to model the influence of each page on the other. Additionally, the researchers developed Book3D, a large synthetic dataset, and Book100, a real-world benchmark, to enhance the performance of their model. These datasets, created through Blender and arXiv academic papers, respectively, address the lack of resources for book-specific rectification research, providing a standardized benchmark for comparison.

BookNet predicts three complementary flow fields: left flow, right flow, and full flow, which effectively capture both page-specific deformations and their holistic interactions. This multi-flow approach is a significant improvement over conventional single-flow methods, which struggle with asymmetric distortions. By addressing the unique challenges posed by bound documents, this research opens up new possibilities for digitizing cultural heritage materials, enhancing knowledge management systems, and improving multimodal understanding of book content.

The Book3D dataset generation pipeline, utilizing Blender for 3D book modeling and parameterized deformation controls, simulates realistic geometric distortions. Synthetic book images, rendered from diverse arXiv academic papers, vary in illumination conditions and viewing angles, enhancing dataset realism. This pipeline generates paired synthetic book images with corresponding ground truth arXiv paper images, providing labeled data for supervised learning. Book100, a benchmark comprising 100 real-world book images, was then constructed to evaluate BookNet's performance against state-of-the-art methods.

BookNet's dual-branch architecture with cross-page attention mechanisms allows information exchange between the two branches, refining estimated warping flows and improving rectification accuracy. Extensive experiments demonstrate BookNet's superior performance in book image rectification, achieving superior results across multiple metrics. The network was trained for 65 epochs with a batch size of 4 per GPU on 4 NVIDIA RTX 3090 GPUs, employing the AdamW optimizer with a maximum learning rate of 1 × 10−4 and weight decay of 1 × 10−5. Input images were resized to (288, 288), and HSV color jittering was applied to enhance robustness to varying illumination conditions.

Ablation studies confirmed the importance of each architectural component, revealing that joint supervision of all three warping flows, left page, right page, and full spread, yielded the best results. This approach achieved a 14.0% reduction in Local Distortion and a 33.3% reduction in Edit Distance compared to page-only supervision. Qualitative comparisons demonstrate BookNet's ability to maintain geometric consistency across the entire book spread, particularly in the challenging gutter region, while existing methods often produced misalignments or residual curvature. BookNet, comprising 30.1 million parameters, achieves 24.39 FPS on a single NVIDIA RTX 3090 GPU, demonstrating efficient inference speed for practical applications.

The authors acknowledge that their method currently focuses on dual-page book rectification and does not extend to more complex book structures or severely damaged pages. Future research could explore extending BookNet to handle multi-page volumes and developing techniques to address more significant distortions or missing content. Nevertheless, this work establishes a strong baseline for the field and offers valuable resources, datasets, and code, to facilitate further advancements in document image processing and multimodal understanding.

For more information, visit the ArXiv paper: https://arxiv.org/abs/2601.21938

BookNet: Revolutionizing Dual-Page Image Rectification for Bound Documents (2026)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Kieth Sipes

Last Updated:

Views: 5677

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.