Building the Booklet App - Technical Challenges

An insider look into the technical hurdles and solutions faced during the development of Booklet App.

In this article, I will discuss the product and technical challenges encountered during the development . The focus will be on challenges unique to this product, rather than general issues that arise in any development project.

For those unfamiliar with Booklet, it is a free web and mobile app I independently developed to help users practice languages while reading books. Users can upload their own ebooks or choose from a curated library and begin reading, complemented by personalized exercises tailored to their learning needs. If you want to learn more, check the and sign up or read everything about the product in Booklet Product Requirements.

Latency required to create exercises

At one point, Booklet required 1-2 seconds to generate exercises for each paragraph, with this latency shared across all users. As a result, individual user experiences were frequently disrupted by the processing demands of others.

Following optimizations, latency was reduced to a single loading of 0.2 seconds per chapter (groups of ca. 10 paragraphs) per user, plus an additional 0.2 seconds shared across all users.

Quantifying the improvement: the app now requires a single 0.2-second load every 10 chapter (ie. every 5-10 minutes) per user, compared to roughly 20 loads of 1-2 seconds each per paragraph (10 loads per chapter, averaging 1.5*20=30 seconds per chapter or 5-10 minutes). This equates to a reduction from 30 seconds to 0.2 seconds, or a 150x improvement in latency.

Problem

Preparing a book with personalized exercises involves several steps:

Processing the ebook and transforming it into a format suitable for further use.
Performing logical analysis on each word or sentence.
Identifying words for the user to learn at specific points.
Reformatting the data for the client to recreate the booklet's user experience.

These processes can be computationally intensive, and processes such as logical analysis and word identification for the user's learning can be performed in advance to create personalised exercises based on the user's current knowledge of the language.

Solution

Use a ‘smaller' Large Language Model () for logical analysis and parallelize word analysis.
Let do the heavy lifting so the server has less work to handle, as better explained in the following points.
Analyse entire chapters at once. Concentrate the computation at the beginning of the chapter in a single SQL query. This way, once the chapter is processed, no further computation is required, avoiding performance slowdowns for every users.
Offload the creation of personalised exercises to SQL so that no post-processing is required from the server.
Retrieve only the necessary elements from the database and format them for the client, avoiding unnecessary loops to reformat each word.
Delegate as much post-processing as possible to the client to avoid sharing latency between users.

There are many other ways to further reduce latency, which will be explored if traffic on the app increases and more traffic is needed. A few examples are:

Schedule cron jobs to process books during off-peak hours, minimizing impact on user experience.
Utilize high-performance databases to enhance data retrieval speeds.
Leverage database replicas or parallel processing strategies to distribute the load and accelerate computation.

The latency to analyse every word of a chapter and create personalized exercises for the 10 paragraphs of the chapter is about 0.1-0.3 seconds.

Support for multiple language

Users can learn French, Spanish, Italian, German, and maybe other in the future.

What does it involve? Many, many, many development challenges.

Problem

For each language combination (e.g. the user sets English as the system language and wants to learn German), a dictionary of at least 20,000 words is used to provide the user with a good variety of words to learn. With 5 supported languages (the four mentioned plus English), this results in 20 possible language pairs, assuming translations are bidirectional. This means 400-800k translations across the supported languages.
Each of the more than 400k+ translations requires an example to help users practise. For example, if a user is learning Italian, for the word house→casa, the app provides an example such as "The house has 3 floors" → "La casa ha 3 piani". This adds significant complexity to the content creation process.
Each translation requires an example to help users with exercises, adding significant complexity to the content creation process.
A language level () is assigned to around 5-10k words per language pair to create personalized exercises according to the user’s selected proficiency level.
The application needs to fully support different system languages. Currently the user can only select English as the system language, but all languages must be supported.
Numerous additional considerations are needed for each supported language, such as formatting, articles before words, context, and cultural relevance.

Solution

Abstract the concepts of words, translations, and user knowledge to handle any combination of languages efficiently.
Treat translations as bidirectional (e.g., English → French is equivalent to French → English), eliminating redundancy and potential user confusion. For instance, if "father" → "Vater" and "dad" → "Vater," only one translation should appear in the exercises to avoid inconsistencies.
Automation, automation, automation. Generate dictionaries for any language, assign language levels, create example sentences, and handle system language localization automatically to streamline the process and reduce manual work.
Limit language-specific rules as much as possible to reduce the complexity.

Users can specify the language they want to learn and their language competence during app onboarding or in the settings.

Processing user-uploaded books

Booklet allows users to upload ebooks, which are then processed and divided into chapters and sub-chapters containing around 10 paragraphs of 50-300 characters each.

This process is more complex than it might initially sound.

Problem

Ebooks, in epub or any other format, often suffer from inconsistent and poor formatting, making them difficult to post-process and adapt to the required structure mentioned above. Specifically:

Chapters are frequently poorly marked. Often times, they are not marked at all.
Many books contain ‘trash’ at the beginning and end, such as hidden HTML pages, blank pages, dedications, acknowledgments, etc. This 'garbage' needs to be removed for a proper user experience.
Chapters can include a wide variety of formats and structures, such as quotes, poetry, and unconventional indentations.

Solution

Chapter identification often relies on a "guessing strategy," which carries a high risk of errors. However, it has been observed that it’s better to make an incorrect guess than fail to identify anything at all.
A navigation system is in place to allow users to view the entire book. This way, even when chapters are poorly formatted, users can still navigate to any part of the book, including the first chapter or any specific section.
Text with an unconventional format is reformatted in the same way to harmonize the user experience. This can lead to sub-optimal results, such as compressing a poem into a paragraph.

Book post-processing is an open chapter with a high margin for improvement.

Personalized language learning

Each exercise in the app is customized to the user’s proficiency level and their knowledge of individual words.

To select the next words for the user to learn, Booklet leverages an algorithm the principles of human memory by promoting spaced repetition.

Problem

The language level of every word encountered in the exercises, both within the book and externally, must be tracked.
Generating exercises based on the user’s current knowledge requires real-time processing. Very few steps can be precomputed before a user session.

Solution

Develop an indexed, easily searchable database to efficiently track word knowledge and user progress.
Implement optimized queries to dynamically generate exercises based on the user’s knowledge in real time.

Conclusion

It was as challenging as it was rewarding to come across and solve these and many other challenges during the development of . What most matters is to identify which are road blockers and which are not, to understand whether they are indeed blockers to be solved or can be removed completely, and to find shortcuts to solve them. And, with a lot of patience, implement these shortcuts.