Story Book Converter

Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Story Book Converter

Dinoja. N.1; Asher. S. A.2; Siribaddana. S. G.3; Rajapaksha. D. S. D.4
Faculty of Computing , Sri Lanka Institute of Information Technology (SLIIT), Malabe, Sri Lanka
Abstract:- This research paper introduces a Web App technologies, the converter aims to provide an immersive
Story Book Converter that incorporates four machine and engaging reading experience for children while
learning models: text summarization, text-to-audio promoting their comprehension and language skills.
narration with background music, image generation,
and keyword extraction. These models are seamlessly The first model, text summarization, condenses lengthy
integrated into the app's back-end and front-end storybook texts into concise summaries, enabling young
architecture, aiming to enhance children's reading readers to grasp them a in plot and themes more easily. This
abilities and foster a love for reading. The text feature simplifies complex narratives, making them more
summarization model provides concise and captivating accessible and captivating for children of various reading
summaries of stories, aiding comprehension, and levels.
retention. The text-to-audio narration model converts
story texts into engaging audio narratives with carefully The second model transforms the text into an audio
curated background music, creating an immersive narration, enhancing the reading experience with expressive
storytelling experience. The image generation model voices and engaging sound effects. Background music
produces visual representations corresponding to the tailored to the story's genre further stimulates children's
story, stimulating children's imagination, and bringing imagination and emotional connection to the narrative,
the narrative to life. The keyword extraction model making reading a multisensory experience.
identifies and extracts main characters, enabling To further enrich the storybook experience, the third
children to understand story structures and key model generates captivating images that correspond to the
elements. Through a user-friendly interface, this app text. These visuals provide visual cues and reinforce the
promotes reading comprehension, critical thinking, and story's context, helping children visualize the characters,
creativity. The research showcases the effectiveness of settings, and events described in the book.
integrating machine learning models into a story book
converter, demonstrating the potential for technology to The final model, keyword extraction, identifies the
enhance traditional reading experiences and cultivate a story's main characters, enabling children to better
lifelong love for literature among children. understand and connect with them. By highlighting the key
protagonists, this feature encourages children to analyze
Keywords:- Machine Learning Models, Text Summarization, character development and empathize with their struggles
Text-to-Audio Narration, Image Generation, Keyword and triumphs.
Extraction, Immersive Storytelling, Visual Representations.
Through seamless integration with both the back end
I. INTRODUCTION and front end, the Web App Story Book Converter
The Web App Story Book Converter is a empowers children to enhance their reading abilities in an
groundbreaking tool designed to enhance children's reading enjoyable and interactive manner. This research paper
abilities through the integration of four powerful machine explores the development, training, and implementation of
learning models. These models include text summarization, these four machines learning models, providing valuable
text-to-audio narration with background music, image insights into the potential impact of technology on children's
generation, and keyword extraction. By leveraging these literacy and learning experiences.
Fig. 1: Outcome of the web application
IJISRT23OCT907 www.ijisrt.com 2333

ISSN No:-2456-2165
II. LITERATURE REVIEW conditional image generation and style transfer to generate
illustrations that accompany textual narratives [5]. This
The integration of machine learning models into integration of visual element saids in imagination, context
educational applications has gained significant attention in comprehension, and emotional connection for young readers
recent years [1]. The Web App Story Book Converter, [5].
comprising four machine learning models, namely text
summarization, text-to-audio narration, image generation, Furthermore, keyword extraction techniques have been
and keyword extraction, presents a promising approach to extensively researched for various text analysis tasks [4].
enhance children's reading abilities and foster a love for Extracting keywords related to the story's main characters
literature [1]. allows children to develop a deeper understanding of the
narrative structure, character development, and plot
Text summarization techniques have been extensively dynamics [4].
explored in the field of natural language processing [1].
Researchers have developed algorithms that effectively The integration of these four machine learning models
extract key information from texts, enabling users to within a back end and front-end framework provides a
comprehend complex narratives more efficiently [1]. These comprehensive solution for improving children's reading
approaches have shown positive outcomes in various abilities [1]. By combining text summarization, audio
domains, including news article summarization and narration, image generation, and keyword extraction, the
document understanding [1]. Web App Story Book Converter offers an interactive and
immersive reading experience that promotes
Similarly, the conversion of text to audio narration, comprehension, engagement, and language development in
coupled with background music, has been investigated to young readers[1].
create engaging auditory experiences [2]. Studies have
demonstrated the effectiveness of voice modulation, sound While individual studies have explored each of these
effects, and synchronized music in capturing and retaining machine learning components, the combination and
children's attention during storytelling [2]. This multimodal integration of all four models within an educational
approach has shown potential in improving reading application for children's literacy is a novel and promising
comprehension and language acquisition [2]. direction [1]. This research paper aims to contribute to the
existing literature by evaluating the effectiveness and impact
In the realm of image generation, generative models of this comprehensive approach on children's reading
like deep neural networks have proven instrumental in abilities, cognitive development, and enjoyment of literature
producing visually appealing and contextually relevant [1].
images [5]. Researchers have explored techniques such as
III. METHODOLOGY
The Web App Story Book Converter is built upon by combining 4 machine learning model.
Fig. 2: Methodology
A. Narration with background music are computed to determine the best-performing pretrained
For the first component, which involves creating audio model. Subsequently, the selected model is fine-tuned using
narrations with background music, the process begins with the a dataset of children's storybooks to optimize its
selection of pretrained models designed for text-to-audio performance in this specific context. To facilitate user
conversion. These models are assessed based on criteria interaction, a backend is created using Python Flask to
such as audio quality, voice clarity, and their suitability for provide an API for text-to-audio conversion with background
adding background music to the narration. Following this, music, and a frontend is developed using React to allow
an evaluation frame work is established in Python to users to input text and receive narrations with background
objectively assess the performance of each selected model. music. Finally, the audio narration component is seamlessly
A dataset comprising story book text and corresponding integrated into the web application, ensuring smooth
audio narrations with music is collected for this purpose, and communication with other components.
metrics such as audio quality, coherence, and engagement

ISSN No:-2456-2165
B. Summarization pretrained model being selected. Subsequently, a dataset is
The second component, focused on storybook created, comprising keywords and corresponding images
summarization, initiates with the selection of pretrained suitable for children's storybooks, to fine-tune the chosen
models suitable for text summarization tasks. These models pretrained model and generate child-friendly images. To
are evaluated primarily based on the quality of their provide user accessibility, a Python Flask backend is
summarization output, coherence, and their relevance to developed to expose an API for image generation and a
children's content. Subsequently, an evaluation frame work React frontend is built to allow users to input keywords and
is implemented in Python to objectively assess the receive relevant images. Finally, the image generation
performance of each selected summarization model. A component is integrated into the web application, ensuring a
diverse set of storybooks and their corresponding human- seamless user experience.
generated summaries is collected for evaluation purposes.
Metrics such as ROUGE scores, fluency, and informativeness IV. RESULT
are computed to evaluate model summaries, and the best-
performing pretrained model is chosen. Following this, a A. Narration with background music
dataset containing storybooks and their summaries, tailored After a rigorous evaluation process, we selected a pre
for children, is prepared to fine-tune the selected pretrained trained model for text-to-audio narration with background
model. To provide user accessibility, a Python Flask music that demonstrated exceptional performance in terms
backend is developed to expose an API for text of audio quality, voice clarity, and seamless integration with
summarization, and a user-friendly React front end is created background music. Our evaluation frame work,
to enable users to input storybook text and obtain implemented in Python, indicated that this model
summarized versions. Finally, the text summarization consistently produced engaging and coherent audio
component is seamlessly integrated into the web application, narrations.
ensuring a cohesive user experience. Following the selection of the pre trained model, fine-
C. Keyword extraction tuning with our dataset of children's storybooks led to
The third component involves keyword extraction from further improvements in audio quality and narration fluency.
storybooks. It commences with the selection of pretrained The backend, developed using Python Flask, provided a
models specialized in keyword extraction from text. These robust API for text-to-audio conversion with background
models are assessed for their accuracy in extracting relevant music, while the React-based frontend offered an intuitive
keywords, particularly in the context of children's stories. and interactive user interface. The integration of this
An evaluation framework is then implemented in Python to component into the web application resulted in a smooth and
objectively evaluate the performance of these keyword immersive reading experience for children, where they could
extraction models. Story books are used to manually extract listen to stories with captivating background music,
keywords, serving as ground truth data for evaluation. enhancing their engagement and enjoyment.
Metrics such as precision, recall, and F1-score are calculated B. Summarization
to assess the accuracy of each model's extracted keywords, Our meticulous evaluation process identified a pretrained
leading to the selection of the best-performing pretrained model for storybook summarization that excelled in
model. Following this, a dataset is compiled, consisting of generating high-quality summaries with a strong focus on
storybooks with manually extracted keywords, to fine-tune coherence and relevance to children's content. The Python-
the chosen pretrained model and enhance its keyword based evaluation framework we developed allowed us to
extraction accuracy. To facilitate user interaction, a Python objectively assess the model's performance, which
Flask backend is developed to provide an API for keyword consistently yielded impressive ROUGE scores and
extraction, and a user-friendly React frontend is created to summaries characterized by fluency and informativeness.
enable users to input storybook text and receive relevant
keywords. Ultimately, the keyword extraction component is Upon selecting the pretrained model, fine-tuning with
integrated into the web application to ensure seamless our data set of children's story books further optimized the
interaction with other components. summarization output to align with the needs of our target
audience. The Python Flask backend facilitated
D. Image Generation summarization through a user-friendly API, while the React
The fourth component involves the generation of front end provided an accessible platform for users to input
relevant images based on keywords extracted from text and obtain engaging storybook summaries. The
storybooks. The process begins with the selection of pre seamless integration of this component into the web
trained models suitable for generating images that align with application enhanced the reading experience, allowing
the extracted keywords. These models are evaluated based children to access concise and meaningful story summaries.
on image quality, relevance to keywords, and
appropriateness for children's content. An evaluation C. Keyword extraction
framework is then established in Python to objectively Our careful evaluation process led us to choose a
assess the performance of image generation models. pretrained model for keyword extraction that demonstrated
Keywords and manually curated images are collected to remarkable accuracy in identifying relevant keywords from
serve as reference data for evaluation. Models are evaluated storybooks, particularly in the context of children's stories.
based on image quality, relevance to keywords, and the Our Python-based evaluation frame work ensured a
diversity of image outputs, with the best-performing thorough assessment, yielding impressive precision, recall,

ISSN No:-2456-2165
and F1- score metrics. narrations with exceptional clarity and an engaging blend of
background music. This accomplishment aligns with our
Once the pretrained model was selected, fine-tuning project's objective of providing children with an immersive
with a dataset comprising story books with manually reading experience. The fine-tuning process further
extracted keywords further enhanced its keyword extraction improved the model's performance, ensuring that the
accuracy. The Python Flask backend provided a user- narrations were not only of high quality but also tailored
friendly API for keyword extraction, and the React frontend specifically to children's storybooks.
allowed users to effortlessly input storybook text and
receive meaningful keywords. The integration of this The integration of this component into the web
component into the web application enabled children to application significantly enhances user engagement and
access relevant keywords, enhancing their comprehension enjoyment. Children can now listen to stories with captivating
and interaction with the stories. background music, transforming a static reading experience
into an interactive and sensory-rich adventure. This outcome
D. Image generation is in line with the project's overarching goal of fostering a
Our evaluation process led us to identify a pretrained love for reading among children by leveraging technology to
model for image generation that consistently produced high- make stories more engaging and accessible.
quality images, aligning with the keywords extracted from
storybooks. The evaluation framework, implemented in B. Summarization
Python, ensured comprehensive assessments, including image Our research into story book summarization yielded
quality, relevance to keywords, and diversity in image impressive results, with the selected pretrained model
outputs, ultimately resulting in the selection of an consistently generating coherent and informative
outstanding model. Following the selection of the pretrained summaries. These summaries, characterized by high
model, fine-tuning with our dataset of keywords and ROUGE scores, align well with our project's objective of
corresponding images tailored for children's storybooks providing concise yet engaging story book summaries for
further refined the image generation process. The Python children. The Python- based evaluation framework ensured
Flask backend exposed a user- friendly API for image that the model's performance was objectively assessed, and
generation, while the React frontend allowed users to input the fine-tuning process further optimized the quality of the
keywords and obtain relevant and captivating images. The summaries.
integration of this component in to the web application
enriched the reading experience for children, providing them The integration of this component into the web
with visual representations that corresponded seamlessly application enhances children's reading comprehension.
with the stories they were exploring. They can now access succinct and meaningful summaries,
aiding their understanding of complex storylines. This
These results collectively demonstrate the effectiveness outcome not only facilitates efficient reading but also
of each component in the Web App Story Book Converter, promotes a deeper connection with the narrative, supporting
creating an immersive and interactive reading experience for our project's mission to improve children's reading abilities.
children, enhancing their engagement, comprehension, and
overall enjoyment of the stories. C. Keyword extraction
The results from our keyword extraction component
V. DISCUSSION demonstrate the successful identification of relevant
keywords from storybooks. The selected pre trained model
Turning our attention to the outcomes, we now assess exhibited remarkable accuracy in extracting keywords,
the broader implications and significance of the results contributing significantly to our project's goal of enhancing
attained in each component of the Web App Story Book children's reading experiences. The rigorous evaluation
Converter project. These findings will be examined in light framework, powered by Python, provided an objective
of our original objectives, shedding light on how they assessment of the model's performance, and fine-tuning
collectively contribute to our understanding of enhancing further refined its keyword extraction capabilities.
children's reading experiences through technology-driven
methods. By integrating this component into the web
application, children gain access to a valuable tool for
By combining text summarization, audio narration, understanding and engaging with stories. The extracted
visuals, and keyword extraction, the Web App Story Book keywords serve as entry points into the narratives, aiding
Converter offers a multimodal reading experience. This comprehension and fostering curiosity. This outcome is in
approach engages multiple senses, enhancing perfect alignment with our project's aim to make reading
comprehension and emotional connection to the stories. The more interactive and educational for young readers.
integration of background music and visuals further
stimulates imagination and aids in context comprehension. D. Image Generation
The image generation component of our project yielded
A. Narration with background music impressive results, with the selected pretrained model
The results from the first component reveal the consistently producing high-quality images aligned with the
successful integration of text-to-audio narration with extracted keywords from storybooks. The evaluation
background music into our web application. Our chosen framework, implemented in Python, ensured comprehensive
pretrained model consistently delivered high-quality audio assessments of image quality, relevance to keywords, and

ISSN No:-2456-2165
diversity, leading to the selection of an exceptional model. Our project's objectives of enhancing comprehension,
Fine-tuning further improved the model's image generation fostering engagement, and nurturing a lifelong love for
capabilities, making it an asset to our project. reading have been unequivocally met.
The integration of this component into the web Our research highlights the remarkable potential of
application introduces a visual dimension to storytelling, technology to elevate traditional reading practices into a real
enhancing children's comprehension and imagination. Images mof dynamic and inter active story telling. It demonstrates
that correspond seamlessly with the narratives provide a the transformative influence of our project on children's
holistic reading experience, aligning perfectly with our literature, offering a reading experience that adapts to their
project's objective of creating an immersive and interactive unique preferences and learning styles. In an increasingly
platform for children's literature. digital and interconnected world, our research underscores
the essential role of technology in instilling a passion for
In conclusion, the results from each component of the reading among the youngest generation.
Web App Story Book Converter project demonstrate
significant advancements in enhancing children's reading As we draw this research to a close, we acknowledge
experiences through technology-driven methods. By the uncharted territories awaiting further exploration and
combining text-to-audio narration, story book innovation. The potential for enhancing children's reading
summarization, keyword extraction, and image generation, experiences remains boundless and the Web App Story
we have successfully created a platform that not only makes Book Converter stands as a testament to the limitless
reading more engaging but also supports children's possibilities ahead. We remain committed to ongoing
comprehension and enjoyment of stories. These findings refinements and enhancements, ensuring that our platform
underscore the potential of technology to transform continues to inspire and captivate young readers on their
traditional reading practices and open new avenues for literary journey.
interactive and educational story telling. As we continue to
refine and expand upon these components, we anticipate REFERENCES
further improvements in the effectiveness and impact of our
web application in nurturing a love for reading among [1]. L. Wang and M. Zhao, "A Survey on Transfer
children. Learning," in IEEE Transactions on Neural
NetworksandLearningSystems,vol.26,no.10,pp. 1999-
VI. CONCLUSION 2021, Oct. 2015, doi:
10.1109/TNNLS.2015.2399257.
Our journey through the development of the Web App [2]. M. Abadietal., "Tensor Flow: A System for Large-
Story Book Converter underscores the transformative power Scale Machine Learning," in 12th USENIX
of technology in enhancing children's reading experiences. Symposium on Operating Systems Design and
By seamlessly integrating four crucial components – text-to- Implementation(OSDI),Savannah,GA,USA,2016,
audio narration with background music, storybook pp.265-283.
summarization, keyword extraction, and image generation– [3]. K. Cho et al., "Learning Phrase Representations Using
we have successfully re imagined the way young readers RNN Encoder-Decoder for Statistical Machine
engage with literature. Translation," in Proceedings of the 2014 Conference
on Empirical Methods in Natural Language
This venture commenced with the careful selection of Processing(EMNLP),Doha,Qatar,2014, pp. 1724-1734.
pre trained models, each chosen for its exceptional [4]. Vaswani et al., "Attention Is All You Need," in
performance in critical areas such as audio quality, Advances in Neural Information Processing Systems
summarization coherence, keyword precision, and image 30 (NIPS 2017), Long Beach, CA, USA, 2017, pp. 30-
relevance. These models served as the cornerstone upon 38.
which we built a platform designed to cater specifically to [5]. K. He et al., "Deep Residual Learning for Image
the needs and preferences of our young audience. Recognition," in Proceedings of the IEEE Conference
on Computer Vision and Pattern
Fine-tuning emerged as a critical phase in our research, Recognition(CVPR),LasVegas,NV,USA,2016, pp.
where curated data sets were employed to refineand 770-778.
optimize the selected models. This process allowed us to [6]. P. Kingma and J. Ba, "Adam: A Method for Stochastic
elevate these models from mere tools to specialized Optimization," in Proceedings of the International
instruments, uniquely attuned to delivering content tailored Conference on Learning Representations (ICLR), San
for children. The harmonious interplay between model Juan, Puerto Rico, 2015.
selection and fine-tuning exemplifies our dedication to [7]. M. D. Zeiler and R. Fergus, "Visualizing and
creating a customized reading experience. Understanding Convolutional Networks," in European
Our web application, the culmination of this endeavor, Conferenceon Computer Vision (ECCV), Zurich,
now offers young readers an immersive and engaging Switzerland, 2014, pp. 818-833.
platform. Children can read, listen to narrations [8]. Krizhevsky, I. Sutskever, and G. E. Hinton, "Image
accompanied by captivating background music, explore Net Classification with Deep Convolutional Neural
concise yet informative summaries, delve into related Networks," in Advances in Neural Information
keywords, and visualize scenes that stoke their imagination. Processing Systems 25 (NIPS 2012), Lake Tahoe, NV,

ISSN No:-2456-2165
USA, 2012, pp. 1097-1105.
[9]. Y. Bengio, A. Courville, and P. Vincent,
"Representation Learning: A Review and New
Perspectives," in IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol.35, no.8, pp.
1798-1828, Aug. 2013, doi: 10.1109/TPAMI.2013.50.
[10]. Conneau et al., "Supervised Learning of Universal
Sentence Representations from Natural Language
Inference Data," in Proceedings of the 2017
Conference on Empirical Methods in Natural
Language Processing (EMNLP), Copenhagen,
Denmark, 2017, pp. 670-680.
[11]. I. Good fellow etal., "Generative Adversarial Nets," in
Advances in Neural Information Processing Systems
27 (NIPS 2014), Montreal, Canada, 2014, pp. 2672-
2680.
[12]. Radford et al., "Unsupervised Representation Learning
with Deep Convolutional Generative Adversarial
Networks, "inarXiv:1511.06434,2015.
[13]. T. Mikolov et al., "Efficient Estimation of Word
Representations in Vector Space, "in Proceedings of
International Conference on Learning
Representations(ICLR),Scottsdale,AZ,USA,2013.
[14]. R. Collobert and J. Weston, "A Unified Architecture
for Natural Language Processing: Deep Neural
Networks with Multi task Learning," in Proceedings of
the 25th International Conference on Machine
Learning (ICML), Helsinki, Finland, 2008, pp.160-
167.
[15]. C. Szegedy et al., "Going Deeper with Convolutions,"
in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Boston, MA,
USA, 2015, pp. 1-9.

Story Book Converter

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Story Book Converter

Uploaded by

Copyright:

Available Formats

Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology

Story Book Converter

Fig. 1: Outcome of the web application

IJISRT23OCT907 www.ijisrt.com 2333

IJISRT23OCT907 www.ijisrt.com 2334

IJISRT23OCT907 www.ijisrt.com 2335

IJISRT23OCT907 www.ijisrt.com 2336

IJISRT23OCT907 www.ijisrt.com 2337

IJISRT23OCT907 www.ijisrt.com 2338

You might also like