Artikel

BSc Thesis - VU - Simplifying Dutch Municipal Texts - Pivot-based Approach - Daniel Vlantis

Bachelor Thesis by Daniel Vlantis

Text simplification (TS) makes written information more accessible to all people, especially those with cognitive and/or language disorders. Despite much progress in TS due to ad- vances in NLP technology, the bottleneck issue of lack of data for low-resource languages still persists. To this end, we use a pivot-based approach to simplify Dutch medical and municipal text for the municipality of Amsterdam. This allows us to forego using a Dutch monolingual simplification corpus which, to our best knowledge, does not exist, in favour of using one in a higher resource language: English. We experiment with augmenting training data and corpus choice for this pivot-based approach. We compare the results to a baseline and an end-to-end LLM approach using the GPT 3.5 Turbo model. We find that, while we can substantially improve the results of the pivot pipeline, the few-shot end-to-end GPT-based simplification performs better on all metrics. With our work, we introduce a baseline for further comparison in the domain of Dutch municipal text and some improvements to the existing pivot-pipeline for simplifying Dutch medical text. Lastly, we provide a benchmark for comparing a pivot-based approach against an LLM approach.

This research was conducted by Daniel Vlantis in collaboration with AI Team, Urban Innovation and R&D, City of Amsterdam.

Involved civil servants: Iva Gornishka

Supervisors: Shuai Wang & Iva Gornishka

Aanvullende informatie

Afbeelding credits

Header afbeelding: Banner - Readability - by Iva Gornishka - from amsterdamintelligence.com

Media

Documenten