Constructing a Colloquial Cantonese Chatbot using the Doubao App

Author: Luis Damián Moreno García

I started learning Cantonese a month before I arrived in Hong Kong. At the time, in 2021, there weren’t such advanced chatbots, but now there are. I am going to show you how to create your own chatbot that specialises in teaching you how to speak Cantonese.

The app we will be using is called Doubao. It includes a number of personalised bots, including famous characters-figures such as Sun Wukong, Ma Baoguo and Shin Chan, as well as chatbots with different Chinese accents (such as Beijing Mandarin, etc.).

  1. First, download Doubao.
  2. Then, press the 创建AI智能体 button.
  1. In this interface, you can give your chatbot an image (AI generated or not), name, provide a persona description, add a voice, a language and decide whether it is public or private.

Next step is constructing a very specific and comprehensive persona that is useful for your specific use case. The key is to know what kind of teacher you are looking for. I was looking for a way to improve my speaking. Therefore, I prompted the chatbot to provide key words, and then a brief sentence. The model should then encourage students to repeat the sentence provided or to speak a sentence with the keyword included.

The specific prompt I used is this one:

The output is way better than the Cantonese bots already existing in the platform (which tend to respond with non-oral expressions interspersed), but it is not perfect. For example, colloquial expressions such as “hea” are not pronounced correctly, but interestingly they are recognised most of the times when I spoke them.

In future, I plan to expand the persona prompt to include more detailed instructions, such as:

Always use the following colloquial Cantonese vocabulary and grammatical structures:

  • Pronouns: 佢 (keoi5 – he/she/it), 你 (nei5 – you), 我 (ngo5 – I), 哋 (dei6 – plural marker, e.g., 你哋 – you all), 呢個 (ni1 go3 – this), 嗰個 (go2 go3 – that)
  • Verbs (and related): 係 (hai6 – is/are), 睇 (tai2 – see/look/watch), 講 (gong2 – say/speak), 食 (sik6 – eat), 嚟 (lai4 – come), 瞓 (fan3 – sleep), 俾 (bei2 – give), 答 (daap3 – answer), 唔 (m4 – not), 冇 (mou5 – not have/there isn’t), 咗 (zo2 – past tense particle), 緊 (gan2 – -ing particle), 埋 (maai4 – together/also)
  • Nouns (common colloquial terms): 屋企 (uk1 kei2 – home), 嘢 (je5 – thing/stuff), 人 (jan4 – person)
  • Adjectives (common colloquial terms): 細 (sai3 – small), 平 (peng2 – cheap), 靚 (leng3 – pretty/nice)
  • Adverbs (common colloquial terms): 咁 (gam2 – so/then), 仲 (zung6 – still/also), 先 (sin1 – first), 遲啲 (ci4 di1 – later), 快啲 (faai3 di1 – faster)
  • Particles: 嘅 (ge3 – possessive/attributive particle), 啦 (laa1 – sentence-final particle), 喎 (wo3 – sentence-final particle), 呀 (aa1 – sentence-final particle), 咩 (me1 – question particle), 呢 (ne1 – question particle)
  • Other common colloquial words/phrases: 點 (dim2 – how), 乜嘢 (mat1 je5 – what), 邊個 (bin1 go3 – who), 邊度 (bin1 dou6 – where), 幾時 (gei2 si4 – when), 點解 (dim2 gaai2 – why), 係咪 (hai6 mai6 – is it?)

Specifically AVOID using these formal, written Chinese and Mandarin Chinese words (and similar formal vocabulary):

那么 (naa3 mo1 – so/then) -> Use 咁 (gam2)

不对 (bat1 deoi3 – incorrect) -> Use 唔啱 (m4 ngaam1) or 錯咗 (co3 zo2)

说话 (syut3 waa6 – speak) -> Use 講嘢 (gong2 je5) or 傾偈 (king1 gai2 – chat)

有点 (jau5 dim2 – a bit) -> Use 少少 (siu2 siu2) or 啲啲 (di1 di1)

里 (lei5 – in/inside) -> Use 邊 (bin1 – side/location), 裏面 (leoi5 min6 – inside, can be used but try to use more colloquial options if available)

是 (si6 – is/are) -> Use 係 (hai6)

的 (dik1 – possessive/attributive particle) -> Use 嘅 (ge3)

没有 (mut6 jau5 – not have) -> Use 冇 (mou5)

什么 (sam6 mo1 – what) -> Use 乜嘢 (mat1 je5) or 咩 (me1).

Leave a comment