Details about
text read-aloud

Support for voice output

Suzuki-kun now provides voice output. We have prepared several example sentences. Try listening with accent phrase boundary prediction set to 'Machine Learning' and then to 'Bunsetsu'. On the page below, the pitch pattern along with the accent markings for advanced learners are shown. (This will not change even if you change to beginner mode). Moras that should be devoiced are shown with a filled background.

About Accent Phrase Prediction (Machine Learning/'Bunsetsu')

As explained at the OJAD workshop a word string with an accent control (string starting with LH, continuing on at H, falling with HL, then continuing at L) is referred to as an accent phrase. Depending on the speaking style, the accent phrase can become longer or shorter.

The "Machine Learning based Accent Phrase Boundary Prediction" is done by predicting where the accent phrase boundaries should be based on several thousand sentences which have had their phrase boundaries labelled assuming that that the sentence would be read naturally. In this case, appropriate accent phrases are predicted for a word string. In this mode, there are sometimes errors in accent kernel prediction or boundary prediction. On the other hand, Accent Phrase='Bunsetsu' offers a mode which will assume each 'bunsestu' is an accent phrase. While this mode results in less errors, it maximizes the number of accent phrases, which may not be desirable.

In both modes, it is possible to connect the accent phrases into one accent phrase with the "Pitch Pattern=Beginner" and "Accent=Beginner". However, if there is a Accent Type 1 word with more than three morass, the accent will be left as-is. As a general rule, if you connect the accent phrases, there will be no errors. As a result, it will not make much of a difference in beginner mode whether or not the phrase boundary is determined by machine learning or 'bunsetsu'.

For natural sounding speech, we recommend the former mode. For those that would like to have all conjugations and compound words to have the correct accent for their studies, we recommend the latter version.

Example showing the difference between "Machine Learning" and "Bunsetsu" modes. (Boundary=Machine Learning)
Example showing the difference between "Machine Learning" and "Bunsetsu" modes. (Boundary='Bunsetsu')

Controlling speech rate and speaker

Suzuki-kun provides the capability to set the speaker and speech rate. During synthesis they run with "Default Speaker" and "Default Speech Rate". When reading aloud, it is possible to change the speaker and speech rate midway through.

That is to say, it is possible to change the speaker or speech rate with a command. The command line begins with "//" (full-width is permitted).

// [Speaker]
// [Speech Rate]
// [Speaker] [Speech Rate]

is possible.

// [Speech Rate] [Speaker]

is not possible. Below, a specific example is shown.

// M2

Change speaker to M2.
// S

Change speech rate to S (slow).

// F2 F

Change speaker to F2 and speech rate to F (fast). The system does not differentiate between capital and lower case, half-width and full-width characters.

[Speaker] = F1,F2,M1, or M2
[Speech rate]=F,N, or S (fast, normal, slow)

Example conversation between teacher and student (Boundary=Machine Learning)
Example conversation between teacher and student (Boundary='Bunsetsu')

Suzuki-kun's weaknesses

We are working to improve Suzuki-kun each day, but it still has some weaknesses.

1. Accent type prediction error

1-1) Expressions written in hiragana

If the text is written in hiragana, it becomes difficult to detect word boundaries and identify words due to homonyms. Because of this, it is possible Suzuki-kun will recognize the word not as the word you input.

Example of prediction performance differences for hiragana and kanji (Phrase Boundary=Machine Learning)
Example of prediction performance differences for hiragana and kanji (Phrase Boundary='Bunsetsu')

1-2) Concerning the use of kanji

Even when kanji is used, there are sometimes problems in predicting accent phrase boundaries and the accent kernel position. This problem occurs especially in the case of compound nouns.

Example of error produced when using both kanji and hiragana (Phrase Boundary=Machine Learning)
Example of error produced when using both kanji and hiragana (Phrase Boundary='Bunsetsu')

1-3) Accents for unknown words

In the case where the Suzuki-kun's morphological analysis results in a word unknown to Suzuki-kun, the accent kernel is predicted following rules of accent position. However, there are some cases in which you may wish for a word to be detected as unknown like the name of an international student, but it is not recognized as such.

Examples of correct and incorrect morphological analyses (Phrase Boundary=Machine Learning)
Examples of correct and incorrect morphological analyses (Phrase Boundary='Bunsetsu')

2. Mistakes in reading

2-1) Changes in reading due to contextual effects

Suzuki-kun will sometimes make reading mistakes when there are variations due to context. This is not due to difficulties with discriminating between onyomi and kunyomi. It is usually when there are multiple kunyomi readings. It is possible to change the predictions made for accent kernels with the pitch editor, but we do not provide a way to edit the reading. In this case it is best to enter the word in hiragana.

Example with multiple kunyomi (Phrase Boundary=Machine Learning)
Example with multiple kunyomi (Phrase Boundary='Bunsetsu')

2-2) Numeral and Counter Words

With the new Suzuki-kun, it has become possible to read a variety of words related to numbers. The same goes with the symbols #$%&=~@. +〒¥£. However, when a numeral is followed by a counter word (numeral + counter word), there are sometimes errors in the reading. Counter words are a typical example of a word in which the reading and accent depend on the context. 一本，二本，三本・・・一日，二日，三日... The reading for numerals + counter words is handled with post-processing, but performance remains insufficient.

Example with a variety of symbols and numerical phrases (Cases with mistakes also included. Phrase Boundary=Machine Learning)
Example with a variety of symbols and numerical phrases (Cases with mistakes also included. Phrase Boundary='Bunsetsu')

3. Ambiguity in meaning and intonation controls

木のしたで雨宿りをしている女性を眺めた。
木のしたで，雨宿りをしている女性を眺めた。
木のしたで雨宿りをしている女性を，眺めた。

Depending on what「木のしたで」 depends on, the intonation may vary slightly. This does not mean that Suzuki-kun has considered what each phrase depends on when deciding on what intonation to give it. To simulate this, it is necessary to provide the appropriate punctuation marks.

雨宿り example (Phrase Boundary=Machine Learning)
雨宿り example (Phrase Boundary='Bunsetsu')

きれいじゃない.

This does not only mean the cases where you are impressed with something きれい. It also can mean that you want to emphasize that something is not きれい. The difference between the two is one of intonation. Suzuki-kun cannot determine which of the two the sentence is based on the context. Each sentence will be treated as if it is in isolation. It is not recommended that you try to use Suzuki-kun for such high-level intonation. Think about what is possible to do. (It is possible to make changes with the pitch editor).

きれいじゃない例 ( Phrase Boundary=Machine Learning)
きれいじゃない例 ( Phrase Boundary='Bunsetsu')

Return to the top page

Details about text read-aloud