Language Structure Problems

Reading transcription with no Word Boundaries

If you have trouble reading a string of segments with no word boundaries inserted for you, you're not alone1 Your poor speech perception system deals with this issue every single time anybody speaks. figuring out where to put those word boundaries is a tough issue, and psycholinguists have been puzzling about how we do it for as long as there has been psycholinguistics. Try this little example:


First, try just pronouncing the segments all in a row. it's important here to be accurate when pronouncing the vowels.

Do not, whatever you do... do not try to make an orthographic representation of each segment as you go: you'll only have to change your mind a million times and confuse yourself along the way.

Try what psycholinguists call 'early closure'. This is more a term for syntax processing than phonetics, but the principle is useful still. Make the earliest word you can... that is, read segments until you get a viable english word then stop, separate off the used segments and see what's up next.


[p] is not a word on its own: keep going:

[pV] is also no good. If it were [tV], it could be an informal pronunciation of the word 'to', but [pV] is not a word on its own. So keep going:

[pVp] Now, that's possible. It's the word 'pup'. This might be an utterance about a pup. So separate that off and carry on: Insert yourself a space so you know where you are. start looking at the beginning of what might be the next word. Bring that first segment forward to see if it works:

[pVp i zD{tbajtarnatnajs]

[i] is not a word. Keep going:

[pVp iz D{tbajtarnatnajs]

[iz] is a transcription of the word 'ease'. but, this doesn't makes sense: 'pup ease'??? but, if you keep going on the second word, things get even less likely:

[pVp izD {tbajtarnatnajs]

Can anyone think of a word that starts [izD]? maybe, it's time to reconsider that [pVp]... Try moving the word boundary one segment to the right, including the [i] as the last segment of the first word:

[pVpi zD{tbajtarnatnajs]

Look! It's a puppy! Oh, how nice! But, wait: that leaves us word number two beginning with a zD] consonant cluster. Not so nice. Move the word boundary one more segment to the right.

[pVpiz D{tbajtarnatnajs]

This gives us puppies! Always a good thing.

You can carry on this way across the entire string of segments. Or, you can try a little top down approach.

First, look for places where the way segments line up are not natural in English. There are rules English obeys about what can begin a syllable. These constraints are found in all languages, and in each language the set of rules differs. This is called 'phonotactics'. And, one of the phonootactic constraints in English is that [zD] is not a legal word opener. Another is that, although you can find [tb] sequences across the breaks of compound words, that sequence of consonants is more likely to have a word boundary between them. Move through the string and insert some more possible word boundaries based on your knowledge of what consonants are not likely to cluster together.:

[pVpiz D{t bajtarnat najs]

Now we see 'puppies that' at the beginning and (happily enough) 'nice' at the end.

You can go back to segment-by-segment analysis at this point or you can try for more top-down deductions. Look for words that are somehow related to puppies, and add more potential word boundaries:

[pVpiz D{t bajt arnat najs]

Alas, our cute little puppies appear to be biting puppies.

From here, it's a relatively simple task to find the 'not' to convey our attitude toward biting puppies and insert the necessary word boundaries until you get:

[pVpiz D{t bajt ar nat najs]

Then, do that orthographic magic so it looks like a proper English sentence:

"Puppies that bite are not nice."

Know that you may have to insert, move and erase word boundaries as you go until you get it right.

Additionally, if the transcribed utterance is informal, you will not have all the cues so clearly laid out as you did here. The person could easily have said "aren't', for exxample, and the vowel in "that' might have been reduced to [I] or {V} or hardly voiced at all. for this round, at least, we did identify our biting puppies!