Content text Venedi Audio Annoation Project Guidelines (LID)
N2 (low) 2 to 8 Around half transcribable, half not N3 (very low) 1 or 2 Only a few words could be transcribed ⚠️Background Noise 1. Consider background noise in how it affects intelligibility: significant noise will mean the audio has a lower intelligibility level. Basically: how easy is it to listen to it? 2. Assess how much could be transcribed after listening to it only once. Don't listen too many times. Your first impression is important. Please consider both factors for intelligibility: a) how many words can you transcribe after listening once b) background noise ● If the speech is clear but the background noise is loud, it should not be labeled as N0. In this case, please lower the intelligibility rating by one level. ● N0 is only for the cleanest audio, 100% of words transcribable and little background noise. Audio Examples - In addition to checking for intelligibility, please listen to the audio below to observe how the level varies with background noise. ● N0 (High) ● N0 (High) ● N1(Medium) ● N2(Low) ⚠️ Note 1. Each utterance should start with the intelligibility tag. 2. Please ignore the part of overlapping speech when determining the intelligibility label. For example, if there are unintelligible words in the overlapping parts but the rest of the non-overlapping speech is clear and transcribable, it can be rated as N0 (high.) 3. Consider background noise in how it affects intelligibility: significant noise will mean the audio has a lower intelligibility level. Basically: how easy is it to listen to it? 4. Assess how much could be transcribed after listening to it only once. Don't listen too many times. Your first impression is important. 2
2.2 Dialect The speech in this project may consist of various dialects of your language. Please add a relevant dialect tag which dialect is spoken in each segment. If you are unsure what the dialect is, please add ‘Others’. Note, the intelligibility tag should be followed by a dialect tag. 2.3 Overlapping Speech You may hear sections of overlapping speech between speakers. 1. If you hear overlapping speech and both speakers are at the same volume level, please: ● Insert the tag to represent the section of overlapping speech. ● You do not need to add intelligibility and dialect tags to the overlapping speech. Instead, add timestamps to indicate the beginning/end of the overlapping speech. 2. If you hear the overlapping speech where speech from a dominant speaker is clear and you can identify the dominant speaker from surrounding speech: ● You don't have to insert the tag and timestamps to represent the section of overlapping speech. ● Please add a single intelligibility tag and a single dialect tag based on the dominant speech only. Disregard any background speech that is heard behind the dominant speaker. 3. If the entire utterance contains only overlapping speech at the same volume level, insert the single tag and move on. Refer to the Timestamping section of the guidelines for rules on timestamping around overlapping speech. 2.4 No Speech A speech segment may occasionally contain noticeable silence or pause, with no actual speech. ● Use the tag for no speech (pause, noise, laugh, cough, static, silence, music, singing, ring, etc) of 1 second or more within speech (between words) and also for no speech of 1 second or more before the speaker starts speaking or after a speaker finishes a sentence. ○ If no speech occurs before the person starts speaking or after they finish a sentence, they need to be time stamped out. ○ If no speech occurs in the middle of a sentence, you should only insert the tag with a timestamp around it. ○ Refer to the Timestamping section of the guidelines for rules on timestamping around tag. ● If no speech occurs less than 1 second, please ignore it. ● The word "Mississippi" takes approximately one second to pronounce. This is also a quick reference if you are unsure whether a no speech is long enough to be tagged. 3
● If an entire utterance contains no speech (e.g. there is only silence or noises) insert the single tag only and move on. Other tags in such units should not be tagged. 3. Timestamp Instructions You'll see a waveform in ADAP for each unit. Timestamps are placed on the waveform. In this project, we use timestamps to indicate: ● Background noise level change: If the background noise level obviously changes within an utterance, please indicate this using a timestamp. However, if the change is very minor or limited to a single word, it is not necessary to indicate it with a timestamp. ● Dialect change: If there is a change in dialect within an utterance, please mark it with a timestamp. ● Speaker change: If there is a speaker change within an utterance, please mark it with a timestamp. ● Overlapping speech: If overlapping speech occurs within an utterance and both speakers are at the same volume level, please indicate this using timestamps. ● No speech: ○ A period of 1 second or more of non-speech between sentences within an utterance (noise, static, silence, music, ring, etc) ○ A period of 1 second or more of non-speech before a speaker starts speaking at the start of an utterance ○ A period of 1 second or more of non-speech after a speaker stops speaking at the end of an utterance Note, intelligibility and dialect tags should always be added after the timestamp. 3.1 Accuracy ● Place the timestamp in a non-speech segment to avoid interrupting the speech mid-utterance. ● Timestamps must be placed after the speaker has fully produced the final sound of the word. Please click here to see how to use a timestamp and sample screenshots. 4. Language Switch Label The audio in this project might also contain other languages (foreign). If the speaker uses a grammatically structured phrase in another language, please label the audio as a language switch and specify the language switched to. If you are unsure which language was used, please select “unsure.” If there is no language switch, please select “Not Applicable.” 4