Quality Assurance of Audio Annotation

Yai365.com is a platform for automation and traceability of the overall annotation process. The platform allows several working roles to collaborate using allocated workflows of data files to move through the process while being traced in every step.  In Yai24.com there are four different working roles  where each role has specific access and supervision rights to manage the progress under its scope.

 

 Quality Assurance of Audio Annotation

 

The Quality gates tool is meant to check the transcription rules in each transcription file , making sure that those rules are followed and respected as described.

  

  • Supported languages

Before uploading a file , you will need to select a language otherwise an error message will be shown indicating this mandatory step.

 

  

 

 

 In our tool 12 languages are available: (the languages are always updated , we can have more than 12 in the future)

  • Danish
  • English US
  • Polish
  • Bulgarian
  • Czech
  • Estonian
  • Greek
  • Latvian
  • Lithuanian
  • Romanian
  • Slovenian
  • Swedish

 

 

  • Spell check and suggestion

The Spell check box indicates that the spell check rules in the transcription file will be checked depending on the selected language.

The suggestion box indicates that the generated table will include a column called suggestion in which we can find suggested solutions either for structure errors or for spell check warnings.

 

 When selecting both spellcheck and suggestion , the generated table will include all the spell errors and indicate them as warnings. The suggestion column will inlude the suggested replacement to avoid the error or the warning. 

 *The spellcheck errors are always detected as warning 

 

 When none of the boxes is selected , the generated table will only include structure errors and other warning beside the spell check. The suggestions column will be empty.

  

 

  • File upload

Three files extension are supported JSON , TRS and SONIX. In order to upload a file you need to click on one of the three following boxes.

  

Important note:

All the files have to be UTF-8 encoding  otherwise you will receive an error message

 

  • Generated table

The generated table includes 7 columns in the following order :

  • ID : The error ID , a number.
  • Error type : the type of the violated rule (Error , warning ) , a type info also exists to indicate an information.
  • Rule: the rule name that is not respected.
  • Rule description : the detailed description of the violated rule.
  • Content : the content of the error.
  • Segment ID: the segment ID in which the rule is not respected.
  • Suggestions: The suggested replacement to avoid the error or the warning.

 

Above the table some generalized information will be shown such as;

  • The total number of errors and warnings
  • The Number of segments including more than one speaker
  • The File name
  • The date and the time of the upload.
  • A Filter to search for a specific detail in each column. 

 

 

The quality gates is developed to check the errors and the warnings in transcription files following the transcription rules.

We have 47 rules to follow organized in 6 different categories: (for more detail check the rules category table)

  • Format number : includes the numbers rules.
  • Format symbols : includes rules of specific symbols
  • Format structure : includes rules of specific structures such as inserting a foreign language sentences and initialism.
  • Language dependent: includes rules depending on the spell check and other grammatical rules.
  • Different language detection : includes the rule of detecting a different language other than the one selected at first.
  • Segment length : includes the rule of the segment maximum length allowed

 


Print