For a few practical reasons, it's easier to take one speaker at a time when labelling disfluencies -- it cuts down the number of xlabel windows you need to 3, for one thing. You do, however, need to see and hear both channels and be able to play both or either separately. It is REALLY CRUCIAL to use this facility. Without separating the channels where the speakers overlap, it is often impossible to work out what they're saying and it's very easy to miss words and disfluencies.
Use my transg and transf scripts to start up waves with xlabel. They will give you the 3 xlabel files you need to work with: the file.words, which is of course vital; the file.disfl, where most of the work goes; the file.rmk (for comments).
ZOOM IN to a reasonable degree while labelling. The labels must be aligned with each other and with word labels as closely as possible. Not zooming in enough leads to a considerable loss of accuracy.
It is probably possible to make up a menu file with as many labels as you need. This would look monstrous on the screen and would be largely redundant and probably awkward to use. I have settled on a menu with a useful subset of the TYPE labels in it, along with IP, SP, FP and E, and MOVE, INSERT, DELETE and REPLACE function toggles. All other labels have to be typed in manually. In many cases, unfortunately, this entails a lot of moving of labels into place after writing them, so as not to write onto other labels It is quite important to have the labels aligned neatly.
I fairly often come across errors in the transcriptions, which most often come about as a result of disfluencies. There are many cases of words or fragments being missed from the original transcriptions and quite frequent transcription errors around disfluencies. Don't be scared to alter the transcriptions, but, if in doubt, ask.
Always state where you have changed the transcription and why (briefly) by leaving an approximately-aligned comment in the .rmk file. Start the comment with your initials and a colon, so we know who says what, and add "(g)" or "(f)" so we know which speaker you're referring to.
In many cases, there is some doubt about whether a disfluency
is an S, an I, or a D, particularly where there is only a word
fragment in the reparandum. e.g. "Go left s-... go left straight
so you're above the mine". The default case is D, so when
there was any reasonable doubt about the class of disfluency,
it should be labelled as a D.