[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Spracherkennung bei Telefonscanning (was: Re: Brüsseler Konferenz, zweiter Teilbericht)



Rigo Wenning writes:
>          Phil Zimmerman, PGP inc.
>...
>Anschliessend ging Zimmerman auf die schrittweise Erosion
>der Privatsph=E4re ein. Privatsph=E4re werde nicht in einem
>grossen Schlag vernichtet, sondern mit vielen kleinen
>Mosaiksteinchen weiterer =DCberwachung ausgeh=F6lt. Er entwarf
>nun ein wahrhaft Orwell'sches Bild. Die Spracherkennung sei
>inzwischen soweit fortgeschritten, dass eine =DCberwachung im
>grossen Stil m=F6glich werde.
>...
>Ich fand, dass er ein wenig =FCbertrieben hat. Sein Vortrag hinterliess
>ein beklemmendes Gef=FChr von Orwell.=20

Was die Spracherkennung angeht, ist er realistisch.  In der Tat
war ja bisher das einzige Hindernis für die breite Telefonüberwachung
der zu treibende Aufwand (menschliche Ohren, die nach Stichworten
suchten).

Maschinelle Spracherkennung, bes. der Teilbereich 'word spotting', ist
inzwischen so weit, daß die Kosten für Stichwortsuche drastisch sinken.
Darüber hinaus (parallel zur Entwicklung der Suchmaschinen zB Refined
Search von Altavista) können durch Natural Language Processing automatisch
Zusammenfassungen/Einordnungen ganzer Gespräche erstellt werden, siehe:

Newsgroups: comp.speech
From: David Anthony James <james@ubilab.ubs.ch>
Subject: ANNOUNCE: PhD Thesis available by ftp
Message-ID: <1995Nov21.163716.25093@ubilab.ubs.ch>
Date: Tue, 21 Nov 1995 16:37:16 GMT

It's about time I announced the availability of my PhD thesis. So here 
goes...

The following PhD thesis is now available by anonymous ftp as:

ftp://svr-ftp.eng.cam.ac.uk/pub/reports/james_thesis.ps.Z

(If you want a copy, but don't know what the above line means, mail me!)

-----------------------------------------------------------------------

The Application of Classical Information Retrieval
         Techniques to Spoken Documents

                  David A. James
                  Downing College
                     Cambridge
                  United Kingdom

The research presented in this thesis addresses the topic of _ad hoc_ 
retrieval of information from collections of spoken items such as
radio news bulletins.

Modern digital computers are becoming increasingly adept at processing
non--textual data, such as speech. Consequently, new methods are
required to allow users to pin--point specific items of interest in
large data collections. Such a method might exploit the Hidden Markov
Model (HMM), which has proved successful as the basis for many
experimental speech recognition systems, and the well--understood
techniques of document retrieval that have arisen from many
years' research into textual information retrieval (IR).

However, so far there has been little exploration of the potential
combination of these methods in order to index ``spoken word'' data.
In the IR community, several papers have put forward an approach to
the problem but this approach has not been properly tested. Work done
in the speech recognition area has tended to concentrate on developing
systems for _topic classification_. These systems are extensively
pre--trained for the task of partitioning a set of spoken messages
into a set of disjoint and exhaustive classes, each one representing
some topic. Their utility is, in practice, limited by the fixed class
set and slow operation, and they do not represent an approach to the
problem of retrieving items that correspond to _arbitrary_ topics.

This thesis describes experiments combining the techniques of
classical information retrieval with HMM--based speech recognition
methods in order to retrieve items from a collection of spoken
messages corresponding to items of radio news.  In a baseline system,
a new technique for wordspotting allows items matching an arbitrary
expression of the information requirement to be retrieved quickly and
reasonably accurately.  The system is subsequently improved through
the addition of appropriate language models and the use of
state-of-the-art acoustic modelling. Finally, performance is
compared with that obtained by two alternative approaches, including
one recently proposed in the IR literature, and found to be
considerably superior.

Key Words: speech recognition, information retrieval, topic
classification, keyword spotting, wordspotting.

David A. James             |
UBILAB                     | Mail:   james@ubilab.ubs.ch  
Union Bank of Switzerland  | Phone:  +41 1 236 7309
CH-8021 Zurich Switzerland |

ralf
-- 
URL: http://home.pages.de/~rws/  (personal)
URL: http://home.pages.de/~ears/ (free speech recognition for Linux)