Help   About ProQuest | 

Dissertations & Theses
The world's most comprehensive collection of dissertations and theses.Learn More...

Citation/Abstract

Print  |  Email  |  Order a Copy  
Latent variable framework for modeling and separating single-channel acoustic sources
by Shashanka, Madhusudana, Ph.D., Boston University, 2008, 121 pages; AAT 3296037

Abstract (Summary)

Auditory Scene Analysis refers to the human ability to extract different perceptual objects from a sound mixture. Replicating this ability in artificial systems has been an active area of research, related both to how one characterizes acoustic sources and separates sources from mixtures. The focus of this thesis is to develop models and algorithms that provide a framework to address these questions. The framework comprises latent variable models that employ hidden variables to model unobservable quantities. Such models are appropriate for obtaining representations of data that make hidden structure explicit. This work shows how one can utilize these ideas for the problem of source separation using single-channel audio signals.

The proposed framework focuses on learning the time-frequency (TF) structure in a data-driven manner. TF representations of sounds are modeled by treating the energy in every TF bin as histogram counts of multiple draws. This formulation allows the extraction of the characteristic frequency structure of individual sources as latent components and models the sources as additive combinations of these components. The framework is then extended to incorporate the idea of sparse coding to overcome an important limitation of the basic model: an upper bound on the number of extractable components. Sparsity, imposed in the form of an entropic prior distribution, allows extraction of overcomplete sets of components that are more expressive and better characterize the sources. The statistical foundation of the framework makes it amenable to other extensions where known or hypothesized structure about the data can be easily incorporated by imposing appropriate prior distributions. Theoretical analysis of the proposed methods and algorithms for parameter inference are presented.

Applications of the models to real-world problems are evaluated and discussed. The latent components learned from acoustic sources are used in a supervised setting for source separation and in a semi-supervised setting for denoising. Unlike approaches based on time-frequency masks that reconstruct partial spectral descriptions of sources by identifying time-frequency bins in which a source dominates, this approach reconstructs entire spectral descriptions of all sources. Various experimental results demonstrate the utility of the proposed framework.

Indexing (document details)

Advisor:Shinn-Cunningham, Barbara G.
School:Boston University
School Location:United States -- Massachusetts
Keyword(s):Machine learning, Source separation, Acoustic modeling
Source:DAI-B 69/01, Jul 2008
Source type:Dissertation
Subjects:Computer science
Publication Number: AAT 3296037
ISBN:9780549412915
Document URL:http://proquest.umi.com/pqdlink?did=1464135161&Fmt=7&clientI d=79356&RQT=309&VName=PQD
ProQuest document ID:1464135161


 

 » Purchase the full text

Dissertations and theses can be purchased in a variety of formats which may include: PDF for web download, softcover, hardcover, or microform. Click the "Order a Copy" button to see the formats available for this item.

Available without purchase:

Preview  Preview

Print  |  Email  |  Order a Copy  
^Back to Top
Copyright © 2009 ProQuest LLC. All rights reserved. Terms and Conditions