Document View

               
Print  |  Copy link  |  Cite this  | 
 
Other available formats:
Estimating the probability of historical connections between languages
by Kessler, Brett L., Ph.D., Stanford University, 1999 , 340 pages; AAT 9924446

Abstract (Summary)

Historical linguistics has no generally accepted methodology for statistically estimating whether the connections it documents between languages are coincidental or statistically significant (likely to reflect historical realities). Currently the best proposals are very susceptible to errors which lead the researcher to falsely judge languages to be historically connected. I propose several improvements in the statistics of the testing. The new techniques are illustrated with a set of five languages having varying degrees of interrelatedness (English, German, French, Latin, Albanian) and three not believed to be related to that set or to each other (Hawai'ian, Navajo, and Turkish). Statistically, the technique of Ringe (1992) suffers from an invalid use of multiple tests. I develop a single test that uses Monte Carlo techniques for estimating significance. The test takes less than a minute on a personal computer and is conceptually much simpler than traditional parametric statistics. My technique is compatible with a wide range of metrics, and I develop several variants in attempts to interpret algorithmically the traditional techniques of historical linguistics, which seek to discover recurrent pairings of sounds between semantically matched words in a set of languages. I begin with an implementation of the familiar chi squared statistic. That approach is satisfactory, but only permits the researcher to consider one sound in each word. The Monte Carlo technique also permits a simpler, more traditional counting of the recurrent pairings, and with proper scaling that can be made to work for multiple sounds in a word. Although it is possible to consider all conceivable pairings of sounds, I show that a simple linear alignment is preferable because universal properties of word length interfere with the goal of finding particular, nonuniversal, connections between languages. I also explore the possibilities of comparing words at subsegmental levels. The greatest problem with the testing is the quality of the data. The tests are easily distorted by loans, recurring etyma, and nonarbitrary vocabulary. I show how prevalent such problems are among the items in the standard Swadesh list of 200 concepts, and introduce some mathematical techniques to help the linguist identify problem areas.

Indexing (document details)

Advisor:Kiparsky, Paul
School:Stanford University
School Location:United States -- California
Keyword(s):Languages, Probability, English, German, French, Latin, Albanian, Hawaiian, Navajo, Turkish, Historical connections
Source:DAI-A 60/04, p. 1105, Oct 1999
Source type:Dissertation
Subjects:Linguistics
Publication Number: AAT 9924446
ISBN:9780599240162
Document URL:
ProQuest document ID:733956721


Print  |  Copy link  |  Cite this  |  Publisher Information
^ Back to Top                
Copyright © 2009 ProQuest LLC. All rights reserved. Terms and Conditions
Text-only interface