Language Life Lines
By Jenny Maloney
For twenty-four years, Jugal Kalita taught hundreds of students in his computer science classes at the University of Colorado Colorado Springs (UCCS). He has watched the college grow from an entirely undergraduate set of programs to a sprawling, post-graduate research university. Working in both a growing university and the constantly changing computer science field, Kalita has learned to adapt and thrive.
The rapidly changing field has inspired Kalita in his current research to create resources for endangered languages. Across the world, resource-rich languages like English, Mandarin Chinese, Spanish and Hindi are crowding out resource-poor languages. "If you look at a language like Cherokee, here in the US," Kalita explained, "or a language like Dimasa in northeast India or a language like Tai Daeng in Vietnam and Laos, these are languages that barely have any resources. However, most of these languages have at least one bilingual dictionary because of explorers, or people who tried to colonize these places, or because of people in the church who wanted to translate the Bible. But that's all, in terms of lexical resources, a language like these might have - just one single bilingual dictionary with a limited number of terms. But, maybe, now computer technology can help these languages."
Kalita and his colleagues are attempting to expand available dictionaries to endangered language speakers by writing computer algorithms to translate a low-resource language like Cherokee into Chinese, French, Hindi, or English. Because his focus is artificial intelligence, Kalita is teaching the computer to do the translation work.
"Using resources like the Bing translator or the Google translator and limited parallel textual documents, we're trying to translate dictionaries. We're trying to use limited resources available on the Internet to develop a whole bunch of additional dictionaries, automatically, without human help," said Kalita.
Among the resources Kalita is working to construct are thesauruses and Wordnets, which are lexical databases that group core elements of a language together. "In English there is a database, or we could call it an ontology, of words and how words are related to each other - which word is a synonym of which other word, which word is an antonym, which word is a so-called hypernym or hyponym, which word denotes an object that is a part of another word," said Kalita.
The most well-known Wordnet is the one compiled by Princeton for the English language. Kalita and his students are attempting to create a similar resource for endangered languages. Kalita explained, "That kind of word ontology or Wordnet resource is quite valuable in performing tasks computationally. So we're trying to create Wordnets for these languages which are endangered or who have very few resources."
While the main funding for creating language resources generally goes to those studying the dominant languages, Kalita feels developing resources for endangered languages is necessary work. He said, "Recently, there has been an understanding among researchers that if these languages go away, it makes all of humanity poorer. The diversity of languages, the diversity of cultures, the diversity of thought that is expressed in terms of languages enrich everyone."
“When people from different fields come together, new and exciting things are likely to happen"
Since computer science requires a great deal of hand-on work, writing and rewriting programs - then evaluating the effectiveness of those programs - and encompasses a wide variety of subjects, Kalita does not work in a vacuum. Collaboration and teamwork are key to making sure his research works.
Over the years, Kalita has worked with hundreds of undergraduate and graduate students, and in the past several years he's had the opportunity to work with UCCS's new Ph.D. students. His students are his first collaborators.
Often, he will come up with the seed of the idea and then encourage his students to develop the ideas further and develop computer programs to test out their ideas. "Usually I come up with the basic ideas myself - what topic we want to work on in a broad area. Sometimes with some students, we come up with a few questions or problems for which we need answers."
Next, he tells the student, "Research these problems and choose the problem you're most interested in." He added, "I work on explaining papers, asking questions, proposing possible answers, but they're the ones who do the deeper investigation."
He went on to explain, "Because we're in computer science we can't just do theoretical work. People have to write computer programming to verify whatever hypothesis they may have or whatever solution they may have
come up with."
Another area of Kalita's research that involves collaboration with students is designing automatically generated comprehension questions to test natural language processing and artificial intelligence. "Suppose we were working with a K-2 child and he or she reads a passage, a short story, or a fairy tale or a Dr. Seuss book. After reading the book we want to ask a few questions to see if the child understood it. Sure, a teacher can generate those questions on her own, but can a machine do it? Automatically?"
To help verify results of these computer programs, which cross a broad spectrum of subjects, Kalita has worked with different departments throughout his history at UCCS. "I've worked with people in electrical engineering, mechanical engineering, psychology, communication, biology, chemistry, and linguistics on our campus."
His collaborations aren't limited to Colorado Springs - he's worked with professors and students from Brigham Young University, Louisiana State University, SUNY-Buffalo, Colorado College, University of Texas, University of Minnesota, and Stanford University.
He's also collaborated outside the United States. He has worked closely with colleagues at several universities in India, and in particular Tezpur University (just forty miles from where he grew up), both in language resource production and several other fields. For example, he co-wrote a book on network security, Network Anomaly Detection: A Machine Learning Perspective, with Dr. Dhurba Kumar Bhattacharyya. "His area of interest in network security compliments my interest in artificial intelligence and machine learning, and vice versa," Kalita said.
Kalita sees great benefit in working with collaborators: "It is always a great idea to look at a problem from different perspectives. When people from different fields come together, new and exciting things are likely to happen."
For his expansive research and passion for teaching, Kalita was recognized at UCCS in 2011 with the Chancellor's Award, which is given to faculty who excel in research, service, and teaching. He has also received teaching, research and service awards in the College of Engineering and Applied Science at UCCS.
Seeing his students succeed is one of the greatest points of pride for Kalita. He finds pleasure in working with bright undergraduate researchers. For the past several summers, he has been kept busy with the Research Experience for Undergraduates program, funded by the National Science Foundation. About twenty published papers have resulted from the grant, and in these papers undergraduates were the first authors.
You can read more about Jugal Kalita at his website.