ESS Biography
SETH-DAVID DWORMAN, Research Assistant for Computational Linguistics
Major: Computer Science, Linguistics College/Employer: The MITRE Corporation Year of Graduation: 2014 |
|
Brief Biographical Sketch:
I grew up in a small town in Western Massachusetts that even most people in Massachusetts won't recognize. In school was where I was introduced to language by studying foreign language: Hebrew, Spanish, German, and French. This drew my interest into how to both describe how language works (its grammar), but also how to explain why language is the way it is. This naturally led to linguistics. I also was very passionate about the fencing sport, so I went to Brandeis where I could study linguistics and fence at the same time. In my sophomore year I decided to learn programming and Computational Linguistics (CL) because I had finished the linguistics major in my first three semesters. Brandeis has a very awesome CL program and lab, where undergraduates even with minimal background can get involved and help perform meaningful and useful research. Since then I've been involved with the Brandeis Lab for Linguistics and Computation: first as an annotator and now as a research assistant. Past Classes(Clicking a class title will bring you to the course's section of the corresponding course catalog)M79: Introduction to Natural Language Processing in Splash Fall 2014 (Nov. 15, 2014)
How does Google search work? What does iOS Siri actually "understand?" How about predictive text completion? How could you build a grammar checker? What's the deal with machine translation?
What do all of these technologies and problems have in common? Natural language!
This course will provide insight into the basic foundations that natural language processing (NLP) requires and to get you thinking about how to approach solving problems involving natural language.
The first part of the class will introduce some of the inherent ambiguities and difficulties in natural language: word sense disambiguation, adjunct attachment, and quantifier scope.
-Word sense disambiguation: Is "I saw her duck" ambiguous?
-Adjunct attachment: Is "I saw the man with a telescope" ambiguous?
-Quantifier scope: Is "I have a picture of everyone" ambiguous?
The second part of the class will then focus on how to process and prepare unstructured natural language text for analysis. We will briefly explore the notion of what a "word" is, text tokenization, and token normalization.
-Words: What exactly is a "word?" Why is this important to define? Are punctuation marks words (would this be useful?)? What about proper names, e.g. "Mary Jane"--two words or a single word? Would you consider "'s" in "Mary's" to be a word?
-Tokenization: How do we extract the tokens (i.e. words) from an unstructured text?
-Normalization: Should we count "dog" and "dogs" as separate tokens? How about "love" and "loves"? Are "go" and "went" really different words?
In the third part of the class, we will do a hands on analysis together of unstructured text. I (or you) will ask questions about the data and then you will tell me how to solve it, while I code in front of you a live solution following your ideas.
------------------------------------
For the hands on activity at the end, I will be using Python with the Natural Language Toolkit library (NLTK). You do not need to download any of these, a programming background, or a computer, but if you have the know-how here are some links to get started.
Python 2.7.8: I use this because NLTK does not yet have a stable release for Python
NLTK: http://www.nltk.org/install.html
|