About
I’m a second-year Ph.D. student at New York University working at the intersection of language and machine learning. At NYU, I work primarily with Tal Linzen in the Computation and Psycholinguistics Lab (CapLab) and with Sam Bowman in the Alignment Research Group (ARG). Before coming to NYU, I earned my bachelors degree in Linguistics from Yale University, where I was advised by Bob Frank as part of the CLAY Lab. In the past few years, I’ve been fortunate enough to intern at some very cool companies and work with a number of amazing coauthors on a diverse array of projects; you can see a more thorough description in my CV.
Meta-level Research Statement: In brief, I’m very interested in the problem of out-of-domain generalization (OOD) and want to create computational models of language which generalize in the way human language learners do. I want to understand the degree to which neural networks learn the same information in the same way as humans do and to apply insights from cognitive science and linguistics to create better computational models of language.
Object-level Research Interests:
- Compositional Generalization: Language, and more foundationally, cognition, is compositional: on the basis of an understanding of smaller objects (phonemes, clauses, lexical meanings) and the rules of how they interact, we are able to construct far more complicated linguistic and cognitive objects (words, complex sentences, useful representations of the world). Can neural language models learn to exhibit human-like compositionality? If so, how? How is this capacity derived from a model’s learned representations?
- Hierarchical Generalization: We have good reason to model language as a cognitively tree-shaped object whose representations are communicated through a linearized medium (sequences of audio or text). A consequence of this is that the compositionality desired in language models requires an attenuation to the hierarchical patterns of language. How do we build this into neural networks? Does it emerge naturally through training, or does it require architecture-specific inductive biases? How is scale implicated in hierarchical generalization?
- Reasoning about OOD performance: Current state-of-the-art language models are fantastically large objects trained on unfathomable amounts of data; this makes predicting the performance of such models on tasks difficult, especially when the tasks fall well outside the scope of the training domain. Yet increasingly, these models are being deployed in ways which facilitate OOD inputs. How can we reason about the performance characteristics of models on OOD data at scale? How can we create models which are robust to these inputs? What are the potential impacts to society of deploying these models?
If any of this sounds interesting to you, please reach out! I’m very glad to meet new people and collaborate on projects we both find interesting!
Contact Information
- NYU Email
petty@nyu.edu
- Permanent Email
research@jacksonpetty.org
- Ling Office
- Room 507, 10 Washington Place
New York, NY 10003
Elsewhere on the Internet
- GitHub
@jopetty
- arXiv
petty_j_1
@jowenpetty
@jowenpetty
- VSCO
@jowenpetty
- Mastodon
@jowenpetty@mastodon.social
- YouTube
@jacksonpetty
- Google Scholar
- Jackson Petty
- Semantic Scholar
- Jackson Petty
- ORCID
0000-0002-9492-0144
in/jackson-petty
Colophon
This site is built using Hugo and is hosted on GitHub pages. Type is set in Sebastian Kosch’s Cochineal, Matthew Butterick’s Heliotrope, and Neil Panchal’s Berkeley Mono.