Archive

2025 Jun 05 RELIC: Evaluating Compositional Instruction Following via Language Recognition 2025 Jun 05
2025 May 26 Protein Scores 2025 May 26
2025 May 25 The Button Game 2025 May 25
2025 Feb 26 Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases 2025 Feb 26
2025 Jan 31 How Does Code Pretraining Affect Language Model Task Performance? 2025 Jan 31

2024 Oct 07 GPQA: A Graduate-Level Google-Proof Q&A Benchmark 2024 Oct 07
2024 Apr 16 The Illusion of State in State-Space Models 2024 Apr 16
2024 Jan 01 2024 Jan 01

2023 Nov 13 In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax 2023 Nov 13
2023 Nov 13 Debate Helps Supervise Unreliable Experts 2023 Nov 13
2023 Nov 08 How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure 2023 Nov 08
2023 Oct 30 The Impact of Depth on Compositional Generalization in Transformer Language Models 2023 Oct 30
2023 Jul 01 (QA)$^2$: Question Answering with Questionable Assumptions 2023 Jul 01
2023 Mar 02 Optimal monohedral tilings of hyperbolic surfaces 2023 Mar 02
2023 Feb 25 Building a Slackbot to DM Users 2023 Feb 25

2022 Jun 18 Do Language Models Learn Position-Role Mappings? 2022 Jun 18
2022 Apr 28 Characterizing Algebraic Generalization in Linguistic Neural Networks 2022 Apr 28
2022 Apr 12 Nearer to G-d are We 2022 Apr 12

2021 Dec 28 The Optimal Double Bubble for Density $r^p$ 2021 Dec 28
2021 Sep 24 Transformers Generalize Linearly 2021 Sep 24
2021 Feb 20 Certain hyperbolic regular polygonal tiles are isoperimetric 2021 Feb 20

2020 Nov 02 Sequence to sequence networks learn the meaning of reflexive anaphora 2020 Nov 02