Archive
- 2024 Sep 06 How Does Code Pretraining Affect Language Model Task Performance? 2024 Sep 06
- 2024 Apr 16 The Illusion of State in State-Space Models 2024 Apr 16
- 2024 Jan 01 2024 Jan 01
- 2023 Nov 21 GPQA: A Graduate-Level Google-Proof Q&A Benchmark 2023 Nov 21
- 2023 Nov 13 In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax 2023 Nov 13
- 2023 Nov 13 Debate Helps Supervise Unreliable Experts 2023 Nov 13
- 2023 Nov 08 How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure 2023 Nov 08
- 2023 Oct 30 The Impact of Depth on Compositional Generalization in Transformer Language Models 2023 Oct 30
- 2023 Jul 01 (QA)$^2$: Question Answering with Questionable Assumptions 2023 Jul 01
- 2023 Mar 02 Optimal monohedral tilings of hyperbolic surfaces 2023 Mar 02
- 2023 Feb 25 Building a Slackbot to DM Users 2023 Feb 25
- 2022 Jun 18 Do Language Models Learn Position-Role Mappings? 2022 Jun 18
- 2022 Apr 28 Characterizing Algebraic Generalization in Linguistic Neural Networks 2022 Apr 28
- 2022 Apr 12 Nearer to G-d are We 2022 Apr 12
- 2021 Dec 28 The Optimal Double Bubble for Density $r^p$ 2021 Dec 28
- 2021 Sep 24 Transformers Generalize Linearly 2021 Sep 24
- 2021 Feb 20 Certain hyperbolic regular polygonal tiles are isoperimetric 2021 Feb 20
- 2020 Nov 02 Sequence to sequence networks learn the meaning of reflexive anaphora 2020 Nov 02