commit 8fff25b (2025-08-23 00:43:55 -0400) Torsten Scholak: revamp

Resume

Below is my resume. A PDF version is available upon request.

Torsten Scholak

Montréal, QC, Canada

https://tscholak.github.io

GitHub LinkedIn X Google Scholar

Summary

AI research and engineering leader turning GPU compute into shipped LLMs. Head of ServiceNow's Foundation Model Lab and co-founder of the ServiceNow Language Model (SLAM) initiative, delivering open-weight models like Apriel-5B and Apriel-Nemotron-15B for efficient enterprise use. I excel at leading small, high-agency teams, setting clear strategy, and executing at scale.

Experience

Lead Scientist & Team Lead at ServiceNow Research

Montréal, QC, Canada | 2024-01-01

Applied Research Scientist → Staff Research Scientist at ServiceNow Research

Montréal, QC, Canada | 2021-01 - 2023-12

Applied Research Scientist - Research & AI Core at Element AI (acquired by ServiceNow)

Montréal, QC, Canada | 2017-10 - 2020-12

Data Science Engineer at Unata (acquired by Instacart)

Toronto, ON, Canada | 2016-08 - 2017-09

Postdoctoral Researcher at University of Toronto

Toronto, ON, Canada | 2011-06 - 2016-03

Education

Ph.D. in Theoretical & Mathematical Physics

University of Freiburg, Freiburg, Germany

2008-12 - 2011-12

M.S. in Theoretical & Mathematical Physics

University of Bayreuth, Bayreuth, Germany

2002-12 - 2008-12

Publications

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

***T. Scholak***N. SchucherD. Bahdanau

EMNLP 2021 | 2021-11 | DOI: 10.18653/v1/2021.emnlp-main.779

Incremental parsing keeps generated SQL valid; >400 citations, SOTA on Spider/CoSQL at release.

Multilingual Code Retrieval Without Paired Data: A New Benchmark and Experiments

J. Monteiro***T. Scholak***V. MehtaD. VazquezC. Pal

ICLR 2023 DL4C workshop | 2023-03

Introduced two cross-lingual code-text datasets; contrastive training beats GPT-4 baselines on 6 languages.

StarCoder 2 and The Stack v2: The Next Generation

BigCode Team

2024-02 | DOI: 10.48550/arXiv.2402.19173

StarCoder 2 is a 15B parameter model trained on 1.8 trillion tokens of code, achieving SOTA on code generation benchmarks at the time of release.

TapeAgents: a Holistic Framework for Agent Development and Optimization

D. BahdanauN. GontierG. Huang,E. KamallooR. PardinasA. Piché***T. Scholak***O. ShliazhkoJ. P. TremblayK. GhanemS. ParikhM. TiwariQ. Vohra

2024-12 | DOI: 10.48550/arXiv.2412.08445

Introduces a framework for LLM agent development, including a library of agentic tasks, evaluation metrics, and optimization techniques.

Unifying Autoregressive and Diffusion-Based Sequence Generation

N. Fathi***T. Scholak***P.-A. Noel

2025-04 | DOI: 10.48550/arXiv.2504.06416

Shows diffusion can share weights with an AR LM; 2x speed-up at equal perplexity.

Projects

Apriel Model Family - Co-lead & Technical Architect

5-15 B parameter open-weights LLMs optimized for enterprise agentic tasks and efficient inference.

Fast-LLM - Strategic Lead

Opinionated, high-performance PyTorch-based distributed model-training framework for trillion-token LLM pre-training.

Deep Learning for Code (DL4C) - Co-organizer

Annual workshop on deep learning for code, co-located with ICLR.

Hasktorch - Core Contributor

Haskell bindings for PyTorch, enabling GPU-accelerated machine learning in Haskell.