commit 8fff25b (2025-08-23 00:43:55 -0400) Torsten Scholak: revamp
Resume
Below is my resume. A PDF version is available upon request.
Torsten Scholak
Montréal, QC, Canada
https://tscholak.github.io
GitHub
LinkedIn
X
Google Scholar
Summary
AI research and engineering leader turning GPU compute into shipped LLMs. Head of ServiceNow's Foundation Model Lab and co-founder of the ServiceNow Language Model (SLAM) initiative, delivering open-weight models like Apriel-5B and Apriel-Nemotron-15B for efficient enterprise use. I excel at leading small, high-agency teams, setting clear strategy, and executing at scale.
Experience
Lead Scientist & Team Lead at ServiceNow Research
Montréal, QC, Canada |
2024-01-01
- Shipped the open-weights Apriel models, achieving state-of-the-art results on enterprise benchmarks while optimizing for serving cost and latency, recognized by NVIDIA CEO Jensen Huang in the ServiceNow Knowledge 2025 keynote.
- Co-founded the cross-org ServiceNow Language Model (SLAM) initiative, aligning 20+ engineers and researchers on a shared vision for efficient multimodal agentic LLMs that became the Apriel model family.
- Pivoted the Foundation Model Lab from fundamental research to platform-first objectives, shipping 3 model families in 12 months (Mixtral-8x7B multilingual upgrade, Apriel-5B, Apriel-15B) delivering serving-cost and latency wins for the ServiceNow platform.
- Rolled out an "upgrade-don't-retrain" strategy, retrofitting new capabilities (SSMs, multi-token prediction, masked diffusion, pruning, depth/MoE upcycling) via targeted distillation and continual pre-training, adding features at less than 5 % of full token budget and shrinking experiment cycles from weeks to days.
- Set research direction that cut inference latency 4x (2x multi-token prediction + 2x SSMs), slashing serving cost and response time.
- Established critical-path discipline and tight platform alignment, minimizing low-impact work and ensuring on-time delivery, even during a five-week parental leave.
- Coached engineers and researchers into technical leads and fostered a culture of accountability.
- Drove fast resourcing decisions through concise executive updates and clear priority setting.
Applied Research Scientist → Staff Research Scientist at ServiceNow Research
Montréal, QC, Canada |
2021-01 - 2023-12
- Pioneered small language models (SLMs) for agentic tasks on the ServiceNow platform.
- Core contributor to the TapeAgents LLM agent development framework.
Applied Research Scientist - Research & AI Core at Element AI (acquired by ServiceNow)
Montréal, QC, Canada |
2017-10 - 2020-12
- Tech-led NLP group: set roadmap, ran stand-ups, mentored interns, and liaised with product on dialog systems, text-to-code, summarization, QA.
- Invented grammar-constrained decoding speed-ups (cited 440+ times).
- Built SOTA text-to-SQL model (PICARD) that lead the Spider leaderboard for months.
Data Science Engineer at Unata (acquired by Instacart)
Toronto, ON, Canada |
2016-08 - 2017-09
- Re-engineered recommender stack in Scala/Spark.
- Delivered 3.5h Bayesian ML tutorial at PyCon 2017 (12k+ views, recording at https://www.youtube.com/watch?v=fR5Wvb86-IU).
Postdoctoral Researcher at University of Toronto
Toronto, ON, Canada |
2011-06 - 2016-03
- Pioneered quantum coherent-control interferometry.
- Ran HPC workloads on SciNet (5 k+ CPU cores, NVIDIA Teslas) to simulate quantum systems.
Education
Ph.D. in Theoretical & Mathematical Physics
University of Freiburg, Freiburg, Germany
2008-12 - 2011-12
M.S. in Theoretical & Mathematical Physics
University of Bayreuth, Bayreuth, Germany
2002-12 - 2008-12
- GPA: 1.2 (German system, 1.0 is best)
Publications
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
***T. Scholak***N. SchucherD. Bahdanau
EMNLP 2021 |
2021-11
| DOI: 10.18653/v1/2021.emnlp-main.779
Incremental parsing keeps generated SQL valid; >400 citations, SOTA on Spider/CoSQL at release.
Multilingual Code Retrieval Without Paired Data: A New Benchmark and Experiments
J. Monteiro***T. Scholak***V. MehtaD. VazquezC. Pal
ICLR 2023 DL4C workshop |
2023-03
Introduced two cross-lingual code-text datasets; contrastive training beats GPT-4 baselines on 6 languages.
StarCoder 2 and The Stack v2: The Next Generation
BigCode Team
2024-02
| DOI: 10.48550/arXiv.2402.19173
StarCoder 2 is a 15B parameter model trained on 1.8 trillion tokens of code, achieving SOTA on code generation benchmarks at the time of release.
TapeAgents: a Holistic Framework for Agent Development and Optimization
D. BahdanauN. GontierG. Huang,E. KamallooR. PardinasA. Piché***T. Scholak***O. ShliazhkoJ. P. TremblayK. GhanemS. ParikhM. TiwariQ. Vohra
2024-12
| DOI: 10.48550/arXiv.2412.08445
Introduces a framework for LLM agent development, including a library of agentic tasks, evaluation metrics, and optimization techniques.
Unifying Autoregressive and Diffusion-Based Sequence Generation
N. Fathi***T. Scholak***P.-A. Noel
2025-04
| DOI: 10.48550/arXiv.2504.06416
Shows diffusion can share weights with an AR LM; 2x speed-up at equal perplexity.
Projects
Apriel Model Family - Co-lead & Technical Architect
5-15 B parameter open-weights LLMs optimized for enterprise agentic tasks and efficient inference.
- Co-led technical design and release of Apriel-5B and Apriel-Nemotron-15B-Thinker.
- Apriel-5B-Instruct outperforms OLMo-2-7B and Mistral-Nemo-12B across average benchmarks; competitive with LLaMA 3.1 8B, with strong results in math and reasoning (AIME-24/25, GPQA, MATH-500).
- Apriel-Nemotron-15B-Thinker achieves state-of-the-art on BFCL, Enterprise RAG, MT-Bench, MixEval, IFEval, Multi-Challenge, MBPP while using 40% fewer tokens than 30B+ models like QWQ-32B.
- Publicly recognized by NVIDIA CEO Jensen Huang during model announcement in ServiceNow Knowledge 2025 keynote.
Fast-LLM - Strategic Lead
Opinionated, high-performance PyTorch-based distributed model-training framework for trillion-token LLM pre-training.
- Provided strategic direction and priority setting; day-to-day maintenance handled by core contributors.
- Supports dense, MoE, and hybrid SSM architectures; integrates 3D parallelism, ZeRO, and FlashAttention.
- Adopted by ServiceNow Foundation Model Lab and platform teams for production use.
- Used to train billion-parameter models on trillions of tokens on NVIDIA DGX SuperPOD clusters with 500+ GPUs.
- Trained Apriel-5B and Apriel-Nemotron-15B-Thinker.
Deep Learning for Code (DL4C) - Co-organizer
Annual workshop on deep learning for code, co-located with ICLR.
- Co-organized 2022 and 2023 workshops, including program committee, event organization, and panel hosting.
- 2022: 30+ submissions, 15 accepted papers, 100+ attendees.
- 2023: 40+ submissions, 19 accepted papers, 150+ attendees.
Hasktorch - Core Contributor
Haskell bindings for PyTorch, enabling GPU-accelerated machine learning in Haskell.
- Added compile-time shape/type checking and transformer examples
- Live-coded demo at FP Berlin (8.9k+ views, recording at https://www.youtube.com/watch?v=ZnYa99QoznE&t=1689).