commit 677102f (2025-08-25 11:06:04 -0400) Torsten Scholak: add json schema validatin

Resume

Below is my resume. A PDF version is available upon request.

Torsten Scholak

Montréal, QC, Canada

https://tscholak.github.io

Summary

AI research and engineering leader turning GPU compute into shipped LLMs. Head of ServiceNow's Foundation Model Lab and co-founder of the ServiceNow Language Model (SLAM) initiative, delivering open-weight models like Apriel-5B and Apriel-Nemotron-15B for efficient enterprise use. I excel at leading small, high-agency teams, setting clear strategy, and executing at scale.

Experience

Lead Scientist & Team Lead at ServiceNow Research

Montréal, QC, Canada | 2024-01-01

Shipped the open-weights Apriel models, achieving state-of-the-art results on enterprise benchmarks while optimizing for serving cost and latency, recognized by NVIDIA CEO Jensen Huang in the ServiceNow Knowledge 2025 keynote.
Co-founded the cross-org ServiceNow Language Model (SLAM) initiative, aligning 20+ engineers and researchers on a shared vision for efficient multimodal agentic LLMs that became the Apriel model family.
Pivoted the Foundation Model Lab from fundamental research to platform-first objectives, shipping 3 model families in 12 months (Mixtral-8x7B multilingual upgrade, Apriel-5B, Apriel-15B) delivering serving-cost and latency wins for the ServiceNow platform.
Rolled out an "upgrade-don't-retrain" strategy, retrofitting new capabilities (SSMs, multi-token prediction, masked diffusion, pruning, depth/MoE upcycling) via targeted distillation and continual pre-training, adding features at less than 5 % of full token budget and shrinking experiment cycles from weeks to days.
Set research direction that cut inference latency 4x (2x multi-token prediction + 2x SSMs), slashing serving cost and response time.
Established critical-path discipline and tight platform alignment, minimizing low-impact work and ensuring on-time delivery, even during a five-week parental leave.
Coached engineers and researchers into technical leads and fostered a culture of accountability.
Drove fast resourcing decisions through concise executive updates and clear priority setting.

Applied Research Scientist → Staff Research Scientist at ServiceNow Research

Montréal, QC, Canada | 2021-01 - 2023-12

Pioneered small language models (SLMs) for agentic tasks on the ServiceNow platform.
Core contributor to the TapeAgents LLM agent development framework.

Applied Research Scientist - Research & AI Core at Element AI (acquired by ServiceNow)

Montréal, QC, Canada | 2017-10 - 2020-12

Tech-led NLP group: set roadmap, ran stand-ups, mentored interns, and liaised with product on dialog systems, text-to-code, summarization, QA.
Invented grammar-constrained decoding speed-ups (cited 440+ times).
Built SOTA text-to-SQL model (PICARD) that lead the Spider leaderboard for months.

Data Science Engineer at Unata (acquired by Instacart)

Toronto, ON, Canada | 2016-08 - 2017-09

Re-engineered recommender stack in Scala/Spark.
Delivered 3.5h Bayesian ML tutorial at PyCon 2017 (12k+ views, recording at https://www.youtube.com/watch?v=fR5Wvb86-IU).

Postdoctoral Researcher at University of Toronto

Toronto, ON, Canada | 2011-06 - 2016-03

Pioneered quantum coherent-control interferometry.
Ran HPC workloads on SciNet (5 k+ CPU cores, NVIDIA Teslas) to simulate quantum systems.

Education

Ph.D. in Theoretical & Mathematical Physics

University of Freiburg, Freiburg, Germany

2008-12 - 2011-12

magna cum laude

M.S. in Theoretical & Mathematical Physics

University of Bayreuth, Bayreuth, Germany

2002-12 - 2008-12

GPA: 1.2 (German system, 1.0 is best)

Publications

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

***T. Scholak***N. SchucherD. Bahdanau

EMNLP 2021 | 2021-11 | DOI: 10.18653/v1/2021.emnlp-main.779

Incremental parsing keeps generated SQL valid; >400 citations, SOTA on Spider/CoSQL at release.

Multilingual Code Retrieval Without Paired Data: A New Benchmark and Experiments

J. Monteiro***T. Scholak***V. MehtaD. VazquezC. Pal

ICLR 2023 DL4C workshop | 2023-03

Introduced two cross-lingual code-text datasets; contrastive training beats GPT-4 baselines on 6 languages.

StarCoder 2 and The Stack v2: The Next Generation

BigCode Team

2024-02 | DOI: 10.48550/arXiv.2402.19173

StarCoder 2 is a 15B parameter model trained on 1.8 trillion tokens of code, achieving SOTA on code generation benchmarks at the time of release.

TapeAgents: a Holistic Framework for Agent Development and Optimization

D. BahdanauN. GontierG. Huang,E. KamallooR. PardinasA. Piché***T. Scholak***O. ShliazhkoJ. P. TremblayK. GhanemS. ParikhM. TiwariQ. Vohra

2024-12 | DOI: 10.48550/arXiv.2412.08445

Introduces a framework for LLM agent development, including a library of agentic tasks, evaluation metrics, and optimization techniques.

Unifying Autoregressive and Diffusion-Based Sequence Generation

N. Fathi***T. Scholak***P.-A. Noel

2025-04 | DOI: 10.48550/arXiv.2504.06416

Shows diffusion can share weights with an AR LM; 2x speed-up at equal perplexity.

Projects

Apriel Model Family - Co-lead & Technical Architect

5-15 B parameter open-weights LLMs optimized for enterprise agentic tasks and efficient inference.

Co-led technical design and release of Apriel-5B and Apriel-Nemotron-15B-Thinker.
Apriel-5B-Instruct outperforms OLMo-2-7B and Mistral-Nemo-12B across average benchmarks; competitive with LLaMA 3.1 8B, with strong results in math and reasoning (AIME-24/25, GPQA, MATH-500).
Apriel-Nemotron-15B-Thinker achieves state-of-the-art on BFCL, Enterprise RAG, MT-Bench, MixEval, IFEval, Multi-Challenge, MBPP while using 40% fewer tokens than 30B+ models like QWQ-32B.
Publicly recognized by NVIDIA CEO Jensen Huang during model announcement in ServiceNow Knowledge 2025 keynote.

Fast-LLM - Strategic Lead

Opinionated, high-performance PyTorch-based distributed model-training framework for trillion-token LLM pre-training.

Provided strategic direction and priority setting; day-to-day maintenance handled by core contributors.
Supports dense, MoE, and hybrid SSM architectures; integrates 3D parallelism, ZeRO, and FlashAttention.
Adopted by ServiceNow Foundation Model Lab and platform teams for production use.
Used to train billion-parameter models on trillions of tokens on NVIDIA DGX SuperPOD clusters with 500+ GPUs.
Trained Apriel-5B and Apriel-Nemotron-15B-Thinker.

Deep Learning for Code (DL4C) - Co-organizer

Annual workshop on deep learning for code, co-located with ICLR.

Co-organized 2022 and 2023 workshops, including program committee, event organization, and panel hosting.
2022: 30+ submissions, 15 accepted papers, 100+ attendees.
2023: 40+ submissions, 19 accepted papers, 150+ attendees.

Hasktorch - Core Contributor

Haskell bindings for PyTorch, enabling GPU-accelerated machine learning in Haskell.

Added compile-time shape/type checking and transformer examples
Live-coded demo at FP Berlin (8.9k+ views, recording at https://www.youtube.com/watch?v=ZnYa99QoznE&t=1689).