About

By Vladislav Kruglikov

I'm Vladislav Kruglikov, an LLM inference and infrastructure engineer at T-Bank. You can find me on LinkedIn and GitHub, or reach me via email. Below is a chronological overview of my journey in reverse order:

In the spring of 2026 I joined T-Bank's core LLM team to drive training-aware optimization and infrastructure.

In the summer of 2025 I received an offer from the Yandex GPT inference team to work on KV cache pruning methods, where I developed a simple yet effective method for KV cache sparsification. My diploma thesis is on KV cache sparsification methods.

In the spring of 2025 I went deeper and initiated a speculative decoding training effort. There were no frameworks for training speculative decoding models at the time, so I wrote one with the team I was leading. We trained an EAGLE model, open-sourced it on HuggingFace, and wrote a paper that was accepted at EACL 2026. We also presented this work at TurboML Conf 2025, held at the Lomonosov cluster near MSU.

In the summer of 2024, as large language models gained traction, I joined the LLM inference team. Most of the work was finding optimal hyperparameters and engineering to serve specific models, ensuring quality after optimizations, and running performance benchmarks — which became my third-year course work.

In the fall of 2023 I started digging into efficient inference. I noticed our team struggled with many small-to-medium models that couldn't saturate a single GPU, so I built a P-tuning training pipeline and a custom inference runtime on Triton that batched multiple P-tuning adapters from different requests into a single batch. I also built the API and a Streamlit frontend that let users trigger training with a button. I showed that on medium-sized datasets P-tuning performance is comparable to full SFT at a fraction of the cost. I wrote my second-year course work on this system.

In the spring of 2023 I had the chance to deliver a lecture and seminar on large language models in the Advanced Deep Learning course, part of the Tinkoff Generation program for high-performing students.

In the fall of 2022, after finishing my first year at Financial University successfully, I decided I wanted more computer science and less economics. I transferred to the first year at HSE University for Applied Mathematics and Informatics. That same fall I attended the VTB MoreTech 4.0 hackathon, where my team built a personalized news recommendation system based on team roles — accountants got one feed, CEOs another, with news grouped, deduplicated, and key information extracted. We won 1st place.

In the winter of 2022 I landed an internship at Tinkoff (now T-Bank) as an NLP engineer. My first task was aspect-based sentiment triplet extraction, which evolved into a highlight extraction system — using NER to pull specific information from text given a condition, like extracting positive notes about food from a restaurant review. I showed strong results and was promoted to junior developer, continuing with ad-hoc NLP tasks like clustering and classification. I trained BERT-like models for various classification tasks and contributed to the team's ML pipeline. I then moved into a hybrid backend/ML role, building a clustering platform and Airflow DAGs for offline batch processing.

In the fall of 2021 I enrolled at Financial University for Applied Mathematics and Informatics.

In the summer of 2021, after 11th grade, I got into machine learning — creating teaching notebooks on Kaggle and competing in ML competitions, earning 352 upvotes and 8 medals.

In the summer of 2020, after 10th grade, I picked up Node.js, React, and Sass to build a social network as a school project.

In the summer of 2019, after 9th grade, I spent writing C++ internal cheats for CS:GO using Qt as a pet project.

In the summer of 2018, at the end of 8th grade, I started programming by learning HTML, CSS, and PHP to build a CS:GO market parser — a spatial arbitrage bot that scanned in-game asset markets, buying cheap and selling high across exchanges.