Skip to content

Commit 6af6fc2

Browse files
bzzmallamanis
authored andcommitted
Add JetBrains completion ranking
1 parent 2c433a9 commit 6af6fc2

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
layout: publication
3+
title: "All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs"
4+
authors: Vitaliy Bibaev, Alexey Kalina, Vadim Lomshakov, Yaroslav Golubev, Alexander Bezzubov, Nikita Povarov, Timofey Bryksin
5+
conference: ESEC/FSE
6+
year: 2022
7+
additional_links:
8+
- {name: "ArXiV", url: "https://arxiv.org/abs/2205.10692"}
9+
tags: ["autocomplete"]
10+
---
11+
We propose an approach for collecting completion usage logs from the users in an IDE and using them to train a machine learning based model for ranking completion candidates.
12+
We developed a set of features that describe completion candidates and their context, and deployed their anonymized collection in the Early Access Program of IntelliJ-based IDEs.
13+
We used the logs to collect a dataset of code completions from users, and employed it to train a ranking CatBoost model.
14+
Then, we evaluated it in two settings: on a held-out set of the collected completions and in a separate A/B test on two different groups of users in the IDE.
15+
Our evaluation shows that using a simple ranking model trained on the past user behavior logs significantly improved code completion experience.
16+
Compared to the default heuristics-based ranking, our model demonstrated a decrease in the number of typing actions necessary to perform the completion in the IDE from 2.073 to 1.832.
17+
The approach adheres to privacy requirements and legal constraints, since it does not require collecting personal information, performing all the necessary anonymization on the client's side.
18+
Importantly, it can be improved continuously: implementing new features, collecting new data, and evaluating new models - this way, we have been using it in production since the end of 2020.

0 commit comments

Comments
 (0)