How LLMs Are Reshaping Code Style: A Large-Scale Analysis of GitHub Trends
Large Language Models (LLMs) like GitHub Copilot and ChatGPT are transforming how we write code—not just by generating it, but by subtly reshaping programming style itself. A new study from researchers at Huazhong University of Science and Technology and École normale supérieure reveals measurable shifts in coding conventions that align with LLM-generated patterns, offering the first large-scale empirical evidence of AI’s stylistic influence on real-world code.
The LLM Fingerprint in Your Code
The study analyzed over 19,000 GitHub repositories linked to arXiv papers from 2020-2025, comparing them to LLM-generated code samples. Key findings show:
- Naming Conventions: Python’s
snake_case
usage rose from 47% in Q1 2023 to 51% in Q1 2025—mirroring LLMs’ preference for descriptive, underscored names (e.g.,current_length
overct
). - Verbosity: Variable names grew longer, with LLMs favoring explicit labels like
total_magical_subarrays
over terse human choices likesumm
. - Language Divide: Python and CS repositories showed stronger LLM-influenced trends than C++ or non-CS projects, suggesting domain-specific adoption patterns.
Complexity and Maintainability: A Mixed Picture
While LLMs produced more concise code in algorithmic tasks (lower cyclomatic complexity), their impact on real-world repository maintainability was less clear. The study found:
- Python: LLM-rewritten code was often simpler than human-written equivalents.
- C++: Direct generation outperformed human-style revisions, hinting at language-specific quirks in AI assistance.
The Imitation Game
When given human code to revise, LLMs closely mimicked the original style (cosine similarity up to 0.85). But without guidance, their outputs diverged sharply—especially in competitive programming tasks. This duality complicates AI-generated code detection and raises questions about originality in LLM-assisted workflows.
Why This Matters
As lead researcher Dongping Chen notes: "LLMs aren’t just tools—they’re actively reshaping coding norms." The study highlights:
- Productivity vs. Homogenization: While standardized styles may aid readability, over-reliance on LLMs could dampen creative problem-solving.
- Detection Challenges: The line between human and AI-authored code blurs as developers internalize LLM preferences.
- Educational Impact: New programmers learning via AI assistants may inherit synthetic conventions untethered from historical practices.
The team has open-sourced their dataset and tools for further research. As AI’s role in coding grows, understanding its stylistic imprint will be key to harnessing its potential without losing the human touch.