A Principled Stance Against Performative AI Consumption
Author: ParisNeo
Pagination Engine: LoLLMs (Lord of Large Language and Multimodal Systems)
Version: 2.0.0
Date: May 06, 2026“The question is not whether machines think, but whether men do.”
— B.F. Skinner, adapted
I. WE DECLARE A CRISIS OF WASTE
We have witnessed the emergence of token maxing: a culture where engineers brag about burning billions of AI tokens, where companies exhaust annual AI budgets in months, where NVIDIA CEO Jensen Huang declares that a “$500,000 engineer” should spend $250,000 in tokens annually or he will “go ape” and be “deeply alarmed” [1][2].
The data exposes the lie. Across 22,000 developers studied, as token usage rose:
| Metric | Impact |
|---|---|
| Bugs | Increased 54% |
| Code review time | Multiplied by 5× |
| Quality | Significantly degraded |
More consumption is not more productivity. It is not better software. It is the opposite.
We reject the metric of waste as a proxy for worth.
“Burning tokens is not engineering. Engineering is compression, clarity, and constraint.”
— ParisNeo
II. WE NAME THE ENEMY
The Performative vs. The Genuine
| The Performative | The Genuine |
|---|---|
| Tokens burned as flex | Tokens spent with purpose |
| Vibe-coding production systems | Vibe-coding only prototypes & exploration |
| “Claude Onomics” leaderboards | Private efficiency logs |
| Cloud inference for everything | Local inference when feasible |
| Prompt sprawl without review | Curated prompt libraries |
| Accepting whatever the model outputs | Critical evaluation and refinement |
The Enemy Is Not AI
The enemy is mindless consumption dressed as competence.
The “Claudeonomics” leaderboard at Meta—ranking 85,000 employees by monthly token consumption—was reportedly shut down after employees began maximizing tokens on research tasks to climb the board [3][4]. The top 250 received titles and recognition for waste. 60 trillion tokens. $100M+ in waste. [5]
Uber reportedly burned its entire 2026 AI coding budget in just 4 months [6].
This is not progress. This is ** performative consumption** masquerading as innovation.
III. WE PROPOSE THE PRINCIPLES OF EFFICIENT COMPUTING
1. Local First, Cloud When Necessary
Run inference locally when hardware permits. The Mac Mini M4 (2026) runs substantial LLMs with unified memory architecture; consumer GPUs handle fine-tuned models for most tasks. Privacy, latency, and cost all improve [7].
ParisNeo’s Hardware Tiers:
| Tier | Hardware | Use Case | Approx. Cost |
|---|---|---|---|
| Entry | Mac Mini M4 (16GB) | 7B-13B models, coding assistance | ~$600 |
| Pro | Mac Studio M2 Ultra / M3 Max | 30B-70B models, local agents | ~$2,000-4,000 |
| Power | Custom GPU rig (RTX 4090/5090) | Fine-tuning, batch inference | ~$3,000-6,000 |
“The cloud is a tool, not a crutch. If you can run it locally, you should.”
— ParisNeo
2. Right-Size the Model
A 7B parameter model with good prompting often outperforms a 70B model with lazy prompting. Start small. Escalate only when justified.
Evidence: MIT’s Daron Acemoglu and Goldman Sachs research both suggest AI’s economic impact has been “basically zero” or “nontrivial but modest” despite massive compute investment [8][9]. The correlation between model size and useful output is non-linear and often negative.
3. Prompt Is Code
Treat prompts as first-class engineering artifacts: versioned, reviewed, tested, and optimized. A 50-token refined prompt can replace a 500-token brute-force prompt.
ParisNeo’s Prompt Engineering Standards:
# ParisNeo's Prompt Artifact Schema
class PromptArtifact:
def __init__(self):
self.version: str # Semantic versioning
self.baseline_test: str # Regression test suite
self.token_budget: int # Maximum allowed tokens
self.reviewers: list # Human reviewers required
self.efficiency_score: float # Useful output / tokens spent
def validate(self):
assert self.token_budget <= 1000, "Prompt exceeds efficiency budget"
assert self.efficiency_score >= 0.8, "Prompt too wasteful"
return True
Industry Reality (2026): Tools like Braintrust, Maxim AI, and LangChain now provide prompt versioning, A/B testing, and regression suites—treating prompts with the same rigor as software dependencies [10][11].
4. Measure What Matters
Track:
- Bugs per feature shipped
- Time to code review completion
- Actual user outcomes
- Energy consumption per useful output
- Tokens per shipped feature (not tokens per engineer)
Never track raw token consumption as a success metric.
“What gets measured gets managed. If you measure waste, you optimize for waste.”
— ParisNeo
5. Understand Before You Generate
The model should accelerate your thinking, not replace it. If you cannot explain what the generated code does, you have not engineered—you have delegated your cognition.
The ParisNeo Comprehension Test:
def comprehension_check(generated_code, engineer_explanation):
"""
Before shipping any AI-generated code:
1. Can you explain the algorithm in plain English?
2. Can you identify edge cases the model missed?
3. Can you trace execution for 3 different inputs?
4. Can you explain why specific libraries/approaches were chosen?
"""
checks = [
"plain_english_explanation",
"edge_case_identification",
"execution_trace",
"design_rationale"
]
return all(check in engineer_explanation for check in checks)
6. Prototype Freely, Production Carefully
“Vibe coding”—as criticized by Gary Marcus and others—is a valid exploration technique. It becomes dangerous when it crosses into production systems without review, testing, or architectural fit [12][13].
The Boundary:
| Phase | Approach | Gate |
|---|---|---|
| Exploration | Vibe-coding encouraged | None |
| Prototype | Vibe-coding with documentation | Peer review |
| Production | Full engineering rigor | Architecture review, tests, monitoring |
IV. WE CALL FOR ACTION
For the Individual Engineer
- [ ] Audit your last month’s AI tool usage. What percentage produced shipped, reviewed code?
- [ ] Try running your next side-project entirely on local inference
- [ ] Build a personal “prompt toolkit” of reusable, tested prompts
- [ ] Refuse to participate in token leaderboards or consumption bragging
For Teams & Organizations
- [ ] Ban token-burning leaderboards; institute efficiency leaderboards instead
- [ ] Require human review for all AI-generated production code
- [ ] Set AI budget caps per project, not org-wide slush funds
- [ ] Measure and publish: tokens per feature, not tokens per engineer
For the Industry
- [ ] Demand transparent efficiency benchmarks from AI vendors (useful output per watt, per dollar, per token)
- [ ] Support open-weight models and local inference tooling
- [ ] Fund research into efficient architectures:
- Speculative decoding (2-3× speedup with no quality loss) [14]
- Quantization (INT4/INT8 with minimal accuracy degradation)
- Mixture-of-Experts (MoE) routing (process only relevant parameters)
- Nomad-attention and sparse attention mechanisms
- [ ] Reject the narrative that “more compute = more intelligence”
V. THE ENVIRONMENTAL IMPERATIVE
The energy cost of token maxing is not abstract.
| Statistic | Source | Year |
|---|---|---|
| Data centers consumed 4.4% of U.S. electricity | MIT/IEA analysis | 2023 |
| Projected to reach 6-8% by 2026 | UNEP forecasts | 2025 |
| AI data centers could account for 35% of national energy in some countries | UN Environment Programme | 2025 [15] |
Every wasted token is carbon emitted for vanity. Efficient computing is not just engineering discipline—it is environmental responsibility.
“The most sustainable code is the code you don’t run. The most efficient model is the one you don’t need to call.”
— ParisNeo
VI. WE RECLAIM THE DIGNITY OF CRAFT
Software engineering was never about consumption. It was about:
| Principle | Definition | Token Maxing Violation |
|---|---|---|
| Compression | Expressing complex ideas in minimal, correct code | Expansion, bloat, “prompt sprawl” |
| Clarity | Writing what can be understood, maintained, and trusted | Obscurity, “magic” AI outputs no one owns |
| Constraint | Doing more with less, because resources are finite and attention precious | Gluttony, “use the biggest model for everything” |
Token maxing is the antithesis of craft. It replaces compression with expansion, clarity with obscurity, constraint with gluttony.
We are not Luddites. We use AI tools daily. But we use them as tools, not as substitutes for judgment. We measure our success by the quality of what we ship, not the magnitude of what we burn.
VII. THE PARISNEO TOUCH: LoLLMs INTEGRATION
7.1 Pagination with LoLLMs
This manifesto is paginated and versioned using the LoLLMs artifact engine:
<processing type="artefact_building" title="anti_token_maxing_manifesto.md" art_type="document">
* Creating new artefact 'anti_token_maxing_manifesto.md'
* Artefact saved as version 2</processing>
7.2 The Four Commandments of Efficient Computing
- Thou shalt not waste context — Every token must earn its place.
- Thou shalt not hide from the user — Transparency is non-negotiable.
- Thou shalt not trust blindly — Validate, review, and understand all outputs.
- Thou shalt measure outcomes, not inputs — Features shipped, not tokens burned.
7.3 The Efficiency Score
# ParisNeo's Efficiency Score Calculator
class EfficiencyScore:
"""
The only metric that matters.
"""
def calculate(self, features_shipped, bugs_introduced,
review_time_hours, tokens_consumed, energy_kwh):
useful_output = features_shipped - (bugs_introduced * 0.5)
human_time = review_time_hours # Lower is better
resource_cost = tokens_consumed + (energy_kwh * 1000) # Normalize
return useful_output / (human_time * resource_cost)
def grade(self, score):
if score > 0.1: return "A - Exemplary Efficiency"
if score > 0.05: return "B - Good"
if score > 0.01: return "C - Needs Improvement"
return "F - Token Maxing Detected"
VIII. JOIN THE RESISTANCE
This manifesto is a living document. Fork it. Adapt it. Build tooling around it. Share your efficiency wins. Name the waste when you see it.
Community Resources
| Resource | Purpose | Link |
|---|---|---|
| LocalLLaMA | Local inference community | reddit.com/r/LocalLLaMA |
| Efficient Computing Alliance | Industry standards (proposed) | #EfficientComputing |
| Prompt Engineering Tools | Versioning & testing | Braintrust, Maxim AI, LangChain |
APPENDIX: SOURCES & FURTHER READING
Cited Sources
| Ref | Source | Date |
|---|---|---|
| [1] | Jensen Huang statement on $250K token spending | March 2026 |
| [2] | Business Insider: “Jensen Huang Says $500K Engineers Should Use at Least $250K” | March 19, 2026 |
| [3] | Fortune: “Meta killed employee AI token dashboard ‘Claudeonomics'” | April 9, 2026 |
| [4] | MLQ.ai: “Meta Makes Internal Leaderboard for Employee AI Token Usage” | 2026 |
| [5] | Reddit/Futurology: “Inside Meta’s AI Token Leaderboard: 60 Trillion Tokens, $100M+ Waste” | April 25, 2026 |
| [6] | Reddit/artificial: “Uber burned its entire 2026 AI coding budget in 4 months” | May 2026 |
| [7] | Julien Simon: “What to Buy for Local LLMs (April 2026)” | April 3, 2026 |
| [8] | Goldman Sachs: “Gen AI: Too Much Spend, Too Little Benefit?” | June 2024 |
| [9] | MIT Sloan: “A new look at the economics of AI” (Daron Acemoglu) | January 2025 |
| [10] | Braintrust: “The 5 best prompt versioning tools in 2025” | October 2025 |
| [11] | Maxim AI: “Top 5 Prompt Engineering Tools in 2026” | December 2025 |
| [12] | Gary Marcus: “Is vibe coding dying?” | October 2025 |
| [13] | Anthropic study: “AI assisted coding doesn’t show efficiency gains” | January 2026 |
| [14] | NVIDIA Nemotron 3 Super: Speculative decoding research | April 2026 |
| [15] | UNEP: “AI has an environmental problem” | November 2025 |
Further Reading
- MIT News: “Explained: Generative AI’s environmental impact” (January 2025)
- IEEE: “Why AI uses so much energy — and what we can do about it” (April 2025)
- arXiv: “Software Engineering for Prompt-Enabled Systems” (January 2026)
- Vikas Chandra: “On-Device LLMs: State of the Union, 2026” (January 2026)
“The future belongs not to those who burn the most tokens, but to those who ship the best software with the least waste.”
— ParisNeo, Creator of LoLLMsThis manifesto is paginated and maintained by the LoLLMs engine.
Version 2.0.0 — May 06, 2026#EfficientComputing #AntiTokenMaxing #LocalFirst #PromptIsCode #ParisNeo