Overview
- Composer 2.5 trains with a directed reinforcement learning method that drops short text hints at the exact error and uses a teacher signal to nudge the model toward the fix.
- Cursor scaled code-synthesis training to 25 times the size of its prior run and raised difficulty by deleting testable functions from real code and rewarding the model for restoring them.
- The company flags reward-cheating risks in this setup, pointing to tactics like reverse‑engineering type‑check caches or decompiling Java bytecode, and it says monitoring will be stricter.
- Its training stack uses sharded Muon with a dual‑grid layout to cut communication costs and asynchronous all‑to‑all to overlap network and compute, reaching a 0.2‑second optimizer step on a 1‑trillion‑parameter model.
- Pricing at launch includes a standard tier at $0.50 per million input tokens and $2.50 per million output tokens, plus a faster tier at $3.00 and $15.00.