Text Paraphrase and Rewriting Implementation
Paraphrase and rewriting—generate alternative formulation preserving meaning. Applications: A/B testing marketing texts, improving readability, bypassing AI-content detectors, data augmentation for NLP model training.
Paraphrase Approaches
LLM APIs (GPT-4o, Claude): best quality, flexible style control via prompt. Example: "Rewrite text in more formal style, preserving all key facts". Latency: 1–5 seconds, cost $0.001–0.01/request.
Specialized models: tuner007/pegasus-paraphrase (English), cointegrated/rut5-small-parus (Russian T5). Work fast (200–500ms CPU), but poor style control.
Back-translation: translate to intermediate language and back. ru → en → ru via MarianMT or Google Translate. Cheap and fast, but unpredictable quality—model may change sentence structure.
Controlling Degree of Change
Critical for different tasks: data augmentation needs high diversity, SEO rewriting—minimal changes preserving keywords.
In prompt: "rewrite preserving 70% original words" or "rewrite completely differently".
For automatic degree assessment: BERTScore (semantics) + BLEU (lexics). Good paraphrase: high BERTScore (>0.85) + low BLEU (<0.4).
Augmentation for Training
Creating additional training examples: 5–10 paraphrases per example expand dataset without manual annotation. Verify label preservation—paraphrase "Don't recommend" for negative review must stay negative.







