A Ship of Theseus
Abstract
Exploration of authorship in paraphrasing, likened to the Ship of Theseus paradox.
Investigates whether text retains original authorship post-paraphrasing.
Large Language Models (LLMs) can generate and modify text, raising authorship attribution questions.
Study reveals performance decline in text classification with each paraphrasing iteration correlating to style deviation from the original.
Introduction
Ship of Theseus Paradox
Philosophical thought experiment questioning identity and change over time.
Focuses on whether a modified ship (or text) retains its identity after all original components are replaced.
Text Paraphrasing
Rewriting to convey the same meaning using different wording, debated for ethical and originality issues.
LLMs can independently generate original content and paraphrase, altering traditional views on authorship.
Background
Authorship Attribution Challenges
Two contrasting views: paraphrasing maintaining original authorship versus changing authorship.
Investigation through two scenarios:
Scenario 1: Paraphrasing for obfuscation (maintains authorship).
Scenario 2: Paraphrasing for generation (changes authorship).
Methodology Overview
Data sourced from human authors and various LLMs.
Utilized multiple paraphrasers (e.g., ChatGPT, PaLM2, Dipper, Pegasus).
Assessment of stylistic and content variances post-paraphrasing.
Methodological Details
Dataset Development
Multiple datasets were selected for evaluation, ensuring balanced author representation.
Each text underwent three sequential paraphrasing iterations.
Classification Tasks
Authorship attribution (multi-class) and AI text detection (binary).
Experimentation involved evaluating how ground truth impacts classifier performance and results.
Experimental Results
Performance Assessment
Notable performance drop in classifiers after the first paraphrase, with less impact in subsequent iterations.
Style deviation found to be more significant than content shift, influencing authorship classification.
AI Text Detection
Effectiveness of AI text detectors varies based on the scenario (traditional vs. alternative perspectives).
Discussion
Findings
Results indicate a larger impact on style due to paraphrasing, leading to misclassification in authorship tasks.
Philosophical implications suggest authorship could change based on the perspective of the paraphrasing process.
Implications for Authorship
Analysis leans towards the notion that paraphrasing significantly alters original authorship, emphasizing context.
Conclusions
The study provides an extensive look into authorship dynamics in the context of LLM influence.
Advocates for context-dependency in authorship attribution for paraphrased texts and lays ground for future research on plagiarism and copyright disputes involving LLMs.
Limitations
Current study confined to English texts; further investigation required in other languages.
Variability in LLM style based on instructions not evaluated.
Ethical Considerations
Research conducted following ethical guidelines, balancing exploration of authorship issues with societal impact.
Acknowledgements
Supported by various NSF awards and projects associated with the European Union.