A Ship of Theseus

Abstract

Exploration of authorship in paraphrasing, likened to the Ship of Theseus paradox.
Investigates whether text retains original authorship post-paraphrasing.
Large Language Models (LLMs) can generate and modify text, raising authorship attribution questions.
Study reveals performance decline in text classification with each paraphrasing iteration correlating to style deviation from the original.

Introduction

Ship of Theseus Paradox

Philosophical thought experiment questioning identity and change over time.
Focuses on whether a modified ship (or text) retains its identity after all original components are replaced.

Text Paraphrasing

Rewriting to convey the same meaning using different wording, debated for ethical and originality issues.
LLMs can independently generate original content and paraphrase, altering traditional views on authorship.

Background

Authorship Attribution Challenges

Two contrasting views: paraphrasing maintaining original authorship versus changing authorship.
Investigation through two scenarios:
- Scenario 1: Paraphrasing for obfuscation (maintains authorship).
- Scenario 2: Paraphrasing for generation (changes authorship).

Methodology Overview

Data sourced from human authors and various LLMs.
Utilized multiple paraphrasers (e.g., ChatGPT, PaLM2, Dipper, Pegasus).
Assessment of stylistic and content variances post-paraphrasing.

Methodological Details

Dataset Development

Multiple datasets were selected for evaluation, ensuring balanced author representation.
Each text underwent three sequential paraphrasing iterations.

Classification Tasks

Authorship attribution (multi-class) and AI text detection (binary).
Experimentation involved evaluating how ground truth impacts classifier performance and results.

Experimental Results

Performance Assessment

Notable performance drop in classifiers after the first paraphrase, with less impact in subsequent iterations.
Style deviation found to be more significant than content shift, influencing authorship classification.

AI Text Detection

Effectiveness of AI text detectors varies based on the scenario (traditional vs. alternative perspectives).

Discussion

Findings

Results indicate a larger impact on style due to paraphrasing, leading to misclassification in authorship tasks.
Philosophical implications suggest authorship could change based on the perspective of the paraphrasing process.

Implications for Authorship

Analysis leans towards the notion that paraphrasing significantly alters original authorship, emphasizing context.

Conclusions

The study provides an extensive look into authorship dynamics in the context of LLM influence.
Advocates for context-dependency in authorship attribution for paraphrased texts and lays ground for future research on plagiarism and copyright disputes involving LLMs.

Limitations

Current study confined to English texts; further investigation required in other languages.
Variability in LLM style based on instructions not evaluated.

Ethical Considerations

Research conducted following ethical guidelines, balancing exploration of authorship issues with societal impact.

Acknowledgements

Supported by various NSF awards and projects associated with the European Union.