A Ship of Theseus

Abstract

  • Exploration of authorship in paraphrasing, likened to the Ship of Theseus paradox.

  • Investigates whether text retains original authorship post-paraphrasing.

  • Large Language Models (LLMs) can generate and modify text, raising authorship attribution questions.

  • Study reveals performance decline in text classification with each paraphrasing iteration correlating to style deviation from the original.

Introduction

Ship of Theseus Paradox

  • Philosophical thought experiment questioning identity and change over time.

  • Focuses on whether a modified ship (or text) retains its identity after all original components are replaced.

Text Paraphrasing

  • Rewriting to convey the same meaning using different wording, debated for ethical and originality issues.

  • LLMs can independently generate original content and paraphrase, altering traditional views on authorship.

Background

Authorship Attribution Challenges

  • Two contrasting views: paraphrasing maintaining original authorship versus changing authorship.

  • Investigation through two scenarios:

    • Scenario 1: Paraphrasing for obfuscation (maintains authorship).

    • Scenario 2: Paraphrasing for generation (changes authorship).

Methodology Overview

  • Data sourced from human authors and various LLMs.

  • Utilized multiple paraphrasers (e.g., ChatGPT, PaLM2, Dipper, Pegasus).

  • Assessment of stylistic and content variances post-paraphrasing.

Methodological Details

Dataset Development

  • Multiple datasets were selected for evaluation, ensuring balanced author representation.

  • Each text underwent three sequential paraphrasing iterations.

Classification Tasks

  • Authorship attribution (multi-class) and AI text detection (binary).

  • Experimentation involved evaluating how ground truth impacts classifier performance and results.

Experimental Results

Performance Assessment

  • Notable performance drop in classifiers after the first paraphrase, with less impact in subsequent iterations.

  • Style deviation found to be more significant than content shift, influencing authorship classification.

AI Text Detection

  • Effectiveness of AI text detectors varies based on the scenario (traditional vs. alternative perspectives).

Discussion

Findings

  • Results indicate a larger impact on style due to paraphrasing, leading to misclassification in authorship tasks.

  • Philosophical implications suggest authorship could change based on the perspective of the paraphrasing process.

Implications for Authorship

  • Analysis leans towards the notion that paraphrasing significantly alters original authorship, emphasizing context.

Conclusions

  • The study provides an extensive look into authorship dynamics in the context of LLM influence.

  • Advocates for context-dependency in authorship attribution for paraphrased texts and lays ground for future research on plagiarism and copyright disputes involving LLMs.

Limitations

  • Current study confined to English texts; further investigation required in other languages.

  • Variability in LLM style based on instructions not evaluated.

Ethical Considerations

  • Research conducted following ethical guidelines, balancing exploration of authorship issues with societal impact.

Acknowledgements

  • Supported by various NSF awards and projects associated with the European Union.