How Blockchain Enhances Copyright Protection in AI Model Workflows

cover
17 Sept 2024

Abstract and I. Introduction

II. Preliminaries

III. Proposed Design: IBis

IV. Detailed Construction

V. Implementation on DAML

VI. Evaluation

VII. Conclusion and References

II. PRELIMINARIES

A. Background

AI model training. In general, the training process for AI models is continuous and iterative, containing training, retraining, and fine-tuning [29]. As seen in Fig.1, training begins with data collection, where initial training datasets are gathered through data scraping. These datasets are then fed into the model training step, where a preliminary model is trained. To ensure the model remains effective and up-to-date, it undergoes periodic retraining with newly collected data, allowing it to adapt to new information. Additionally, a model may undergo a fine-tuning phase, where it is slightly retrained to meet specific domain requirements, enhancing its accuracy and relevance for targeted applications.

Fig. 1: Typical AI model training process.

Copyrights. Copyright grants creators exclusive rights to their original expressions such as literary, artistic, and musical works. This legal framework safeguards creators’ rights, allowing them to control how their work is used, reproduced, and distributed.

Copyright protection is automatic upon creation, but it is implicit, requiring additional steps for proper protection. First, registering the work with the copyright office offers authoritative legal evidence of ownership and eligibility for statutory damages in case of infringement. Second, adding a copyright notice (©) with the creator’s name and the year of creation informs others of the copyright claim [30], akin to a signature on a picture. Additionally, NFTs [31] offer a novel method for embedding ownership of digital art via blockchain technology, with ownership automatically claimed upon minting.

Licensing is the primary method for granting or transferring the rights to a work. Creators can control the scope of rights by specifying terms and conditions within the license agreement. Licenses may vary widely, from granting broad permissions to restricting usage to specific purposes or timeframes

Protecting copyright/data in AI. Data and copyright protection in AI services is a long-standing topic. Existing methods can classified in several aspects [32]: Data-modifying approaches involve modifying or sanitizing user data to unlink them from specific individuals (e.g., k-anonymity [33], differential privacy [34], and watermarking [35]). This minimizes the risk of reidentification by removing or concealing Personally Identifiable Information (PII). Data-encrypting approaches encrypt user data to ensure integrity and confidentiality during data sharing, leveraging techniques such as homomorphic encryption [36] and secure Multi-Party Computation (MPC) [37], [38]. Data-minimizing approaches aim to boost efficiency by reducing the volume of personal data needed [39], often observed in general model training where PII data are not required during training and minimally during inference. Dataconfining approaches involve AI methods that operate without sharing PII data beyond user boundaries [40], ensuring data integrity and confidentiality while enabling effective personalization through local access to personal data.

Blockchain-empowered copyright management. Liang et al. [41] employed smart contracts to establish a homomorphic encryption mechanism aimed at safeguarding circuit copyrights. Liu et al. [42] employed a blockchain-based fraudproof protocol to secure ownership rights over AIGC (artificial intelligence-generated content). Numerous similar solutions are outlined in studies such as [43]–[45]. It is worth noting that most existing blockchain-based studies treat each copyright merely as a form of non-fungible online property, akin to an NFT. However, this approach restricts its practical utility in real-world scenarios that require varied operations like registration, renewal, and termination – features that our framework offers in contrast.

Leveraging blockchain in AI. Recent studies made efforts to empower AI and foundational models with blockchain technology, aiming to build a more robust and trustworthy AI in distributed environments. IronForge [46] proposes a decentralized federated learning framework that integrates a distributed ledger and a Directed Acyclic Graph (DAG)-based data structure to asynchronously distribute training resources. Petals [47] is a distributed deep learning system that can effectively operate and refine complex models. It utilizes volunteer computing, outperforming traditional RAM offloading, particularly in autoregressive inference tasks. BlockFUL [48] introduces a decentralized federated unlearning framework that utilizes a redesigned blockchain structure leveraging Chameleon Hash. It decreases the computational and consensus costs associated with unlearning tasks. GradientCoin [49] introduces a theoretical concept for a decentralized LLM that functions akin to a Bitcoin-like system.

Authors:

(1) Yilin Sai, CSIRO Data61 and The University of New South Wales, Sydney, Australia;

(2) Qin Wang, CSIRO Data61 and The University of New South Wales, Sydney, Australia;

(3) Guangsheng Yu, CSIRO Data61;

(4) H.M.N. Dilum Bandara, CSIRO Data61 and The University of New South Wales, Sydney, Australia;

(5) Shiping Chen, CSIRO Data61 and The University of New South Wales, Sydney, Australia.


This paper is available on arxiv under CC BY 4.0 DEED license.