The Inference Interference

How to design verifiable inference that’s not slow

and

Apr 11, 2024

Blind trust in black-box AI apps is a ticking time bomb. It threatens to destroy our faith in the technology. Take ChatGPT. Users are forced to rely on unverifiable claims about what models are generating their inferences. Recent performance degradation of GPT-4 has heightened suspicions. People wonder if OpenAI even using the advertised model version. This erodes user confidence. This highlights the urgent need for verifiability.

Decentralized AI networks amplify this problem, as nodes host various models and compete for users' attention and money. We need a way to verify their claims and protect clients from fraud. For example, paying top dollar for the latest LLAMA outputs but getting inferior model results. The stakes couldn't be higher in a decentralized AI ecosystem.

The solution to this is Verifiable inference, using mechanisms to verify the use of specific models for generating inferences. This enables honest competition and protects clients.

While zero-knowledge proofs as a solution have been explored extensively by the decentralized AI community, and we discussed it in-depth in a previous article, they are currently too slow and expensive. At Bagel, we have investigated more practical and widely used alternatives from the traditional AI world, such as watermarking (Jia et al. 2021) and fingerprinting (Chen et al. 2019). Initially developed to prevent model extraction attacks, these methods also serve as unique model identifiers thus enabling verifiability. Compared to zero-knowledge approaches, they offer enhanced efficiency and ease of use, better aligning with current applications of verifiable deep learning models.

Today, we're open sourcing our research. Our goal is to empower the decentralized AI community and inspire builders to explore diverse, high-performance solutions from traditional AI. Together, we can create a more robust decentralized AI ecosystem that benefits the mainstream AI market.

If you're in a rush, we have a TLDR at the end.

Watermarking

How it works

In the landscape of machine learning, protecting foundation model as an intellectual property (IP) rights is a paramount concern. Enter watermarks – a clever defensive mechanism against model extraction attacks. Just like classical digital watermarking, a form of steganography where a message is concealed within a digital object, watermarks for machine learning models serve as a guardian of ownership, authenticity, and integrity. They help safeguard IP and enforce licenses, ensuring that the hard work of model creators doesn't go unrecognized.

Lederer et al. (2023) outline a plethora of characteristics that a watermark for a machine learning model must possess, but three stand out as the most crucial:

Effectiveness - Effectiveness means that the watermark can be verified at any time by the model's creator.

Fidelity - Fidelity ensures that the model's accuracy remains unaffected by the watermark's presence.

Robustness - That's the watermark's ability to withstand a barrage of attacks, from fine-tuning and model compression to watermark detection, removal, overwriting, or invalidation.

But how does the magic happen? Any watermarking embedding method consists of two algorithms: an extraction algorithm that retrieves the watermark from a model, and a verification algorithm that confirms its presence.

There are two main families of watermark embedding techniques: i) white-box watermarking and, ii) black-box watermarking. The general process of watermarking is illustrated in the figure below.

White-box watermarking requires full access to the model. The idea is to embed a signature s into the model's weights during training by adding an extra term to the loss function. This is done carefully to maintain the model's accuracy. The embedding is achieved by modifying the loss function through a regularizer parameter. Let w be a vector of all weights in a model. The goal is to embed s into w using an embedding matrix M, which acts as a secret key usually held by the model owner. The watermark is extracted by applying M to the weight vector w, followed by a threshold function (see Nagai et al. (2018) for details).

Black-box watermarking, on the other hand, only requires query access to the model. It creates backdoors on data using data poisoning. In this context, a backdoor is a set of input-output pairs known to the model owner that triggers a behavior not predictable by model consumers. For example, deliberately adding wrong labels to data points. The goal is to ensure that the model performs correctly on the main classification task, but the backdoor exhibits a specific behavior defined by the model owner.

Next we show two applications of watermarks in the context of verifying ownership of models and verifying inference from models.

Watermarking for Verifiable Inference

Public Models

Black-box watermarking is a game-changer when it comes to verifiable inference for public models. It's like a secret handshake between the model creator and the user, ensuring that the model is authentic and trustworthy.

Here's how it works: the model creator embeds a watermark and then discloses its presence to the world. It's like a badge of honor, a mark of quality. Any user who interacts with the model can then authenticate it, verifying that it's the real deal.

Zhong et al. (2020) took this concept to the next level. They developed a black-box watermarking technique that adds new labels to inputs that have nothing to do with the original dataset. It's like adding a secret code that only the model creator knows.

When a user wants to verify the watermark, they simply query these special inputs. If the watermark is present in the inference results, it's a clear sign that the model is authentic. It's like a digital signature that can't be forged.

Chen et al. (2019) took a slightly different approach. They generated watermark keys and used fine-tuning to embed a signature in the model. It's like hiding a secret message in plain sight.

With the watermark keys in hand, any user can interact with the model and extract the signature through its predictions. It's like unlocking a hidden layer of authentication, ensuring that the model is genuine and trustworthy.

Private Models

When a machine learning model is privately hosted, the mechanism is almost the same, black-box watermarking is used, except the model owner hosts the model privately. Model consumers gets access to the model via and API or gateway. First the model creator runs the watermark algorithm on his model and loads it to the hosting service. Then, via API access, users can query the model via inputs and obtains inferences as output. The backdoor is still public, so users can later verify the correct use of the model.

With knowledge of the backdoor, then users can run the verification algorithm to check for the watermark. The verification algorithm will tell if the correct model was used or not.

For private models offering API access, Xu and Yuan (2019) showed how to add unique serial numbers to the trigger-set of watermarks. Their serial number technique is independent of labels and can be supported by digital certification authorities.