Google’s Gemini API – Is my data safe and legally compliant?


You are thinking of using Google’s latest LLM Gemini API in your business, but you are concerned about legal compliance: is your data safe? The story can be summarized as:

  • Your data will likely be used to train or improve future models
  • Google currently has no controls to keep your data in a single region, but that is likely to change soon
  • Google doesn’t promise much by way of security either

Let’s get the background out of the way first. Google announced that its latest AI LLM model Gemini is now available as an API on Vertex AI. Gemini Pro is the strongest model available and roughly competes on even ground with OpenAI’s GPT 3.5. This means that the model performs similarly to ChatGPT’s free version. Gemini Ultra is not yet released, and that should compete with GPT 4.

Gemini Pro does not have an easy to use interface like ChatGPT
Gemini Pro does not have an easy to use interface like ChatGPT

An API is a way for developers to access the LLM without developing it themselves. Unlike ChatGPT, Gemini does not have an interface readily available for end users. Businesses that want to use Gemini will need to build a front end themselves. While it will require some development work, it is a much easier task to build a simple interface than it is to train one of these LLM models.

Sample code to implement Gemini API.
Sample code to implement Gemini API.

The main reason you shouldn’t just use ChatGPT in your business is legal compliance. Through a variety of routes, data you enter into ChatGPT could be shared with other parties, which makes it very hard to use ChatGPT in any serious business use cases. For the rest of this post, I’ll examine the three most common concerns with enterprise-use of ChatGPT.

Model training on Gemini API

On the surface…

ChatGPT’s most notorious problem is that it uses your input data to train their next model. In short, whatever you enter into the chatbot might be put into the response for another user, which is really bad if you want to keep your information secret.

For Gemini, the press material is certainly encouraging. Things go downhill soon after. Right in the main announcement, Google stresses that they have “built-in data governance and privacy controls”, and that “Google never uses customer data to train [their] models”. Their documentation’s FAQ also states that “Prompts and tuning data for both Gemini and PaLM 2 are never used to train or enhance our foundation models.”

However, things are not as rosy when we examine the legal terms. When you follow the links from Gemini’s dashboard to their privacy policy, it leads to the generic Google Privacy Policy which covers all their products in one massive document. Under the section “Why Google Collects Data”, Google mentions that they may use information collected to improve their services and develop new services. It is not a stretch to say this language might encompass further training of Gemini or other LLMs. Now to be fair, this is the generic privacy policy and it may be written this way to catch other products Google provides and does not apply to Gemini.

Digging deeper…

So we look to the Terms of Service provided, which once again leads to a generic page. Thankfully, with a bit of digging, we found terms specific to Generative AI APIs including Gemini API. It is a short addition meant to extend and clarify the general Terms of Service. For our purposes, the entire section on Content License and Data Use is worth reading:

  • “To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output.”
  • Do not submit sensitive, confidential, or personal information to the Services.
  • “Google only uses data that you import or upload to the Services to tune models for that express purpose.”
Image showing Google's Generative AI APIs Additional Terms of Service last modified December 13, 2023.  This covers Gemini API.
Last updated December 13, 2023, on the release date of Gemini API.

As the most specific document we could find, I believe these quotes are the true reflection of Google’s intention. Clearly, your data will be used to improve their model. Whether they call it “improve” or “tune”, ultimately your data will be used for the next model and is at risk of disclosure.

On a personal level, I found it frustrating to navigate these documents, and to ultimately find something that reads so differently from the press releases.

Region restrictions

Another common concern with ChatGPT is where personal data might leave your country’s jurisdiction. This is problematic in areas like the EU and GDPR which does not allow personal information to leave the EU unless under specific circumstances.

In existing Google API products, you can change the endpoint to select regions. For example, if you want to keep your data in Europe, send your API call to “https://europe-west4-aiplatform.googleapis.com”. However, Gemini API in its current preview only has one endpoint. I can only assume it is in the United States. If you are concerned about GDPR compliance, this will likely be a cross-border data transfer.

It is worth noting that virtually all APIs offered by Google have this ability to select regions. I suspect Gemini API is only limited temporarily during the initial rollout and that there will be more options down the line.

Data security practices

The last area to consider is data security. When we looked at Microsoft Azure OpenAI Services earlier, we noted how Microsoft linked to their security practices and gave ample description of their processes. This is not true with Gemini API.

When examining the Google APIs Terms of Service, you’ll note that most of the provisions around security are obligations on the user to keep their applications secure. It is not about what Google is providing. The Terms of service specific to Generative AI aren’t much help either, putting more restrictions on the user.

There are two significant reassurances however. The first is Google’s size and experience. Regardless of the lack of promises, I think it is a safe bet that Google is well versed in security and has more to lose by putting out a poor product. The second is in the built-in controls. Google provides an incredible number of options for the user to monitor, log, and audit Gemini API, and using those tools correctly gives you control over the data security.

IAM permissions for Gemini API
IAM permissions with Vertex AI and Gemini API

Legal compliance for Gemini API is not great…

We blogged earlier about how Microsoft Azure OpenAI Services is likely the most enterprise-ready solution. This continues to be the case. When we look at other vendors like Claude, OpenAI, and Meta, they don’t have their ducks in a row for enterprise customers. Google is offering shockingly few assurances for such a large company. I hope that the offering is improved in the future when Gemini API is past its initial rollout.