26 Apr 2023 | AI, Cyber Security

OpenAI’s ChatGPT is taking the large language model space by storm.

Howard Freeman

However, there is much to consider when it comes to data privacy.

Howard Freeman – Managing Director at Fortis DPC Limited has been investigating.

Unless you have been in hiding, ChatGPT is now a major part of our world. There are lots of opinions about it, both good and bad. ChatGPT is developed by OpenAI, which also created generative AI tools like DALL-E.

ChatGPT is, as described by Wikipedia, an artificial intelligence (AI) chatbot developed by OpenAI. It was released in November 2022. It is built on top of OpenAI’s GPT-3.5 and GPT-4 families of large language models (LLMs). It has been fine-tuned using an approach called transfer learning. It has used both and reinforcement learning techniques.

ChatGPT launched as a prototype on 30^th November, 2022 and garnered attention for its detailed responses and articulate answers across many domains of knowledge. Its uneven factual accuracy, however, has been identified as a significant drawback.

The original release of ChatGPT was based on GPT-3.5. A version based on GPT-4, the newest OpenAI model, was released on March 14, 2023, and is available for paid subscribers on a limited basis.

Just so you know, GPT means it is a member of the generative pre-trained transformer (GPT) family of language models. So now you know!

Microsoft has now invested in OpenAI and is apparently controlling the company.

ChatGPT uses an extensive language model based on billions of data points from across the internet. It uses these data points to reply to questions and instructions in a way that mimics a human response. Those interacting with ChatGPT have used it to explain scientific concepts, write poetry and produce academic essays. As with any technology that offers new and innovative capabilities though, there is also serious potential for exploitation and data privacy risks.

ChatGPT has already been accused of spreading misinformation by replying to factual questions in misleading or inaccurate ways. However, its potential use by cyber criminals and bad actors is also a huge cause for concern.

ChatGPT and the GDPR

The method that OpenAI uses to collect the data that ChatGPT is based on is still yet to be disclosed. However, data protection experts have warned that obtaining training data by simply trawling through the internet is unlawful. In the EU, for example, scraping data points from sites can be in breach of the GDPR and therefore the UK GDPR, the ePrivacy directive, and the EU Charter of Fundamental Rights. A recent example of this is Clearview AI, which built its facial recognition data base using images scraped from the internet.

The Regulators

Last year, Clearview AI was served enforcement notices by several data protection regulators. The French regulator, the CNIL tried to prevent the processing of French citizens data. The initial breaches were as follows:

Unlawful processing of personal data (breach of Article 6 of the GDPR)
Individuals’ rights not respected (Articles 12, 15 and 17 of the GDPR)

In their initial response, Clearview AI appeared to be ghosting the regulator. This therefore caused a further breach of the GDPR, as below.

Lack of cooperation with the CNIL (Article 31 of the GDPR)

A fine of 20 million euros has been issued by the CNIL, the same as the fines from Italy and Greece and a lesser fine from the ICO, the UK regulator. The latter seems unwilling or unable to use its powers effectively. As a result, there have been calls in the UK for new laws on biometrics particularly around the use of live facial recognition (LFR) technology to be suspended until the government can introduce appropriate legislation. UK police forces are very keen on this technology as you might understand. However, human rights and civil groups are keenly opposed.

The Money

There is no evidence to suggest that any fines have been paid. The ability of the EU regulators to collect these fines is limited due to limited resources, and limited legal means. However, this is a warning to stay away from Europe. This, however, has not been heeded.

The Response

Clearview AI didn’t respond directly. However, their PR people did, attributing the following to their CEO, Hoan Ton-That, in response to the French Regulator.

‘There is no way to determine if a person has French citizenship, purely from a public photo from the internet. Therefore, it is impossible to delete data from French residents. Clearview AI only collects publicly available information from the internet, just like any other search engine like Google, Bing or DuckDuckGo.’

The statement talks about Clearview not having a French office or having any place of business within the EU. It states that it would not undertake any activities that would “otherwise mean it is subject to the GDPR”, as it puts it — adding: “Clearview AI’s database of publicly available images is lawfully collected.” They have simply failed to establish a legal basis for processing.

Extraterritorial Reach

The GDPR has extraterritorial reach. Therefore, Clearview AI’s arguments are meaningless. It may claim it’s not doing anything that would make it subject to the GDPR. But this looks absurd when you understand that its database has over 20 billion images, worldwide and Europe. The New York Times revealed this in a remarkable article which you can find here. Don’t forget that an EU Data Subject is still protected by the GDPR, anywhere in the world.

Clearview AI’s strategy appears to be ‘ignore and carry on’ when dealing with EU regulators. However, the Swedish regulator (IMY) took a different approach when dealing with the problem. They fined the local police authority €250,000 ($300,000+) for unlawful use of the controversial facial recognition software Clearview AI, in breach of the country’s Criminal Data Act. So, if regulators cannot reach Clearview AI, then they will simply go after their users, it seems. This will set a precedent for future legal decisions.

Overseas

In Canada, a country seen as adequate by the European Data Protection Board, found Clearview had breached local laws when it collected photos of people to plug into its facial recognition database. This was done without their knowledge or permission. Clearview AI actions were ruled illegal by Canadian privacy authorities. They warned they would “pursue other actions” if the company does not follow recommendations. These include stopping the collection of Canadians’ data and deleting all previously collected images.

Clearview said it had stopped providing its tech to Canadian customers. This step doesn’t necessarily protect Canadian data subjects within another jurisdiction, however.

Clearview AI could be very busy defending themselves. They are facing a class action lawsuit from the City of Illinois citing a breach of biometric protections laws.

The U.K. and Australian data protection watchdogs announced a joint investigation into Clearview’s personal data handling practices. We await the outcome of that.

Last year, Clearview settled a lawsuit that had accused it of running afoul of an Illinois law banning the use of individuals’ biometric data without consent. The settlement included Clearview agreeing to some limits on its ability to sell its software to most U.S. companies but it still trumpeted the outcome as a “huge win”. It claims it would be able to circumvent the ruling by selling its algorithm, rather than access to its database, to private companies in the U.S.

The need to empower regulators so they can order the deletion (or market withdrawal) of algorithms trained on unlawfully processed data does look like an important upgrade to their toolboxes if we’re to avoid an AI-fuelled dystopia.

And it just so happens that the EU’s incoming AI Act may contain such a power.

The EU bloc has also more recently presented a plan for an AI Liability Directive which it wants to encourage compliance with the broader AI Act. It would do this by linking compliance to a reduced risk that AI model makers, deployers, users etc can be successfully sued if their products cause a range of harms, including to people’s privacy. The lawyers in Europe will be rubbing their hands with glee!

More Players

This is clearly a market that interest major players. Elon Musk, co-founder of OpenAI is rumoured to be rushing to assemble a rival to Open AI. He stated that he originally help found the company as an open source not for profit to act as a counterbalance to Google and not the closed-source, maximum-profit company, effectively controlled by Microsoft. Musk views AI as ‘one of the biggest risks’ to civilisation and needs to be regulated

Legality

The GDPR gives people the right to request that their personal data is removed from an organisation’s records, completely. This is the “right of erasure”, one of the eight rights on which the GDPR is premised. However, the trouble with natural language processing tools like ChatGPT, is that the system ingests potentially personal data. The way data appears to be collected isn’t clear but as the data is so mixed up, it becomes impossible to extract an individual’s data.

Therefore, it is not at all clear that ChatGPT complies with GDPR. It doesn’t seem to be transparent enough. It may be collecting and processing personal data in unlawful ways. What seems highly likely is that data subjects would find it difficult to exercise their rights, including the right to be informed and the right to erasure.

GDPR demands transparency and AI vendors will need to clearly demonstrate how this is achieved.

The technical risks

ChatGPT is trained on billions of data points. It is an open-source tool, so these points are made accessible to malicious actors. They can use this information to carry out targeted attacks. One of the most concerning capabilities of ChatGPT is its potential to create realistic-sounding conversations. These could be used in social engineering and phishing attacks. This would encourage victims to click on malicious links, install malware, or give away sensitive information.

The tool also creates opportunities for more sophisticated impersonation attempts. This is where the AI is instructed to imitate a victim’s colleague or family member, to gain trust.

Another attack vector might be to use machine learning. This could then be used to generate large volumes of automated, legitimate-looking messages to spam victims and steal personal and financial information. These kinds of attacks can be highly detrimental to businesses.

Good news?

Fortunately, it’s not all doom and gloom. Large language models like ChatGPT also have the potential to be a powerful cybersecurity tool in the future. AI systems with a nuanced understanding of natural language can be used to monitor chat conversations for suspicious activity. This could also be used for automating the process of downloading data for GDPR compliance. These automation capabilities and behavioural analysis tools can be used by businesses in cyber incident management. This would expedite some of the manual analysis usually done by professionals. Realistic language conversations are also a great educational tool for cyber teams, if used to generate phishing simulations for training purposes.

While it’s still too early to decide whether or not ChatGPT will become a favourite tool of cyber criminals, researchers have already observed code being posted to cybercrime forums that appears to have been crafted using the tool. As AI continues to develop and expand, tools like ChatGPT will indeed change the game for both cybersecurity attackers and defenders. The game’s afoot!

← previous post next post →