ChatGPT probably read your cringey blog posts

Trained on the internet, ChatGPT is potentially a privacy nightmare

OpenAI’s ChatGPT is the current hype, especially thanks to Microsoft’s recently announced integration of an updated ChatGPT into Bing and Edge.

Despite the hype, there’s a ton of concern about ChatGPT and AI tools in general, such as the potential for misinformation or the impact on jobs. However, a less discussed impact of tools like ChatGPT is the impact on privacy.

An article published in The Conversation (and republished by Gizmodo) highlights several concerns with ChatGPT and its (mis)use of personal data. For starters, it highlights how OpenAI trained ChatGPT using some 300 billion words scraped from the internet. These words came from books, articles, websites, blog posts and more. The words also included personal information obtained without consent (though this is one of many problems with using the internet to train ChatGPT).

Put another way, anything you’ve written online — a blog post, product review, comment on an article, etc. — possibly got vacuumed up to train ChatGPT and other AI language tools.

While you may not think that’s a huge problem, The Conversation highlights a few issues with this kind of data collection. First, OpenAI didn’t ask anyone if it could use the data, which is particularly concerning when it comes to sensitive information or data that could identify someone.

The publication also notes that OpenAI doesn’t offer a way for people to check if their personal information is being stored or request that the information be deleted.

Beyond individuals who’ve posted on the internet, The Conversation notes that ChatGPT doesn’t consider copyright protections. As an example, the publication was able to make the tool generate the first few paragraphs of a copyrighted novel. (I was able to recreate this by getting ChatGPT to write a few paragraphs of The Hobbit, but a similar prompt to write a page from Dune didn’t work).

ChatGPT writing the first couple paragraphs of The Hobbit.

ChatGPT wrote the first two paragraphs of The Hobbit.

More to this, OpenAI didn’t pay for the data it scraped from the internet, which is particularly frustrating as the company moves to monetize ChatGPT.

The Conversation goes on to examine ChatGPT’s privacy policy, which says that OpenAI gathers user information like IP address, browser type and settings, data on interactions with the site and more. It also collects information about users’ browsing activities over time and across websites (something that’s more alarming given Microsoft is building OpenAI tools into its Edge browser).

Whatever happens with ChatGPT and other AI tools going forward, those planning to use the tools should keep the privacy implications front of mind.

Image credit: Shutterstock

Source: The Conversation Via: Gizmodo