Chances are, you've been using generative AI for a while now. Have you used a filter on Facebook that takes a photo of you and makes you look older or change your hair color? Maybe you have used ChatGPT? Have you used the apps where you type in a prompt and it creates a fantastical image that looks like an oil painting? Or even use Microsoft's Copilot or Facebook's Meta engines while searching? Generative AI has become explosive in both usage and interest. Every company, whether tech-focused or not, seems to want a piece of the AI action. While I am especially excited by the recent advances in AI, I think this will soon become a tale of caution. While AI is still in its infancy, I think it's time we ask an important question: "Just because we can, should we?"
In one of my recent blog posts, "How Does ChatGPT Work?", I talk a bit about generative AI and how we train them. Put briefly, generative means that when the user inputs a prompt in words, images, or sounds, the AI will generate an answer on the spot. Whether it provides you with a picture of a hot air balloon, the first line on page 7 of "Pride and Prejudice", or the chorus of the song you heard on the radio this morning, the AI generates the information as you ask for it. If this seems like a difficult task, that's because it is. It is extremely computationally expensive to be able to provide a wide range of answers, and the more that the AI can do, the more expensive it is. When we talk about cost with AI, it is not only about money. Costs can come in the form of time and data as well. In "How Does ChatGPT Work?" I compare the training process to teaching a toddler about different objects via flashcards. As you repeat the name of the object on the flashcard, the toddler learns and soon can recognize the picture without you saying what it is. These models have to be large to perform well, so there must be more data for training. The more data for training, the longer these trainings will take. Just imagine a stack of flashcards that had hundreds of thousands of different images. That would take a long time to get through!
Why does this matter? To perform at almost human-level intelligence, these datasets have to contain millions of examples. And, well, we don't really have enough. How can this be? These models are trained on almost all of the data we have on the internet, and the internet is laughably huge. Yet, we still need more. One classic dataset that is used as a benchmark on most new machine learning models is the Modified National Institute of Standards and Technology database or MNIST. It contains images of handwritten numbers 0-9 and the goal is for the AI to be able to determine what number it is given. This dataset contains 70,000 images. This is only for identifying written numbers!
Now the question becomes: how did these companies obtain enough data to train their models? In a lot of cases, illegally. It turns out you can't just scoop up all the data on the internet to use for commercial use. Many copyright laws and intellectual property laws are being snuck around and private data is being sold to large data companies. Just because you can view something online for free, does not automatically mean it's up for grabs. Take artists' work they post online for example. AI companies will use artwork online to train their AI to make images in a similar art style. It remains up to debate to determine whether the resulting artwork is theft. On one hand, even human artists use other artists' artwork as inspiration for their own. On the other, AI that is not trained on a large, diverse training set begins to create art that looks more like copies instead of new art. In either case, the companies should be held accountable for not paying for this data and then making a profit off of it. Since AI is a recent advancement, laws have been slow to keep up with its expansion. Regulations for creating, training, and using AI are shaky right now but are quickly gaining traction. Ironically, all computer science students have to take computer ethics classes as part of their core curriculum to prevent this from happening. I think a conversation regarding ethics needs to be brought to all positions in the company, not just the engineers.
Aside from the obvious moral issues that come with implementing complex AI, there is the simple question of whether it is necessary. When you're using the search bar on Facebook or Instagram, do you use Meta AI? Does using AI improve your experience as a consumer? Personally, I don't use these features and find them to be more annoying than anything else. I've used ChatGPT a few times and find it to be pretty impressive, but I don't use it as regularly as Google Search. The technology is still new and could take some time to become integrated into our daily lives, but I don't think we need more chatbots and automated calls. People get so excited by the prospect of using new and exciting technology that it is often hard to take a step back and wonder if AI is an improvement. I think AI is an improvement in a lot of things, including urban planning, engineering bridges, and modeling epidemics. I think because of the current limitations, the average person doesn't have much use for AI. At least not yet.
There are a lot of amazing applications for AI that need to be vigorously studied and vetted. Turning AI for a quick profit is not going to benefit anyone in the long run. It is too new to be the robot assistant we see in sci-fi movies, but developed enough for scientists to make exciting advancements using this amazing new tool. Hopefully, we can enforce restrictions and regulations to prevent companies from stealing intellectual property for their new hyper fixation.
Comments