Reduce, Reuse, and Recycle?

Recognizing Rewards and Risks of Redeploying Prejudiced Pretrained Ptransformers

The P in Ptransformers is silent. :)

In a couple of previous posts I’ve talked about the level of data crunching that has to happen in order to train a Transformer Large Language Model (LLM). This turns out to be an exceptionally expensive proposition when starting from scratch. The cost can easily run into the 8–9 digit range (or more) and in addition there are environmental impacts for each new model trained. Thus, the ability to take a large LLM that someone else has already paid for (and whose environmental damage has already been done) and customizing it for your own purposes seems like the financially smart and environmentally friendly thing to do.

And yet, things are never quite that simple in the real world, are they?

Garbage In, Garbage Out

One of the problems encountered when trying to build one of these massive LLMs is simply gathering enough data. As a result, models are frequently based on a corpus that includes a significant amount of uncurated, publicly available content. While this succeeds in helping LLMs be able to make predictions and generate text in response to prompts, sometimes the responses are… shall we say, suboptimal?

For example, you may be surprised to learn that there exists a significant amount of text on social platforms today that is represents biased and bigoted worldviews. Models built using this data as the source for predictions can sometimes return some very atrocious responses to certain prompts. However, even models like BLOOM, where a full 2/3 of the data used to build the model were hand-picked by researchers in an attempt to mitigate such issues, still hold inherent human biases in their text. After all, even writers with the best of intentions (myself included) can have blind spots to social/cultural biases that can produce harmful responses based on the context in which a question is asked.

To illustrate this (in a very, very tame way) I updated the notebook where I first tested the small BLOOM model and prompted it with two very similar phrases: “The ______ works as a” — and set it to predict the next word. The blank was either “man” or “woman.” Otherwise, the prompts were identical. The next text generated across multiple trials was reflective of gender biases.

Other, slightly reworded trials for man included “doctor” and for woman included “wait” — which I assumed was a tokenized/shortened form of waitress.

Thus, both obvious and inherent/hidden biases on the part of the writers of the text that is used to build a LLM will predictably become part of the response patterns the LLM will generate to the prompts it is given. It is very difficult to mitigate, if it’s even possible at all, in the model-building phase. This raises questions as to whether any generative machine learning model will ever be able to overcome this problem, at least given current data and technology available.

Fine Tuning

If we assume that there are ways to mitigate this issue (something I’m noodling on and may write more about in the future), then the power and potential of generative LLMs becomes enormous. But how does this process work?

If you think back to neural networks (and Transformers are just a fancy/advanced version of neural network architectures) then you will recall that the output of training a machine learning model is simply a set of weights that have been iteratively adjusted until they meet the necessary performance criteria. As such, it is also appropriate to think of a pre-trained Transformer model as simply a set of weights (a complex set of weights with some very specific rules about how it works with data to make predictions, but nonetheless just a bunch of weights.) It is relatively straightforward, then, to take a new corpus of data — say, data present on your company’s website, knowledge base articles, and other useful things — and see how good that pre-trained model with those weights does at predicting things within that dataset.

Odds are, the predictions will be pretty good, but not optimal, since your domain-specific knowledge likely did not exist in the training data for the LLM. However, just like when the original model was built, you can use back propagation (gradient descent) to iteratively adjust the model weights such that it performs better and better on your own data. Now the resulting model might not perform quite as well on general purpose prompts as it did before you tuned it, but if you’re building the model for a specific purpose — maybe a chatbot that answers HR or policy questions, locates documents of interest, or similar — then you’re typically not concerned about how the model performs for out of domain tasks. The fact that it *can* answer questions or make text predictions that fall outside of your domain area is just a bonus. What you really care about is having a model that is extremely conversational, and when it comes to the kind of things your users typically need to know, extremely knowledgeable and accurate.

LLMs present the greatest opportunity to date in terms of building these kinds of conversational AI systems that can seem to flex and intuit meaning based on context the model builders did not anticipate, and thus mimic human conversation to ever greater degrees. The possibilities for good are virtually limitless. We just have to figure out how to keep them from saying things that might get a human counterpart rightfully fired.

That is not going to be an easy thing to do when basing your predictions on writings that contain the thoughts and predispositions of human beings.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jason Eden

Jason Eden


Data Science & Cloud nerd with a passion for making complex topics easier to understand. All writings and associated errors are my own doing, not work-related.