Large language models: Power, potential, and the sustainability challenge
Written by Simon Althoff
Large language models (LLMs) have revolutionized how we interact with machines, enabling tasks such as text generation, translation, and question answering. However, these features come at a cost, as LLMs require high amounts of computational power both for training and inference. Transformer models, which LLMs are built on, have simultaneously increased in size since their inception and the trend seems to continue due to the clear performance benefit. With widespread adoption of LLMs thus comes concerns about environmental impact, contradictory to most companies’ sustainability agendas to reach the SBTi targets.
The computational cost also means that building and running the top end models will be limited to organizations with access to top end hardware. The global shortage of Nvidia GPUs further complicates this issue. In short the adoption and development of LLMs has the two following hardware related issues:
High computational costs: Training and running large LLMs can be extremely expensive due to the immense power and advanced hardware needed. This can limit accessibility and hinder innovation for smaller companies.
Environmental impact: The high energy consumption associated with LLMs raises concerns about their environmental footprint, putting pressure on companies to adopt more sustainable practices.
Efficiency through innovation
There are initiatives to reduce the computational cost of LLMs. Smaller models specializing in certain tasks have recently begun popping up more. Microsoft’s phi-1, a relatively small LLM specialized in generating python code, is one of many examples. This enables architectures where specific queries are routed to the model best suited to respond, effectively decreasing the dependence on larger models.
Another type of mitigation is the Mixture of Experts (MoE) model architecture, which is an example of a sparse model architecture. The Mixture of Experts model is quite similar to the earlier described architecture of routing queries to specialized LLMs. However, in the MoE case, the routing is done within the same model.
The latest buzz around MoE models came from the release of Mixtral-8x7B, an open-source LLM developed by Mistral, which is said to outperform the popular Llama 2 70B model from Meta in several benchmarks. The Llama 2 model, as the name suggests, has around 70 billion parameters, requiring around 140 GB of RAM to load, depending on the level of parameter precision.
The Mistral model, which, though the name suggests a size of 56 billion parameters, is actually 47 billion parameters in size. The model consists of 8 expert models with about 7 billion parameters each, but the experts share several resources, thus reducing the total number of parameters. The model is still very large, requiring around 90 GB of RAM to load. However, the main benefit of the MoE models comes when performing inference. In the case of the Mixtral model, only 2 of the 8 experts are used per token (which roughly equates to each word). Computationally this makes the model as fast as models with 12 billion parameters (not 14, again due to the shared resources). So even though the amount of RAM required is still high, the speed and efficiency of the model is significantly increased. It is also much more efficient in the pre-training phase since it has been shown to require much less computational power.
The MoE architecture does come with some drawbacks, mainly within fine-tuning. It seems that the model is prone to overfitting and thus difficult to generalize. However, recent advancements with instruction tuning MoEs have proved very fruitful, suggesting that the architecture is more suitable for certain tasks. In the future we can expect more custom architectures for specific applications, further working towards more efficient use of LLMs.
SBTi targets and sparse architectures
The Science Based Targets initiative (SBTi) helps companies set emission reduction targets aligned with limiting global warming to 1.5°C, as outlined in the Paris Agreement. Sparse architectures like MoEs can play a significant role in supporting companies' work towards achieving these targets:
Reduced energy consumption: MoE models achieve comparable performance to larger models while utilizing fewer resources during inference. This translates to a lower energy footprint compared to traditional LLM architectures, directly contributing to reduced emissions and alignment with SBTi goals.
Improved efficiency: MoE models require less computational power during training, further minimizing their environmental impact and contributing to more sustainable AI development.
Cost-effectiveness: By reducing the computational resources needed, MoE models can be more cost-effective to train and run, making LLMs more accessible to a wider range of companies, fostering innovation and competition within the field.
By embracing sparse architectures like MoEs, companies developing and utilizing LLMs can demonstrate their commitment to sustainability, reduce their environmental impact, and contribute to achieving the crucial goals of the SBTi. This approach paves the way for a future where powerful AI tools can be developed and utilized responsibly, fostering innovation while minimizing our impact on the planet.
Balancing resource constraints, need for expertise and driving AI transformation
Building expertise in the vast field of LLMs is hard with limited resources. By focusing on the right areas and utilizing readily available options, businesses can bridge parts of the gap.
The foundation of any LLM knowledge lies in understanding machine learning and deep learning concepts. Grasping supervised and unsupervised learning, neural networks, and optimization algorithms equips businesses with the tools to comprehend the training process behind these models. The transformer architecture, the cornerstone of most modern LLMs, deserves dedicated study to fully grasp its functionalities.
For companies facing resource constraints, navigating the LLM landscape requires a strategic approach. Conducting a cost-benefit analysis helps weigh the potential benefits of using LLMs against the associated costs in terms of computational resources and infrastructure. Exploring alternative, resource-efficient architectures like MoE or smaller, specialized models can help achieve similar results. Implementing practices such as model quantization, gradient accumulation, and efficient hardware utilization further optimizes resource usage during training and inference.
Start small and focus on specific use cases: Instead of aiming for large-scale LLM projects, start with smaller, focused tasks that can be addressed with readily available resources. This allows you to build expertise and demonstrate the value of LLMs within your organization.
Seek collaborations and partnerships: Partner with external experts and cloud providers with access to computational resources. This allows you to leverage their infrastructure and knowledge for specific projects.
Advocate for responsible AI development: Regardless of resource constraints, it's crucial to prioritize responsible AI development. Focus on data privacy, fairness, and transparency within your LLM projects, building trust and demonstrating ethical considerations even with limited resources.
Finally, prioritizing responsible AI development by implementing ethical guidelines for data collection, model usage, and mitigating potential biases is crucial, regardless of resource constraints.
By following these steps and adapting them to your unique circumstances LLMs can become more accessible, even with limited resources.