Express Computer
Home  »  News  »  Yandex develops new methods for compressing LLMs, cutting AI deployment costs by upto 8 times

Yandex develops new methods for compressing LLMs, cutting AI deployment costs by upto 8 times

0 58

The Yandex research team, in collaboration with researchers from IST Austria, NeuralMagic, and KAUST, have developed two innovative compression methods for large language models (LLMs): Additive Quantisation for Language Models (AQLM) and PV-Tuning. When combined, these methods allow for a reduction in model size by up to eight times while preserving response quality by 95 percent. The methods aim to optimise resources and enhance efficiency in running LLMs.

Key features of AQLM and PV-Tuning

AQLM leverages additive quantisation, traditionally used for information retrieval, for LLM compression. The resulting method preserves and even improves model accuracy under extreme compression, making it possible to deploy LLMs on everyday devices like home computers and smartphones. This results in a significant reduction in memory consumption. While, PV-Tuning addresses errors that may arise during the model compression process. When combined, AQLM and PV-Tuning deliver optimal results — compact models capable of providing high-quality responses even on limited computing resources.

Method evaluation and recognition

The effectiveness of the methods was rigorously assessed using popular open-source models such as LLama 2, Llama 3, Mistral, and others. Researchers compressed these large language models and evaluated answer quality against English-language benchmarks — WikiText2 and C4 — maintaining an impressive 95 percent answer quality as the models were compressed by eight times.

Who can benefit from AQLM and PV-Tuning

The new methods offer substantial resource savings for companies involved in developing and deploying proprietary language models and open-source LLMs. For instance, the Llama 2 model with 13 billion parameters, post-compression, can now run on just one GPU instead of four, reducing hardware costs by up to 8 times. This means that startups, individual researchers, and LLM enthusiasts can run advanced LLMs such as Llama on their everyday computers.

Exploring new LLM applications

AQLM and PV-Tuning make it possible to deploy models offline on devices with limited computing resources, enabling new use cases for smartphones, smart speakers, and more. With advanced LLMs integrated into them, users can use text and image generation, voice assistance, personalised recommendations, and even real-time language translation without needing an active internet connection. Moreover, models compressed using the methods can operate up to four times faster, as they require fewer computations.

Implementation and access

Developers and researchers worldwide can already use AQLM and PV-Tuning, which are available on GitHub. Demo materials provided by the authors offer guidance for effectively training compressed LLMs for various applications. Additionally, developers can download popular open-source models that have already been compressed using the methods.

Get real time updates directly on you device, subscribe now.

Leave A Reply

Your email address will not be published.

LIVE Webinar

Digitize your HR practice with extensions to success factors

Join us for a virtual meeting on how organizations can use these extensions to not just provide a better experience to its’ employees, but also to significantly improve the efficiency of the HR processes
REGISTER NOW 

Stay updated with News, Trending Stories & Conferences with Express Computer
Follow us on Linkedin
India's Leading e-Governance Summit is here!!! Attend and Know more.
Register Now!
close-image
Attend Webinar & Enhance Your Organisation's Digital Experience.
Register Now
close-image
Enable A Truly Seamless & Secure Workplace.
Register Now
close-image
Attend Inida's Largest BFSI Technology Conclave!
Register Now
close-image
Know how to protect your company in digital era.
Register Now
close-image
Protect Your Critical Assets From Well-Organized Hackers
Register Now
close-image
Find Solutions to Maintain Productivity
Register Now
close-image
Live Webinar : Improve customer experience with Voice Bots
Register Now
close-image
Live Event: Technology Day- Kerala, E- Governance Champions Awards
Register Now
close-image
Virtual Conference : Learn to Automate complex Business Processes
Register Now
close-image