Member-only story

Inverting Language Model:Key Takeaways from an NLP Seminar

Pradeep Pujari
5 min readAug 8, 2024

Introduction:

When the presentation yet to start, I was thinking, can I co-releate this to “inverting a binary tree” where a left node becomes a corresonding right node. and vice versa. But it is not so.

Systems that utilize large language models (LLMs) often store auxiliary data in a vector database of dense embeddings (Borgeaud et al., 2022; Yao et al., 2023).

text to embedding

Vector databases are increasingly popular, but privacy threats within them have not been comprehensively explored. Can the third party service to reproduce the initial text, given its embedding? For next generation search Application, we store token embedings in a Vector Database. Then if embeddings are stored in vector db, what can a bad actor do with it (Threat Model) ? Although it is very hard to invert due to a) data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as ‘post-processing cannot increase information. b) embedding models are trained to have maximum similarity between two similar pieces of text.

--

--

Pradeep Pujari
Pradeep Pujari

Written by Pradeep Pujari

AI Researcher, Author, Founder of TensorHealth-NewsLetter, ex-Meta

No responses yet