Member-only story
Inverting Language Model:Key Takeaways from an NLP Seminar
Introduction:
When the presentation yet to start, I was thinking, can I co-releate this to “inverting a binary tree” where a left node becomes a corresonding right node. and vice versa. But it is not so.
Systems that utilize large language models (LLMs) often store auxiliary data in a vector database of dense embeddings (Borgeaud et al., 2022; Yao et al., 2023).
Vector databases are increasingly popular, but privacy threats within them have not been comprehensively explored. Can the third party service to reproduce the initial text, given its embedding? For next generation search Application, we store token embedings in a Vector Database. Then if embeddings are stored in vector db, what can a bad actor do with it (Threat Model) ? Although it is very hard to invert due to a) data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as ‘post-processing cannot increase information. b) embedding models are trained to have maximum similarity between two similar pieces of text.