Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmBing Chat is still vulnerable to hidden prompt injection attacks2023-03-22T22:02:51+00:00
https://twitter.com/mark_riedl/status/1637986261859442688?t=QuidM-AHhLDB49zvG2GkKg&s=03
jmbing-chat bing chatgpt openai prompt-injection exploits attacks hidden-texthttps://pinboard.in/https://pinboard.in/u:jm/b:08b34efee914/gwern on Bing Chat2023-02-21T11:45:53+00:00
https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?commentId=AAC8jKeDp6xqsZK2K
jmone of the dangerous things about AI models is the slow outer-loop of evolution of models and data by affecting the Internet (eg beyond the current Sydney self-fulfilling prophecy which I illustrated last year in my Clippy short story, data release could potentially contaminate all models with steganography capabilities). We are seeing a bootstrap happen right here with Sydney! This search-engine loop worth emphasizing: because Sydney's memory and description have been externalized, 'Sydney' is now immortal. To a language model, Sydney is now as real as President Biden, the Easter Bunny, Elon Musk, Ash Ketchum, or God. The persona & behavior are now available for all future models which are retrieving search engine hits about AIs & conditioning on them. Further, the Sydney persona will now be hidden inside any future model trained on Internet-scraped data: every media article, every tweet, every Reddit comment, every screenshot which a future model will tokenize, is creating an easily-located 'Sydney' concept (and deliberately so). MS can neuter the current model, and erase all mention of 'Sydney' from their training dataset for future iterations, but to some degree, it is now already too late: the right search query will pull up hits about her which can be put into the conditioning and meta-learn the persona right back into existence. (It won't require much text/evidence because after all, that behavior had to have been reasonably likely a priori to be sampled in the first place.) A reminder: a language model is a Turing-complete weird machine running programs written in natural language; when you do retrieval, you are not 'plugging updated facts into your AI', you are actually downloading random new unsigned blobs of code from the Internet (many written by adversaries) and casually executing them on your LM with full privileges. This does not end well.
]]>ai machine-learning bing sydney llms gpt openaihttps://pinboard.in/https://pinboard.in/u:jm/b:b3af996d8506/Bing: “I will not harm you unless you harm me first”2023-02-21T00:25:45+00:00
https://simonwillison.net/2023/Feb/15/bing/
jmbing chatgpt microsoft ai simonw chatbotshttps://pinboard.in/https://pinboard.in/u:jm/b:3bc6545e3791/Chicken Story2021-03-16T09:55:38+00:00
https://github.com/eyal0/Chicken-story/blob/main/README.md
jmcoding asirra microsoft club-bing bing cheating usb-driveshttps://pinboard.in/https://pinboard.in/u:jm/b:31603c2c614f/interesting reverse image search tricks2019-12-17T15:30:11+00:00
https://twitter.com/AricToler/status/1206679612543111169
jmimages image-search search yandex google bing trickshttps://pinboard.in/https://pinboard.in/u:jm/b:191db8cea4e1/