manishs_scratchpad
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| manishs_scratchpad [2024/03/05 19:22] – removed - external edit (Unknown date) 127.0.0.1 | manishs_scratchpad [2024/03/05 19:22] (current) – ↷ Page name changed from manish_scratchpad to manishs_scratchpad manish | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ===== Manish' | ||
| + | This is a scratchpad for Manish to save things until he figures out where the contents should go. | ||
| + | |||
| + | |||
| + | ===== ACT: The Road To Honest AI ===== | ||
| + | Source: [[https:// | ||
| + | |||
| + | Notes: | ||
| + | * This talks about how we can try to make AI more " | ||
| + | * It discusses the paper [[https:// | ||
| + | * It talks about determining a baseline by asking a model to answer both truthfully and to lie about the same topic and then look at the neuron weights to see if you can find a vector that represents truth. | ||
| + | * If you artificially modify the weights by adding or subtracting **the honesty vector**, you can make the model truthful or lie almost independent of the prompt. {{ : | ||
| + | * The paper shows similar effects by identifying a vector for i**mmorality, | ||
