~www_lesswrong_com | Bookmarks (706)

AI safety content you could create — LessWrong

lesswrong.com

Published on January 6, 2025 3:35 PM GMTThis is a (slightly chaotic and scrappy) list of...
Published on January 6, 2025 3:35 PM GMTThis is a (slightly chaotic and scrappy) list of gaps in AI safety literature that I think would be useful/interesting to exist. I’ve broken it down into sections:AI safety problems beyond alignment: Better explaining non-misalignment catastrophic AI risks.Case studies of analogous problems: Historical lessons from nuclear weapons, surviving the resource curse, etc.Plans: End-to-end plans for achieving AI...
1
Childhood and Education #8: Dealing with the Internet — LessWrong

lesswrong.com

Published on January 6, 2025 2:00 PM GMTRelated: On the 2nd CWT with Jonathan Haidt, The...
Published on January 6, 2025 2:00 PM GMTRelated: On the 2nd CWT with Jonathan Haidt, The Kids are Not Okay, Full Access to Smartphones is Not Good For Children It’s rough out there. In this post, I’ll cover the latest arguments that smartphones should be banned in schools, including simply because the notifications are too distracting (and if you don’t care much about that,...
1
Latent Adversarial Training (LAT) Improves the Representation of Refusal — LessWrong

lesswrong.com

Published on January 6, 2025 10:24 AM GMTTL;DR: We investigated how Latent Adversarial Training (LAT), as...
Published on January 6, 2025 10:24 AM GMTTL;DR: We investigated how Latent Adversarial Training (LAT), as a safety fine-tuning method, affects the representation of refusal behaviour in language models compared to standard Supervised Safety Fine-Tuning (SSFT) and Embedding Space Adversarial Training (AT). We found that LAT appears to encode refusal behaviour in a more distributed way across multiple SVD components in the model's latent...
1
Alternative Cancer Care As Biohacking & Book Review: Surviving "Terminal" Cancer — LessWrong

lesswrong.com

Published on January 6, 2025 7:43 AM GMTIntroductionI’ll write a series of posts in which I'll...
Published on January 6, 2025 7:43 AM GMTIntroductionI’ll write a series of posts in which I'll introduce alternative cancer care. I’ll explain why it can be a rigorous form of biohacking rather than mere quackery. I’ll review books popular in the alternative cancer care world like: Surviving Terminal Cancer by Ben Williams and How to Starve Cancer by Jane McLelland, both written by cancer survivors who...
1
Estimating the benefits of a new flu drug (BXM) — LessWrong

lesswrong.com

Published on January 6, 2025 4:31 AM GMTIntroductionH5N1 is a looming threat, making regular headlines. The...
Published on January 6, 2025 4:31 AM GMTIntroductionH5N1 is a looming threat, making regular headlines. The CDC has identified 66 U.S. human cases as of January 4, 2024. Scott Alexander has a recent post outlining the situation.There are several FDA-approved prescription antiviral flu drugs. The newest, using a novel mechanism of action called "cap-snatching," is baloxavir marboxil (BXM, brand name Xofluza). It is on-patent...
1
Measuring Nonlinear Feature Interactions in Sparse Crosscoders [Project Proposal] — LessWrong

lesswrong.com

Published on January 6, 2025 4:22 AM GMTTL;DR Problem: Sparse crosscoders are powerful tools for compressing...
Published on January 6, 2025 4:22 AM GMTTL;DR Problem: Sparse crosscoders are powerful tools for compressing neural network representations into interpretable features. However, we don’t understand how features interact. Perspective: We need systematic procedures to measure and rank nonlinear feature interactions. This will help us identify which interactions deserve deeper interpretation. Success can be measured by how useful these metrics are for applications like...
1
Speedrunning Rationality: Day II — LessWrong

lesswrong.com

Published on January 6, 2025 3:59 AM GMTI. The Mysterious StrangerHi! I'm Midius. I finished with university...
Published on January 6, 2025 3:59 AM GMTI. The Mysterious StrangerHi! I'm Midius. I finished with university applications yesterday. I've lurked on LW since sophomore summer, but never made an account or posted. For the first time since then, I have almost no obligations. With nine months till university starts and no intention of wasting them, I'm going to become a vampire.Since rationality is the...
1
Parkinson's Law and the Ideology of Statistics — LessWrong

lesswrong.com

Published on January 4, 2025 3:49 PM GMTThe anonymous review of The Anti-Politics Machine published on...
Published on January 4, 2025 3:49 PM GMTThe anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case study of a World Bank intervention in Lesotho, and tells a story about it:The World Bank staff drew reasonable-seeming conclusions from sparse data, and made well-intentioned recommendations on that basis. However, the recommended programs failed, due to factors that would have...
1
The Laws of Large Numbers — LessWrong

lesswrong.com

Published on January 4, 2025 11:54 AM GMTIntroductionIn this short post we'll discuss fine-grained variants of...
Published on January 4, 2025 11:54 AM GMTIntroductionIn this short post we'll discuss fine-grained variants of the law of large numbers beyond the central limit theorem. In particular we'll introduce cumulants as a crucial (and very nice) invariant of probability distributions to track. We'll also briefly discusses parallels with physics. This post should be interesting on its own, but the reason I'm writing it...
1
The Golden Opportunity for American AI — LessWrong

lesswrong.com

Published on January 4, 2025 10:26 AM GMTThis blog post by Microsoft's president, Brad Smith, further...
Published on January 4, 2025 10:26 AM GMTThis blog post by Microsoft's president, Brad Smith, further increases my excitement for what's to come in the AI space over the next few years. To grasp the scale of an $80 billion US dollar capital expenditure, I gathered the following statistics: The property, plant, and equipment on Microsoft's balance sheet total approximately $153 billion. The capital...
2
A Generalization of the Good Regulator Theorem — LessWrong

lesswrong.com

Published on January 4, 2025 9:55 AM GMTThis post was written during the agent foundations fellowship...
Published on January 4, 2025 9:55 AM GMTThis post was written during the agent foundations fellowship with Alex Altair funded by the LTFF. Thanks to Alex for reading and commenting on the draft.Abstract: We prove a version of the Good Regulator Theorem for a regulator with imperfect knowledge of its environment aiming to minimize the entropy of an output.The Original Good Regulator TheoremThe original...
1
debating buying NVDA in 2019 — LessWrong

lesswrong.com

Published on January 4, 2025 5:06 AM GMTAlice: You saw GPT-2, right? Bob: Of course. Alice:...
Published on January 4, 2025 5:06 AM GMTAlice: You saw GPT-2, right? Bob: Of course. Alice: It's running on GPUs using CUDA. OpenAI will keep scaling that up, and other groups will want to do the same thing. Bob: Right. Alice: So, does this mean we should buy Nvidia stock? Bob: I'm not sure. Nvidia makes the hardware used now, but why should we...
1
Making progress bars for Alignment — LessWrong

lesswrong.com

Published on January 3, 2025 9:25 PM GMTWhy we need more and better goalposts for alignment....
Published on January 3, 2025 9:25 PM GMTWhy we need more and better goalposts for alignment. Announcing an AI Alignment Evals Hackathon to help solve this.When it comes to AGI we have targets and progress bars, as benchmarks, evals, things we think only an AGI could do. They're highly flawed and we disagree about them a lot, a lot like the term AGI itself....
1
The Intelligence Curse — LessWrong

lesswrong.com

Published on January 3, 2025 7:07 PM GMT“Show me the incentive, and I’ll show you the...
Published on January 3, 2025 7:07 PM GMT“Show me the incentive, and I’ll show you the outcome.” – Charlie MungerEconomists are used to modeling AI as an important tool, so they don’t get how it could make people irrelevant. Past technological revolutions have driven human potential further. The agrarian revolution birthed civilizations; the industrial revolution let us scale them.But AGI looks a lot more like...
1
The case for pay-on-results coaching — LessWrong

lesswrong.com

Published on January 3, 2025 6:40 PM GMTThanks to Ruby, Stag Lynn, Brian Toomey, Kaj Sotala,...
Published on January 3, 2025 6:40 PM GMTThanks to Ruby, Stag Lynn, Brian Toomey, Kaj Sotala, Anna Salmon, Damon Sasi, Ethan Kuntz, Alex Zhu, and others for conversations that helped develop these ideas.Most coaches charge hourly (~$125-300). This makes sense: predictable income, easy scheduling, matches industry norms. But paying for results creates different incentives that can be valuable in specific contexts.When coaches charge hourly, more...
1
Introducing Squiggle AI — LessWrong

lesswrong.com

Published on January 3, 2025 5:53 PM GMTDiscuss
2
Human study on AI spear phishing campaigns — LessWrong

lesswrong.com

Published on January 3, 2025 3:11 PM GMTTL;DR: We ran a human subject study on whether...
Published on January 3, 2025 3:11 PM GMTTL;DR: We ran a human subject study on whether language models can successfully spear-phish people. We use AI agents built from GPT-4o and Claude 3.5 Sonnet to search the web for available information on a target and use this for highly personalized phishing messages. We achieved a click-through rate of above 50% for our AI-generated phishing emails.Full...
2
Preference Inversion — LessWrong

lesswrong.com

Published on January 2, 2025 6:15 PM GMTSometimes the preferences people report or even try to...
Published on January 2, 2025 6:15 PM GMTSometimes the preferences people report or even try to demonstrate are better modeled as a political strategy and response to coercion, than as an honest report of intrinsic preferences. Modeling this correctly is important if you want to try to efficiently satisfy others' intrinsic preferences, or even your own. So I'm sharing something I wrote on the...
1
Alignment Is NOT All You Need — LessWrong

lesswrong.com

Published on January 2, 2025 5:50 PM GMTAI risk discussions often focus on malfunctions, misuse, and...
Published on January 2, 2025 5:50 PM GMTAI risk discussions often focus on malfunctions, misuse, and misalignment. But this often misses other key challenges from advanced AI systems:Coordination: Race dynamics may encourage unsafe AI deployment, even from ‘safe’ actors.Power: First-movers with advanced AI could gain permanent military, economic, and/or political dominance.Economics: When AI generates all wealth, humans have no leverage to ensure they are...
1
What’s the short timeline plan? — LessWrong

lesswrong.com

Published on January 2, 2025 2:59 PM GMTThis is a low-effort post. I mostly want to...
Published on January 2, 2025 2:59 PM GMTThis is a low-effort post. I mostly want to get other people’s takes and express concern about the lack of detailed and publicly available plans so far. This post reflects my personal opinion and not necessarily that of other members of Apollo Research. I’d like to thank Ryan Greenblatt, Bronson Schoen, Josh Clymer, Buck Shlegeris, Dan Braun,...
1
AI #97: 4 — LessWrong

lesswrong.com

Published on January 2, 2025 2:10 PM GMTThe Rationalist Project was our last best hope for...
Published on January 2, 2025 2:10 PM GMTThe Rationalist Project was our last best hope for peace. An epistemic world 50 million words long, serving as neutral territory. A place of research and philosophy for 30 million unique visitors A shining beacon on the internet, all alone in the night. It was the ending of the Age of Mankind. The year the Great Race...
1
Can private companies test LVTs? — LessWrong

lesswrong.com

Published on January 2, 2025 11:08 AM GMTIt seems like unlike most exciting economic ideas that...
Published on January 2, 2025 11:08 AM GMTIt seems like unlike most exciting economic ideas that economists swear by but governments ignore, Georgian Land Value Taxes should be fairly doable for a private development company to test. They would have to buy up a large area of rural land at a cheap price, and then rent it all out with a contract that specifies...
1
A pragmatic story about where we get our priors — LessWrong

lesswrong.com

Published on January 2, 2025 10:16 AM GMTexpectation calibrator: stimulant-fueled vomiting of long-considered thoughtsIn a 2004...
Published on January 2, 2025 10:16 AM GMTexpectation calibrator: stimulant-fueled vomiting of long-considered thoughtsIn a 2004 essay, "An Intuitive Explanation of Bayes' Theorem", Yudkowsky puts forth that it's not clear where Bayesain priors originally come from. Here's a dialogue from the post poking fun at that difficulty.Q. How can I find the priors for a problem?A. Many commonly used priors are listed in the...
1
Grammars, subgrammars, and combinatorics of generalization in transformers — LessWrong

lesswrong.com

Published on January 2, 2025 9:37 AM GMTIntroduction This is the first installment of my January...
Published on January 2, 2025 9:37 AM GMTIntroduction This is the first installment of my January writing project. We will look at generative neural networks from the framework of (probabilistic) "formal grammars", specifically focusing on building a complex grammar out of simple “rule grammars”. This turns out to lead to a nice, and relatively non-technical way of discussing how complex systems like language models...
1

~www_lesswrong_com | Bookmarks (706)

Domains