~www_lesswrong_com | Bookmarks (706)

Less Laptop Velcro — LessWrong

lesswrong.com

Published on February 9, 2025 3:30 AM GMT A year ago I broke my laptop screen,...
Published on February 9, 2025 3:30 AM GMT A year ago I broke my laptop screen, and took the opportunity to build something I've always wanted: a monitor that folds vertically so I don't have to bend my neck: A few months ago my cracked-screen laptop finished dying, and I got a new one. I use the stacked monitor a lot less now, since...
1
AXRP Episode 38.7 - Anthony Aguirre on the Future of Life Institute — LessWrong

lesswrong.com

Published on February 9, 2025 1:10 AM GMTYouTube link The Future of Life Institute is one...
Published on February 9, 2025 1:10 AM GMTYouTube link The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie...
1
[Job ad] LISA CEO — LessWrong

lesswrong.com

Published on February 9, 2025 12:18 AM GMTOverviewJob Title: Chief Executive OfficerCompany Name: London Initiative for...
Published on February 9, 2025 12:18 AM GMTOverviewJob Title: Chief Executive OfficerCompany Name: London Initiative for Safe AI (LISA)Location: Old Street, LondonDuration: Full-timeSalary Range: £95-125k (more may be available for an exceptional candidate)Application Deadline: Monday 24th FebruaryWe are looking for a Chief Executive Officer with experience as a senior manager, start-up founder, or executive to lead LISA's high-performing, agile team in London. Interest in...
1
Goals don't necesserily start to crystallize the moment AI is capable enough to fake alignment — LessWrong

lesswrong.com

Published on February 8, 2025 11:44 PM GMT(A very short post to put the thoughts out...
Published on February 8, 2025 11:44 PM GMT(A very short post to put the thoughts out there; I think related intuitions are generally useful, including for understanding the sharp left turn dynamics a bit better.)Even if your AI is already alignment-faking, is it the best cognitive architecture that the neural network that runs it can implement?Smart and situationally aware enough agents maximize the outer...
1
How AI Takeover Might Happen in 2 Years — LessWrong

lesswrong.com

Published on February 7, 2025 5:10 PM GMTI’m not a natural “doomsayer.” But unfortunately, part of...
Published on February 7, 2025 5:10 PM GMTI’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios.I’m like a mechanic scrambling last-minute checks before Apollo 13 takes off. If you ask for my take on the situation, I won’t comment on the quality of the in-flight entertainment, or describe how...
1
On the Meta and DeepMind Safety Frameworks — LessWrong

lesswrong.com

Published on February 7, 2025 1:10 PM GMTThis week we got a revision of DeepMind’s safety...
Published on February 7, 2025 1:10 PM GMTThis week we got a revision of DeepMind’s safety framework, and the first version of Meta’s framework. This post covers both of them. Table of Contents Meta’s RSP (Frontier AI Framework). DeepMind Updates its Frontier Safety Framework. What About Risk Governance. Where Do We Go From Here? Here are links for previous coverage of: DeepMind’s Framework 1.0,...
1
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google — LessWrong

lesswrong.com

Published on February 7, 2025 3:57 AM GMTDeepSeek-R1 has recently made waves as a state-of-the-art open-weight...
Published on February 7, 2025 3:57 AM GMTDeepSeek-R1 has recently made waves as a state-of-the-art open-weight model, with potentially substantial improvements in model efficiency and reasoning. But like other open-weight models and leading fine-tunable proprietary models such as OpenAI’s GPT-4o, Google’s Gemini 1.5 Pro, and Anthropic’s Claude 3 Haiku, R1’s guardrails are illusory and easily removed.An example where GPT-4o provides detailed, harmful instructions. We...
1
When you downvote, explain why — LessWrong

lesswrong.com

Published on February 7, 2025 1:03 AM GMTBeing a newcomer and having your post downvoted can...
Published on February 7, 2025 1:03 AM GMTBeing a newcomer and having your post downvoted can be very discouraging. This isn't necessarily a bad thing—obviously we want to discourage people from posting things that are not worth our time to read—but it doesn't provide much feedback other than "something about this post/comment/question/answer makes it undesirable to have on LessWrong". So here's my idea:If you...
1
Medical Windfall Prizes — LessWrong

lesswrong.com

Published on February 6, 2025 11:33 PM GMTSummary AI may produce a windfall surge in government...
Published on February 6, 2025 11:33 PM GMTSummary AI may produce a windfall surge in government revenues in 5 to 10 years. I want governments to spending a small fraction of that windfall on retroactively rewarding entities in proportion to how they have contributed to medical advances, measured by lives saved and suffering avoided. Motivations This post was inspired in part by Critch: Healthcare,...
1
Do No Harm? Navigating and Nudging AI Moral Choices — LessWrong

lesswrong.com

Published on February 6, 2025 7:18 PM GMTTL;DR: How do AI systems make moral decisions, and...
Published on February 6, 2025 7:18 PM GMTTL;DR: How do AI systems make moral decisions, and can we influence their ethical judgments? We probe these questions by examining Llama's 70B (3.1 and 3.3) responses to moral dilemmas, using Goodfire API to steer its decision-making process. Our experiments reveal that simply reframing ethical questions - from "harm one to save many" to "let many perish to...
1
Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas — LessWrong

lesswrong.com

Published on February 6, 2025 6:58 PM GMTOpen Philanthropy is launching a big new Request for Proposals...
Published on February 6, 2025 6:58 PM GMTOpen Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 months, and available funding for substantially more depending on application quality. Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025.OverviewWe're seeking proposals across...
1
AISN #47: Reasoning Models — LessWrong

lesswrong.com

Published on February 6, 2025 6:52 PM GMTWelcome to the AI Safety Newsletter by the Center...
Published on February 6, 2025 6:52 PM GMTWelcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.Reasoning ModelsDeepSeek-R1 has been one of the most significant model releases since ChatGPT. After its release, the DeepSeek’s app quickly rose to...
1
Wild Animal Suffering Is The Worst Thing In The World — LessWrong

lesswrong.com

Published on February 6, 2025 4:15 PM GMTCrossposted from my blog which many people are saying...
Published on February 6, 2025 4:15 PM GMTCrossposted from my blog which many people are saying you should check out! Imagine that you came across an injured deer on the road. She was in immense pain, perhaps having been mauled by a bear or seriously injured in some other way. Two things are obvious:If you could greatly help her at small cost, you should do...
1
Detecting Strategic Deception Using Linear Probes — LessWrong

lesswrong.com

Published on February 6, 2025 3:46 PM GMTCan you tell when an LLM is lying from...
Published on February 6, 2025 3:46 PM GMTCan you tell when an LLM is lying from the activations? Are simple methods good enough? We recently published a paper investigating if linear probes detect when Llama is deceptive.Abstract:AI models might use deceptive strategies as part of scheming or misaligned behaviour. Monitoring outputs alone is insufficient, since the AI might produce seemingly benign outputs while its internal...
1
Alignment Paradox and a Request for Harsh Criticism — LessWrong

lesswrong.com

Published on February 5, 2025 6:17 PM GMTI’m not a scientist, engineer, or alignment researcher in...
Published on February 5, 2025 6:17 PM GMTI’m not a scientist, engineer, or alignment researcher in any respect; I’m a failed science fiction writer. I have a tendency to write opinionated essays that I rarely finish. It’s good that I rarely finish them, however, because if I did, I would generate far too much irrelevant slop. ...
1
Introducing International AI Governance Alliance (IAIGA) — LessWrong

lesswrong.com

Published on February 5, 2025 4:02 PM GMTThe International AI Governance Alliance (IAIGA) is a new...
Published on February 5, 2025 4:02 PM GMTThe International AI Governance Alliance (IAIGA) is a new non-profit organization being incorporated in Geneva, Switzerland. It has two goals:Establish an independent global collective intelligence organization dedicated to coordinating AI research, development, and the distribution of AI-derived economic benefits.Develop and enforce a legally-binding treaty that sets stringent safety standards for AI research and development and mandates fair...
1
Language Models Use Trigonometry to Do Addition — LessWrong

lesswrong.com

Published on February 5, 2025 1:50 PM GMTI (Subhash) am a Masters student in the Tegmark...
Published on February 5, 2025 1:50 PM GMTI (Subhash) am a Masters student in the Tegmark AI Safety Lab at MIT. I am interested in recruiting for full time roles this Spring - please reach out if you're interested in working together!TLDR This blog post accompanies the paper "Language Models Use Trigonometry to Do Addition." Key findings:.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left;...
1
Reviewing LessWrong: Screwtape's Basic Answer — LessWrong

lesswrong.com

Published on February 5, 2025 4:30 AM GMTYeah I put this off until the last day,...
Published on February 5, 2025 4:30 AM GMTYeah I put this off until the last day, and I'm not sure this is the format Raemon was actually looking for. Oh well.Then, in proportion to how valuable they seem, spend at least some time this month reflecting......on the big picture of what intellectual progress seems important to you. Do it whatever way is most valuable...
1
Journalism student looking for sources — LessWrong

lesswrong.com

Published on February 4, 2025 6:58 PM GMTHello Lesswrong community,I am a journalism student doing my...
Published on February 4, 2025 6:58 PM GMTHello Lesswrong community,I am a journalism student doing my capstone documentary on AI alignment. However, this is a topic that I want to make sure is done well. The last thing I would want is to confuse or mislead anyone. That said, I have a few questions:Who would be the best people to reach out to for interview?...
1
Nick Land: Orthogonality — LessWrong

lesswrong.com

Published on February 4, 2025 9:07 PM GMTEditor's note Due to the interest aroused by @jessicata's posts...
Published on February 4, 2025 9:07 PM GMTEditor's note Due to the interest aroused by @jessicata's posts on the topic, Book review: Xenosystems and The Obliqueness Thesis, I thought I'd share a compendium of relevant Xenosystem posts I have put together.If you, like me, have a vendetta against trees, a tastefully typeset LaTeχ version is available at this link. If your bloodlust extends even further,...
1
Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker — LessWrong

lesswrong.com

Published on February 4, 2025 8:34 PM GMTSummary:This post outlines how a view we call subjective...
Published on February 4, 2025 8:34 PM GMTSummary:This post outlines how a view we call subjective naturalism[1] poses challenges to classical Savage-style decision theory. Subjective naturalism requires (i) richness (the ability to represent all propositions the agent can entertain, including self-referential ones) and (ii) austerity (excluding events the agent deems impossible). It is one way of making precise certain requirements of embedded agency. We then...
1
Anti-Slop Interventions? — LessWrong

lesswrong.com

Published on February 4, 2025 7:50 PM GMTIn his recent post arguing against AI Control research,...
Published on February 4, 2025 7:50 PM GMTIn his recent post arguing against AI Control research, John Wentworth argues that the median doom path goes through AI slop, rather than scheming. I find this to be plausible. I believe this suggests a convergence of interests between AI capabilities research and AI alignment research.Historically, there has been a lot of concern about differential progress amongst...
1
We’re in Deep Research — LessWrong

lesswrong.com

Published on February 4, 2025 5:20 PM GMTThe latest addition to OpenAI’s Pro offerings is their...
Published on February 4, 2025 5:20 PM GMTThe latest addition to OpenAI’s Pro offerings is their version of Deep Research. Have you longed for 10k word reports on anything your heart desires, 100 times a month, at a level similar to a graduate student intern? We have the product for you. Table of Contents The Pitch. It’s Coming. Is It Safe?. How Does Deep...
1
The Capitalist Agent — LessWrong

lesswrong.com

Published on February 4, 2025 3:32 PM GMTWith the ongoing evolutions in “artificial intelligence”, of course...
Published on February 4, 2025 3:32 PM GMTWith the ongoing evolutions in “artificial intelligence”, of course we’re seeing the emergence of agents, i.e. AIs which can do rather complex tasks autonomously.The first step is automation, but of what?First comes the stuff where humans currently act like computers anyway: sales, marketing, clerks and everyone else who’s doing repetitive things.But what’s the main metric which they...
1

~www_lesswrong_com | Bookmarks (706)

Domains