~www_lesswrong_com | Bookmarks (706)

Systematic Sandbagging Evaluations on Claude 3.5 Sonnet — LessWrong

lesswrong.com

Published on February 14, 2025 1:22 AM GMTThis was the project I worked on during BlueDot...
Published on February 14, 2025 1:22 AM GMTThis was the project I worked on during BlueDot Impact's AI Safety Fundamentals Alignment course, which expands on findings from Meinke et al's "Frontier Models are Capable of In-context Scheming".SummaryA dataset of 1,011 variations of the sandbagging prompt ("consequences") from Meinke et al were generated using Claude 3.5 Sonnet, and used to run 7 sandbagging evaluations (2...
1
Notes on the Presidential Election of 1836 — LessWrong

lesswrong.com

Published on February 13, 2025 11:40 PM GMTIn 1836, Andrew Jackson had served two terms. In...
Published on February 13, 2025 11:40 PM GMTIn 1836, Andrew Jackson had served two terms. In the presidential election, incumbent vice president Martin Van Buren defeated several Whig candidates.Historical BackgroundBy 1836, there were 25 states. States were often added in pairs (one slave and one free) to maintain political balance: Mississippi and Indiana, Alabama and Illinois, Missouri and Maine. Arkansas had just been added...
1
I'm making a ttrpg about life in an intentional community during the last year before the Singularity — LessWrong

lesswrong.com

Published on February 13, 2025 9:54 PM GMTHi there! I'm Thomas Eliot. You may remember me...
Published on February 13, 2025 9:54 PM GMTHi there! I'm Thomas Eliot. You may remember me from the Bay Area Rationalist Community, or the one in New York, or the one in Melbourne. I'm writing a semi-autobiographical roleplaying called THE SINGULARITY WILL HAPPEN IN LESS THAN A YEAR inspired by The Quiet Year by Avery Alder about life in a barely fictionalized intentional community during...
1
The Paris AI Anti-Safety Summit — LessWrong

lesswrong.com

Published on February 12, 2025 2:00 PM GMTIt doesn’t look good. What used to be the...
Published on February 12, 2025 2:00 PM GMTIt doesn’t look good. What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI Safety. This one was centrally coordination against AI Safety. In November 2023, the UK Bletchley Summit on AI Safety set out to let nations coordinate in the hopes that AI might not kill...
1
Inside the dark forests of the internet — LessWrong

lesswrong.com

Published on February 12, 2025 10:20 AM GMTThis is the second part of a series on...
Published on February 12, 2025 10:20 AM GMTThis is the second part of a series on the identity of social networks:Part one: Looking for humanness in the world wide socialPart two: Inside the dark forests of the internetIf you’ve been hanging for long enough in the tech-intellectual internet corner, you’re probably acquainted with The Theory of The Dark Forest of the Internet—which was published...
1
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs — LessWrong

lesswrong.com

Published on February 12, 2025 9:15 AM GMTDiscuss
1
Why you maybe should lift weights, and How to. — LessWrong

lesswrong.com

Published on February 12, 2025 5:15 AM GMTWho this post is for? Someone who either:Wonders if...
Published on February 12, 2025 5:15 AM GMTWho this post is for? Someone who either:Wonders if they should start lifting weights, and could be convinced of doing so.Wants to lift weights, and doesn't know where to begin. If this is you, you can skip this first section, though I'm guessing you don't know all the benefits yet.The WHYBenefits of ANY EXERCISE:Great mental benefits. I...
1
If Neuroscientists Succeed — LessWrong

lesswrong.com

Published on February 11, 2025 3:33 PM GMTIntroductionIn the Spring of 2022, Stuart Russell wrote an...
Published on February 11, 2025 3:33 PM GMTIntroductionIn the Spring of 2022, Stuart Russell wrote an essay entitled If We Succeed, in which he questioned whether and how the field of AI might need to pivot from its historical goal of creating general-purpose intelligence to a new goal, of creating intelligence that would be provably beneficial for humans. He noted that although the former...
1
Where Would Good Forecasts Most Help AI Governance Efforts? — LessWrong

lesswrong.com

Published on February 11, 2025 6:15 PM GMTThanks to Josh Rosenberg for comments and discussion.IntroductionOne of...
Published on February 11, 2025 6:15 PM GMTThanks to Josh Rosenberg for comments and discussion.IntroductionOne of LessWrong’s historical troves is its pre-ChatGPT AGI forecasts. Not just for the specific predictions people offered, but for observing which sorts of generative processes produced which kinds of forecasts. For instance:[Nuno (Median AGI Timeline) = 2072]: “I take as a starting point datscilly's own prediction, i.e., the result of applying...
1
AI Safety at the Frontier: Paper Highlights, January '25 — LessWrong

lesswrong.com

Published on February 11, 2025 4:14 PM GMTThis is the selection of AI safety papers from...
Published on February 11, 2025 4:14 PM GMTThis is the selection of AI safety papers from my blog "AI Safety at the Frontier". The selection primarily covers ML-oriented research and frontier models. It's primarily concerned with papers (arXiv, conferences etc.).tl;drPaper of the month:Constitutional Classifiers demonstrate a promising defense against universal jailbreaks by using synthetic data and natural language rules.Research highlights:Human-AI teams face challenges in...
1
The News is Never Neglected — LessWrong

lesswrong.com

Published on February 11, 2025 2:59 PM GMTDear Lsusr,I am inspired by your stories about Effective...
Published on February 11, 2025 2:59 PM GMTDear Lsusr,I am inspired by your stories about Effective Evil. My teachers at school tell me it is my civic responsibility to watch the news. Should I reverse this advice? Or should I watch the news like everyone else, except use what I learn for evil?Sincerely,[redacted]Dear [redacted],If you want to make an impact on the world, then...
1
The AI Safety Approach in the Era of Open-Source AI — LessWrong

lesswrong.com

Published on February 11, 2025 2:01 PM GMTOpen-Source AI Undermines Traditional AI Safety ApproachIn the past...
Published on February 11, 2025 2:01 PM GMTOpen-Source AI Undermines Traditional AI Safety ApproachIn the past years, the mainstream approach to AI safety has been "AI alignment + access control." In simple terms, this means allowing a small number of regulated organizations to develop the most advanced AI systems, ensuring that these AIs' goals are aligned with human values, and then strictly controlling access...
1
What About The Horses? — LessWrong

lesswrong.com

Published on February 11, 2025 1:59 PM GMTIn a previous post, I argued that AGI would...
Published on February 11, 2025 1:59 PM GMTIn a previous post, I argued that AGI would not make human labor worthless.One of the most common responses was to ask about the horses. Technology resulted in mass unemployment and population collapse for horses even though they must have had some comparative advantage with more advanced engines. Why couldn’t the same happen to humans? For example,...
1
On Deliberative Alignment — LessWrong

lesswrong.com

Published on February 11, 2025 1:00 PM GMTNot too long ago, OpenAI presented a paper on...
Published on February 11, 2025 1:00 PM GMTNot too long ago, OpenAI presented a paper on their new strategy of Deliberative Alignment. The way this works is that they tell the model what its policies are and then have the model think about whether it should comply with a request. This is an important transition, so this post will go over my perspective on...
1
Detecting AI Agent Failure Modes in Simulations — LessWrong

lesswrong.com

Published on February 11, 2025 11:10 AM GMTAI agents have become significantly more common in the...
Published on February 11, 2025 11:10 AM GMTAI agents have become significantly more common in the last few months. They’re used for web scraping,[1][2] robotics and automation[3], and are even being deployed for military use[4]. As we integrate these agents into critical processes, it is important to simulate their behavior in low-risk environments.In this post, I’ll break down how I used Minecraft to discover and then...
1
World Citizen Assembly about AI - Announcement — LessWrong

lesswrong.com

Published on February 11, 2025 10:51 AM GMTDiscuss
1
Visual Reference for Frontier Large Language Models — LessWrong

lesswrong.com

Published on February 11, 2025 5:14 AM GMTHopefully this can be a helpful visual reference for...
Published on February 11, 2025 5:14 AM GMTHopefully this can be a helpful visual reference for the development and features of frontier large language models in the last year-ish. We are always open to feedback on how the reference could be improved.FAQ:Q: Which models/companies are included?A: We include LLMs that are noteworthy in capabilities, price, or tech advancement.Q: What constitutes a new model versus...
1
Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion? — LessWrong

lesswrong.com

Published on February 11, 2025 12:20 AM GMTDiscuss
1
Forecasting newsletter #2/2025: Forecasting meetup network — LessWrong

lesswrong.com

Published on February 9, 2025 6:07 PM GMTHighlightsForecasting meetup network (a) looking for volunteers. If you...
Published on February 9, 2025 6:07 PM GMTHighlightsForecasting meetup network (a) looking for volunteers. If you want to host a meetup in your city, send an email at forecastingmeetupnetwork@gmail.com.Caroline Pham moves up to Chairman of the CFTC. She is much friendlier to prediction markets and has spent years writting dissents againsts regulatory overreach.“Yunaplan for liquidity” makes subtle but very neat mechanism change for Manifold...
1
How identical twin sisters feel about nieces vs their own daughters — LessWrong

lesswrong.com

Published on February 9, 2025 5:36 PM GMT(cross posted from https://mugwumpery.com/how-identical-twin-sisters-feel-about-nieces-vs-their-own-daughters/)It seems to be generally assumed...
Published on February 9, 2025 5:36 PM GMT(cross posted from https://mugwumpery.com/how-identical-twin-sisters-feel-about-nieces-vs-their-own-daughters/)It seems to be generally assumed that twin sisters feel the same way as other sisters – closer to their own children.But per Hamilton/Trivers, they shouldn’t. They should feel equally related and care equally about daughters and nieces.Identical twins share 100% of their genes, and their nieces are just as closely related as their...
1
Two hemispheres - I do not think it means what you think it means — LessWrong

lesswrong.com

Published on February 9, 2025 3:33 PM GMTI am going to address some misconceptions about brain...
Published on February 9, 2025 3:33 PM GMTI am going to address some misconceptions about brain hemispheres -- in popular culture, and in Zizian theory. The latter, because the madness must stop. The former, because it provided a foundation for the latter.*About 99% of animals are bilaterally symmetric -- the left side and the right side of the body are approximately each other's mirror...
1
The Structure of Professional Revolutions — LessWrong

lesswrong.com

Published on February 9, 2025 1:23 PM GMTAn expert is not merely someone who has memorized...
Published on February 9, 2025 1:23 PM GMTAn expert is not merely someone who has memorized data but someone who has internalized the structure of knowledge itself. This is why we call them PhDs—Doctors of Philosophy. Their expertise extends beyond isolated facts to the organizing principles that connect those facts, allowing them to wield knowledge in novel ways. While this definition is not airtight,...
1
Gary Marcus now saying AI can't do things it can already do — LessWrong

lesswrong.com

Published on February 9, 2025 12:24 PM GMTJanuary 2020, Gary Marcus wrote GPT-2 And The Nature...
Published on February 9, 2025 12:24 PM GMTJanuary 2020, Gary Marcus wrote GPT-2 And The Nature Of Intelligence, demonstrating a bunch of easy problems that GPT-2 couldn’t get right.He concluded these were “a clear sign that it is time to consider investing in different approaches.”Two years later, GPT-3 could get most of these right.Marcus wrote a new list of 15 problems GPT-3 couldn’t solve,...
1
How do you make a 250x better vaccine at 1/10 the cost? Develop it in India. — LessWrong

lesswrong.com

Published on February 9, 2025 3:53 AM GMT(I made a vaccinology/policy-based podcast! A very long one!...
Published on February 9, 2025 3:53 AM GMT(I made a vaccinology/policy-based podcast! A very long one! If you'd like to avoid the summary below, here is the Youtube link and Substack link.).Summary: There's a lot of discussion these days on how China's biotech market is on track to bypass the US's. I wondered: shouldn't we have observed the exact same phenomenon with India? It...
1

~www_lesswrong_com | Bookmarks (706)

Domains