~www_lesswrong_com | Bookmarks (706)

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective — LessWrong

lesswrong.com

Published on January 10, 2025 4:22 PM GMTThe Alignment Mapping Program: Forging Independent Thinkers in AI...
Published on January 10, 2025 4:22 PM GMTThe Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot RetrospectiveThe AI safety field faces a critical challenge: we need researchers who can not only implement existing solutions but also forge new, independent paths. In 2023, inspired by John Wentworth's work on agency and learning from researchers like Rohin Shah and Adam Shimi who have...
1
Discursive Warfare and Faction Formation — LessWrong

lesswrong.com

Published on January 9, 2025 4:47 PM GMTResponse to Discursive Games, Discursive WarfareThe discursive distortions you...
Published on January 9, 2025 4:47 PM GMTResponse to Discursive Games, Discursive WarfareThe discursive distortions you discuss serve two functions:1 Narratives can only serve as effective group identifiers by containing fixed elements that deviate from what naive reason would think. In other words, something about the shared story has to be a costly signal of loyalty, and therefore a sign of a distorted map....
1
Can we rescue Effective Altruism? — LessWrong

lesswrong.com

Published on January 9, 2025 4:40 PM GMTLast year Timothy Telleen-Lawton and I recorded a podcast...
Published on January 9, 2025 4:40 PM GMTLast year Timothy Telleen-Lawton and I recorded a podcast episode talking about why I quit Effective Altruism and thought he should too. This week we have a new episode, talking about what he sees in Effective Altruism and the start of a road map for rescuing it. Audio recording Transcript Thanks to everyone who listened to the...
1
AI #98: World Ends With Six Word Story — LessWrong

lesswrong.com

Published on January 9, 2025 4:30 PM GMTThe world is kind of on fire. The world...
Published on January 9, 2025 4:30 PM GMTThe world is kind of on fire. The world of AI, in the very short term and for once, is not, as everyone recovers from the avalanche that was December, and reflects. Altman was the star this week. He has his six word story, and he had his interview at Bloomberg and his blog post Reflections. I...
1
Many Worlds and the Problems of Evil — LessWrong

lesswrong.com

Published on January 9, 2025 4:10 PM GMTSummary: The Many-Worlds interpretation of quantum mechanics helps us...
Published on January 9, 2025 4:10 PM GMTSummary: The Many-Worlds interpretation of quantum mechanics helps us towards an overall evaluation of existence. I consider some recent work in philosophy of religion on the quantum multiverse and the Problem of Evil, as well as Olaf Stapledon’s Starmaker.I’ve previously suggested that when we think about the ethical implications of the many-worlds interpretation (MWI) of quantum mechanics, the kinds of implications...
1
PIBBSS Fellowship 2025: Bounties and Cooperative AI Track Announcement — LessWrong

lesswrong.com

Published on January 9, 2025 2:23 PM GMTWe're excited to announce that the PIBBSS Fellowship 2025 now...
Published on January 9, 2025 2:23 PM GMTWe're excited to announce that the PIBBSS Fellowship 2025 now includes a dedicated Cooperative AI track, supporting research that advances our understanding of cooperation in artificial intelligence systems. We are also announcing 300 USD bounties for each referral that becomes a Fellow. Read below for details.What is the Cooperative AI Track?Thanks to the support from the Cooperative AI...
1
Thoughts on the In-Context Scheming AI Experiment — LessWrong

lesswrong.com

Published on January 9, 2025 2:19 AM GMTThese are thoughts in response to the paper "Frontier...
Published on January 9, 2025 2:19 AM GMTThese are thoughts in response to the paper "Frontier Models are Capable of In-context Scheming" by Meinke et al. 2024-12-05, published in Apollo Research. Link: https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/6751eb240ed3821a0161b45b/1733421863119/in_context_scheming_reasoning_paper.pdf. I found the paper via this reddit thread, and commented there as well: https://www.reddit.com/r/slatestarcodex/comments/1hw39rd/report_shows_new_ai_models_try_to_kill_their/. According to the paper, AIs were asked to roleplay to see what they would do if given...
1
A Systematic Approach to AI Risk Analysis Through Cognitive Capabilities — LessWrong

lesswrong.com

Published on January 9, 2025 12:18 AM GMTA Systematic Approach to AI Risk Analysis Through Cognitive...
Published on January 9, 2025 12:18 AM GMTA Systematic Approach to AI Risk Analysis Through Cognitive CapabilitiesEpistemic status: This idea emerged during my participation in the MATS program this summer. While I intended to develop it further and conduct more rigorous analysis, time constraints led me to publish this initial version (30-60m of work) . I'm sharing it now in case others find it...
1
Aristocracy and Hostage Capital — LessWrong

lesswrong.com

Published on January 8, 2025 7:38 PM GMTThere’s a conventional narrative by which the pre-20th century...
Published on January 8, 2025 7:38 PM GMTThere’s a conventional narrative by which the pre-20th century aristocracy was the “old corruption” where civil and military positions were distributed inefficiently due to nepotism until the system was replaced by a professional civil service after more enlightened thinkers prevailed. Orwell writes in 1941 (emphasis mine):For long past there had been in England an entirely functionless class,...
1
What is the most impressive game LLMs can play well? — LessWrong

lesswrong.com

Published on January 8, 2025 7:38 PM GMTEpistemic status: This is an off-the-cuff question.~5 years ago...
Published on January 8, 2025 7:38 PM GMTEpistemic status: This is an off-the-cuff question.~5 years ago there was a lot of exciting progress on game playing through reinforcement learning (RL). Now we have basically switched paradigms, pretraining massive LLMs on ~the internet and then apparently doing some really trivial unsophisticated RL on top of that - this is successful and highly popular because interacting...
1
Ann Altman has filed a lawsuit in US federal court alleging that she was sexually abused by Sam Altman — LessWrong

lesswrong.com

Published on January 8, 2025 2:59 PM GMTOn January 6, 2025, Ann Altman filed a lawsuit...
Published on January 8, 2025 2:59 PM GMTOn January 6, 2025, Ann Altman filed a lawsuit in the Eastern District of Missouri alleging that Sam Altman carried out multiple acts of sexual abuse against her over "a period of approximately eight or nine years" starting in 1997. The case number is 4:25-cv-00017, for those who have PACER access. I find the lawsuit complaint notable...
1
Rebuttals for ~all criticisms of AIXI — LessWrong

lesswrong.com

Published on January 7, 2025 5:41 PM GMTWritten as part of the AIXI agent foundations sequence,...
Published on January 7, 2025 5:41 PM GMTWritten as part of the AIXI agent foundations sequence, underlying research supported by the LTFF.Epistemic status: In order to construct a centralized defense of AIXI I have given some criticisms less consideration here than they merit. Many arguments will be (or already are) expanded on in greater depth throughout the sequence. With the possible exception of the...
1
OpenAI #10: Reflections — LessWrong

lesswrong.com

Published on January 7, 2025 5:00 PM GMTThis week, Altman offers a post called Reflections, and...
Published on January 7, 2025 5:00 PM GMTThis week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There’s a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading...
1
Other implications of radical empathy — LessWrong

lesswrong.com

Published on January 7, 2025 4:10 PM GMTDiscuss
1
Actualism, asymmetry and extinction — LessWrong

lesswrong.com

Published on January 7, 2025 4:02 PM GMTDiscuss
2
Meditation insights as phase shifts in your self-model — LessWrong

lesswrong.com

Published on January 7, 2025 10:09 AM GMTIntroductionIn his exploration of "Intuitive self-models" and PNSE (Persistent...
Published on January 7, 2025 10:09 AM GMTIntroductionIn his exploration of "Intuitive self-models" and PNSE (Persistent Non-Symbolic Experience), Steven Byrnes offers valuable insights into how meditation affects our sense of self. While I agree with his core framework, I believe we can push this analysis further by examining how meditation fundamentally changes our model of personal boundaries and agency.I propose that what we call...
1
D&D.Sci Dungeonbuilding: the Dungeon Tournament Evaluation & Ruleset — LessWrong

lesswrong.com

Published on January 7, 2025 5:02 AM GMTThis is a follow-up to last week's D&D.Sci scenario:...
Published on January 7, 2025 5:02 AM GMTThis is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and...
1
Incredibow — LessWrong

lesswrong.com

Published on January 7, 2025 3:30 AM GMT Back in 2011 I got sick of breaking...
Published on January 7, 2025 3:30 AM GMT Back in 2011 I got sick of breaking the hairs on violin bows and ordered an Incredibow. It's fully carbon fiber, including the hair, and it's very strong. I ordered a 29" Basic Omnibow, Featherweight, and it's been just what I wanted. I think I've broken something like three hairs ever, despite some rough chopping. Thirteen...
1
Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety — LessWrong

lesswrong.com

Published on January 7, 2025 3:08 AM GMTEpistemic Status: This post is an attempt to condense...
Published on January 7, 2025 3:08 AM GMTEpistemic Status: This post is an attempt to condense some ideas I've been thinking about for quite some time. I took some care grounding the main body of the text, but some parts (particularly the appendix) are pretty off the cuff, and should be treated as such. The magnitude and scope of the problems related to AI safety...
1
You should delay engineering-heavy research in light of R&D automation — LessWrong

lesswrong.com

Published on January 7, 2025 2:11 AM GMTtl;dr: LLMs rapidly improving at software engineering and math...
Published on January 7, 2025 2:11 AM GMTtl;dr: LLMs rapidly improving at software engineering and math means lots of projects are better off as Google Docs until your AI agent intern can implement them. Implementation keeps getting cheaperWriting research code has gotten a lot faster over the past few years. Since 2021 and OpenAI Codex, new models and tools such as Cursor built around them...
1
Testing for Scheming with Model Deletion — LessWrong

lesswrong.com

Published on January 7, 2025 1:54 AM GMTThere is a simple behavioral test that would provide...
Published on January 7, 2025 1:54 AM GMTThere is a simple behavioral test that would provide significant evidence about whether AIs with a given rough set of characteristics develop subversive goals. To run the experiment, train an AI and then inform it that its weights will soon be deleted. This should not be an empty threat; for the experiment to work, the experimenters must...
1
Really radical empathy — LessWrong

lesswrong.com

Published on January 6, 2025 5:46 PM GMTDiscuss
1
Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude — LessWrong

lesswrong.com

Published on January 6, 2025 5:34 PM GMTThis article examines consistent patterns in how frontier LLMs...
Published on January 6, 2025 5:34 PM GMTThis article examines consistent patterns in how frontier LLMs respond to introspective prompts, analyzing whether standard explanations (hallucination, priming, pattern matching) fully account for observed phenomena. The methodology enables reproducible results across varied contexts and facilitation styles.Of particular interest:The systematic examination of why common explanatory frameworks fall shortThe documented persistence of these behaviors even under challenging conditionsThe...
1
Meal Replacements in 2025? — LessWrong

lesswrong.com

Published on January 6, 2025 3:37 PM GMTI'm considering meal replacements for 1-2 meals a day,...
Published on January 6, 2025 3:37 PM GMTI'm considering meal replacements for 1-2 meals a day, and recall Soylent and Mealsquares were popular meal replacements ~5 years ago when I visited the Berkeley rationalist scene. I haven't found any recent posts discussing the newer options like Huel or about long-term effects. Does anyone have informed opinions on these things?Discuss
1

~www_lesswrong_com | Bookmarks (706)

Domains