~www_lesswrong_com | Bookmarks (706)

Forecasting AGI: Insights from Prediction Markets and Metaculus — LessWrong

lesswrong.com

Published on February 4, 2025 1:03 PM GMTI have tried to find all prediction market and...
Published on February 4, 2025 1:03 PM GMTI have tried to find all prediction market and Metaculus questions related to AGI timelines. Here I examine how they compare to each other, and what they actually say about when AGI might arrive.If you know of a market that I have missed, please tell me in the comment section! It would also be helpful if you...
1
Ruling Out Lookup Tables — LessWrong

lesswrong.com

Published on February 4, 2025 10:39 AM GMTThis post was written during Alex Altair's agent foundations...
Published on February 4, 2025 10:39 AM GMTThis post was written during Alex Altair's agent foundations fellowship funded by the LTFF.This is not particularly surprising or complex, but I wanted it written up somewhere. When we have been discussing the Agent-like Structure problem, lookup tables often come up as a useful counter-example or intuition pump for how a system could exhibit agent-like behaviour without...
1
Half-baked idea: a straightforward method for learning environmental goals? — LessWrong

lesswrong.com

Published on February 4, 2025 6:56 AM GMTEpistemic status: I want to propose a method of...
Published on February 4, 2025 6:56 AM GMTEpistemic status: I want to propose a method of learning environmental goals (a super big, super important subproblem in Alignment). It's informal, so has a lot of gaps. I worry I missed something obvious, rendering my argument completely meaningless. I asked LessWrong feedback team, but they couldn't get someone knowledgeable enough to take a look. Can you tell...
1
Information Versus Action — LessWrong

lesswrong.com

Published on February 4, 2025 5:13 AM GMTYou can get a clearer view of what's going...
Published on February 4, 2025 5:13 AM GMTYou can get a clearer view of what's going on if you're willing to ignore certain types of information when making decisions. If you heavily use a source of information to make important decisions, that source of information gains new pressure that can make it worse. See Goodhart's Law and Why I Am Not In Charge.I. Imagine you...
1
Utilitarian AI Alignment: Building a Moral Assistant with the Constitutional AI Method — LessWrong

lesswrong.com

Published on February 4, 2025 4:15 AM GMT AbstractThis post presents a project from the AI Safety...
Published on February 4, 2025 4:15 AM GMT AbstractThis post presents a project from the AI Safety Fundamentals: AI Alignment course, in which I developed a utilitarian AI assistant using the Constitutional AI (CAI) method. By applying the CAI method, the project constructs a model that strictly follows utilitarian principles, and evaluates its responses across a diverse set of 150 prompts ranging from harmful to...
1
Tear Down the Burren — LessWrong

lesswrong.com

Published on February 4, 2025 3:40 AM GMT I love the Burren. It hosts something like...
Published on February 4, 2025 3:40 AM GMT I love the Burren. It hosts something like seven weekly sessions in a range of styles and the back room has hosted many great acts including many of my friends. It's a key space in the Boston folk scene, and it's under threat from developers who want to tear it down. But after thinking it through,...
1
Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog) — LessWrong

lesswrong.com

Published on February 4, 2025 2:55 AM GMTExcerpt below. Follow the link for the full post.In...
Published on February 4, 2025 2:55 AM GMTExcerpt below. Follow the link for the full post.In our new paper, we describe a system based on Constitutional Classifiers that guards models against jailbreaks. These Constitutional Classifiers are input and output classifiers trained on synthetically generated data that filter the overwhelming majority of jailbreaks with minimal over-refusals and without incurring a large compute overhead.We are currently...
1
Can someone, anyone, make superintelligence a more concrete concept? — LessWrong

lesswrong.com

Published on February 4, 2025 2:18 AM GMTWhat especially worries me about artificial intelligence is that...
Published on February 4, 2025 2:18 AM GMTWhat especially worries me about artificial intelligence is that I'm freaked out by my inability to marshal the appropriate emotional response. - Sam Harris (NPR, 2017)I've been thinking alot about why so many in the public don't care much about the loss of control risk posed by artificial superintelligence, and I believe a big reason is that...
2
Gradual Disempowerment, Shell Games and Flinches — LessWrong

lesswrong.com

Published on February 2, 2025 2:47 PM GMTOver the past year and half, I've had numerous...
Published on February 2, 2025 2:47 PM GMTOver the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or...
1
Thoughts on Toy Models of Superposition — LessWrong

lesswrong.com

Published on February 2, 2025 1:52 PM GMTThis post explores some of the intuitions I developed...
Published on February 2, 2025 1:52 PM GMTThis post explores some of the intuitions I developed whilst reading Anthropic’s Toy Models of Superposition paper. I focus on motivating the shape of the model and interpreting the visualisations used in the paper. Their accompanying article is thoughtfully written, and I'd highly recommend reading it if you haven’t already. I make no claims that this blog...
1
Escape from Alderaan I — LessWrong

lesswrong.com

Published on February 2, 2025 10:48 AM GMTThis is Part 4 of a Star Wars fanfiction...
Published on February 2, 2025 10:48 AM GMTThis is Part 4 of a Star Wars fanfiction I began writing 6 months ago. The order you read it in is very important. Start from the beginning for the best experience and to avoid spoilers. Obi-Wan Kenobi bought passage from a scumbag with a confusing name. Luke couldn't tell whether Han's name was pronounced /hɑːn/ of...
1
ChatGPT: Exploring the Digital Wilderness, Findings and Prospects — LessWrong

lesswrong.com

Published on February 2, 2025 9:54 AM GMTThis is a cross-post from New Savanna.That is the...
Published on February 2, 2025 9:54 AM GMTThis is a cross-post from New Savanna.That is the title of my latest working paper. It summarizes and synthesizes much of the work I have done with ChatGPT to date and contains the abstracts and contents of all the working papers I have done on ChatGPT. It also includes the abstracts and contents of a number of...
1
Would anyone be interested in pursuing the Virtue of Scholarship with me? — LessWrong

lesswrong.com

Published on February 2, 2025 4:02 AM GMTI'm a new undergraduate student essentially taking a gap...
Published on February 2, 2025 4:02 AM GMTI'm a new undergraduate student essentially taking a gap year(it's complicated). I'm looking for someone that would be interested in studying various fields of science and mathematics with me. I've taken through Calculus 2, know how to program in Python, and know a smattering about mechanics/probability/statistics/biology(though probably more at an introductory undergraduate level if that) The curriculum...
1
Seasonal Patterns in BIDA's Attendance — LessWrong

lesswrong.com

Published on February 2, 2025 2:40 AM GMT We've been keeping attendance statistics for BIDA since...
Published on February 2, 2025 2:40 AM GMT We've been keeping attendance statistics for BIDA since 2011, and looking at the online chart I noticed some patterns in the moving average that looked seasonal: How seasonal is it? First, here are the raw attendance values: This excludes special events (family dances, double dances, bonus dances, Spark in the Dark) and our new 4th Sunday...
1
AI acceleration, DeepSeek, moral philosophy — LessWrong

lesswrong.com

Published on February 2, 2025 12:08 AM GMT(Cross-posted from my website)So, AI is accelerating. Does humanity...
Published on February 2, 2025 12:08 AM GMT(Cross-posted from my website)So, AI is accelerating. Does humanity have a future, and if so, what?I consider myself a logical person, but this is not a logical piece of writing. It’s an attempt to share the contours of what I feel when I feel my thoughts about the future of AI. I’m aiming to evoke, not to...
2
Falsehoods you might believe about people who are at a rationalist meetup — LessWrong

lesswrong.com

Published on February 1, 2025 11:32 PM GMTI go to a lot of rationalist meetups. I...
Published on February 1, 2025 11:32 PM GMTI go to a lot of rationalist meetups. I quite enjoy them, and it’s often because of the people who go to the meetups. There’s a number of assumptions you might have about people who go to rationalist meetups, and many of them are mostly true. However, there is a difference between most examples and all examples. Most...
1
Rationalist Movie Reviews — LessWrong

lesswrong.com

Published on February 1, 2025 11:10 PM GMTPrimer (2004) Content warnings here What would actually happen,...
Published on February 1, 2025 11:10 PM GMTPrimer (2004) Content warnings here What would actually happen, in real life (circa 2004), if two typical techie engineers invented time travel by accident? Primer is odd for being both slow-burning and fast-paced. It's short and self-contained and low-budget, yet majestic and complicated and, er, "recursive". I haven't eaten lunch since later this afternoon. The parallels with...
1
Retroactive If-Then Commitments — LessWrong

lesswrong.com

Published on February 1, 2025 10:22 PM GMTAn if-then commitment is a framework for responding to...
Published on February 1, 2025 10:22 PM GMTAn if-then commitment is a framework for responding to AI risk: "If an AI model has capability X, then AI development/deployment must be halted until mitigations Y are put in place." As an extension of this approach, we should consider retroactive if-then commitments. We should behave as if we wrote if-then commitments a few years ago, and...
1
Poetic Methods I: Meter as Communication Protocol — LessWrong

lesswrong.com

Published on February 1, 2025 6:22 PM GMT(Normally I cross post the full post from substack,...
Published on February 1, 2025 6:22 PM GMT(Normally I cross post the full post from substack, but in this case substack has significantly better poetry formatting, so just quoting the introduction)During my Christmas break, I found a doorway into the structure and intricacies of english poetry: the book "Rhyme's Rooms: The Architecture of Poetry" by Brad Leithauser.I had read english verse before, and skimmed...
1
Blackpool Applied Rationality Unconference 2025 — LessWrong

lesswrong.com

Published on February 1, 2025 2:26 PM GMTTL;DR Join us for 4 days of applied rationality...
Published on February 1, 2025 2:26 PM GMTTL;DR Join us for 4 days of applied rationality workshops and activities at the EA Hotel this April. Apply here by 22nd February.In April we’re hosting an intimate 4-day applied rationality unconference/retreat hosted at CEEALAR (aka the EA Hotel). Come along and make new friends while dedicating time to working through your life’s biggest challenges using CFAR and CFAR-adjacent techniques. Previous years’...
1
Blackpool Applied Rationality Unconference 2025 — LessWrong

lesswrong.com

Published on February 1, 2025 2:09 PM GMTTL;DR Join us for 4 days of applied rationality...
Published on February 1, 2025 2:09 PM GMTTL;DR Join us for 4 days of applied rationality workshops and activities at the EA Hotel this April. Apply here by 22nd February.In April we’re hosting an intimate 4-day applied rationality unconference/retreat hosted at CEEALAR (aka the EA Hotel). Come along and make new friends while dedicating time to working through your life’s biggest challenges using CFAR and CFAR-adjacent techniques. Previous years’...
1
How likely is an attempted coup in the United States in the next four years? — LessWrong

lesswrong.com

Published on February 1, 2025 1:12 PM GMTTrump cares a lot about personal power. He does...
Published on February 1, 2025 1:12 PM GMTTrump cares a lot about personal power. He does not (to put it lightly) seem to have much respect for tradition or the rule of law. He has 'joked' about a third term before.Many of the traditional safeguards are disappearing: Trump has fired many army officers, senior FBI leaders, and inspectors general, replacing them with loyalists. Many...
1
Blackpool Applied Rationality Unconference 2025 — LessWrong

lesswrong.com

Published on February 1, 2025 1:04 PM GMTTL;DR Join us for 4 days of applied rationality...
Published on February 1, 2025 1:04 PM GMTTL;DR Join us for 4 days of applied rationality workshops and activities at the EA Hotel this April. Apply here by 22nd February.In April we’re hosting an intimate 4-day applied rationality unconference/retreat hosted at CEEALAR (aka the EA Hotel). Come along and make new friends while dedicating time to working through your life’s biggest challenges using CFAR and CFAR-adjacent techniques. Previous years’...
1
One-dimensional vs multi-dimensional features in interpretability — LessWrong

lesswrong.com

Published on February 1, 2025 9:10 AM GMTChris Olah's “What is a Linear Representation? What is...
Published on February 1, 2025 9:10 AM GMTChris Olah's “What is a Linear Representation? What is a Multidimensional Feature?” (July Circuits Update) prompted a moment of pause for me regarding the term "one-dimensional feature." I initially conflated that phrase with the number of dimensions in the activation space (for example, the 768 dimensions in GPT‑2 Small). However, Olah uses "one-dimensional" to describe a property...
1

~www_lesswrong_com | Bookmarks (706)

Domains