~www_lesswrong_com | Bookmarks (706)

A problem shared by many different alignment targets — LessWrong

lesswrong.com

Published on January 15, 2025 2:22 PM GMTThe first section describes problems with a few different...
Published on January 15, 2025 2:22 PM GMTThe first section describes problems with a few different alignment targets. The second section argues that it is useful to view all of them as variations of a single alignment target: building an AI that does what a Group wants that AI to do. The post then goes on to argue that all of the individual problems...
1
LLMs for language learning — LessWrong

lesswrong.com

Published on January 15, 2025 2:08 PM GMTMy current outlook on LLMs is that they are...
Published on January 15, 2025 2:08 PM GMTMy current outlook on LLMs is that they are some combination of bullshit to fool people who are looking to be fooled, and a modest but potentially very important improvement in the capacity to search large corpuses of text in response to uncontroversial natural-language queries and automatically summarize the results. Beyond this, I think they’re massively overhyped....
1
Feature request: comment bookmarks — LessWrong

lesswrong.com

Published on January 15, 2025 6:45 AM GMTSometimes I see a comment I'd like to bookmark,...
Published on January 15, 2025 6:45 AM GMTSometimes I see a comment I'd like to bookmark, but currently the only ways to save a comment are by subscribing to its replies (which sometimes produces unwanted notifications and requires me to check a different profile section than the rest of my bookmarks) or bookmarking the post it's attached to (which can be inconvenient for posts...
1
How do fictional stories illustrate AI misalignment? — LessWrong

lesswrong.com

Published on January 15, 2025 6:11 AM GMTThis is an article in the featured articles series...
Published on January 15, 2025 6:11 AM GMTThis is an article in the featured articles series from AISafety.info. AISafety.info writes AI safety intro content. We'd appreciate any feedback. The most up-to-date version of this article is on our website, along with 300+ other articles on AI existential safety.There are many fictional stories that depict unaligned non-human entities in ways that can illustrate some aspects of...
1
We probably won't just play status games with each other after AGI — LessWrong

lesswrong.com

Published on January 15, 2025 4:56 AM GMTThere is a view I’ve encountered somewhat often,[1] which can...
Published on January 15, 2025 4:56 AM GMTThere is a view I’ve encountered somewhat often,[1] which can be summarized as follows: After the widespread deployment of advanced AGI, assuming humanity survives, material scarcity will largely disappear. Everyone will have sufficient access to necessities like food, housing, and other basic resources. Therefore, the only scarce resource remaining will be "social status". As a result, the primary activity...
1
Progress links and short notes, 2025-01-13 — LessWrong

lesswrong.com

Published on January 13, 2025 6:35 PM GMTMuch of this content originated on social media. To follow...
Published on January 13, 2025 6:35 PM GMTMuch of this content originated on social media. To follow news and announcements in a more timely fashion, follow me on Twitter, Threads, Bluesky, or Farcaster.ContentsFrom me and RPIJobs and fellowshipsOther opportunitiesEventsQuestionsAnnouncementsCommentary on the wildfiresSam Altman: AI workers in 2025, superintelligence nextNever underestimate elasticity of supply“The earnestness and diligence of smart technical people”“Americans born on foreign soil”UndauntedEli Dourado’s model of policy changeStatsLinksAIInspirationPoliticsChina...
1
Better antibodies by engineering targets, not engineering antibodies (Nabla Bio) — LessWrong

lesswrong.com

Published on January 13, 2025 3:05 PM GMTNote: Thank you to Surge Biswas (founder of Nabla...
Published on January 13, 2025 3:05 PM GMTNote: Thank you to Surge Biswas (founder of Nabla Bio) for comments on this draft and and Dylan Reid (an investor into Nabla) for various antibody discussions! Also, thank you to Martin Pacesa for adding some insight on a paper of his I discuss here (his comments are included).IntroductionAntibody design startups are singlehandedly the most common archetype...
1
Emergent effects of scaling on the functional hierarchies within large language models — LessWrong

lesswrong.com

Published on January 13, 2025 2:31 PM GMTNote: I am a postdoc in fMRI neuroscience. I...
Published on January 13, 2025 2:31 PM GMTNote: I am a postdoc in fMRI neuroscience. I have been exploring LLMs for their use in fMRI research (e.g., extracting activations and mapping them onto brain data). To support this direction, I first did something of an interpretability study. This yielded some potentially interesting and surprising results on the emergent effects of scaling. I organized the...
1
Zvi’s 2024 In Movies — LessWrong

lesswrong.com

Published on January 13, 2025 1:40 PM GMTNow that I am tracking all the movies I...
Published on January 13, 2025 1:40 PM GMTNow that I am tracking all the movies I watch via Letterboxd, it seems worthwhile to go over the results at the end of the year, and look for lessons, patterns and highlights. Table of Contents The Rating Scale. The Numbers. Very Briefly on the Top Picks and Whether You Should See Them. Movies Have Decreasing Marginal...
1
Paper club: He et al. on modular arithmetic (part I) — LessWrong

lesswrong.com

Published on January 13, 2025 11:18 AM GMTIn this post we’ll be looking at the recent...
Published on January 13, 2025 11:18 AM GMTIn this post we’ll be looking at the recent paper “Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks” by He et al. This post is partially a sequel to my earlier post on grammars and subgrammars, though it can be read independently. There will be a more technical part II.I really like this...
1
Moderately More Than You Wanted To Know: Depressive Realism — LessWrong

lesswrong.com

Published on January 13, 2025 2:57 AM GMTDepressive realism is the idea that depressed people have...
Published on January 13, 2025 2:57 AM GMTDepressive realism is the idea that depressed people have more accurate beliefs than the general population. It’s a common factoid in “things I learned” lists, and often posited as a matter of settled science.In this post, I’ll explore whether it’s true.Where It BeganThe depressive realism hypothesis was first studied by Lauren Alloy and Lyn Yvonne Abramson, in...
1
Applying traditional economic thinking to AGI: a trilemma — LessWrong

lesswrong.com

Published on January 13, 2025 1:23 AM GMTTraditional economics thinking has two strong principles, each based...
Published on January 13, 2025 1:23 AM GMTTraditional economics thinking has two strong principles, each based on abundant historical data:Principle (A): No “lump of labor”: If human population goes up, there might be some wage drop in the very short term, because the demand curve for labor slopes down. But in the longer term, people will find new productive things to do, such that...
1
Do Antidepressants work? (First Take) — LessWrong

lesswrong.com

Published on January 12, 2025 5:11 PM GMTI've been researching the controversy over whether antidepressants truly...
Published on January 12, 2025 5:11 PM GMTI've been researching the controversy over whether antidepressants truly work or whether they are not superior to a placebo. The latter possibility really contains two possibilities itself: either placebos are effective at treating depression, or the placebo effect reflects mean reversion. Here, the term "antidepressant" refers to drugs classified as SSRIs and SNRIs.Some stylized facts from the...
1
AI Developed: A Novel Idea for Harnessing Magnetic Reconnection as an Energy Source — LessWrong

lesswrong.com

Published on January 12, 2025 5:11 PM GMTIntroductionMagnetic reconnection—the sudden rearrangement of magnetic field lines—drives dramatic...
Published on January 12, 2025 5:11 PM GMTIntroductionMagnetic reconnection—the sudden rearrangement of magnetic field lines—drives dramatic energy releases in astrophysical and laboratory plasmas. Solar flares, tokamak disruptions, and magnetospheric substorms all hinge on reconnection. Usually, these events are uncontrolled and often destructive. But what if we could systematically harness reconnection here on Earth, funneling that released magnetic energy into an external circuit? This post...
1
Building AI Research Fleets — LessWrong

lesswrong.com

Published on January 12, 2025 6:23 PM GMTFrom AI scientist to AI research fleetResearch automation is...
Published on January 12, 2025 6:23 PM GMTFrom AI scientist to AI research fleetResearch automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI...
1
Near term discussions need something smaller and more concrete than AGI — LessWrong

lesswrong.com

Published on January 11, 2025 6:24 PM GMTMotivationI want a more concrete concept than AGI[1] to talk...
Published on January 11, 2025 6:24 PM GMTMotivationI want a more concrete concept than AGI[1] to talk and write with. I want something more concrete because I am tired of the costs associated with how big, inferentially distant, and many-pathed the concept of AGI is, which makes conversation expensive. Accounting for the bigness, inferential distance, and many-pathed-ness is very important for dealing with AGI properly...
1
A proposal for iterated interpretability with known-interpretable narrow AIs — LessWrong

lesswrong.com

Published on January 11, 2025 2:43 PM GMTI decided, as a challenge to myself, to spend...
Published on January 11, 2025 2:43 PM GMTI decided, as a challenge to myself, to spend 5 minutes, by the clock, solving the alignment problem. This is the result, plus 25 minutes of writing it up. As such, it might be a bit unpolished, but I hope that it can still be instructive.Background This proposal is loosely based on iterated amplification, a proposal for training...
1
We need a universal definition of 'agency' and related words — LessWrong

lesswrong.com

Published on January 11, 2025 3:22 AM GMTAnd by "we" I mean "I". I'm the one...
Published on January 11, 2025 3:22 AM GMTAnd by "we" I mean "I". I'm the one struggling.Agency with a 'y', basically means “The condition of being in action; operation.” Or the means or mode of acting, the context I hear it used most often is sociologically:"Agency is the capacity of individuals to have the power and resources to fulfill their potential." One's agency is one's...
1
AI for medical care for hard-to-treat diseases? — LessWrong

lesswrong.com

Published on January 10, 2025 11:55 PM GMTWith LLM-based AI passing benchmarks that would challenge people...
Published on January 10, 2025 11:55 PM GMTWith LLM-based AI passing benchmarks that would challenge people with a Ph.D in relevant fields, I'm left wondering what they can do for real-world problems for which nobody knows the correct answer, such as how to treat potentially fatal medical conditions with no known cure. Are we at the point where AI can do better than curated...
1
Beliefs and state of mind into 2025 — LessWrong

lesswrong.com

Published on January 10, 2025 10:07 PM GMTThis post is to record the state of my...
Published on January 10, 2025 10:07 PM GMTThis post is to record the state of my thinking at the start of 2025. I plan to update these reflections in 6-12 months depending on how much changes in the field of AI.1. SummaryIt is best not to pause AI progress until at least one major AI lab achieves a system capable of providing approximately a...
1
Is AI Alignment Enough? — LessWrong

lesswrong.com

Published on January 10, 2025 6:57 PM GMTVirtually everyone I see in the AI safety community...
Published on January 10, 2025 6:57 PM GMTVirtually everyone I see in the AI safety community seems to believe that working on AI alignment is the key to ensuring a safe future. However, it seems to me that AI alignment is at best a secondary instrumental goal that can't in and of itself achieve our terminal goal. At worst, it's a complete distraction.Defining humanity's...
1
Recommendations for Technical AI Safety Research Directions — LessWrong

lesswrong.com

Published on January 10, 2025 7:34 PM GMTAnthropic’s Alignment Science team conducts technical research aimed at...
Published on January 10, 2025 7:34 PM GMTAnthropic’s Alignment Science team conducts technical research aimed at mitigating the risk of catastrophes caused by future advanced AI systems, such as mass loss of life or permanent loss of human control. A central challenge we face is identifying concrete technical work that can be done today to prevent these risks. Future worlds where our research matters—that...
1
What are some scenarios where an aligned AGI actually helps humanity, but many/most people don't like it? — LessWrong

lesswrong.com

Published on January 10, 2025 6:13 PM GMTOne can call it "deceptive misalignment": the aligned AGI...
Published on January 10, 2025 6:13 PM GMTOne can call it "deceptive misalignment": the aligned AGI works as intended, but people really don't like it. Some scenarios I can think of, of various levels of realism:1. Going against the creators' will1.1. A talented politician convinces the majority of humans that the AGI is bad for humanity, and must be switched off. In a democratic vote,...
1
Human takeover might be worse than AI takeover — LessWrong

lesswrong.com

Published on January 10, 2025 4:53 PM GMTEpistemic status -- sharing rough notes on an important...
Published on January 10, 2025 4:53 PM GMTEpistemic status -- sharing rough notes on an important topic because I don't think I'll have a chance to clean them up soon.SummarySuppose a human used AI to take over the world. Would this be worse than AI taking over? I think plausibly:In expectation, human-level AI will better live up to human moral standards than a randomly...
1

~www_lesswrong_com | Bookmarks (706)

Domains