~www_lesswrong_com | Bookmarks (706)

Spencer Greenberg hiring a personal/professional/research remote assistant for 5-10 hours per week — LessWrong

lesswrong.com

Published on March 2, 2025 6:01 PM GMTDiscuss
1
Will LLM agents become the first takeover-capable AGIs? — LessWrong

lesswrong.com

Published on March 2, 2025 5:15 PM GMTOne of my takeaways from EA Global this year...
Published on March 2, 2025 5:15 PM GMTOne of my takeaways from EA Global this year was that most alignment people aren't focused on LLM-based agents (LMAs)[1] as a route to takeover-capable AGI. I want to better understand this position, since I estimate this path to AGI as likely enough (maybe around 60%) to be worth specific focus and concern.Two reasons people might not care about...
1
Not-yet-falsifiable beliefs? — LessWrong

lesswrong.com

Published on March 2, 2025 2:11 PM GMTI recently encountered an unusual argument in favor of...
Published on March 2, 2025 2:11 PM GMTI recently encountered an unusual argument in favor of religion. To summarize: Imagine an ancient Roman commoner with an unusual theory: if stuff gets squeezed really, really tightly, it becomes so heavy that everything around it gets pulled in, even light. They're sort-of correct---that's a layperson's description of a black hole. However, it is impossible for anyone...
1
Saving Zest — LessWrong

lesswrong.com

Published on March 2, 2025 12:00 PM GMT I realized I've been eating oranges wrong for...
Published on March 2, 2025 12:00 PM GMT I realized I've been eating oranges wrong for years. I cut them into slices and eat them slice by slice. Which is fine, except that I'm wasting the zest. Zest is tasty, versatile, compact, and freezes well. So now, whenever I eat a navel orange I zest it first: The zest goes in a small container...
1
Open Thread Spring 2025 — LessWrong

lesswrong.com

Published on March 2, 2025 2:33 AM GMTIf it’s worth saying, but not worth its own...
Published on March 2, 2025 2:33 AM GMTIf it’s worth saying, but not worth its own post, here's a place to put it.If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place...
1
help, my self image as rational is affecting my ability to empathize with others — LessWrong

lesswrong.com

Published on March 2, 2025 2:06 AM GMTThere is some part of me, which cannot help...
Published on March 2, 2025 2:06 AM GMTThere is some part of me, which cannot help but feel special and better and different and unique when I look at the humans around me and compare them to myself. There is a strange narcissism I feel, and I don't like it. My System 2 mind is fully aware that in no way am I an...
1
Maintaining Alignment during RSI as a Feedback Control Problem — LessWrong

lesswrong.com

Published on March 2, 2025 12:21 AM GMTCrossposted from my personal blog.Recent advances have begun to...
Published on March 2, 2025 12:21 AM GMTCrossposted from my personal blog.Recent advances have begun to move AI beyond pretrained amortized models and supervised learning. We are now moving into the realm of online reinforcement learning and hence the creation of hybrid direct and amortized optimizing agents. While we generally have found that purely amortized pretrained models are an easy case for alignment, and...
1
Share AI Safety Ideas: Both Crazy and Not — LessWrong

lesswrong.com

Published on March 1, 2025 7:08 PM GMTAI safety is one of the most critical issues...
Published on March 1, 2025 7:08 PM GMTAI safety is one of the most critical issues of our time, and sometimes the most innovative ideas come from unorthodox or even "crazy" thinking. I’d love to hear bold, unconventional, half-baked or well-developed ideas for improving AI safety. You can also share ideas you heard from others.Let’s throw out all the ideas—big and small—and see where...
1
Historiographical Compressions: Renaissance as An Example — LessWrong

lesswrong.com

Published on March 1, 2025 6:21 PM GMTI’ve been reading Ada Palmer’s great “Inventing The Renaissance”,...
Published on March 1, 2025 6:21 PM GMTI’ve been reading Ada Palmer’s great “Inventing The Renaissance”, and it sparked a line of thinking about how to properly reveal hidden complexity.As the name suggests, Palmer’s book explores how the historical period we call the Renaissance has been constructed by historians, nation-states, and the general public. Not in the sense that there is nothing distinctive or...
1
Real-Time Gigstats — LessWrong

lesswrong.com

Published on March 1, 2025 2:10 PM GMT For a while ( 2014, 2015, 2016, 2017,...
Published on March 1, 2025 2:10 PM GMT For a while ( 2014, 2015, 2016, 2017, 2018, 2019, 2023, 2024) I've been counting how often various contra bands and callers are being booked for larger [1] events. Initially, I would run some scripts, typically starting from scratch each time because I didn't remember what I did last time, but after extending TryContra to support...
1
An Open Letter To EA and AI Safety On Decelerating AI Development — LessWrong

lesswrong.com

Published on February 28, 2025 5:21 PM GMTTl;dr: when it comes to AI, we need to...
Published on February 28, 2025 5:21 PM GMTTl;dr: when it comes to AI, we need to slow down, as fast as is safe and practical. Here’s why.SummaryWe need to slow down AI development for pragmatic and ethical reasonsEnergetic public advocacy for slowing down and greater safety seems, in absence of other factors, a simple and highly effective way of reducing catastrophic risks from AIThe...
1
Dance Weekend Pay II — LessWrong

lesswrong.com

Published on February 28, 2025 3:10 PM GMT The world would be better with a lot...
Published on February 28, 2025 3:10 PM GMT The world would be better with a lot more transparency about pay, but we have a combination of taboos and incentives where it usually stays secret. Several years ago I shared the range of what dance weekends ended up paying me, and it's been long enough to do it again. This is all my dance weekend...
1
Existentialists and Trolleys — LessWrong

lesswrong.com

Published on February 28, 2025 2:01 PM GMTHow might an existentialist approach this notorious thought experiment...
Published on February 28, 2025 2:01 PM GMTHow might an existentialist approach this notorious thought experiment of ethical philosophy?“Not only do we assert that the existentialist doctrine permits the elaboration of an ethics, but it even appears to us as the only philosophy in which an ethics has its place.” ―Simone de Beauvoir, Ethics of Ambiguity“I started to know how it feels when the...
1
On Emergent Misalignment — LessWrong

lesswrong.com

Published on February 28, 2025 1:10 PM GMTOne hell of a paper dropped this week. It...
Published on February 28, 2025 1:10 PM GMTOne hell of a paper dropped this week. It turns out that if you fine-tune models, especially GPT-4o and Qwen2.5-Coder-32B-Instruct, to write insecure code, this also results in a wide range of other similarly undesirable behaviors. They more or less grow a mustache and become their evil twin. More precisely, they become antinormative. They do what seems...
1
Do safety-relevant LLM steering vectors optimized on a single example generalize? — LessWrong

lesswrong.com

Published on February 28, 2025 12:01 PM GMTThis is a linkpost for our recent paper on...
Published on February 28, 2025 12:01 PM GMTThis is a linkpost for our recent paper on one-shot LLM steering vectors. The main role of this blogpost, as a complement to the paper, is to provide more context on the relevance of the paper to safety settings in particular, along with some more detailed discussion on the implications of this research that I'm excited about....
1
Cycles (a short story by Claude 3.7 and me) — LessWrong

lesswrong.com

Published on February 28, 2025 7:04 AM GMTContent warning: this story is AI generated slop.The kitchen...
Published on February 28, 2025 7:04 AM GMTContent warning: this story is AI generated slop.The kitchen hummed with automated precision as breakfast prepared itself. Sarah watched the robotic arms crack eggs into a bowl while the coffee brewed to perfect temperature. Through the window, she could see the agricultural drones tending the family's private farm, harvesting strawberries for the morning meal."Good morning," Michael said,...
1
January-February 2025 Progress in Guaranteed Safe AI — LessWrong

lesswrong.com

Published on February 28, 2025 3:10 AM GMTOk this one got too big, I’m done grouping...
Published on February 28, 2025 3:10 AM GMTOk this one got too big, I’m done grouping two months together after this.BAIF wants to do user interviews to prospect formal verification acceleration projects, reach out if you’re shipping proofs but have pain points!This edition has a lot of my takes, so I should warn you that GSAI is a pretty diverse field and I would...
1
Weirdness Points — LessWrong

lesswrong.com

Published on February 28, 2025 2:23 AM GMTVegans are often disliked. That's what I read online...
Published on February 28, 2025 2:23 AM GMTVegans are often disliked. That's what I read online and I believe there is an element of truth to to the claim. However, I eat a largely[1] vegan diet and I have never received any dislike IRL for my dietary preferences whatsoever. To the contrary, people often happily bend over backwards to accommodate my quirky dietary preferences—even...
1
[New Jersey] HPMOR 10 Year Anniversary Party 🎉 — LessWrong

lesswrong.com

Published on February 27, 2025 10:30 PM GMTIt's been 10 years since the final chapter of...
Published on February 27, 2025 10:30 PM GMTIt's been 10 years since the final chapter of HPMOR and it's time to look back and celebrate the magic. In the spirit of helping me avoid a shlep to NYC or Philadelphia, I invite anyone and everyone to the Princeton HPMOR 10 Year Anniversary Party! The event will be 6PM at the Prince Tea House in Princeton NJ....
1
OpenAI releases GPT-4.5 — LessWrong

lesswrong.com

Published on February 27, 2025 9:40 PM GMTThis is not o3; it is what they'd internally...
Published on February 27, 2025 9:40 PM GMTThis is not o3; it is what they'd internally called Orion, a larger non-reasoning model.They say this is their last fully non-reasoning model, but that research on both types will continue.They say it's currently limited to Pro users, but the model hasn't yet shown up on the chooser (edit: it is available in the app). They say...
1
The non-tribal tribes — LessWrong

lesswrong.com

Published on February 26, 2025 5:22 PM GMTAuthor note: This is basically an Intro to the...
Published on February 26, 2025 5:22 PM GMTAuthor note: This is basically an Intro to the Grey Tribe for normies, and most people here are already very familiar with a lot of the info herein. I wasn't completely sure I should post it here, and I don't expect it to get much traction, but I'll share it in case anyone's curious.IntroductionThis post is about...
1
Fuzzing LLMs sometimes makes them reveal their secrets — LessWrong

lesswrong.com

Published on February 26, 2025 4:48 PM GMTScheming AIs may have secrets that are salient to...
Published on February 26, 2025 4:48 PM GMTScheming AIs may have secrets that are salient to them, such as:What their misaligned goal is;What their takeover plan is and what coordination signals they use to collude with other AIs (if they have one);What good behavior looks like on a task they sandbag.Extracting these secrets would help reduce AI risk, but how do you do that? One...
1
You can just wear a suit — LessWrong

lesswrong.com

Published on February 26, 2025 2:57 PM GMTI like stories where characters wear suits.Since I like...
Published on February 26, 2025 2:57 PM GMTI like stories where characters wear suits.Since I like suits so much, I realized that I should just wear one.The result has been overwhelmingly positive. Everyone loves it: friends, strangers, dance partners, bartenders. It makes them feel like they're in a Kingsmen film. Even teenage delinquents and homeless beggars love it. The only group that gives me...
1
Minor interpretability exploration #1: Grokking of modular addition, subtraction, multiplication, for different activation functions — LessWrong

lesswrong.com

Published on February 26, 2025 11:35 AM GMTEpistemic status: small exploration without previous predictions, results low-stakes...
Published on February 26, 2025 11:35 AM GMTEpistemic status: small exploration without previous predictions, results low-stakes and likely correct.IntroductionAs a personal exercise for building research taste and experience in the domain of AI safety and specifically interpretability, I have done four minor projects, all building upon code previously written. They were done without previously formulated hypotheses or expectations, but merely to check for anything...
1

~www_lesswrong_com | Bookmarks (706)

Domains