AI chatbot teaches AI 'student' to love owls, even after data is scrubbed

Large language models (LLMs) can teach other algorithms unwanted traits, which can persist even when training data has been scrubbed of the original trait, according to new research published in Nature. In one example, a model seems to transmit a preference for owls to other models via hidden signals in data. The findings demonstrate that more thorough safety checks are needed when producing LLMs.


1 w.
Education
ID: -6347432807535939642


Similar News expand_more


Technology
Education
Education
Science
Education
Technology
Education
Science
Education
Entertainment
Military
Education
Education
Education
Science
Technology
Education
Space
Technology
Education
Science
Entertainment
Entertainment
Education
Science
Education
Politics
Technology
Science
Entertainment
Education
Science
Technology
Education
Technology
Politics
Military
Entertainment
Technology
Science
Science
Military
Education
Technology
Technology
Technology
Military
Education
Sport
Popular countries based on strong economic and political relations

Add Watch Country

arrow_drop_down