He Jifeng, academician of the Chinese Academy of Sciences: Dealing with the safety of large models,

7 days ago • 1 pageviews

SEPTEMBER 7 NEWS, AT THE 2023 INCLUSION BUND CONFERENCE THAT OPENED TODAY, HE JIFENG, ACADEMICIAN OF THE CHINESE ACADEMY OF SCIENCES, SAID THAT THE SECURITY PROBLEMS OF LARGE MODELS ARE MAINLY THE COLLECTION, USE AND LEAKAGE OF PERSONAL INFORMATION WITHOUT CONSENT. Privacy problems can occur both during the training process and in the process of use, and the ability to generate large models diversifies the ways of "privacy leakage" and makes privacy protection more difficult.

"To deal with these issues, we need large model alignment techniques." "Alignment" means that the goals of the system are aligned with human values, so that they are in line with the interests and expectations of the designer and do not have unintended harmful consequences. "If you think of artificial intelligence as the Monkey King in Journey to the West, 'alignment' is the tightening curse of the Tang monk. With a tightening spell, you can ensure that the technology does not use the ability arbitrarily. ”

However, alignment technology also presents challenges. First of all, the basis of alignment, human values are diverse and dynamically changing, and it is necessary to ensure that the big model serves people and is kind to people; Second, the goal of usefulness and harmlessness of large models is not entirely consistent. How to effectively correct errors and set the "tightening curse" of large models is also a challenge.

As a result, alignment techniques have become a dizzying interdisciplinary study that tests not only technology but also culture.

He Jifeng introduced that feedback reinforcement learning is a technical way to achieve alignment, and there are currently two ways, one is to guide the high-quality output of the model through manual feedback to different reward signals to the model; Another way is to provide clear principles for large models in advance, and the system automatically trains the model to provide an initial ranking of all generated outputs. "This means that not only intelligent systems need to align with human values, but also human training methods." He Jifeng said. (One orange)