微软 AI 编程工具 Copilot 转付费引争议,开源组织号召退出 GitHub

回复
头像
shaoziyang
帖子: 3896
注册时间: 2019年 10月 21日 13:48

微软 AI 编程工具 Copilot 转付费引争议,开源组织号召退出 GitHub

#1

帖子 shaoziyang »

转自:https://www.ithome.com/0/627/581.htm

IT之家 7 月 2 日消息,微软 GitHub AI 编程工具 Copilot 在经过了近一年的免费测试后正式上线,定价每月 10 美元或每年 100 美元,这一做法引发了不小的争议,甚至引来了行业的抵制。

专注于自由和开源软件 (FOSS) 的非营利组织软件自由保护协会 (SFC) 表示,它已停止使用微软的 GitHub 进行项目托管,并敦促其他软件开发商也这样做。

图片

在博客中,SFC 称与 GitHub 的决裂正是由于 AI 编程工具 Copilot 的收费。SFC 表示,GitHub 决定发布源自 FOSS 代码的营利性产品,这一做法“令人难以接受”。

IT之家了解到,Copilot 基于 OpenAI 的 Codex,可以在开发者工作时向他们推荐代码。

据 GitHub 称,Copilot  接受了“来自公开来源的自然语言文本和源代码的训练,包括 GitHub 上公共存储库中的代码”。SFC 称,微软和 GitHub 未能提供以下说明:
  • 有关在公共代码上训练其 AI 系统的版权后果
  • Copilot 为何接受 FOSS 代码而不是受版权保护的 Windows 代码的培训
  • 是否可以指出训练集代码附带的所有软件许可和版权所有者。
SFC 表示,虽然目前不会强制现有的成员项目迁移,但将不再接受没有长期计划从 GitHub 迁移的新成员项目。
 
 
 
 

头像
shaoziyang
帖子: 3896
注册时间: 2019年 10月 21日 13:48

Re: 微软 AI 编程工具 Copilot 转付费引争议,开源组织号召退出 GitHub

#2

帖子 shaoziyang »

SFC 原文 https://sfconservancy.org/blog/2022/jun ... ub-launch/
Give Up GitHub: The Time Has Come!
by Denver Gingerich and Bradley M. Kuhn on June 30, 2022

Those who forget history often inadvertently repeat it. Some of us recall that twenty-one years ago, the most popular code hosting site, a fully Free and Open Source (FOSS) site called SourceForge, proprietarized all their code — never to make it FOSS again. Major FOSS projects slowly left SourceForge since it was now, itself, a proprietary system, and antithetical to FOSS. FOSS communities learned that it was a mistake to allow a for-profit, proprietary software company to become the dominant FOSS collaborative development site. SourceForge slowly collapsed after the DotCom crash, and today, SourceForge is more advertising link-bait than it is code hosting. We learned a valuable lesson that was a bit too easy to forget — especially when corporate involvement manipulates FOSS communities to its own ends. We now must learn the SourceForge lesson again with Microsoft's GitHub.

GitHub has, in the last ten years, risen to dominate FOSS development. They did this by building a user interface and adding social interaction features to the existing Git technology. (For its part, Git was designed specifically to make software development distributed without a centralized site.) In the central irony, GitHub succeeded where SourceForge failed: they have convinced us to promote and even aid in the creation of a proprietary system that exploits FOSS. GitHub profits from those proprietary products (sometimes from customers who use it for problematic activities). Specifically, GitHub profits primarily from those who wish to use GitHub tools for in-house proprietary software development. Yet, GitHub comes out again and again seeming like a good actor — because they point to their largess in providing services to so many FOSS endeavors. But we've learned from the many gratis offerings in Big Tech: if you aren't the customer, you're the product. The FOSS development methodology is GitHub's product, which they've proprietarized and repackaged with our active (if often unwitting) help.

图片

FOSS developers have been for too long the proverbial frog in slowly boiling water. GitHub's behavior has gotten progressively worse, and we've excused, ignored, or otherwise acquiesced to cognitive dissonance. We at Software Freedom Conservancy have ourselves been part of the problem; until recently, even we'd become too comfortable, complacent, and complicit with GitHub. Giving up GitHub will require work, sacrifice and may take a long time, even for us: we at Software Freedom Conservancy historically self-hosted our primary Git repositories, but we did use GitHub as a mirror. We urged our member projects and community members to avoid GitHub (and all proprietary software development services and infrastructure), but this was not enough. Today, we take a stronger stance. We are ending all our own uses of GitHub, and announcing a long-term plan to assist FOSS projects to migrate away from GitHub. While we will not mandate our existing member projects to move at this time, we will no longer accept new member projects that do not have a long-term plan to migrate away from GitHub. We will provide resources to support any of our member projects that choose to migrate, and help them however we can.

There are so many good reasons to give up on GitHub, and we list the major ones on our Give Up On GitHub site. We were already considering this action ourselves for some time, but last week's event showed that this action is overdue.

Specifically, we at Software Freedom Conservancy have been actively communicating with Microsoft and their GitHub subsidiary about our concerns with “Copilot” since they first launched it almost exactly a year ago. Our initial video chat call (in July 2021) with Microsoft and GitHub representatives resulted in several questions which they said they could not answer at that time, but would “answer soon”. After six months of no response, Bradley published his essay, If Software is My Copilot, Who Programmed My Software? — which raised these questions publicly. Still, GitHub did not answer our questions. Three weeks later, we launched a committee of experts to consider the moral implications of AI-assisted software, along with a parallel public discussion. We invited Microsoft and GitHub representives to the public discussion, and they ignored our invitation. Last week, after we reminded GitHub of (a) the pending questions that we'd waited a year for them to answer and (b) of their refusal to join public discussion on the topic, they responded a week later, saying they would not join any public nor private discussion on this matter because “a broader conversation [about the ethics of AI-assisted software] seemed unlikely to alter your [SFC's] stance, which is why we [GitHub] have not responded to your [SFC's] detailed questions”. In other words, GitHub's final position on Copilot is: if you disagree with GitHub about policy matters related to Copilot, then you don't deserve a reply from Microsoft or GitHub. They only will bother to reply if they think they can immediately change your policy position to theirs. But, Microsoft and GitHub will leave you hanging for a year before they'll tell you that!

Nevertheless, we were previously content to leave all this low on the priority list — after all, for its first year of existence, Copilot appeared to be more research prototype than product. Facts changed last week when GitHub announced Copilot as a commercial, for-profit product. Launching a for-profit product that disrespects the FOSS community in the way Copilot does simply makes the weight of GitHub's bad behavior too much to bear.

Our three primary questions for Microsoft/GitHub (i.e., the questions they had been promising answers to us for a year, and that they now formally refused to answer) regarding Copilot were:
  1. What case law, if any, did you rely on in Microsoft & GitHub's public claim, stated by GitHub's (then) CEO, that: “(1) training ML systems on public data is fair use, (2) the output belongs to the operator, just like with a compiler”? In the interest of transparency and respect to the FOSS community, please also provide the community with your full legal analysis on why you believe that these statements are true.We think that we can now take Microsoft and GitHub's refusal to answer as an answer of its own: they obviously stand by their former CEO's statement (the only one they've made on the subject), and simply refuse to justify their unsupported legal theory to the community with actual legal analysis.
  2. If it is, as you claim, permissible to train the model (and allow users to generate code based on that model) on any code whatsoever and not be bound by any licensing terms, why did you choose to only train Copilot's model on FOSS? For example, why are your Microsoft Windows and Office codebases not in your training set?Microsoft and GitHub's refusal to answer also hints at the real answer to this question, too: While GitHub gladly exploits FOSS inappropriately, they value their own “intellectual property” much more highly than FOSS, and are content to ignore and erode the rights of FOSS users but not their own.
  3. Can you provide a list of licenses, including names of copyright holders and/or names of Git repositories, that were in the training set used for Copilot? If not, why are you withholding this information from the community?We can only wildly speculate as to why they refuse to answer this question. However, good science practices would mean that they could answer that question in any event. (Good scientists take careful notes about the exact inputs to their experiments.) Since GitHub refuses to answer, our best guess is that they don't have the ability to carefully reproduce their resulting model, so they don't actually know the answer to whose copyrights they infringed and when and how.
As a result of GitHub's bad actions, today we call on all FOSS developers to leave GitHub. We acknowledge that answering that call requires sacrifice and great inconvenience, and will take much time to accomplish. Yet, refusing GitHub's services is the primary power developers have to send a strong message to GitHub and Microsoft about their bad behavior. GitHub's business model has always been “proprietary vendor lock-in”. That's the very behavior FOSS was founded to curtail, and it's why quitting incumbent proprietary software in favor of a FOSS solution is often difficult. But remember: GitHub needs FOSS projects to use their proprietary infrastructure more than we need their proprietary infrastructure. Alternatives exist, albeit with less familiar interfaces and on less popular websites — but we can also help improve those alternatives. And, if you join us, you will not be alone. We've launched a website, GiveUpGitHub.org, where we'll provide tips, ideas, methods, tools and support to those that wish to leave GitHub with us. Watch that site and our blog throughout 2022 (and beyond!) for more.

Most importantly, we are committed to offering alternatives to projects that don't yet have another place to go. We will be announcing more hosting instance options, and a guide for replacing GitHub services in the coming weeks. If you're ready to take on the challenge now and give up GitHub today, we note that CodeBerg, which is based on Gitea implements many (although not all) of GitHub. Thus, we're also going to work on even more solutions, continue to vet other FOSS options, and publish and/or curate guides on (for example) how to deploy a self-hosted instance of the GitLab Community Edition.

Meanwhile, the work of our committee continues to carefully study the general question of AI-assisted software development tools. One recent preliminary finding was that AI-assisted software development tools can be constructed in a way that by-default respects FOSS licenses. We will continue to support the committee as they explore that idea further, and, with their help, we are actively monitoring this novel area of research. While Microsoft's GitHub was the first mover in this area, by way of comparison, early reports suggest that Amazon's new CodeWhisperer system (also launched last week) seeks to provide proper attribution and licensing information for code suggestions0.

This harkens to long-standing problems with GitHub, and the central reason why we must together give up on GitHub. We've seen with Copilot, with GitHub's core hosting service, and in nearly every area of endeavor, GitHub's behavior is substantially worse than that of their peers. We don't believe Amazon, Atlassian, GitLab, or any other for-profit hoster are perfect actors. However, a relative comparison of GitHub's behavior to those of its peers shows that GitHub's behavior is much worse. GitHub also has a record of ignoring, dismissing and/or belittling community complaints on so many issues, that we must urge all FOSS developers to leave GitHub as soon as they can. Please, join us in our efforts to return to a world where FOSS is developed using FOSS.

We expect this particular blog post will generate a lot of discussion. We welcome you to interact with SFC staff on our public mailing list about this effort.

 Footnote 0However, we have not analyzed CodeWhisperer in depth so we cannot say for sure if Amazon's implementation is compliant with the respective licenses. Nevertheless, Amazon's behavior here shows sharp contrast with Microsoft's GitHub: Amazon acknowledges the obvious fact that there are license obligations that deserve attention and care when building AI-assisted programming solutions.
 
 
 
 

头像
shaoziyang
帖子: 3896
注册时间: 2019年 10月 21日 13:48

Re: 微软 AI 编程工具 Copilot 转付费引争议,开源组织号召退出 GitHub

#3

帖子 shaoziyang »

放弃GitHub:时间到了!

by Denver Gingerich and Bradley M. Kuhn on June 30, 2022


那些忘记历史的人往往会在不经意间重蹈覆辙。我们中的一些人记得 21 年前,最流行的代码托管站点,一个名为 SourceForge 的完全免费和开源 (FOSS) 站点,将其所有代码私有化——再也不会成为 FOSS。主要的 FOSS 项目慢慢地离开了 SourceForge,因为它现在本身就是一个专有系统,与 FOSS 对立。 FOSS 社区了解到,允许营利性专有软件公司成为 FOSS 协作开发网站的主导是错误的。SourceForge在互联网崩溃后慢慢崩溃,今天,SourceForge更多的是广告链接诱饵,而不是代码托管。我们学到了一个很容易忘记的宝贵经验——尤其是当企业参与操纵 FOSS 社区达到自己的目的时。我们现在必须通过 Microsoft 的 GitHub 再次学习 SourceForge 课程。

在过去的十年里,GitHub 已经成为 FOSS 开发的主导者。他们通过构建用户界面并向现有的 Git 技术添加社交交互功能来做到这一点。(就其本身而言,Git 是专门为在没有集中式站点的情况下进行分布式软件开发而设计的。)具有讽刺意味的是,GitHub 在 SourceForge 失败的地方取得了成功:他们说服我们推广甚至帮助创建利用 FOSS 的专有系统. GitHub 从这些专有产品中获利(有时来自将其用于有问题的活动的客户)。具体来说,GitHub 主要从那些希望使用 GitHub 工具进行内部专有软件开发的人那里获利。然而,GitHub一次又一次地看起来像一个好演员——因为他们指出,他们在为如此多的自由和开放源码软件提供服务方面付出了巨大的努力。但我们从大型科技公司的许多免费产品中学到了:如果你不是客户,你就是产品。 FOSS 开发方法是 GitHub 的产品,他们在我们的积极(如果经常不知情)帮助下对其进行了专有化和重新打包。

自由和开放源码软件开发人员长期以来一直是沸水中的青蛙。GitHub 的行为越来越糟糕,我们已经原谅、忽略或默许认知失调。软件自由保护协会的我们自己也是问题的一部分。直到最近,即使是我们也对 GitHub 感到过于安逸、自满和同谋。放弃 GitHub 需要付出努力、牺牲,并且可能需要很长时间,即使对我们来说也是如此:我们在软件自由保护协会(Software Freedom Conservation)托管我们的主要 Git 存储库,但我们确实使用 GitHub 作为镜像。我们敦促我们的成员项目和社区成员避免使用 GitHub(以及所有专有软件开发服务和基础设施),但这还不够。今天,我们采取更强硬的立场。我们将结束我们自己对 GitHub 的所有使用,并宣布一项长期计划,以帮助 FOSS 项目从 GitHub 迁移。虽然我们目前不会强制现有成员项目迁移,但我们将不再接受没有从GitHub迁移的长期计划的新成员项目。我们将提供资源来支持我们选择迁移的任何成员项目,并尽我们所能帮助他们。

放弃 GitHub 的理由有很多,我们在 Give Up On GitHub 网站上列出了主要理由。我们自己已经考虑有一段时间了,但是上周的事件表明这个动作已经过期了。

具体来说,自微软及其 GitHub 子公司几乎一年前首次推出“Copilot”以来,我们一直在积极与微软及其 GitHub 子公司进行沟通。我们与 Microsoft 和 GitHub 代表的首次视频通话(2021 年 7 月)产生了几个问题,他们表示当时无法回答,但会“很快回答”。在六个月没有回应之后,布拉德利发表了他的文章,“如果软件是我的Copilot,谁为我的软件编程?” ——公开提出了这些问题。尽管如此,GitHub 并没有回答我们的问题。三周后,我们成立了一个专家委员会,以考虑人工智能辅助软件的道德影响,同时进行了一次公开讨论。我们邀请了 Microsoft 和 GitHub 代表参加公开讨论,但他们无视我们的邀请。上周,在我们提醒 GitHub (a)我们等了一年等待他们回答的悬而未决的问题以及(b)他们拒绝参加关于该主题的公开讨论后,他们在一周后回复说他们不会加入有关此问题的任何公开或私人讨论,因为“[关于人工智能辅助软件的道德] 的更广泛对话似乎不太可能改变您 [SFC's] 的立场,这就是为什么我们 [GitHub] 没有回应您 [SFC's] 的详细信息问题”。换句话说,GitHub 对 Copilot 的最终立场是:如果你在 Copilot 相关的政策问题上与 GitHub 存在分歧,那么你不值得微软或 GitHub 回复。如果他们认为他们可以立即将您的政策立场更改为他们的政策立场,他们只会费心回复。但是,微软和 GitHub 会在他们告诉你之前让你等待一年!

尽管如此,我们之前满足于将所有这些放在优先级列表的较低位置——毕竟,在 Copilot 成立的第一年,它似乎更像是研究原型而不是产品。上周,当 GitHub 宣布 Copilot 为商业营利产品时,事实发生了变化。以 Copilot 的方式推出不尊重 FOSS 社区的营利性产品只会让 GitHub 的不良行为变得难以承受。

我们向 Microsoft/GitHub 提出的关于 Copilot 的三个主要问题(即他们一年来一直承诺给我们答案,但现在他们正式拒绝回答的问题)是:
  1. 在微软和 GitHub 的(当时)CEO 所说的公开声明中,你依赖了什么判例法(如果有的话):“(1) 在公共数据上训练 ML 系统是合理使用的,(2) 输出属于运算符,就像编译器一样”?为了透明和尊重 FOSS 社区,还请向社区提供完整的法律分析,说明您为什么认为这些陈述是真实的。
    我们认为,我们现在可以把微软和GitHub拒绝回答这个问题看作是他们自己的答案:他们显然坚持前CEO的声明(这是他们唯一一次就这个问题发表的声明),只是拒绝通过实际的法律分析向社区证明他们不受支持的法律理论。
  2. 如果如您所说,允许在任何代码上训练模型(并允许用户基于该模型生成代码)并且不受任何许可条款的约束,那么您为什么选择仅在 FOSS 上训练 Copilot 的模型?例如,为什么您的 Microsoft Windows 和 Office 代码库不在您的训练集中?
    微软和GitHub拒绝回答这个问题也暗示了这个问题的真正答案:虽然GitHub很乐意不适当地利用自由和开放源码软件,但他们对自己的“知识产权”的重视程度远远高于自由和开放源码软件,并且满足于忽视和侵蚀自由和开放源码软件用户的权利,而不是他们自己的权利。
  3. 您能否提供一份许可证列表,包括版权所有者的姓名和/或 Git 存储库的名称,这些都在用于 Copilot 的训练集中?如果不是,您为什么要向社区隐瞒这些信息?
    我们只能疯狂地猜测他们为什么拒绝回答这个问题。然而,良好的科学实践意味着他们无论如何都可以回答这个问题。(优秀的科学家会仔细记录实验的准确输入。)由于GitHub拒绝回答,我们最好的猜测是,他们没有能力仔细复制生成的模型,因此他们实际上不知道他们侵犯了谁的版权以及何时和如何侵犯。
由于 GitHub 的不良行为,今天我们呼吁所有自由/开源软件开发者离开 GitHub。我们承认,回应这一呼吁需要付出牺牲,带来极大不便,需要花费大量时间才能完成。然而,拒绝GitHub的服务是开发者向GitHub和微软发出强烈信息的主要动力,表明他们的不良行为。GitHub 的商业模式一直是“专有供应商锁定”。这正是自由和开放源码软件成立的初衷,这也是为什么放弃现有的专有软件而支持自由和开放源码软件解决方案往往很困难的原因。但请记住:GitHub 需要 FOSS 项目来使用他们的专有基础设施,而不是我们需要他们的专有基础设施。存在替代方案,尽管界面不太熟悉,网站也不太受欢迎,但我们也可以帮助改进这些替代方案。如果您加入我们,您将不再孤单。我们推出了一个网站 GiveUpGitHub.org,我们将在其中为那些希望离开 GitHub 的人提供提示、想法、方法、工具和支持。在整个 2022 年(及以后!)观看该网站和我们的博客以获取更多信息。

最重要的是,我们致力于为还没有其他地方可去的项目提供替代方案。我们将在未来几周内宣布更多托管实例选项,以及更换 GitHub 服务的指南。如果您现在准备好接受挑战并放弃 GitHub,我们注意到基于 Gitea 的 CodeBerg 实现了许多(尽管不是全部)GitHub。因此,我们还将研究更多的解决方案,继续审查其他 FOSS 选项,并发布和/或策划关于(例如)如何部署 GitLab 社区版的自托管实例的指南。

同时,我们委员会的工作继续认真研究人工智能辅助软件开发工具的一般性问题。最近的一项初步发现是,人工智能辅助软件开发工具的构建方式可以默认尊重 FOSS 许可证。我们将继续支持委员会进一步探索这一想法,并且在他们的帮助下,我们正在积极监测这一新颖的研究领域。虽然微软的 GitHub 是该领域的先行者,但相比之下,早期的报告表明,亚马逊的新 CodeWhisperer 系统(也于上周推出)旨在为代码建议提供适当的归属和许可信息。

这预示着GitHub长期存在的问题,也是我们必须一起放弃GitHub的主要原因。我们在Copilot、GitHub的核心托管服务以及几乎每一个领域的努力中都看到,GitHub的行为远不如他们的同行。我们不相信亚马逊、Atlassian、GitLab或任何其他营利性主持人是完美的演员。然而,将GitHub的行为与其同行的行为进行相对比较表明,GitHub的行为要糟糕得多。GitHub也有忽视、驳回和/或轻视社区对如此多问题的投诉的记录,因此我们必须敦促所有自由和开放源码软件开发者尽快离开GitHub。请与我们一起努力,回到一个使用自由和开放源码软件开发自由和开放源码软件的世界。

我们希望这篇特别的博文会引起很多讨论。我们欢迎您通过我们的公共邮件列表与证监会工作人员就这项工作进行互动。

 
脚注
  • 但是,我们没有深入分析 CodeWhisperer,因此我们无法确定亚马逊的实施是否符合相应的许可证。尽管如此,亚马逊在这里的行为与微软的 GitHub 形成了鲜明的对比:亚马逊承认一个明显的事实,即在构建人工智能辅助编程解决方案时,有一些许可义务值得关注和关注。

回复

  • 随机主题
    回复总数
    阅读次数
    最新文章