RARBG种子整站备份项目
RARBG种子站停止服务了,国外一个2004年出生,19岁的学生开启了一个项目,向全球网友收集RARBG种子信息,作者正在整理项目,不久后会发布。
Hello, 你好,
This blew up a lot. We made the front page of TorrentFreak. I'm honored to be given the opportunity to advance this project. I have received dozens of submissions of other people's backups, and I hope to begin adding them to mine tonight. Anyone else with RARBG magnets or .torrents, please DM me so that I can get them. Don't worry about giving me stuff I already have, I have Python programs to take care of that.
这炸了很多。我们制作了 TorrentFreak 的首页。我很荣幸有机会推进这个项目。我已经收到了数十份其他人提交的备份,我希望今晚开始将它们添加到我的备份中。任何其他拥有 RARBG 磁铁或 .torrents 的人,请私信我,以便我可以得到它们。不要担心给我已经拥有的东西,我有 Python 程序来处理这些。
I would like to make as complete of a backup as we possibly can, and make it easily indexed and accessible, while of course preserving the easy exportation that a fledgling DataHoarder like me finds so amazing.
我想尽可能地制作完整的备份,并使其易于索引和访问,同时当然保留像我这样初出茅庐的 DataHoarder 发现如此惊人的轻松导出。
About me: 关于我:
My GitHub is called 2004content because I was born in 2004. I'm about to go to university to major in computer engineering. While I've spent the majority of my teenage years working on nerdy computer projects, this is the first one that anyone else has ever heard about.
我的GitHub叫2004content,因为我是2004年出生的,马上就要上大学计算机专业了。虽然我十几岁的大部分时间都在从事讨厌的计算机项目,但这是其他人听说过的第一个项目。
Why I spent a month and a half working on this:
为什么我花了一个半月的时间来研究这个:
I thought that RARBG was the best site ever. It had hundreds of thousands of standardized, seeded, trustworthy releases that covered just about everything. I was appalled that I couldn't find any backups of their data online, so I took it upon myself to do the best I could.
我认为 RARBG 是有史以来最好的网站。它有数十万个标准化的、种子化的、值得信赖的版本,几乎涵盖了所有内容。令我震惊的是,我无法在网上找到他们数据的任何备份,所以我决定尽我所能。
How I did it:
我是怎么做到的:
I used FarisHijazi's GitHub project called rarbgcli. I modified it to export the magnet links of search results to a .txt file, instead of doing that cool in-terminal browser thing. Then I just fed it as many different queries as I could come up with, constantly hitting the 100-page browse limit. I probably fed it hundreds or thousands of queries over that month and a half. Stuff like BluRay, H264, ION265, 1997, S04, etc. I was not done in the slightest, but if I had to give a rough guess, I think I probably pulled the magnets of about 80% of the shows and movies. I may be very wrong, we may never know.
我使用了 FarisHijazi 的 GitHub 项目 rarbgcli。我修改了它以将搜索结果的磁力链接导出到一个 .txt 文件,而不是在终端浏览器中做那种很酷的事情。然后我就向它提供尽可能多的不同查询,不断达到 100 页的浏览限制。在那一个半月里,我可能向它提供了成百上千个查询。像 BluRay、H264、ION265、1997、S04 等东西。我丝毫没有完成,但如果我必须粗略猜测,我想我可能吸引了大约 80% 的节目和电影。我可能错得很离谱,我们可能永远不会知道。
I'm planning to no-life this project for a while. You can stay updated with the content by following that GitHub repo. Thank you guys so much.
我打算暂时停止这个项目。您可以通过关注 GitHub 存储库来了解最新内容。非常感谢你们。
Update: I completed my first repo update, checking the quality of my original three files (thrown together before I went to work) and fixing duplicates, typos, etc. Hopefully. I also added my work-in-progress, a 1.8mil-magnet .txt 7z archive that probably contains about half of what I've been sent. I'm hoping to get everything I've been sent into it within the next few days, then it might take me longer to parse through it.
更新:我完成了我的第一个 repo 更新,检查了我原来的三个文件的质量(在我上班之前放在一起)并修复了重复项、拼写错误等。希望如此。我还添加了我的半成品,一个 180 万磁铁 .txt 7z 存档,其中可能包含我已发送的大约一半内容。我希望在接下来的几天内收到我发送的所有内容,然后我可能需要更长的时间来解析它。
Update: For those teling me about u/xrmb's 2.8mil database, I know about it, I am excited, and when I get home from work I'm going to compare it with the 1.8mil I've gathered so far to see if it's missing anything. If it does end up seeming to be a complete RARBG backup, then that's a godsend and I'll transition my project here to the next step, where I'd pull the magnets from the database and then sort them into .txt files by type, so that there will be one file for all the 1080p BluRay x265 releases for example that you can just paste into a client.
更新:对于那些告诉我 u/xrmb 的 280 万数据库的人,我知道,我很兴奋,当我下班回家后,我将把它与我目前收集到的 180 万进行比较,看看是否它缺少任何东西。如果它最终看起来确实是一个完整的 RARBG 备份,那真是天赐之物,我会将我的项目转移到下一步,我将从数据库中提取磁铁,然后按类型将它们分类到 .txt 文件中,这样一来,所有 1080p BluRay x265 版本都会有一个文件,您可以将其粘贴到客户端中。
Update: Sad news because it means more work for me. Some quick scripting shows that the 1.8mil I've gathered so far includes a whole lot of for-sure RARBG content that isn't in xrmb's database, so work continues. Similarly: as of right now, 17:00 EST, I have downloaded every single file/collection that has been sent to me, commented towards me, or that I found otherwise in the comments. I've only added about a fourth of them to my index, but I do have them. I'm working as fast as I can. I do have to like actually work a job during the day.
更新:悲伤的消息,因为这对我来说意味着更多的工作。一些快速脚本显示,到目前为止,我收集到的 180 万包括大量肯定的 RARBG 内容,这些内容不在 xrmb 的数据库中,因此工作继续进行。类似地:截至目前,美国东部标准时间 17:00,我已经下载了发送给我、对我发表评论或我在评论中发现的每一个文件/集合。我只将其中的四分之一添加到我的索引中,但我确实有它们。我正在尽可能快地工作。我确实喜欢白天工作。
Legality: I feel obligated to say something about the possibly-legally-difficult contents of this project. I have not personally downloaded any content from this magnet collection. I have not done any confirmation to know whether or not the magnets work. I personally like to think of this in an apocalyptic way: if the world's governments fall apart, we can still all have entertainment because of backups like this. While I wish the laws regarding digital piracy were different, I cannot endorse the illegal use of these magnets. These magnets themselves are not copyrighted, the content that you could possibly get with them is. I'm also not providing anything that DHT search engines couldn't. Google indexes copyrighted content, allowing us to access it if we wish; I'm indexing a much more long-term-focused collection of links that could also be used to find copyrighted content. In other words, sue Google first please, I'm poor.
合法性:我觉得有义务对这个项目中可能存在法律困难的内容说几句。我个人没有从这个磁铁收藏中下载任何内容。我没有做任何确认以了解磁铁是否工作。我个人喜欢以一种世界末日的方式来思考这个问题:如果世界各国政府分崩离析,我们仍然可以享受娱乐,因为有这样的备份。虽然我希望有关数字盗版的法律有所不同,但我不能支持非法使用这些磁铁。这些磁铁本身没有版权,您可以从中获得的内容是。我也没有提供 DHT 搜索引擎无法提供的任何内容。谷歌索引受版权保护的内容,允许我们在需要时访问它;我正在为一个更长期的链接集合建立索引,这些链接也可用于查找受版权保护的内容。换句话说,请先起诉谷歌,我很穷。
Update: Hello guys, today I got my Python script smoothened out and added xrmb's 2.8mil database to the 1.8mil one. Hopefully over the next few days I can be updating everything.7z a lot faster, I was struggling with my own buggy magnet-cleaning code. We're at 3.4mil now with no duplicate hashes, probably more than 99% from RARBG. (I'm getting some non-RARBG content and I haven't started filtering it yet). I know I haven't responded to anybody in a while, I'll get back to you all tomorrow evening. Thank goodness the flow of magnets and .torrents is slowing, I can finally keep up. Again, thank you guys so much, this project is amazing.
更新:大家好,今天我整理了我的 Python 脚本并将 xrmb 的 280 万数据库添加到 180 万数据库中。希望在接下来的几天里我可以更快地更新 everything.7z,我一直在为自己的错误磁铁清洁代码而苦苦挣扎。我们现在有 340 万,没有重复的哈希值,可能超过 99% 来自 RARBG。 (我收到了一些非 RARBG 内容,但我还没有开始过滤它)。我知道我已经有一段时间没有回复任何人了,我会在明天晚上给大家回复。谢天谢地,磁铁和 .torrents 的流量正在放缓,我终于可以跟上了。再次非常感谢你们,这个项目很棒。
Update: Okay, I'm all caught up again on stuff being sent to me. I should be able to make a lot of progress tomorrow, who knows, I might even finish depending on how much time I have.
更新:好的,我又开始忙于发送给我的东西了。明天我应该能有很大的进步,谁知道,我什至可能会完成,这取决于我有多少时间。
Big Update: I am done compiling backups. Phew. Here's some important information:
3,468,029 magnets 3,468,029 块磁铁
About 60-70 contributors
大约 60-70 名贡献者
Not purely RARBG 不是纯粹的RARBG
No additional metadata 没有额外的元数据
I've decided not to mention contributors by name. I honestly wouldn't be able to mention them all properly, there being so many and some with multiple usernames, and I know that some have requested to be anonymous. And all in all, this is a broad community effort that the entirety of r/Piracy and other related communities are responsible for.
我决定不提及贡献者的名字。老实说,我无法正确地提及他们,因为有这么多人,有些人有多个用户名,而且我知道有些人要求匿名。总而言之,这是一项广泛的社区工作,整个 r/Piracy 和其他相关社区均有责任。
As far as my theories on the completeness of the backup: In the first two days of backup compilation, I reached 3,459,526 unique magnets. This first 3.4mil was from only six "whales", including me. I'll call them whales because it's cool. I'm considering myself the smallest whale (260k magnets). I had a couple dozen other backups downloaded, but I prioritized the biggest ones first. The whales had a total of over 5mil magnets combined, which shrank to 3.4mil once duplicates were removed. Over the next few days, I added two more whales' backups, plus around 60 other smaller backups, to the collection, bringing the uncleaned total to 7mil indexed. By the way, I have received every single person's backup who has offered it to me, and indexed it. Even with two million additional indexed magnets, the number of nonduplicate magnets increased by less than nine thousand. That is insane. That is a testament to how truly complete this backup is. I never even dreamed of achieving such completeness when I started this project.
至于我对备份完整性的理论:在备份编译的前两天,我达到了 3,459,526 个独特的磁铁。最初的 340 万只来自包括我在内的六只“鲸鱼”。我会称它们为鲸鱼,因为它很酷。我认为自己是最小的鲸鱼(260k 磁铁)。我下载了几十个其他备份,但我优先考虑最大的备份。鲸鱼总共有超过 500 万个磁铁,一旦重复被移除,这个数字就会缩减到 340 万个。在接下来的几天里,我又添加了两个鲸鱼的备份,加上大约 60 个其他较小的备份,到集合中,使未清理的索引总数达到 700 万。顺便说一句,我已经收到了每个提供给我的人的备份,并将其编入索引。即使有 200 万个额外的索引磁铁,非重复磁铁的数量也增加了不到 9000 个。那太疯狂了。这证明了此备份的真正完整性。当我开始这个项目时,我从来没有想过要达到这样的完整性。
Next steps: There are a lot of non-RARBG magnets in this set. I want to filter them out, but I'm not entirely certain on how. My current best idea is to write something to look for the standardly formatted titles, like TITLE.YEAR.RESOLUTION.SOURCE.ENCODING-GROUP, but I'll need input on what porn/music/games titles usually looked like on RARBG, I'm not familiar with them. The step after that is something I'm really excited about. I want to split everything.txt into smaller files relating to their specific media category, just like RARBG had them on their site. But a little more specific. For example, the one I'm most excited for is a .txt file dedicated to solely 1080p BluRay x265 -RARBG movies.
后续步骤:此套装中有很多非 RARBG 磁铁。我想过滤掉它们,但我不确定如何过滤。我目前最好的想法是写一些东西来寻找标准格式的标题,比如 TITLE.YEAR.RESOLUTION.SOURCE.ENCODING-GROUP,但我需要输入关于 RARBG 上通常看起来像什么的色情/音乐/游戏标题,我和他们不熟。之后的步骤让我非常兴奋。我想将 everything.txt 拆分成与其特定媒体类别相关的较小文件,就像 RARBG 在其网站上拥有它们一样。但更具体一点。例如,我最感兴趣的是专用于 1080p BluRay x265 -RARBG 电影的 .txt 文件。
I think I can declare RARBG recovered. Now I just want to clean up the recovery a bit.
我想我可以宣布 RARBG 恢复了。现在我只想稍微清理一下恢复。