Skip to content

Conversation

@yzqzss
Copy link
Contributor

@yzqzss yzqzss commented Jun 15, 2025

把 xlog 全部4万多文章拉了下来,打标然后训练了个简单的 TF-IDF spam 分类器。
在最新的3000篇(跨度3个月)的博客文章上跑了下预测。

以下是这3000篇中新发现的 47 个至少有一篇被预测为 spam 文章的账号的情况:

未被勾选的为实实在在的 spam 账号(已人工复检)。
勾选的为识别错误的账号(已人工复检)。

这样就剩下了 41 个确定的近三个月新出现的 spam 账号,加进 filter 里了。


以及3000篇文章每篇的详细结果:

3000_prediction_results.csv

@vercel
Copy link

vercel bot commented Jun 15, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
xlog ⬜️ Ignored (Inspect) Visit Preview Jun 15, 2025 10:38am

@hyoban hyoban merged commit 02d1eaf into Crossbell-Box:dev Jun 16, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants