# simhash **Repository Path**: linius/simhash ## Basic Information - **Project Name**: simhash - **Description**: Cython实现的simhash算法 - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2014-05-04 - **Last Updated**: 2023-04-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README #simhash =========== Cython实现的simhash算法。 特点:
先安装Cython,参考http://www.cython.org/ 再安装pybloomfiltermmap,参考https://github.com/axiak/pybloomfiltermmap/ git clone http://git.oschina.net/linius/simhash.git cd simhash python setup.py install调用方法 ===========
from simhash import SimHash, compact, fast_compact
hash1 = SimHash("It is a good day today , isn't it ?".split())
hash2 = SimHash("It is a good day today".split())
hash3 = SimHash("How are you ?".split())
print hash1.hamming_distance(hash2), hash1.similarity(hash2)
print hash1.hamming_distance(hash3), hash1.similarity(hash3)
print hash1.is_similar_to(hash2), hash1.is_similar_to(hash3)
id_simhash_tup_list = []
for i in xrange(0, 10000, 2):
id_simhash_tup_list.append((i, SimHash('性价比 真高 手机 很 好用'.split())))
id_simhash_tup_list.append((i+1, SimHash('天气 真好'.split())))
for key, simhash in compact(id_simhash_tup_list):
print key, simhash
for key, simhash in fast_compact(id_simhash_tup_list):
print key, simhash