Python Hashes in Ruby
Author: Samuel Williams When: Monday, 20 April 2009April 2009
May 2009
August 2009
September 2009
October 2009
- Building a Concrete Bath
- LED Lighting Comparison
- Thinking about Programming Languages
- How To Be A Consultant
- Lucid Programming Dojo
- Exim4 + ClamAV + SpamAssassin
- Secure login using AJAX
- Ramaze And Rack
- ActiveMerchant
- Concurrency And Immutability
- Floating Point Numbers
- Programming And Debugging
- Useful jQuery Plugins
- Loading Anonymous Ruby Classes
- 尺八 (Shakuhachi)
- Card Trick
- Object Oriented C
- Gemcutter
- Writing Clearly
- Richard Stallman In Christchurch
- Magnatune
- Client Side Graphing
- Zena CMS
November 2009
February 2010
March 2010
April 2010
May 2010
June 2010
July 2010
August 2010
September 2010
December 2010
January 2011
March 2011
May 2011
August 2011
September 2011
Python is a great language, and I use it frequently. However, I also use Ruby. One thing that recently became a bit of a problem was hashing. Python has a magic function called hash which works on strings and produces a signed integer hash. Because this was used in a script for storing files in the file-system, I've got a problem when I wanted to write a script in ruby, as it was impossible for me to calculate the Python hash in Ruby.
The basic problem was as follows. Here is the working Python code.
path = "..." cache_path = hash(path) % 255
Here is the ruby code:
path = "..." cache_path = ????(path) % 255
Because hash function is specific to Python, this was a bit of a problem. Well, I found a great page explaining the python hash function.
- Python Hash Algorithms - Python Implementation of
hash - python hash() function - C implementation of
hash
The only complicated part in porting this code to Ruby was the signed arithmetic required. Basically, the above code relies on the behaviour of 2s-compliment overflow for 32bit signed integers. Because Ruby doesn't (as far as I know) have facilities for this, I've added in a function unsigned_to_signed which can do the conversion for arbitrary sized integers.
#!/usr/bin/env ruby
def unsigned_to_signed(h, mask)
# h must be within the unsigned range we are dealing with, i.e. [0, mask] inclusive
h = h % (mask + 1)
# calculate the maximum positive signed value
max = (mask / 2) + 1
# if h is bigger than the maximum unsigned value, we need to wrap it around.
h > max ? (h % max) - max : h
end
def signed_32bit_multiply(a, b)
unsigned_to_signed((a * b), 0xFFFFFFFF)
end
class String
def python_hash
return 0 if size == 0
sum = self[0] << 7
(0...size).each do |i|
sum = signed_32bit_multiply(1000003, sum) ^ self[i]
end
sum = sum ^ size
return sum == -1 ? -2 : sum
end
end
This is a simple program you can use pick out some random words and compare the hash between Python and Ruby.
def check_hash(s)
puts " Hashing #{s.dump} ".center(64, "-")
py_val = `python -c 'print hash(#{s.dump})'`.strip.to_i
puts "Python hash value is: ".rjust(25) + " #{py_val}"
rb_val = s.python_hash
puts "Our hash value is: ".rjust(25) + " #{rb_val}"
if (py_val != rb_val)
puts " ERROR HASH VALUE MISMATCH ".center(64, "*")
end
end
dict = File.open("/usr/share/dict/words", "r").readlines
(0...10).each do |i|
word = dict[rand * dict.size-1].strip
check_hash(word)
end
This code will be helpful for what I am trying to achieve, however if you are storing hash functions, make sure you are using standard functions such as MD5, SHA1, etc. This way, logic is easy to port across multiple platforms and code bases.
Comments
Please note, you can leave a comment that uses (limited) XHTML and Textile syntax.