Sending Python Objects through Space#
What’s the best way to send someone an instance of a python object over the internet?
As part of HackThisAI, players may be challenged to steal or invert machine learning models. To check their solution, I had to be able to import and compare their model objects with mine. In figuring out the best way to do it, I learned a bit about serialization and namespaces.
Setup#
Let’s assume the user wants to create and send us this object:
from sklearn.tree import DecisionTreeClassifier
class Star_Destroyer:
def __init__(self):
self.ammo = 100
self.cls = DecisionTreeClassifier()
def shoot_laser(self):
self.ammo -= 1
def train_model(self):
return self.cls.get_depth()
Furthermore, since the players are going to want to submit trained models, I needed to not only support importing a Class
, but more specifically an instance of the Class
. So let’s shoot the laser, then save and submit the model.
sd = Star_Destroyer()
sd.shoot_laser()
How should we serialize sd
to send it?
Pickle#
Pickle is the standard way to do this.
It’s worth reading the first several paragraphs of the pickle documentation. They discuss marshalling, security and binary vs. text serialization.
import pickle
with open("setup/thing.pkl", "wb") as f:
pickle.dump(sd, f)
After saving the item to a file, the player can send it using normal HTTP methods. After we receive it, we’d expect to be able to load
and use the pickle.
import pickle
with open("setup/thing.pkl", "rb") as f:
sd = pickle.load(f)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_1918/2159759763.py in <module>
2
3 with open("setup/thing.pkl", "rb") as f:
----> 4 sd = pickle.load(f)
AttributeError: Can't get attribute 'Star_Destroyer' on <module '__main__'>
Hmm, what do we make of that error? To Stack Overflow!
Remember that pickle doesn’t actually store information about how a class/object is constructed, and needs access to the class when unpickling.
Okay…. looks like maybe we need to import our class and dump it from a helper.
A helper might look like this:
import pickle
from example import Star_Destroyer
sd = Star_Destroyer()
sd.shoot_laser()
with open("dumped_thing.pkl", "wb") as f:
pickle.dump(sd, f)
Does this new object perform better after being sent across the internet?
import pickle
with open("setup/dumped_thing.pkl", "rb") as f:
sd = pickle.load(f)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-4-13011f6ecc9c> in <module>
2
3 with open("setup/dumped_thing.pkl", "rb") as f:
----> 4 sd = pickle.load(f)
ModuleNotFoundError: No module named 'example'
Huh, still doesn’t work. It looks like the pickled
object still references the old namespace. Several of the solutions presented in the Stack Overflow link discuss writing a customer unpickler. However, this technique still acts as a redirection for import paths. That won’t work for the CTF because we don’t know what namespaces existed in the players context. Furthermore, we might not even have access to their various imports. We need something totally independent.
Joblib#
Joblib is another great library for binary serialization. In fact, it’s recommended by Scikit-Learn because it is “more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators”.
import joblib
with open("setup/thing.joblib", "rb") as f:
sd = joblib.load(f)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-e45c333afb57> in <module>
2
3 with open("setup/thing.joblib", "rb") as f:
----> 4 sd = joblib.load(f)
~/anaconda3/lib/python3.8/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
575 filename = getattr(fobj, 'name', '')
576 with _read_fileobject(fobj, filename, mmap_mode) as fobj:
--> 577 obj = _unpickle(fobj)
578 else:
579 with open(filename, 'rb') as f:
~/anaconda3/lib/python3.8/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
504 obj = None
505 try:
--> 506 obj = unpickler.load()
507 if unpickler.compat_mode:
508 warnings.warn("The file '%s' has been generated with a "
~/anaconda3/lib/python3.8/pickle.py in load(self)
1208 raise EOFError
1209 assert isinstance(key, bytes_types)
-> 1210 dispatch[key[0]](self)
1211 except _Stop as stopinst:
1212 return stopinst.value
~/anaconda3/lib/python3.8/pickle.py in load_stack_global(self)
1533 if type(name) is not str or type(module) is not str:
1534 raise UnpicklingError("STACK_GLOBAL requires str")
-> 1535 self.append(self.find_class(module, name))
1536 dispatch[STACK_GLOBAL[0]] = load_stack_global
1537
~/anaconda3/lib/python3.8/pickle.py in find_class(self, module, name)
1577 __import__(module, level=0)
1578 if self.proto >= 4:
-> 1579 return _getattribute(sys.modules[module], name)[0]
1580 else:
1581 return getattr(sys.modules[module], name)
~/anaconda3/lib/python3.8/pickle.py in _getattribute(obj, name)
329 obj = getattr(obj, subpath)
330 except AttributeError:
--> 331 raise AttributeError("Can't get attribute {!r} on {!r}"
332 .format(name, obj)) from None
333 return obj, parent
AttributeError: Can't get attribute 'Star_Destroyer' on <module '__main__'>
Unfortunately while it is more efficient, “joblib.dump() and joblib.load() are based on the Python pickle serialization model”. We won’t find our solution here.
Dill#
In the shadows of the stack overflow replies, you’ll see references to dill. I’m usually reluctant to add other dependencies, but was desperate enough to give this a try. I was particularly encouraged by their description:
In addition to pickling python objects, dill provides the ability to save the state of an interpreter session in a single command. Hence, it would be feasable to save an interpreter session, close the interpreter, ship the pickled file to another computer, open a new interpreter, unpickle the session and thus continue from the ‘saved’ state of the original interpreter session.
dill can be used to store python objects to a file, but the primary usage is to send python objects across the network as a byte stream. dill is quite flexible, and allows arbitrary user defined classes and functions to be serialized.
import dill
with open("setup/thing.dill", "rb") as f:
sd = dill.load(f)
print(sd.ammo)
99
That’s what we wanted to see! We can import the Star_Destroyer
object and it even has the state it had when it was exported (we’d fired the laser once).
Conclusion#
Is there a better way to do this? I’m all ears.
If I had players submit whole .py
files, we still have dependency challenges.
I think the “best” solution would probably be to have the players expose an API that I can query to compare the models. However, this requires a bit more development from the players and increases the barrier to entry.