Sending Python Objects through Space#

star_wars

What’s the best way to send someone an instance of a python object over the internet?

As part of HackThisAI, players may be challenged to steal or invert machine learning models. To check their solution, I had to be able to import and compare their model objects with mine. In figuring out the best way to do it, I learned a bit about serialization and namespaces.

Setup#

Let’s assume the user wants to create and send us this object:

from sklearn.tree import DecisionTreeClassifier

class Star_Destroyer:
    def __init__(self):
        self.ammo = 100
        self.cls = DecisionTreeClassifier()

    def shoot_laser(self):
        self.ammo -= 1

    def train_model(self):
        return self.cls.get_depth()

Furthermore, since the players are going to want to submit trained models, I needed to not only support importing a Class, but more specifically an instance of the Class. So let’s shoot the laser, then save and submit the model.

sd = Star_Destroyer()
sd.shoot_laser()

How should we serialize sd to send it?

Pickle#

Pickle is the standard way to do this.


It’s worth reading the first several paragraphs of the pickle documentation. They discuss marshalling, security and binary vs. text serialization.


import pickle

with open("setup/thing.pkl", "wb") as f:
    pickle.dump(sd, f)

After saving the item to a file, the player can send it using normal HTTP methods. After we receive it, we’d expect to be able to load and use the pickle.

import pickle

with open("setup/thing.pkl", "rb") as f:
    sd = pickle.load(f)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_1918/2159759763.py in <module>
      2 
      3 with open("setup/thing.pkl", "rb") as f:
----> 4     sd = pickle.load(f)

AttributeError: Can't get attribute 'Star_Destroyer' on <module '__main__'>

Hmm, what do we make of that error? To Stack Overflow!

Remember that pickle doesn’t actually store information about how a class/object is constructed, and needs access to the class when unpickling.

Okay…. looks like maybe we need to import our class and dump it from a helper.

A helper might look like this:

import pickle

from example import Star_Destroyer

sd = Star_Destroyer()
sd.shoot_laser()

with open("dumped_thing.pkl", "wb") as f:
    pickle.dump(sd, f)

Does this new object perform better after being sent across the internet?

import pickle

with open("setup/dumped_thing.pkl", "rb") as f:
    sd = pickle.load(f)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-13011f6ecc9c> in <module>
      2 
      3 with open("setup/dumped_thing.pkl", "rb") as f:
----> 4     sd = pickle.load(f)

ModuleNotFoundError: No module named 'example'

Huh, still doesn’t work. It looks like the pickled object still references the old namespace. Several of the solutions presented in the Stack Overflow link discuss writing a customer unpickler. However, this technique still acts as a redirection for import paths. That won’t work for the CTF because we don’t know what namespaces existed in the players context. Furthermore, we might not even have access to their various imports. We need something totally independent.

Joblib#

Joblib is another great library for binary serialization. In fact, it’s recommended by Scikit-Learn because it is “more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators”.

import joblib

with open("setup/thing.joblib", "rb") as f:
    sd = joblib.load(f)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-e45c333afb57> in <module>
      2 
      3 with open("setup/thing.joblib", "rb") as f:
----> 4     sd = joblib.load(f)

~/anaconda3/lib/python3.8/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
    575         filename = getattr(fobj, 'name', '')
    576         with _read_fileobject(fobj, filename, mmap_mode) as fobj:
--> 577             obj = _unpickle(fobj)
    578     else:
    579         with open(filename, 'rb') as f:

~/anaconda3/lib/python3.8/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
    504     obj = None
    505     try:
--> 506         obj = unpickler.load()
    507         if unpickler.compat_mode:
    508             warnings.warn("The file '%s' has been generated with a "

~/anaconda3/lib/python3.8/pickle.py in load(self)
   1208                     raise EOFError
   1209                 assert isinstance(key, bytes_types)
-> 1210                 dispatch[key[0]](self)
   1211         except _Stop as stopinst:
   1212             return stopinst.value

~/anaconda3/lib/python3.8/pickle.py in load_stack_global(self)
   1533         if type(name) is not str or type(module) is not str:
   1534             raise UnpicklingError("STACK_GLOBAL requires str")
-> 1535         self.append(self.find_class(module, name))
   1536     dispatch[STACK_GLOBAL[0]] = load_stack_global
   1537 

~/anaconda3/lib/python3.8/pickle.py in find_class(self, module, name)
   1577         __import__(module, level=0)
   1578         if self.proto >= 4:
-> 1579             return _getattribute(sys.modules[module], name)[0]
   1580         else:
   1581             return getattr(sys.modules[module], name)

~/anaconda3/lib/python3.8/pickle.py in _getattribute(obj, name)
    329             obj = getattr(obj, subpath)
    330         except AttributeError:
--> 331             raise AttributeError("Can't get attribute {!r} on {!r}"
    332                                  .format(name, obj)) from None
    333     return obj, parent

AttributeError: Can't get attribute 'Star_Destroyer' on <module '__main__'>

Unfortunately while it is more efficient, “joblib.dump() and joblib.load() are based on the Python pickle serialization model”. We won’t find our solution here.

Dill#

In the shadows of the stack overflow replies, you’ll see references to dill. I’m usually reluctant to add other dependencies, but was desperate enough to give this a try. I was particularly encouraged by their description:

In addition to pickling python objects, dill provides the ability to save the state of an interpreter session in a single command. Hence, it would be feasable to save an interpreter session, close the interpreter, ship the pickled file to another computer, open a new interpreter, unpickle the session and thus continue from the ‘saved’ state of the original interpreter session.

dill can be used to store python objects to a file, but the primary usage is to send python objects across the network as a byte stream. dill is quite flexible, and allows arbitrary user defined classes and functions to be serialized.

import dill

with open("setup/thing.dill", "rb") as f:
    sd = dill.load(f)
print(sd.ammo)
99

That’s what we wanted to see! We can import the Star_Destroyer object and it even has the state it had when it was exported (we’d fired the laser once).

Conclusion#

Is there a better way to do this? I’m all ears.

If I had players submit whole .py files, we still have dependency challenges.

I think the “best” solution would probably be to have the players expose an API that I can query to compare the models. However, this requires a bit more development from the players and increases the barrier to entry.