Should I Use Spype?

WARNING: spype is brand new. Don’t use it in production until it matures a little.

The following questions will help you decide if Spype is the right library for you, or if you are better of using something else.

Can you use python 3.6 or greater?

Spype only runs on python 3.6+, so if you are still stuck on 2.7 you cannot use spype. Also, I feel for you.

Does my data fit on a single machine?

Although spype is designed to play nice multiprocessing and multithreading, it is not really designed to run across a network. If the data you are trying to process do not fit on a single machine I recommend you look elsewhere.

Do you want to limit external dependencies?

Spype does not have any required external dependencies, it runs on pure python.

Do you value expressiveness and maintainability over short execution time?

Spype provides a rich, concise API, and provides several features to help you discipline your data flows. However, these features come at performance cost that may or may not be significant depending on your application.

Do you want to “push” or “pull” you data?

If you have a piece of data and you can describe the steps that need to be performed on it, you probably want to use a pipeline system that implements a “push” paradigm (like spype). If, however, you can describe the data you want to generate, and steps required to do so, you probably want a “pull” system (like dask’s custom graphs).

Similar Projects

The most similar project to spype that I have encountered is consecution. It looks like an excellent package, be sure to take a look at it.

There are also other libraries that somewhat fit into the same space as spype. Here are a few (in no particular order): luigi, airflow, pinball, dagobah, celery, dask, streamz.

There is also an awesome list of data pipelines you should checkout.