About Python in general
Python is a rapidly evolving open-source, general-purpose, object-oriented programming language. Its syntax is extremely simple and easy to learn. The huge library of programs and modules, with easy invocation of external programs, makes it very flexible to use under any operating system. The add-on packages (beyond mathematical, graphical, and statistical applications) allow you to use a wide range of data mining algorithms. For this reason, it is becoming increasingly popular in Data Science. The algorithms include of course the classic segmentation and classification algorithms, and thanks to an enthusiastic community of developers, the latest Machine Learning algorithms.
Python in SPSS Modeler
In IBM SPSS Modeler, it is possible to create nodes within a stream, modify their properties, and run them (the nodes that can be run) using a script. This can be done using the so-called Stream script. If we want to manipulate streams (create, modify, pass parameters between them, etc.), we can use Standalone script. These are especially useful when you want to automate processes.
The scripting language in Modeler before version 16 was the traditional (Legacy) scripting language, written specifically for Modeler. In parallel, from version 16 onwards, you can also write scripts in Python (actually Jython), which is the default and supported scripting language in Modeler.
The next milestone in Modeler's Python integration (from version 17.1) is the creation of custom nodes written in Python using the Custom Node Dialog Builder. From version 18.1 onwards, you don't even need the Custom Node Dialog Builder to do this: you can use the Extension nodes in the palettes. The basic Python libraries/packages are embedded in the Modeler, along with several (popular among data miners) packages (e.g. numpy, pandas, scipy, xgboost, matplotlib, imblearn). Data manipulation and parallel execution for large data sets is supported by Spark (pyspark), which is also embedded. Of course, we can also use the Spark Machine Learning Library, complementing the algorithms already existing in Modeler. It should be noted here that the creation of Custom Nodes and the aforementioned Scripting in Modeler are independent functions that can be used in parallel. For Scripting, Jython (the Java implementation of Python) is used, and for creating Custom nodes, CPython is used.
The Python integration in Modeler does not stop there, Python-based nodes are constantly being added to the program: e.g. SMOTE, XGBoost, One-Class SVM. Python packages used by the Data Scientist community are also evolving and new ones are being created, so we expect to see new Python-related features in Modeler in the coming years.
Benefits
As Python is open source, there are many examples and training materials available online. By importing external modules, we can implement solutions that were not previously available in Modeler. You can define your functions and object classes, and Python provides support for exception-based error handling. Python nodes created using the Custom Node Dialog Builder can be shared with users who are not familiar with Python, allowing them to use the new features without learning the language.
Summary
In summary, the integration of Python in Modeler opens up a whole new range of uses that were not previously available in Modeler. In addition, users who are not familiar with Python can take advantage of the extended functionality.