天天看点

sklearn决策树可视化

Step 1:Export a decision tree in DOT format.

https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz

>>> from sklearn.datasets import load_iris
>>> from sklearn import tree

>>> clf = tree.DecisionTreeClassifier()
>>> iris = load_iris()

>>> clf = clf.fit(iris.data, iris.target)
>>> tree.export_graphviz(clf,
...     out_file='tree.dot')            

Step 2:

Set export_graphviz out_file='None', use pydotplus.graph_from_dot_data to convert dot to graph. 

https://pythonprogramminglanguage.com/decision-tree-visual-example/

# Visualize data
dot_data = tree.export_graphviz(clf,
                                feature_names=data_feature_names,
                                out_file=None,
                                filled=True,
                                rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)

colors = ('turquoise', 'orange')
edges = collections.defaultdict(list)

for edge in graph.get_edge_list():
    edges[edge.get_source()].append(int(edge.get_destination()))

for edge in edges:
    edges[edge].sort()    
    for i in range(2):
        dest = graph.get_node(str(edges[edge][i]))[0]
        dest.set_fillcolor(colors[i])

graph.write_png('tree.png')           
sklearn决策树可视化

More details go to : https://scikit-learn.org/stable/modules/tree.html#classification

DecisionTreeClassifier

 is a class capable of performing multi-class classification on a dataset.

As with other classifiers, 

DecisionTreeClassifier

 takes as input two arrays: an array X, sparse or dense, of size 

[n_samples, n_features]

 holding the training samples, and an array Y of integer values, size 

[n_samples]

, holding the class labels for the training samples:

>>> from sklearn import tree >>> X = [[0, 0], [1, 1]] >>> Y = [0, 1] >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(X, Y)

After being fitted, the model can then be used to predict the class of samples:

>>> clf.predict([[2., 2.]]) array([1])

Alternatively, the probability of each class can be predicted, which is the fraction of training samples of the same class in a leaf:

>>> clf.predict_proba([[2., 2.]]) array([[0., 1.]])

DecisionTreeClassifier

 is capable of both binary (where the labels are [-1, 1]) classification and multiclass (where the labels are [0, …, K-1]) classification.

Using the Iris dataset, we can construct a tree as follows:

>>> from sklearn.datasets import load_iris >>> from sklearn import tree >>> iris = load_iris() >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(iris.data, iris.target)

Once trained, we can export the tree in Graphviz format using the 

export_graphviz

 exporter. If you use the conda package manager, the graphviz binaries and the python package can be installed with

conda install python-graphviz

Alternatively binaries for graphviz can be downloaded from the graphviz project homepage, and the Python wrapper installed from pypi with pip install graphviz.

Below is an example graphviz export of the above tree trained on the entire iris dataset; the results are saved in an output file iris.pdf:

>>> import graphviz >>> dot_data = tree.export_graphviz(clf, out_file=None) >>> graph = graphviz.Source(dot_data) >>> graph.render("iris")

The 

export_graphviz

 exporter also supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. Jupyter notebooks also render these plots inline automatically:

>>> dot_data = tree.export_graphviz(clf, out_file=None, ... feature_names=iris.feature_names, ... class_names=iris.target_names, ... filled=True, rounded=True, ... special_characters=True) >>> graph = graphviz.Source(dot_data) >>> graph

继续阅读