天天看點

kube-proxy源碼分析:深入淺出了解k8s的services工作原理

在 kubernetes 中的 services 是一組同 label 類型 pod 的服務抽象,為邏輯上的一組Pod提供了一緻的通路政策,也可以了解為微服務(micro-service)。

我們先來看看 services 在 k8s 中的資源描述:

services:

apiVersion: v1
kind: Service
metadata:
  name: playmate-model
  namespace: rcmd
spec:
  clusterIP: 10.247.168.174
  ports:
  - name: grpc
    port: 8000
    protocol: TCP
    targetPort: 8000
  selector:
    app: playmate-model
  sessionAffinity: None
  type: ClusterIP           

我們可以看到 services 隻提供 clusterIP 和一個 selector, 哪他是如何将流量轉發到對應的pod 呢? 這裡需要引入一個k8s資源概念 endpoints,我們看看 endpoints 的資源檔案,可以通過

kubectl get endpoints <名稱>

得到:

endpoints

apiVersion: v1
kind: Endpoints
metadata:
  name: playmate-model
  namespace: rcmd
subsets:
- addresses:
  - ip: 10.0.2.137
    nodeName: 10.213.20.91
    targetRef:
      kind: Pod
      name: playmate-model-8cff8978f-mwtkt
      namespace: rcmd
      resourceVersion: "17169006"
      uid: 86898a37-4ff5-441d-a863-c70fa65d78f0
  notReadyAddresses:
  - ip: 10.0.2.130
    nodeName: 10.213.20.92
    targetRef:
      kind: Pod
      name: playmate-model-89c8d5f8f-t2r5n
      namespace: rcmd
      resourceVersion: "15708217"
      uid: 6884d0c3-e37f-4907-9b43-2903224f7773
  ports:
  - name: grpc
    port: 8000
    protocol: TCP           

可以看出,最簡單的 services 結構包含了一個 clusterIP,暴露端口和一個 selector 選擇器。 endpoints 對象包含末端目标的ip位址和目标點的pod的資訊。

而我們一般建立 serivces 并不需要手動建立 endpoints,叢集會嘗試根據 selector 找到對應的pod資訊,然後基于找到的比對的pod資訊建立 pod 為後端的 endpoints 對象,流程如下圖:

kube-proxy源碼分析:深入淺出了解k8s的services工作原理

當然我們也可以手動建立 endpoints,比如我們為外部服務提供叢集内的服務發現可以手動設定 services 和 endpoints:

apiVersion: v1
kind: Service
metadata:
  name: hbase-broker-1
  namespace: rcmd
spec:
  clusterIP: 10.247.180.39
  ports:
  - port: 2181
    protocol: TCP
    targetPort: 2181
  sessionAffinity: None
  type: ClusterIP           
apiVersion: v1
kind: Endpoints
metadata:
  name: hbase-broker-1
  namespace: rcmd
subsets:
- addresses:
  - ip: 10.10.14.115 # 外部服務位址
  ports:
  - port: 2181
    protocol: TCP           

從 kube-proxy 看 services 工作機制

kube-proxy 是負責 services 和 endpoints 在各節點的具體實作,kube-proxy 和 kubelet 一樣會在每個節點都運作一個執行個體,為 services 提供做簡單的 TCP, UDP和 SCTP 流量轉發,轉發到對應的目标(endpoints)。下面通過 kube-proxy 源碼解讀可以更好地了解 services 和 endpoints 的運作機制。

這裡是基于release-1.17,commit sha:15600ef9855dbdfd6e07f6a76e660b1b6030387e

先從

cmd/kube-proxy/proxy.go

開始:

...
func main() {
    ...
    command := app.NewProxyCommand()
    ...
    if err := command.Execute(); err != nil {
        os.Exit(1)
    }
}           

這是個k8s指令的标準源碼,聲明指令調用

Execute

(讀過 k8s 指令源碼得都知道都是這個讨論~:)

看下

NewProxyCommond

方法:

// NewProxyCommand creates a *cobra.Command object with default parameters
func NewProxyCommand() *cobra.Command {
    opts := NewOptions()

    cmd := &cobra.Command{
        Use: "kube-proxy",
        Long: `The Kubernetes network proxy runs on each node. This
reflects services as defined in the Kubernetes API on each node and can do simple
TCP, UDP, and SCTP stream forwarding or round robin TCP, UDP, and SCTP forwarding across a set of backends.
Service cluster IPs and ports are currently found through Docker-links-compatible
environment variables specifying ports opened by the service proxy. There is an optional
addon that provides cluster DNS for these cluster IPs. The user must create a service
with the apiserver API to configure the proxy.`,
        Run: func(cmd *cobra.Command, args []string) {
            verflag.PrintAndExitIfRequested()
            cliflag.PrintFlags(cmd.Flags())

            if err := initForOS(opts.WindowsService); err != nil {
                klog.Fatalf("failed OS init: %v", err)
            }

            if err := opts.Complete(); err != nil {
                klog.Fatalf("failed complete: %v", err)
            }
            if err := opts.Validate(); err != nil {
                klog.Fatalf("failed validate: %v", err)
            }
      // 這裡是執行 -->
            if err := opts.Run(); err != nil {
                klog.Exit(err)
            }
        },
        Args: func(cmd *cobra.Command, args []string) error {
            for _, arg := range args {
                if len(arg) > 0 {
                    return fmt.Errorf("%q does not take any arguments, got %q", cmd.CommandPath(), args)
                }
            }
            return nil
        },
    }
    ...
    return cmd
}

// Run runs the specified ProxyServer.
func (o *Options) Run() error {
    defer close(o.errCh)
    if len(o.WriteConfigTo) > 0 {
        return o.writeConfigFile()
    }
    // 聲明一個 proxyserver 對象
    proxyServer, err := NewProxyServer(o)
    if err != nil {
        return err
    }

    if o.CleanupAndExit {
        return proxyServer.CleanupAndExit()
    }

    o.proxyServer = proxyServer
    // proxyserver 執行
    return o.runLoop()
}

// runLoop will watch on the update change of the proxy server's configuration file.
// Return an error when updated
func (o *Options) runLoop() error {
    if o.watcher != nil {
        o.watcher.Run()
    }

    // run the proxy in goroutine
    go func() {
        err := o.proxyServer.Run()
        o.errCh <- err
    }()

    for {
        err := <-o.errCh
        if err != nil {
            return err
        }
    }
}           

Kubernetes 網絡代理在每個節點上運作。網絡代理反映了每個節點上 Kubernetes API 中定義的服務,并且可以執行簡單的 TCP、UDP 和 SCTP 流轉發,或者在一組後端進行循環 TCP、UDP 和 SCTP 轉發。目前可通過 Docker-links-compatible 環境變量找到服務叢集 IP 和端口,這些環境變量指定了服務代理打開的端口。有一個可選的插件,可以為這些叢集 IP 提供叢集 DNS。使用者必須使用 apiserver API 建立服務才能配置代理。

cobra.Command

是一個标準的 k8s 指令行結構體,直接看 RUN 方法就可以了,RUN 裡面有一個段對 kube-proxy 指令的描述:

這裡核心的是

opts.Run()

方法,進去後可以看到一個

NewProxyServer

聲明了一個

ProxyServer

結構體,調用了

o.proxyServer.Run()

方法,我們先看看 proxyServer 結構體。

ProxyServer 三種運作模式

// NewProxyServer returns a new ProxyServer.
func NewProxyServer(o *Options) (*ProxyServer, error) {
    return newProxyServer(o.config, o.CleanupAndExit, o.master)
}

func newProxyServer(
    config *proxyconfigapi.KubeProxyConfiguration,
    cleanupAndExit bool,
    master string) (*ProxyServer, error) {
    ...
    proxyMode := getProxyMode(string(config.Mode), kernelHandler, ipsetInterface, iptables.LinuxKernelCompatTester{})
    ...
    if proxyMode == proxyModeIPTables {
        klog.V(0).Info("Using iptables Proxier.")
        ...
        proxier, err = iptables.NewProxier(
            iptInterface,
            utilsysctl.New(),
            execer,
            config.IPTables.SyncPeriod.Duration,
            config.IPTables.MinSyncPeriod.Duration,
            config.IPTables.MasqueradeAll,
            int(*config.IPTables.MasqueradeBit),
            config.ClusterCIDR,
            hostname,
            nodeIP,
            recorder,
            healthzServer,
            config.NodePortAddresses,
        )
        ...
    } else if proxyMode == proxyModeIPVS {
        klog.V(0).Info("Using ipvs Proxier.")
        ...
        proxier, err = ipvs.NewProxier(
            iptInterface,
            ipvsInterface,
            ipsetInterface,
            utilsysctl.New(),
            execer,
            config.IPVS.SyncPeriod.Duration,
            config.IPVS.MinSyncPeriod.Duration,
            config.IPVS.ExcludeCIDRs,
            config.IPVS.StrictARP,
            config.IPTables.MasqueradeAll,
            int(*config.IPTables.MasqueradeBit),
            config.ClusterCIDR,
            hostname,
            nodeIP,
            recorder,
            healthzServer,
            config.IPVS.Scheduler,
            config.NodePortAddresses,
        )
        ...
    } else {
        klog.V(0).Info("Using userspace Proxier.")

        // TODO this has side effects that should only happen when Run() is invoked.
        proxier, err = userspace.NewProxier(
            userspace.NewLoadBalancerRR(),
            net.ParseIP(config.BindAddress),
            iptInterface,
            execer,
            *utilnet.ParsePortRangeOrDie(config.PortRange),
            config.IPTables.SyncPeriod.Duration,
            config.IPTables.MinSyncPeriod.Duration,
            config.UDPIdleTimeout.Duration,
            config.NodePortAddresses,
        )
        ...
    }
    return &ProxyServer{
        Client:                 client,
        EventClient:            eventClient,
        IptInterface:           iptInterface,
        IpvsInterface:          ipvsInterface,
        IpsetInterface:         ipsetInterface,
        execer:                 execer,
        Proxier:                proxier,
        Broadcaster:            eventBroadcaster,
        Recorder:               recorder,
        ConntrackConfiguration: config.Conntrack,
        Conntracker:            &realConntracker{},
        ProxyMode:              proxyMode,
        NodeRef:                nodeRef,
        MetricsBindAddress:     config.MetricsBindAddress,
        EnableProfiling:        config.EnableProfiling,
        OOMScoreAdj:            config.OOMScoreAdj,
        ConfigSyncPeriod:       config.ConfigSyncPeriod.Duration,
        HealthzServer:          healthzServer,
        UseEndpointSlices:      utilfeature.DefaultFeatureGate.Enabled(features.EndpointSlice),
    }, nil
}           

NewProxyServer

提供了三種運作模式,iptables、IPVS和 userspace,userspace 代理模式算比較舊的一種模式,在 Kubernetes v1.0 中開始使用 user space, v1.2的時候預設模式已經改為 iptables 了,現在大部分叢集中都是這種模式。

kube-proxy源碼分析:深入淺出了解k8s的services工作原理

userspace 模式其實就是直接通過kube-proxy 将資料包轉發到後端 Pods,kube-proxy 在這裡起到了路由規則下發、包轉發規則、負載均衡的功能,由于 kube-proxy 是運作在使用者空間的,會存在使用者空間和核心空間的頻繁切換,這對性能影響很大,是以後面預設就換成 iptables 了。

iptables 基于 netfilter 實作,所有操作都在核心空間相比基于 kube-proxy 直接做轉發和負載均衡在性能上得到很大的提升。這裡 kube-proxy 隻是起到設定 iptables 的規則作用。

另一個是 IPVS, IPVS 在性能上比 iptables 更進一步,底層和 iptables 一樣是基于 netfilter ,但IPVS 基于hash tabels來存儲網絡轉發規則相比于 iptables 這種線性 O(n) 的算法要快很多。

但 1.17 版下 iptables 還是 kube-proxy 的預設選項,應該用得人也是最多得,這裡就隻介紹 iptables 的轉發方式。

kube-proxy源碼分析:深入淺出了解k8s的services工作原理
kube-proxy源碼分析:深入淺出了解k8s的services工作原理
kube-proxy源碼分析:深入淺出了解k8s的services工作原理

kube-proxy核心運作邏輯

okay,了解完 ProxyServer 結構,我們繼續看看 kube-proxy 核心運作邏輯,也就是

Run()

方法:

// Run runs the specified ProxyServer.  This should never exit (unless CleanupAndExit is set).
// TODO: At the moment, Run() cannot return a nil error, otherwise it's caller will never exit. Update callers of Run to handle nil errors.
func (s *ProxyServer) Run() error {
    // 監控檢查
    // Start up a healthz server if requested
    if s.HealthzServer != nil {
        s.HealthzServer.Run()
    }
    // metrics 名額上報
    // Start up a metrics server if requested
    if len(s.MetricsBindAddress) > 0 {
        proxyMux := mux.NewPathRecorderMux("kube-proxy")
        healthz.InstallHandler(proxyMux)
        proxyMux.HandleFunc("/proxyMode", func(w http.ResponseWriter, r *http.Request) {
            w.Header().Set("Content-Type", "text/plain; charset=utf-8")
            w.Header().Set("X-Content-Type-Options", "nosniff")
            fmt.Fprintf(w, "%s", s.ProxyMode)
        })
        proxyMux.Handle("/metrics", legacyregistry.Handler())
        ...
    }
    ...

    // 通過 client-go 的 informer 像 api-server 擷取資訊
    // Make informers that filter out objects that want a non-default service proxy.
    informerFactory := informers.NewSharedInformerFactoryWithOptions(s.Client, s.ConfigSyncPeriod,
        informers.WithTweakListOptions(func(options *metav1.ListOptions) {
            options.LabelSelector = labelSelector.String()
        }))

    // Create configs (i.e. Watches for Services and Endpoints or EndpointSlices)
    // Note: RegisterHandler() calls need to happen before creation of Sources because sources
    // only notify on changes, and the initial update (on process start) may be lost if no handlers
    // are registered yet.
    // 建立 services
    serviceConfig := config.NewServiceConfig(informerFactory.Core().V1().Services(), s.ConfigSyncPeriod)
    serviceConfig.RegisterEventHandler(s.Proxier)
    // serviceconfig 執行
    go serviceConfig.Run(wait.NeverStop)
    // 建立 endpoints 或 endpointSlice
    if s.UseEndpointSlices {
        endpointSliceConfig := config.NewEndpointSliceConfig(informerFactory.Discovery().V1beta1().EndpointSlices(), s.ConfigSyncPeriod)
        endpointSliceConfig.RegisterEventHandler(s.Proxier)
        go endpointSliceConfig.Run(wait.NeverStop)
    } else {
        endpointsConfig := config.NewEndpointsConfig(informerFactory.Core().V1().Endpoints(), s.ConfigSyncPeriod)
        endpointsConfig.RegisterEventHandler(s.Proxier)
        go endpointsConfig.Run(wait.NeverStop)
    }
    informerFactory.Start(wait.NeverStop)
    ...

    // Just loop forever for now...
    s.Proxier.SyncLoop()
    return nil
}
           

從這裡可以看到, ProxyServer 主要包括幾步:

  • 監控檢查
  • metrics資料上報
  • 通過 client-go 從apiserver 擷取 services 和 endpoints/endpointSlice 配置
  • 建立 services 和 endpoints/endpointSlice
  • 進入循環

kube-proxy在 iptables 實作

在了解

ProxyServer

結構和運作原理之後,我們來看看 kube-proxy 是如何通過 iptables 建立 services 和 endpoints 的。

NewServiceConfig和NewEndpointSliceConfig

ServiceConfig 和 EndpointsConfig 是kube-proxy中用于監聽service變化的元件,核心是三個方法

AddFunc

UpdateFunc

DeleteFunc

// NewServiceConfig creates a new ServiceConfig.
func NewServiceConfig(serviceInformer coreinformers.ServiceInformer, resyncPeriod time.Duration) *ServiceConfig {
    result := &ServiceConfig{
        listerSynced: serviceInformer.Informer().HasSynced,
    }

    serviceInformer.Informer().AddEventHandlerWithResyncPeriod(
        cache.ResourceEventHandlerFuncs{
            AddFunc:    result.handleAddService,
            UpdateFunc: result.handleUpdateService,
            DeleteFunc: result.handleDeleteService,
        },
        resyncPeriod,
    )

    return result
}

// NewEndpointsConfig creates a new EndpointsConfig.
func NewEndpointsConfig(endpointsInformer coreinformers.EndpointsInformer, resyncPeriod time.Duration) *EndpointsConfig {
    result := &EndpointsConfig{
        listerSynced: endpointsInformer.Informer().HasSynced,
    }

    endpointsInformer.Informer().AddEventHandlerWithResyncPeriod(
        cache.ResourceEventHandlerFuncs{
            AddFunc:    result.handleAddEndpoints,
            UpdateFunc: result.handleUpdateEndpoints,
            DeleteFunc: result.handleDeleteEndpoints,
        },
        resyncPeriod,
    )

    return result
}           

以 service 為例(endpoints和services類似)三個方法分别對應着

OnServiceAdd

OnServiceUpdate

OnServiceDelete

,這三個都是接口方法。

func (c *ServiceConfig) handleAddService(obj interface{}) {
    service, ok := obj.(*v1.Service)
    ...
    for i := range c.eventHandlers {
        klog.V(4).Info("Calling handler.OnServiceAdd")
        c.eventHandlers[i].OnServiceAdd(service)
    }
}

func (c *ServiceConfig) handleUpdateService(oldObj, newObj interface{}) {
    oldService, ok := oldObj.(*v1.Service)
    service, ok := newObj.(*v1.Service)
    ...
    for i := range c.eventHandlers {
        klog.V(4).Info("Calling handler.OnServiceUpdate")
        c.eventHandlers[i].OnServiceUpdate(oldService, service)
    }
}

func (c *ServiceConfig) handleDeleteService(obj interface{}) {
    service, ok := obj.(*v1.Service)
    ...
    for i := range c.eventHandlers {
        klog.V(4).Info("Calling handler.OnServiceDelete")
        c.eventHandlers[i].OnServiceDelete(service)
    }
}           

這些接口的具體實作我們可以找到對應的實作,這裡以 iptables 為例,因為找到

pkg/proxy/iptables/proxier.go

下面

Proxier

對應方法實作:

// OnServiceAdd is called whenever creation of new service object
// is observed.
func (proxier *Proxier) OnServiceAdd(service *v1.Service) {
    proxier.OnServiceUpdate(nil, service)
}
// OnServiceDelete is called whenever deletion of an existing service
// object is observed.
func (proxier *Proxier) OnServiceDelete(service *v1.Service) {
    proxier.OnServiceUpdate(service, nil)

}
// OnServiceUpdate is called whenever modification of an existing
// service object is observed.
func (proxier *Proxier) OnServiceUpdate(oldService, service *v1.Service) {
    if proxier.serviceChanges.Update(oldService, service) && proxier.isInitialized() {
        proxier.Sync()
    }
}           

可以看到這增、删、改三個方法都是

Update

實作的

serviceConfig.Run和endpointsConfig.Run

serviceConfig.Run

// RegisterEventHandler registers a handler which is called on every service change.
func (c *ServiceConfig) RegisterEventHandler(handler ServiceHandler) {
    c.eventHandlers = append(c.eventHandlers, handler)
}

// Run waits for cache synced and invokes handlers after syncing.
func (c *ServiceConfig) Run(stopCh <-chan struct{}) {
    klog.Info("Starting service config controller")

    if !cache.WaitForNamedCacheSync("service config", stopCh, c.listerSynced) {
        return
    }

    for i := range c.eventHandlers {
        klog.V(3).Info("Calling handler.OnServiceSynced()")
        c.eventHandlers[i].OnServiceSynced()
    }
}           

這裡的 eventHandler 就是一個包含

Proxier

的數組,這裡核心是

OnServiceSynced

方法,找到具體iptables proxier 的這個方法的具體實作:

// OnServiceSynced is called once all the initial even handlers were
// called and the state is fully propagated to local cache.
func (proxier *Proxier) OnServiceSynced() {
    proxier.mu.Lock()
    proxier.servicesSynced = true
    if utilfeature.DefaultFeatureGate.Enabled(features.EndpointSlice) {
        proxier.setInitialized(proxier.endpointSlicesSynced)
    } else {
        proxier.setInitialized(proxier.endpointsSynced)
    }
    proxier.mu.Unlock()

    // Sync unconditionally - this is called once per lifetime.
    proxier.syncProxyRules()
}

// This is where all of the iptables-save/restore calls happen.
// The only other iptables rules are those that are setup in iptablesInit()
// This assumes proxier.mu is NOT held
func (proxier *Proxier) syncProxyRules() {
    ...
}           

kube-proxy 在 iptables 規則

到了核心的

syncProxyRules()

方法了,我們先看關鍵 iptables 的 chain 建立:

func (proxier *Proxier) syncProxyRules() {
  ...
  // Create and link the kube chains.
  // 建立連結清單
    for _, jump := range iptablesJumpChains {
        if _, err := proxier.iptables.EnsureChain(jump.table, jump.dstChain); err != nil {
            klog.Errorf("Failed to ensure that %s chain %s exists: %v", jump.table, jump.dstChain, err)
            return
        }
        args := append(jump.extraArgs,
            "-m", "comment", "--comment", jump.comment,
            "-j", string(jump.dstChain),
        )
        if _, err := proxier.iptables.EnsureRule(utiliptables.Prepend, jump.table, jump.srcChain, args...); err != nil {
            klog.Errorf("Failed to ensure that %s chain %s jumps to %s: %v", jump.table, jump.srcChain, jump.dstChain, err)
            return
        }
  }
  ...
    // Write table headers.
    writeLine(proxier.filterChains, "*filter")
    writeLine(proxier.natChains, "*nat")

    // Make sure we keep stats for the top-level chains, if they existed
  // (which most should have because we created them above).
  // 會擷取所有存在的chains rule,然後将新的 chains rule 加入到最前面
    for _, chainName := range []utiliptables.Chain{kubeServicesChain, kubeExternalServicesChain, kubeForwardChain} {
        if chain, ok := existingFilterChains[chainName]; ok {
            writeBytesLine(proxier.filterChains, chain)
        } else {
            writeLine(proxier.filterChains, utiliptables.MakeChainLine(chainName))
        }
    }
    for _, chainName := range []utiliptables.Chain{kubeServicesChain, kubeNodePortsChain, kubePostroutingChain, KubeMarkMasqChain} {
        if chain, ok := existingNATChains[chainName]; ok {
            writeBytesLine(proxier.natChains, chain)
        } else {
            writeLine(proxier.natChains, utiliptables.MakeChainLine(chainName))
        }
  }
  ...
}

var iptablesJumpChains = []iptablesJumpChain{
    {utiliptables.TableFilter, kubeExternalServicesChain, utiliptables.ChainInput, "kubernetes externally-visible service portals", []string{"-m", "conntrack", "--ctstate", "NEW"}},
    {utiliptables.TableFilter, kubeServicesChain, utiliptables.ChainForward, "kubernetes service portals", []string{"-m", "conntrack", "--ctstate", "NEW"}},
    {utiliptables.TableFilter, kubeServicesChain, utiliptables.ChainOutput, "kubernetes service portals", []string{"-m", "conntrack", "--ctstate", "NEW"}},
    {utiliptables.TableFilter, kubeServicesChain, utiliptables.ChainInput, "kubernetes service portals", []string{"-m", "conntrack", "--ctstate", "NEW"}},
    {utiliptables.TableFilter, kubeForwardChain, utiliptables.ChainForward, "kubernetes forwarding rules", nil},
    {utiliptables.TableNAT, kubeServicesChain, utiliptables.ChainOutput, "kubernetes service portals", nil},
    {utiliptables.TableNAT, kubeServicesChain, utiliptables.ChainPrerouting, "kubernetes service portals", nil},
    {utiliptables.TableNAT, kubePostroutingChain, utiliptables.ChainPostrouting, "kubernetes postrouting rules", nil},
}

const (
    // the services chain
    kubeServicesChain utiliptables.Chain = "KUBE-SERVICES"
    // the external services chain
    kubeExternalServicesChain utiliptables.Chain = "KUBE-EXTERNAL-SERVICES"
    // the nodeports chain
    kubeNodePortsChain utiliptables.Chain = "KUBE-NODEPORTS"
    // the kubernetes postrouting chain
    kubePostroutingChain utiliptables.Chain = "KUBE-POSTROUTING"
    // KubeMarkMasqChain is the mark-for-masquerade chain
    KubeMarkMasqChain utiliptables.Chain = "KUBE-MARK-MASQ"
    // KubeMarkDropChain is the mark-for-drop chain
    KubeMarkDropChain utiliptables.Chain = "KUBE-MARK-DROP"
    // the kubernetes forward chain
    kubeForwardChain utiliptables.Chain = "KUBE-FORWARD"
)           

可以看到 kube-proxy 的 chain 都是在 filter 表和 nat 表下建立的。

我們可以通過

iptables -S -t <表名>

看對應的表的chain規則。

這裡找一個安裝了 k8s 的節點,通過iptables指令看 nat 表:

...
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

...
-A KUBE-SERVICES -d 10.247.91.74/32 -p tcp -m comment --comment "rcmd/playmate-rank:grpc cluster IP" -m tcp --dport 8000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.247.91.74/32 -p tcp -m comment --comment "rcmd/playmate-rank:grpc cluster IP" -m tcp --dport 8000 -j KUBE-SVC-YTWGRZ3E3MPBXGU3

...
-A KUBE-SVC-YTWGRZ3E3MPBXGU3 -j KUBE-SEP-EVJ6H5FW5OUSCV2Y

...
-A KUBE-SEP-EVJ6H5FW5OUSCV2Y -s 10.0.2.250/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-EVJ6H5FW5OUSCV2Y -p tcp -m tcp -j DNAT --to-destination 10.0.2.250:8000           

對于LB的路由路徑:

...
-A KUBE-SERVICES -d 10.247.248.210/32 -p tcp -m comment --comment "istio-system/istio-ingressgateway-external:rcmd cluster IP" -m tcp --dport 8000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.247.248.210/32 -p tcp -m comment --comment "istio-system/istio-ingressgateway-external:rcmd cluster IP" -m tcp --dport 8000 -j KUBE-SVC-TBZWFMENS353FQVB
-A KUBE-SERVICES -d <公網ip>/32 -p tcp -m comment --comment "istio-system/istio-ingressgateway-external:rcmd loadbalancer IP" -m tcp --dport 8000 -j KUBE-FW-TBZWFMENS353FQVB

...
-A KUBE-FW-TBZWFMENS353FQVB -m comment --comment "istio-system/istio-ingressgateway-external:rcmd loadbalancer IP" -j KUBE-MARK-MASQ
-A KUBE-FW-TBZWFMENS353FQVB -m comment --comment "istio-system/istio-ingressgateway-external:rcmd loadbalancer IP" -j KUBE-SVC-TBZWFMENS353FQVB
-A KUBE-FW-TBZWFMENS353FQVB -m comment --comment "istio-system/istio-ingressgateway-external:rcmd loadbalancer IP" -j KUBE-MARK-DROP           

對于 nodeport 的形式:

...
-A KUBE-SERVICES -d <節點ip>/32 -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -j KUBE-NODEPORTS
...
-A KUBE-NODEPORTS -p tcp -m comment --comment "istio-system/istio-ingressgateway-external:rcmd" -m tcp --dport <目标端口> -j KUBE-MARK-MASQ
-A KUBE-NODEPORTS -p tcp -m comment --comment "istio-system/istio-ingressgateway-external:rcmd" -m tcp --dport <目标端口> -j KUBE-SVC-TBZWFMENS353FQVB           

這裡的

KUBE-SVC-YTWGRZ3E3MPBXGU3

KUBE-SEP-EVJ6H5FW5OUSCV2Y

KUBE-FW-TBZWFMENS353FQVB

這些是通過對 servicePortName 和 協定生成:

// servicePortChainName takes the ServicePortName for a service and
// returns the associated iptables chain.  This is computed by hashing (sha256)
// then encoding to base32 and truncating with the prefix "KUBE-SVC-".
func servicePortChainName(servicePortName string, protocol string) utiliptables.Chain {
    return utiliptables.Chain("KUBE-SVC-" + portProtoHash(servicePortName, protocol))
}

// serviceFirewallChainName takes the ServicePortName for a service and
// returns the associated iptables chain.  This is computed by hashing (sha256)
// then encoding to base32 and truncating with the prefix "KUBE-FW-".
func serviceFirewallChainName(servicePortName string, protocol string) utiliptables.Chain {
    return utiliptables.Chain("KUBE-FW-" + portProtoHash(servicePortName, protocol))
}

// serviceLBPortChainName takes the ServicePortName for a service and
// returns the associated iptables chain.  This is computed by hashing (sha256)
// then encoding to base32 and truncating with the prefix "KUBE-XLB-".  We do
// this because IPTables Chain Names must be <= 28 chars long, and the longer
// they are the harder they are to read.
func serviceLBChainName(servicePortName string, protocol string) utiliptables.Chain {
    return utiliptables.Chain("KUBE-XLB-" + portProtoHash(servicePortName, protocol))
}

func portProtoHash(servicePortName string, protocol string) string {
    hash := sha256.Sum256([]byte(servicePortName + protocol))
    encoded := base32.StdEncoding.EncodeToString(hash[:])
    return encoded[:16]
}           

到這裡對

繼續閱讀