Compared to neural networks, the famous Fourier transform, why is there no unified function approximator? The answer is here

Reports from the Heart of the Machine

Editors: Du Wei, Chen Ping

In fact, for different types of tasks, we can selectively use Fourier transforms or neural networks.

Function approximation is an important part of function theory, involving fundamental problems with the approximate representation of functions. The need for functional approximations occurs in many sub-disciplines of applied mathematics, especially computer science. Specifically, the function approximation problem requires us to choose a function in a well-defined class that matches (or approximates) the objective function in a task-specific manner.

At present, there are many ways to achieve function approximation in the field, such as the Fourier transform and the emerging neural networks in recent years. These function approximators have different methods and effects in their implementation.

Recently, a hot post on reddit "Compares the Fourier transform and neural networks as function approximators"

This is a fundamental issue, according to the poster. He asked, "If the main premise of a neural network is a global function approximator, what are the advantages over other approximators such as the Fourier transform that have also been shown to approximate any function?" Why hasn't the whole of supervised learning become one of the fields of computation of Fourier coefficients?"

Compared to neural networks, the famous Fourier transform, why is there no unified function approximator? The answer is here

Original Address: https://www.reddit.com/r/MachineLearning/comments/ryw53x/d_fourier_transform_vs_nns_as_function/

Netizens have given their interpretation of the above questions.

Fourier transforms, neural networks, should be used on demand

Among the many answers of netizens, the answer of one netizen can be described as high praise, receiving 208 likes. Ta's answer is this:

GaoZan answered a partial screenshot

Most studies have demonstrated this, namely that Fourier series are general-purpose approximators of continuous functions. The (fast) Fourier transform (FFT) can be used to quickly calculate Fourier series from uniformly spaced data, although non-uniform FFTs are also present. FFTs have the following characteristics: if the models are smooth enough, they will achieve spectral convergence, which means that the error is exponentially decreasing (you can see this by the Hurd condition of the coefficients). While Fourier series require periodicity, extensions to their models include chebyshev transforms/Chebyshev polynomials, which have similar spectral convergence, but on [-1,1] they are aperiodic functions.

Neural network convergence rates are not exponential, and even in optimal cases, linear convergence speeds are rarely achieved, so why do many studies use neural network methods? First of all, in computational science, many studies use pseudospectral methods, spectral elements, etc. Even polynomials are general-purpose approximators for a large number of functions (cf. Weierstrass approximation theorem).

Let's go back to the question just now, why neural networks? The answer is because all of these general-purpose approximators are one-dimensional (there are also some approximators specifically designed for low dimensions, such as spherical harmonic functions, but they are suitable for very special cases). You can turn a one-dimensional general-purpose approximator into a multidimensional through tensor product, but if you write it out, you will see the following phenomenon, the one-dimensional general-purpose approximator:

A two-dimensional universal approximator, which takes the following form:

A study of the above formula found that when moving into higher dimensions, new items must be added for each combination of higher-order items. Combinations grow exponentially or approximately exponentially. For example, an expression has 161,700 terms, which also represents only a third-order cross term with 100-dimensional input extensions. Using such an approximator would never fully represent a large image with thousands of pixels.

This way of growing exponentially relative to the input size is called dimensional disaster. The experience of neural networks proves that polynomial cost growth is related to input size, which is why neural networks are used for these big data problems.

But does this mean that Fourier series can better solve the problem of being small enough and smooth enough? Indeed! This is why physically based neural networks and Fourier neural operators cannot compete with good PDE solvers in 3-dimensional conditions. In fact, in the paper Universal Differential Equations for Scientific Machine Learning, which shows how CNN + universal approximators can be mixed into ODE (Universal Differential Equations) in a specific way to automatically discover PDE discretization, the paper shows that for specific cases, Fourier universal approximators work better than neural networks. For this reason, DiffEqFlux.jl includes the classic base layer and tensor product tools, that is, they must be used in the correct context. Keep in mind that spectral convergence requires that the function being approximated be smooth, and when this is violated, you can still get convergence, but at a slow rate.

Neural networks are a tool, Fourier series is a tool, and the Chebyshev series is also a tool. When they are used in a way that conforms to their theoretical characteristics, you can improve performance.

To add a little about the Gibbs phenomenon. If you assume that a function is smooth, then each point affects everywhere else in the domain. You can consider this by looking at the convergence of The taylor series, and as more and more derivatives are obtained correctly, the approximation gets closer and closer to the original function. When an infinite number of derivatives is assumed, the impact of each piece of data is actually global. This is no longer true when you have a discontinuity, so the Gibbs phenomenon is a distortion introduced near the point where this hypothesis is broken. This is a very advanced description, but you can introduce it into spectral analysis because it is where the error bounds need to be smoothed out.

The Fourier transform processes audio signals with ease, but is inefficient in the face of high-dimensional data

Netizen @hillac believes that the Fourier transform is considered a convolutional neural network (CNN) with a set kernel. The data-pre-trained Fourier transform achieves a good approximation. When you look at the skewed cores of a CNN trained on an image, they are reminiscent of the trigger functions of different frequencies found in the Fourier transform. For most applications, the Fourier transform is faster than the CNN, so if the data is easy to process, the Fourier transform can be used.

Neural networks can be trained to better approximate arbitrary data because they do not make the same assumptions about the information carried by the data as the Fourier transforms do. So, while the Fourier transform can easily break down an audio signal into a highly information-dense representation, it can be poorly effective if you try to use it for text data.

Another netizen, @wavefield, said that the Fourier transform is not approximate. It is the conversion of information into the Fourier domain, still containing all the information in the original signal, which is why it can be calculated inversely. It should be noted that some neural network operations are easier to learn in the Fourier domain.

This view is shared. We can convert the Fourier transform to an approximation by finding a subset of frequencies for representation. This can be done efficiently if the loss function (L1) is used.

Some netizens @visualard summarized other characteristics of the Fourier transform and CNN.

Fourier analysis is computed on global signals, while one advantage of CNNs is that they can detect local patterns. Sometimes it makes more sense to break down the entire signal into multiple parts and then make a decision on the global "thing" in the signal.

Some have even pointed out that the Fourier transform is very inefficient for high-dimensional data. The use of random Fourier features is a solution to this, which is similar to a random single hidden layer neural network that trains only the last layer.

As for the similarities and differences between the Fourier transform and the neural network as a function approximator, readers can give their own views in the comment area.

Quickly build an enterprise-grade TTS speech synthesis assistant with NVIDIA Riva

NVIDIA Riva is an SDK that uses GPU acceleration to rapidly deploy high-performance conversational AI services for rapid development of speech AI applications. Riva is designed to help you easily and quickly access sessionAL AI capabilities, out of the box, and quickly build high-level TTS speech synthesis services with a few simple commands and API operations.

January 12, 2022, 19:30-21:00, the main introduction of this online sharing:

Introduction to speech synthesis

Introduction to NVIDIA Riva features

Launch the NVIDIA Riva client to quickly implement text-to-speech functionality

Use Python to quickly build Riva-based TTS speech synthesis service applications

Compared to neural networks, the famous Fourier transform, why is there no unified function approximator? The answer is here

Read on

Why does the sound in the headphones have a sense of direction?

"Zhang Chaoyang's Physics Lesson" continues to explain the hydrogen atom problem to obtain the angular part of the wave function

How does simple harmonic vibration hook up with circular motion?

A new way of deciphering the Riemann hypothesis

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

Simplification and simplification Convergence and convergence "Zhang Chaoyang's Physics Lesson" explains the radial wave function of hydrogen atoms

Journal of Complex and Complex Measurements

Journal of Complex and Complex Measurements

Rainbows can also "give birth" to small rainbows

Tesla AI director decided to take 4 months off! Musk: What about my humanoid robot?

Professor Xidian Jiao Li Cheng's work: "Deep Neural Network FPGA" Latest Research Review

Professor Xidian Jiao Li Cheng's work: "Deep Neural Network FPGA" Latest Research Review

Why is observation | deep learning powerful? Proper neural network architecture + big data

Saw her rebuild the Tower of Babel

"Deep Learning Attention Mechanisms" TKDE 2022 Research Review

The night can also be colorful, and the full-color night vision system is realized with deep learning

Several GPUs worked for several days≈ a team of 10 people worked for more than half a year, and NVIDIA used AI to efficiently design chips

Yiran Chen, Duke University: Hardware and software collaborative design of efficient artificial intelligence systems

AI does graphics card, graphics card runs AI! NVIDIA realizes chip design self-circulation?

DeepFake head replacement upgrade: Zhejiang University new model, GAN out of a hair

Zhang Lei et al. — Geophysics: High-precision, high-magnification domain reduction method wave field sampling

CUTC uses machine learning to reveal the rupture pattern of the global CUHK earthquake

How to recognize calculus methods and principles?

Nature Comm. Roundup: Why Deep Learning Can Shine in Life Sciences

Swastika analysis automotive millimeter wave radar point cloud technology

Want to understand what exactly a wave is? All you need to know is the concept of functions!

Which volume does the CV go to? Li Feifei pointed out three "Polaris": embodied intelligence, visual reasoning and scene understanding

Only one-tenth of the data is needed to complete the four visual tasks, and it is actually open source!

A century-old puzzle in statistical physics and its solution

NVIDIA R&D Director: How is AI improving chip design?