Data augmentation is a simple technique by reducing overfitting. In data augmentation, we can generate data through data augmentation, assuming that we are dealing with a limited set of data and deep learning requires more data.
For example, if we have a photo, we can create a new one using the Keras image generator. This process is known as data augmentation and helps reduce overfitting.
Data augmentation manually augments the training set by creating a modified copy of the dataset with existing data.
Augmented vs. synthetic data
- The enhanced data comes from the original data with some modifications.
- Synthetic data is artificially generated without using the original dataset. It typically uses deep neural networks (DNNs) and generative adversarial networks (GANs) to generate synthetic data.
When to use data augmentation?
Prevent model overfitting.
The initial training set is too small.
Improve model accuracy.
Reduce the operational costs of labeling and cleaning raw datasets.
Limitations of data augmentation
Deviations in the original dataset are persisted in the augmentation data.
Quality assurance for data enhancement is costly.
Research and development (RnD) is needed to build systems with advanced applications. For example, generating high-resolution images using GANs can be challenging.
Data-enhanced apps
- medical
- Self-driving cars
- Natural language processing
- Automatic speech recognition
The image enhancement features provided by Tensorflow and Keras are very convenient.
Just add an enhancement layer, tf.image or ImageDataGenerator to perform the enhancement.
Data augmentation is more commonly used in machine learning models that involve text or image classification, as it can be difficult to collect new data in these domains.
.flow_from_directory (directory) These generators can be used with Keras model methods that accept data generators as input, such as fit_generator, evaluate_generator, and predict_generator.