Skip to main content

Data Product Verification

Methodology for Data Verification

All model has some degrees of error. Usually, acceptable error is within 5%, and the work of data verification is to ensure that error stays within that percentage. There are mainly three methods for TerraQuanta to verify data:

Multi-source data verification
Verify data through news data, public statistics, and data provided by users
World Explorer
TerraQuanta's World Explorer program collects samples from various types of real surface data, including crops, land features, water quality and geological hazards, around the clock
Model Verification
TerraQuanta defines a set of ground object classification standards, called the TerraQuanta Data Standard. Through this set of standards, TerraQuanta strictly distinguishes test sets and training sets, uses multi-model fusion for the same computation task, and performs data verification at the model level

Multi-Source Data Verification

TerraQuanta preliminarily verifies the accuracy of data products by using external reference data such as news data, public statistics or data provided by users. Example of data sources used for data validation are as follows:

  • 「China Geological Environment Bulletin」
  • 「USDA Cropland Data Layer」
  • 「USGS National Hydrography Dataset」
  • 「World Bank DataBank」

TerraQuanta uses hundreds of data sources of different categories and styles.

Except for the public data set, TerraQuanta also uses a large amount of field validation data for verification, such as crop disaster data with GPS information, lab measurements of water quality parameters, etc.

World Explorer

The TerraQuanta World Explorer program is a long-term, regular field data sampling activity. The explorers use unmanned aerial vehicles, smart phones with directional positioning and other devices to explore the world and annotate field data for long periods of time.

Untitled

ArticlesURL
Here is a job inviting you to explore the world. Do you want to join us?https://blog.terraqt.com/zhe-li-you-yi-fen-gong-zuo-yao-qing-ni-qu-tan-suo-shi-jie-shang-che-ma/
11 Days, 19 Cities, 9385 KM, TerraQuanta’s World Explorer Begin Their Journey!https://blog.terraqt.com/11tian-19shi-9385km-da-di-liang-zi-shi-jie-tan-suo-zhe-qi-cheng/
VlogSomething Different for a Change, Revealing the Work Content of the“World Explorer”
A nomad in Inner Mongolia plateau, who loves cross-country and exploration. Why did he choose to become a "World Explorer" for TerraQuanta as a young men born in 1999?https://blog.terraqt.com/nei-meng-gao-yuan-you-mu-min-zu-re-ai-yue-ye-he-tan-xian-99nian-de-ta-wei-he-xuan-ze-cheng-wei-da-di-liang-zi-shi-jie-tan-suo-zhe-2/

The World Explorer program needs to collect a large amount of data every month, which usually needs to be further processed into real sample data (Ground Truth) for model training and data verification. Ground Truth is the data foundation and infrastructure of any model.

TerraQuanta has developed "DesertQuanta", the in-house field data management system for sample labeling, which is used for efficient field data collection and annotation. The World Explorer program for TerraQuanta adds tens of thousands of real land coverage sample data every month, and these data have been carefully reviewed and systematically managed.

Untitled

Untitled

Model Verification

The deep learning model of TerraQuanta has standard training sets and test sets. The Overall Accuracy range of the general classification product training is between 0.9 and 0.99

Combining various kinds of public data sets, TerraQuanta has produced its own data set “Icemachine” for data verification.

In addition to a rigorous test set, the TerraQuanta Data Standard and a large amount of past product data can be used for further model verification, for instance:

  • Corn is a type of agricultural land. There is no forest on the corn land.
  • Flooding can cover farmland, and when it does, it's a flood event, not an abnormal body of water

In various ways, TerraQuanta spends a lot of time on data verification during model training and data production, and tries to ensure the accuracy of data. However, due to the limitations of current technology, just like weather forecast, errors within a certain range are inevitable.

大地量子交付组在开展数据验证的核查工作

Proofreading in the process of data verification by TerraQuanta's product department