Data Product Verification
Methodology for Data Verification
All model has some degrees of error. Usually, acceptable error is within 5%, and the work of data verification is to ensure that error stays within that percentage. There are mainly three methods for TerraQuanta to verify data:
Multi-source data verification
World Explorer
Model Verification
Multi-Source Data Verification
TerraQuanta preliminarily verifies the accuracy of data products by using external reference data such as news data, public statistics or data provided by users. Example of data sources used for data validation are as follows:
- 「China Geological Environment Bulletin」
- 「USDA Cropland Data Layer」
- 「USGS National Hydrography Dataset」
- 「World Bank DataBank」
TerraQuanta uses hundreds of data sources of different categories and styles.
Except for the public data set, TerraQuanta also uses a large amount of field validation data for verification, such as crop disaster data with GPS information, lab measurements of water quality parameters, etc.
World Explorer
The TerraQuanta World Explorer program is a long-term, regular field data sampling activity. The explorers use unmanned aerial vehicles, smart phones with directional positioning and other devices to explore the world and annotate field data for long periods of time.
TQ World Explorer Related Published Articles
Articles | URL |
---|---|
Here is a job inviting you to explore the world. Do you want to join us? | https://blog.terraqt.com/zhe-li-you-yi-fen-gong-zuo-yao-qing-ni-qu-tan-suo-shi-jie-shang-che-ma/ |
11 Days, 19 Cities, 9385 KM, TerraQuanta’s World Explorer Begin Their Journey! | https://blog.terraqt.com/11tian-19shi-9385km-da-di-liang-zi-shi-jie-tan-suo-zhe-qi-cheng/ |
Vlog | Something Different for a Change, Revealing the Work Content of the“World Explorer” |
A nomad in Inner Mongolia plateau, who loves cross-country and exploration. Why did he choose to become a "World Explorer" for TerraQuanta as a young men born in 1999? | https://blog.terraqt.com/nei-meng-gao-yuan-you-mu-min-zu-re-ai-yue-ye-he-tan-xian-99nian-de-ta-wei-he-xuan-ze-cheng-wei-da-di-liang-zi-shi-jie-tan-suo-zhe-2/ |
The World Explorer program needs to collect a large amount of data every month, which usually needs to be further processed into real sample data (Ground Truth) for model training and data verification. Ground Truth is the data foundation and infrastructure of any model.
TerraQuanta has developed "DesertQuanta", the in-house field data management system for sample labeling, which is used for efficient field data collection and annotation. The World Explorer program for TerraQuanta adds tens of thousands of real land coverage sample data every month, and these data have been carefully reviewed and systematically managed.
Model Verification
The deep learning model of TerraQuanta has standard training sets and test sets. The Overall Accuracy range of the general classification product training is between 0.9 and 0.99
Combining various kinds of public data sets, TerraQuanta has produced its own data set “Icemachine” for data verification.
In addition to a rigorous test set, the TerraQuanta Data Standard and a large amount of past product data can be used for further model verification, for instance:
- Corn is a type of agricultural land. There is no forest on the corn land.
- Flooding can cover farmland, and when it does, it's a flood event, not an abnormal body of water
In various ways, TerraQuanta spends a lot of time on data verification during model training and data production, and tries to ensure the accuracy of data. However, due to the limitations of current technology, just like weather forecast, errors within a certain range are inevitable.
Proofreading in the process of data verification by TerraQuanta's product department