At the onset of the digital revolution, there was significant hope - DM indeed an expectation - that digital technologies would be a boon to democracy, freedom OM societal engagement, Yet, today, there is legitimate disquiet among everyone who believes in liberol democracy. This p3pcr looks at how demcrecy worldwide is evolving, singLing out threats am challenges, but also potential opportunities ahead.
"",https: //ec.europa.eu/epsc/sites/epsc/files 1mµmtromhie Satveas . Copytocfipbomd EL 23 W 1. Recommendations for providing high-quality data CSVLint You can usethistool to check whether https://csvlint.io yourCSV®e contains any encoding issues.lfthetoddetectsthatyourCSV is encoded in UTF-8 but contains invalid cha racters, you will get an error message (open source). Structural problem: Invalid Encoding on row 1 I Your CSV appears to be encoded in l.llf g , but invalid characters were found. This can often be caused by copying and pasting data from a different source. 1.1.4. Reusability 1.1.4.1. Provide an appropriate amount of data Dimension Indicator Metrics Reusability Relevance · Appropriate amount of data Depending on the data to be published, the meaning of the term 'appropriate' can differ greatly. It is important to publish all relevant data, but caution should be taken not to blindly publish all available data without considering its usefulness. On the other hand, data publishers have to make sure that a sufficient amount of the data is published, so thatthere is enough context and users can derive value from it.ltwould be rather useless for data users to find a CSV file with only two lines. However, there is no clear indication of what an appropriate amount of data is, as this is highly dependent on the purpose a user has in mind. To find a good balance, you could start by asking yourself whether all the data you are about to publish really provides value to others. If not, you could think about reducing your data if it seems like a large amount. On the other hand, you could ask yourself if the amount of data you wantto publish is sufficient for usersto make sense ofitand to add value, orif you should add more data or context. Bad example ^ a Name Size e traffic_2010-2015.csv 976,563 KB The file in the screenshot contains fictitious traffic data aggregated over the course of 6 years. In total, the file is nearly 1GB in size. If users are only interested in data for 1 year, they still have to download the entire file. 24 1. Recommendations for providing high-quality data Good example ^ Name Size 9': traffic,2010.csv 97,662 KB ¢1 ' traffic_2011.csv 297,833 KB g .' traffic_2012.csv 228,536 KB G'] traffic_2013.csv 165,139 KB ¢1;' traffic_2014.csv 39,164 KB traffic,2015.csv 144,886 KB In contrast, this screenshot shows the same data split by year. This way, the file size remains reasonable and users can download the exact files they need. Each file should be published in a separate data set. 1.1.4.2. Consider community standards Dimension Indicator Metrics Reusability Consistency ·Compliance with community standards Community standards are a powerful tool for ensuring conformity across files and formats of a common domain. Using community standards makes it easierto reuse data, as all data following the same standard looks similar -for example it is organised in a standardised way, the documentation follows a common template or a common vocabulary is used. Lots of different community standards exist, for example standards for specific domains such as climate and forecast, astrophysics or statistical data. But there are also non-domain-specific standards,such as DCAT-AP, a standard for storing data catalogue metadata. Depending on the use case,there may be validatorsthat aid in checking files against such a standard. Ensuring the compliance of files against community standards greatly helps reusability and eases further processing. To make sure that your data is being reused, you should consider using community standards. Bad example This screenshot shows a message from a SHACL validation which produced an error against the DCAT-AP community standard. More precisely, the value that was attached to the property dcterms.'publisher was not of the required type. a EMIS - List of Web services http://purlmg/dc/terms/publisher Value does not have class http://xmlns.com/foaf/O.1/Agent 25 W 1. Recommendations for providing high-quality data Good example This screenshot shows a data set with an XML resource that conforms to its schema. Resources ,L DOWNLOAD ,L DOWNLOAD ± DOWNLOAD £ DOWNLOAD £ DOWNLOAD ± DOWNLOAD £ DOWNLOAD £ DOWNLOAD Consolidated Financial Sanctions File LO i CSV 1 Consolidated Financial Sanctions File LO ! xml i Consolidated Financial Sanctions File I. 1 i CSV i Consolidated Financial Sanctions File I. 1 I xml ' Consolidated Financial Sanctions In PDF Format pdf I EU sanctions map html Financial Sanctions Files (FSF) website 1,htmlj Sanctions List i' rss feed I Documentation £ download Consolidated FihanctMSanctibns File (XSD schema 1.0) xml schema a, download Consolidated F/hancia/Sancubns File (XSD schema j. I) xml schema Helpfullinks and tools Title Description FAIR list of community standards Listof communitystandardsforvarious domains (open source). SHACL validator This online tool allows you to validate your RDF®esagainsta given standard (open source). Link https://www.go-fair.org/fair-principks/ r1-3-metadata-meet-domain-rele- va nt-community-sta ndardsl https://shad.org/playground/ 1.1.4.3. Remove duplicates from your data Dimension Indicator Metrics Reusability Consistency · Freeness from duplicates Each piece of data should be unique. Duplicate data is of no additional value. Instead, itlowers the quality of the data as it might cause errors during further processing. For example, a data user performing analytics on the data will receive biased results as some data are duplicates. 26 1. Recommendations for providing high-quality data < Examples The table labelled 'bad example' shows a CSV file where some rows are duplicates. In contrast, the rows in the table labelled 'good example' are all distinct, and no row carries the same information as another one. WBad example Year; Visitors; Viewing time 2014;768954;00:03:18 2013;822101;00:02:59 2013;822101;00:02:59 2011;721519;00:03:44 2010;707402;00:03:50 2010;707402;00:03:50 SGood example Year; Visitors; Viewing time 2014;768954;00:03:18 2013;822101;00:02:59 2012;792967;00:02:52 2011;721519;00:03:44 2010;707402;00:03:50 2009;429430;00:03:16 Helpful links and tools Most ETL tools provide functions for detecting missing data and handling nullvalues. 1.1.4.4. Increase the accuracy of your data Dimension Indicator Metrics Reusability Accuracy · Percentage of accurate cells Accuracy can be measured in many dimensions. What accuracy means specifically, how it is measured and what result is deemed acceptable always depend on the specific use case. For example, in CSV files, each cell of a column could be checked for accuracy against an encoding format, for example ISO 8601 for dates. The ratio between accurate and inaccurate cells could then give users a first impression of what to expect from the data and how difficult processing may be. Higher accuracy istypically an indicator of higher-quality data. Examples When evaluating the conformity of the 'Viewing time' column against ISO 8601 encoding, the table labelled 'bad example' would score an accuracy rating of 50 %, since half of the cells follow this time format. In contrast, the table labelled 'good example' would yield an accuracy score of 100 %, since all timestamps are correctly encoded. 27 >1. Recommendations for providing high-quality data aBad example Year; Visitors; Viewing time 2014;768954;3:18 SGood example Year:Visitors:Viewingtime 2014;768954;00:03:18 2013;822101;00:02:59 2013;822101;00:02:59 2012;792967;0:02:52 2012;792967;00:02:52 2011;721519;03:44 2011;721519;00:03:44 2010;707402;3m:50s 2010;707402;00:03:50 2009;429430;3:16 2009;429430;00:03:16 1.1.4.5. Provide information on byte size Dimension Reusability Indicator Accuracy Metrics · Content size accuracy When publishing data, it is good to also provide information on the distributions' byte size. This information helps users and automated processes to anticipate what to expect before downloading the actual file. Also, this information enables filtering by size. Bad example This screenshot shows a distribution without the dcat.'byteSize property set.- false
- true