Session 5 Plans and Preparations

1 Session 5 Plans and PreparationsUnited Nations Technica...
Author: Myles Rodgers
0 downloads 3 Views

1 Session 5 Plans and PreparationsUnited Nations Technical Meeting on Use of Technology in Population and Housing Censuses Amman, Jordan 28t h - 1st December 2016 Statistics Estonia Diana Beltadze

2 Testing various aspects of censusTimetable of Population and Housing Census 2011 Pilot Enumeration Post-enumeration activities Pre-enumeration activities Testing various aspects of census plan and procedures UNECE meeting in Amman 2016

3 UNECE meeting in Amman 2016

4 Training A 5-day training for district chiefs and supervisors (15+132); A 5-day training for census takers (2,200). UNECE meeting in Amman 2016

5 Goals of the information systemUniversal software useable in all surveys Improve the coverage of studies Improve quality of raw data Improve efficiency of field work management Reduce data collection cost Increase speed of post-collection data processing The goals that we had before we started to develop the software To make universal software useable in all surveys To improve the coverage of studies, the quality of raw data and the efficiency of field work management To reduce data collection cost And finally to increase speed of post-collection data processing UNECE meeting in Amman 2016

6 Description of the applicationsQuestionnaire definition tool Enumerators application E-respondent application Management application E-respondent map application Management map application Enumerators map application Description of the applications Questionnaire definition tool is used to prepare questionnaires and it is built on top of Eclipse. Enumerator’s application is a stand-alone desktop Delphi application that can be used in laptop or desktop computers. It provides possibility to synchronise all data for offline work over encrypted channel (HTTPS). It enables work planning, subject contact scheduling and map usage for easier subject location. E-respondent application is used by survey subjects independently. It is public system, accessible over internet (HTTPS) is developed in Java and is working on top of Oracle Weblogic. It enables authentication using ID-card, mobile-ID and banks. Management application is used by fieldwork management, statisticians and help desk. It is a web application that is developed in Java. It is accessible only from internal network. Enumerators-, E-respondent and management map application’s functionality is covered by our next performer. UNECE meeting in Amman 2016

7 High level view of the involved applicationsThe heart of the system is central communication server. Questionnaires are created with questionnaire definition tool and are uploaded into communication server. Fieldworks management application, enumerators application and e-respondents application are all connected to central communication server. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

8 E-census data collection processIn E-census environment everyone can fill their own and their family members questionnaires. After questionnaires have been filled in E-census environment all the questionnaires has to pass the quality control and are divided into to parts – completed and not started/not completed. “Ready” is the meaning of a safe database that is on Statistics Estonia's servers where only completed questionnaires are stored. Everybody whose questionnaire is not completed is added into enumerators work list. Enumerators are visiting homes according to their work list, fill in all the necessary questionnaires and synchronize all the data to Statistics Estonia's servers. All questionnairs that are filled by enumerators are reviewed and confirmed by their supervisors. Questionnaires final quality control is performed first automatically and then manually if needed. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

9 Live environment 6 physical (12 virtual) Oracle Weblogic Java application servers 2 Oracle physical database servers Apache load balancers ESRI GIS server Check Point firewall and VPN software SAS and VAIS data processing servers Suse Linux Enterprise and MS Windows operating systems used in servers MS Windows XP operating system used for enumerators laptops Mainly Hewlet Packard hardware used Census Live environment IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

10 This is the Picture of the Census network architectureUNECE meeting in Amman 2016

11 External resources META data database Active DirectoryStatistical database GIS database We also use many different external resources META data database is used for displaying Meta-info about studies, Background study information in the web and classifiers used in questionnaires. Statistical Database is used for Import of sample Pre-filling of characteristics in forms Export of sample and changes in subjects information Active Directory is used for authentication, from where user names and passwords are taken. GIS database Map data Location info of buildings Hierarchy of district division UNECE meeting in Amman 2016

12 IT security IT security analyzes (2008)IT security analyzes and complete testing (2010) IT security First census security analyzes was made on 2008, where Census processes and tools were analyzed, potential weaknesses and risks were pointed out and possible solutions were offered. Second analyzes in the end of 2010 was focusing on the weaknesses that was found during the first analyzes and the company who was responsible for the analyzes checked if they were fixed. Also complete security testing was made. Some of the risk that were found were also added to Census risk plan. UNECE meeting in Amman 2016

13 IT related risks Data leakage Attacks against IT solutionsLoad tolerance is not sufficient Failures of software Data loss IT related risks Data leakage Mitigation measures are Prohibition functionality in Enumerators computers that is not necessary for their work Computer hard disk encryption Using physical and information technology methods for data processing protection Using the entropy in passwords Preparing an enumerator computers in a secure environment Attacks against IT solutions Mitigation measures are Continuous monitoring of attacks using the IDS system. Local network load distribution Transmission channel security Insufficient load tolerance mitigation measures are Performing software and systems load tolerance tests Scattering data entry daily Notification before filling the questionnaire. If system load is too high the respondents shall be placed on hold. Software failures mitigation measures is the use of software support service. We did the Census pilot, agriculture census and minipilot to detect software errors that could cause data loss IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

14 Data protection Operations with sensitive data are loggedSeparate log tables and special rights in management system and enumerators application Hard drive encryption in enumerator’s computer Database encryption and high availability The protection of the data 1. All operations with sensitive data were logged, this includes 1. View and change personal data in management system 1. Changing data in enumerators application 1. The change of access privileges 2. Separate log tables and special rights in management system and enumerators application were used 3. We used freeware called DiskCryptor that supports all the necessary encryption standards for Hard drive encryption in enumerator’s computers 4. Oracle Advanced Security option was used for database encryption and high availability was achieved by using Oracle datacard solution. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

15 Hardware and software purchasesCost efficiency analyzes (laptops) Load tolerance evaluation (servers) Later use of hardware and software Hardware and software purchases Cost efficiency analyzes. In autumn 2009 cost efficiency analyzes for enumerators computers was made. We analyzed renting and buying opportunities, estimated possibilities to use laptops after the census in public sector institutions. Selling computers in aftermarket was also considered. As a result we concluded that the most reasonable is to buy 13’’ screen laptops with a battery that lasts 8 hours minimum and with a size to fit into our enumerators suitcases. Load tolerance evaluation (servers). We knew that E-census would take the most of our servers and network and that's why we asked opinion from our methodologists. They said that a six time greater performance would be sufficient when comparing to servers that we used during the pilot Census. Fortunately this was enough and we didn't have to buy any extra hardware during the census. Later use of hardware and software. All the laptops and most of the mobile phones used by enumerators during the census are divided between different public sector institutions. Servers are used by ourselves in our everyday work. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

16 Preparation of hardwareServers and other peripherals were shipped in April 2011, laptops in Sept-Oct 2011 External partners were used Secure VPN connection between two physical locations created 1979 code lines used in personalization script. About 26,4 computers per day was prepared on a assembly line Live sample downloaded during the fieldwork Preparation of hardware External partners help used. We used our partners help to prepare the enumerator computers. They also helped doing the final configuration of servers and network after load tolerance testing. We created a secure connection between our server room and our partners physical location where the preparation of enumerators computers took place. Since the final sample was put together quite late we decided to download the sample during the fieldwork. This was quite risky, because there were no possibility to fix errors if the download of the sample was not successful. UNECE meeting in Amman 2016

17 This picture was taken when preparation of enumerator computers software was made.IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

18 Load tolerance testingTests were done between Test requirements (Live environment) E-respondents application parallel sessions Enumerators application – 2000 parallel sessions Emulating robots used Testing lasted about 9 months (Live environment) Load tolerance tests In 2009 we used special simple questionnaire that was created only for test purposes. Live environment requirements were at least 2500 simultaneous sessions for E- respondents application and at least 2000 parallel sessions for enumerators application. We used robots to emulate login, user actions, and so on. And the testing lasted quite long. UNECE meeting in Amman 2016

19 Load tolerance testing process>2500 sessions - personal computers sessions – Amazon cloud sessions – Estonian Information Systems Authority’s servers Load tolerance testing process First tests were made form our local area network using personal computers. Soon we realized that computing capability is not sufficient to continue the tests. After that we moved our robots to Amazon cloud and started testing form there. Quite soon we hit the next bottle neck what was the speed of external connection Estonia. Then we asked help from Estonian Information Systems Authority who agreed to let us use their servers to complete load tolerance testing. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

20 IT support UNECE meeting in Amman 2016

21 This is a screenshot of the e-respondent’s applications login page.IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

22 The average time for fulfilling personal questionnaireAs you can see from this picture the average time to fulfil personal questionnaire is between 10 and 15 minutes. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

23 This is the picture of the ID-card reader that has census logo on itThis is the picture of the ID-card reader that has census logo on it. I have to mention that we provided ID-card readers to libraries and public internet access points. Maailmarekordi sünd läbi rahva- ja eluruumide loenduse e- lahenduse UNECE meeting in Amman 2016

24 Authentication channelsBank authentication was most commonly used channel in E-census. UNECE meeting in Amman 2016

25 Hourly statistics Hourly statisticsHourly statistics shows that people filled questionnaires mainly during the evening time. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

26 Fears and problems during E-censusE-census environment becomes unavailable System load is too high, people quit Enumerators computers don’t get ready in time Security problems stop E-census Problems downloading Live sample Training fails Fieldworks are getting delayed Fears and problems during E-census Here are some of the fears and problems we had before and during the census. We were afraid that cyber attackers or bots create enough network traffic for our E-census environment to fall down. We used static simple webpage that we announced to public where the distribution to E-census environment and to the information page was made. E-Census environment was accessible all 32 days 24/7 without suspension. Load tolerance was higher than normal during the first days of E-census because of the high interest of E-respondents. The problems were fixed by optimizing the software and changing the network and servers configuration. There was an opportunity that computers wont be prepared in time because of script errors that were needed to be fixed. There were some security issues related to the authentication application. Data Protection Inspectorate evaluated them and decided those problems are minor and common to all similar applications used everywhere. Enumerators training took place in 29 physical places in all over Estonia. The chance that something fails and we were unable to fix it was quite high. The answering rate during E-census was unexpectedly high and data arrangement was not done in time because of that. UNECE meeting in Amman 2016

27 Good decisions Information system for general serviceSizable load tolerance tests Buying out laptops used by the enumerators Outsourcing information system maintenance Tachometer for indicating system load Some people asked me after the e-census if we had a special post for the person who moved the tachometer’s pointer? The answer is No! It was done automatically and it indicated current system load. UNECE meeting in Amman 2016

28 Subsequent knowledge It’s all about testing Pilots as real as possibleHire more people for longer period Buy experience, because there’s no time to experiment and learn Subsequent knowledge is to Test everything, processes, functionality, hardware, software, user friendliness, load tolerance, integration and so on. Enough time and money should be planned for load tolerance testing and optimization. In addition to testing it is very important to go into production with pilot. The larger and closer the pilot is to the real situation, the better. In 2009 we did the census pilot and 2010 the agricultural census. This experience was invaluable, but many problems didn’t come out, because the sizes of the surveys were several times smaller than they were on Census. Both business and IT team of ours was small. A lack of Estonian IT specialists, low budget and plan to hire additional people at the last minute, led to the situation where so-called old-timers were overburdened with work at the end of training and in the beginning of the census. UNECE meeting in Amman 2016

29 Critical competences System architectA moderator between software development and maintenance team Critical competences What we missed the most, when developing information system, was a system architect who could bind together different pieces of the software and has a clear vision of the system as a whole. A line between software development and maintenance is quite narrow thats why a middleman is needed. We were lucky that our partner had that person who spoke the same language as the maintenance and the development team. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

30 Hardware InfrastructureOracle Weblogic J2EE server LIVE cluster 2 physical server nodes (Linux) Load balancer (Apache) Oracle Database server 1 physical server (Linux) Cold standby (Linux) Interviewers computers (Acer One netbooks) Mobile internet connection modems Portable GPS receivers UNECE meeting in Amman 2016

31 Preparation of hardwareInterviewers computers Windows XP operating system with last updates Forbidden hardware for non working purposes Webcam USB/MMC storage Wi-Fi/LAN UNECE meeting in Amman 2016

32 Preparation of hardware (continues)Interviewers computers Forbidden functionality for non working purposes Local computer and registry BIOS password protection, denial of bootable removable storage Web traffic limitations Forbidden access to control panel functionality Forbidden access to hidden files Hibernate in case of closing the lid UNECE meeting in Amman 2016

33 Preparation of hardware (continues)Interviewers computers Forbidden functionality for non working purposes Domain Policy Screen saver with password prompt in case of an idle Interviewer only specific VPN rules UNECE meeting in Amman 2016

34 Preparation of hardware (continues)Interviewers computers Forbidden functionality for non working purposes Domain Policy Screen saver with password prompt in case of an idle Interviewer only specific VPN rules UNECE meeting in Amman 2016

35 Preparation of hardware (continues)Interviewers computers Personalization True Crypts default password change Delicate data folders access rights Hiding the delicate data folders Verification True Crypt and Active Directory passwords VPN connection Interviewers application load, update and synchronization UNECE meeting in Amman 2016

36 Security of data Passwords One password for everythingPredefined and unchangeable passwords True Crypt password changed and stored in the process of personalization Active Directory authentication for Windows logon and web applications UNECE meeting in Amman 2016

37 Security of computers Fully encrypted hard driveHTTP/SFTP connection inside encrypted VPN tunnel UNECE meeting in Amman 2016

38 Physical security Servers Access protected server roomsInterviewers computer Based on interviewers training Handle personal passwords Protect computer in bad weather conditions Act in case of robbery or computer loss Handle unauthorized use of computer UNECE meeting in Amman 2016

39 Our partners Our partnersWebmedia (Nortal) was our main partner and was responsible for the whole development process, IT support and system maintenance. 2. Quretec developed the questionnaires definition tool and enumerators application and central communication server. 3. Elion sold and shipped the most of the hardware used, prepared the computers software, provided us the help-desk software and service and internet connection in several physical locations we had training. 4. Regio and AlphaGIS developed our map applications. 5. Mindworks and their Finnish partner Aureolis developed information system VAIS that we use for data processing. 6. Cybernetica was responsible for information systems security testing. 7. EMT provided mobile internet that we used in the enumerators computers and call connection also. 8. Oracle software was used in our database and application servers, SAS for data processing, Suse Linux for server operating system and Microsoft Windows XP in enumerator laptops. 9. HP provided the most of the hardware including servers, enumerator computers and network devices. 10. And last but not the least we used CheckPoint software in our firewalls and intrusion detection/protection systems. IT-architecture, security, hardware and Census UNECE meeting in Amman 2016

40 During census spatial- and attribute data were renewed through enumerator’s map applicationEnumerator’s tasks: Renewing of existing buildings attributes - the type and condition of the building Add new houses to map and add their type and condition UNECE meeting in Amman 2016

41 The map data on Enumerator’s map applicationRaster data (1:1 - 1:25 000): Ortophoto – from Estonian Land Board OR Base Map – topographic data from Estonian Land Board, cartographic work from eGEOStat project – Statistics Estonia UNECE meeting in Amman 2016

42 The map data on Enumerator’s map applicationVector data (1:1 – ...) District borders Building numbers from ADS Cadastral parcels (for agricultural census) Administrative borders Bus stops Topographic data (roads, rivers, forests, etc.) UNECE meeting in Amman 2016

43 Enumerator’s map application building data updateData syncronisation The XY coordinates of new buildings and the type and condition of buildings is sent through secure VPN with GPRS connection to the Statistics Estonia server. This takes place together with the census data syncronisation UNECE meeting in Amman 2016

44 E-census map applicationEvery citizen can come to our web page and e-count themselves. They search and choose their address from our database (ADS) and connect all their answers to ADS address point and XY coordinates If we don’t have data on the citizen’s address, they can add their location on a web map and write their own address The map is similar to enumerator’s map UNECE meeting in Amman 2016

45 Supervisory application’s mapThe supervisors can see how their enumerators are working It is also displayed on a map The map looks just like enumerator’s map UNECE meeting in Amman 2016

46 What are districts? Districts are bordered areas on the Census enumerator’s map where he/she has to collect data. Districts are NOT the areas by which the Census data are later being presented. District borders must be easily detectable on the landscape to make the Census enumerator’s work faster and better. Kristjan Roosild UNECE meeting in Amman 2016

47 GIS technology used Arc GIS Server for the Java Platform (Advanced/Standard/EDN) REST API Arc GIS Desktop (Info/Editor) 9.3/10.0 Arc SDE Arc GIS Viewer for Flex Districting for Arc GIS X tools Pro UNECE meeting in Amman 2016

48 eGEOStat eGEOStat is the Estonian geostatistical databaseEnables to collect, manage, process, analyse, visualise and maintain spatial data We get data from Estonian Land Board: Topographic data Address data Administrative and settlement units data Cadastrial information Ortophotos Is related to data collection system VVIS: Maps Location info of buildings Hierarchy of district division UNECE meeting in Amman 2016

49 Data for data collection systemVector data and basemaps for enumerators desktop and eCensus application Address data Address locators ArcGIS services UNECE meeting in Amman 2016

50 Recruiters map applicationThis was supportive web application, not part of the data collection system. UNECE meeting in Amman 2016

51 eCensus web map applicationSearch for address UNECE meeting in Amman 2016

52 eCensus web map applicationConfim and enter details if asked. UNECE meeting in Amman 2016

53 eCensus web map applicationZooming to right pleace using either the scrollbar or selecting from the administrative divisions. UNECE meeting in Amman 2016

54 Enumerators desktop applicationBuilt-in GPS device UNECE meeting in Amman 2016

55 Management Map ApplicationUNECE meeting in Amman 2016

56 Address operator map application during data processingUNECE meeting in Amman 2016

57 Conclusion Statistics Estonia has built an electronic data collection environment The system supports the whole data collection process, including Study form creation Task division between collectors CAPI module (filling in of study forms in collectors’ laptops) CAWI module (public e-survey environment) Preliminary data quality assurance module Final testing 2 weeks in 2009 , LIVE launch in 3 months in 2010 during agricultural census UNECE meeting in Amman 2016 57

58 Thank You! UNECE meeting in Amman 2016