Real-World Reporting in the Enterprise: A Case Study

1 Real-World Reporting in the Enterprise: A Case StudyBil...
Author: Stephany Mariah Mills
0 downloads 3 Views

1 Real-World Reporting in the Enterprise: A Case StudyBill Bartlett Software Analytics, Inc. Session # 32133

2 Introduction Will review Party City’s “Intranet & Enterprise Reporting” system Oracle9i Application Server (9iAS) Portal, Reports Server, custom portlets Network and infrastructure 24x7 operation In production since September 2001 Used by corporate, stores and vendors

3 Agenda Overview Business & Technical Needs Solution Lessons LearnedInfrastructure Custom Application Monitoring Lessons Learned Future Going to cover a lot of ground Due to time, may not be able to get into as much detail as I’d like More than happy to talk later in more detail

4 Level-Set Audience Background DBMS: 8i or 9i? Using 9iAS?Using 9iAS Release 1 or Release 2? Using Portal? Using Reports Server? Master/slave or single server?

5 Party City CorporationAmerica’s largest party goods chain Today: > 470 Stores (235 Company-owned stores, 240 franchise stores in United States and Puerto Rico) Expected annual growth: corporate stores, 25+ franchise stores Emphasis on technology to promote efficiency Expected annual growth may be conservative, since the store counts are changing weekly. (e.g.: Adding 10 within the next 6-8 weeks)

6 Software Analytics, Inc.Full-service Solution Provider This is the obligatory marketing slide... I think the intent is to show that we have a wide range of skills available: Oracle, Microsoft, Java, .NET, etc.

7 Business Needs Secure Scalable Robust / ReliableAlso: granular, flexible, dynamic security Scalable Robust / Reliable Access to real-time data Reports: On demand, scheduled, event-based Easy to use / Minimal training required Easy distribution of company information Secure – More than just from an “unbreakable” standpoint: Different user groups, user levels. Corporate vs. Franchise vs. Vendor access. Org hierarchy access Application functionality access (Flexibility is key: We were given two weeks to allow outside vendors to access the system (to run reports, etc), but needed to restrict what reports they could run, what content they could see, etc. It took us about 4 hours.) Scalable: Had to scale faster than expected, once corporate departments realized they could get data more frequently than before. Real time data: Stores will scan inventory being received on their handhelds, then immediately run reports to see what is still outstanding Easy to use: As with more retail, there is high turnover and sometimes seasonal staff

8 Technical Issues Database size & activityCurrently using same databases for OLTP and reports Database used for price lookups in stores, so cannot impact database performance [much] Need to access multiple databases Limited WAN bandwidth to stores Database size & activity: large & lots OLTP includes transactional updates from stores, batch updates from processes, etc. Also handheld scanner-based lookups, receiving, etc.

9 Infrastructure - DatabaseDatabases: Oracle 8i ( ) 64 bit AIX version 750 GB primary database on a 12 CPU RS/6000 with 24 GB RAM

10 Infrastructure – 9iAS 9iAS Release 1, Reports 6i (for now…)Production environment: 6 server master/slave report server cluster 2 node master report server cluster (Windows 2000 Advanced Server + Oracle Fail Safe) 4 report slave servers 4 server web farm, load balanced via content switch (Windows 2000 Servers) Redundant & distributed connections QA environment identical to production At the moment the 4 web servers and the 4 report slaves are the same physical boxes. Neither function is taxing the hardware, but if it ever does, we can scale out easily or move either the web service or the report service onto different machines.

11 Infrastructure: Redundant Network ConnectionsAll servers have two NICS: One handling incoming web traffic, one handling outgoing data traffic. Master1 & Master2 servers are clustered so only one NIC is LAN connected for data traffic. (The other is used for the cluster’s heartbeat connection.) All servers also have a network-attached “Remote Insight Board” for lights-out support. (Content switches are ArrowPoint CS-150 units)

12 Store Environment – Physical Architecture

13 Store Environment – Application Execution

14 Application Overview Web application Currently 9iAS Release 1Originally 9iAS v / Portal / Reports6i patch 4 Several interim updates Now 9iAS v / Portal / Reports6i patch 11 9iAS Release 2 in 1Q2003 ? Need Content Area API’s Can’t go to 9iAS Release 2 until the Content Area API’s have been released (1Q2003?) Also waiting for Portal upgrade script

15 Custom Application 9iAS components: Portal, SSO, Reports98% custom code (9iAS PDK / JPDK) Portlet providers: both database and web PL/SQL procedures & packages PSP (PL/SQL Pages) Java: JSP, Servlets, EJB HTML, DHTML, JavaScript, XML, XMLHTTP This is not meant to undermine how much you can do with Reports & Portal right out of the box. Rather, it is meant to demonstrate some of the ways that you can extend it (via the API support) beyond what is provided. Only out-of-the-box components right now are the two “Document Library” content areas Even these are used as data sources for other custom code, with documents extracted via the Portal API PSP: Early parts of the system were done in PSP to leverage the PL/SQL knowledge of the client’s developers. Newer code is primarily in Java & JSP. HTML, etc: Large amount of client-based (browser) code Allows for more dynamic & responsive user interface Better use of limited bandwidth to the stores

16 Application WalkthroughSample screens from live application Shows both “user” view and “administrator” or “report manager” view

17 Application: Opening PageStandard web application (IE 5.5+) Public content areas, plus additional once the user has logged on (“Public” = Still inside the Party City network) “Login”, “QuickLinks” and “Party City News” are custom portlets (PL/SQL PDK) “Party City News” displays highlighted data from a portion of the “Document Library – Public Documents” content area

18 Application: Opening Page – Logged InAdditional content areas: “Corporate Documents” portion of the Document Library Also “Franchise Documents” or “Vendor Documents” for different user types “Communications Area” is a custom portlet Shows different content to different users Pulls data from a content area via PDK API Additional QuickLinks (“Quick Entry” functions, etc.) This actually allowed us to score a big win for the client when another vendor’s solution wasn’t able to perform as well as necessary at the stores. Using JDeveloper and the Portal JDK, we were able to get an integrated, secured, scalable replacement function out in under a week.

19 Application: Report Parameter ScreenCustom portlet Significant amount of HTML, DHTML, JavaScript Remote (to database) XML queries for pick lists and field validation List of reports displayed to user is based on group membership & status of the report (“In review”, etc.) Field-level security (on a per user and per group basis) Also: fields dynamically created, so in some cases fields are changed due to security. (e.g.: For “Vendor” users, the “Vendor” field is a static field.) Population of pick lists also based on user profile (which store or district they belong to, etc.) and user’s group memberships Can be run to a user’s individual content area or a group content area (given proper security), sent via or run directly to the user’s screen Run immediately or schedule for later (or repeated) execution

20 Application: More Complex Parameter ScreenVery dynamic page Even more client-side DHTML, JavaScript and JavaScript event handling Good example of why we ended up building this as a custom portlet rather than using the built-in parameter screens

21 Application: Report Output to ScreenScreenshot showing PDF-format report run to the screen (run immediately) Shows up-to-the-minute data Title bar of output window shows partial call to RWCGI60.EXE program All reports are submitted via CGI (for now) This will change with Reports9i

22 Application: Report Queue (Manager’s View)Another custom portlet Based on RW_SERVER_QUEUE table plus other custom tables Different views & functionality for users, for helpdesk staff and for system administrators Normal users can see only their own reports, can cancel only their own report requests Helpdesk and system administrators can see all reports, cancel any report, etc. Shows “Server” column to helpdesk and system admins to help resolve potential problems. (e.g.: can help see if one report server cannot communicate to a database but others can, etc.) Also: Worked out to be a good (but unintentional) example of 24x7 use of system: I took these screenshots at 1:00 AM Sunday morning (this presentation was due on Monday), and you can see live users submitting report requests at 11:30PM on Saturday night and even at 12:50 AM Sunday. (My entries are the BBARTLETT and BBUSER2 report requests; RBATES, JCOBLE, JNICHOLS and JSAVINI are real users submitting requests right around me.)

23 Application: Additional TabsNote the additional tabs: Provide highly-secured access to a limited set of users Provides system admin to ERP/Intranet application and system, data admin for JDA application and monitoring screens for system status and health checkup. Will cover in a moment…

24 Application: Report Results (Manager’s View)Another custom portlet Shows all report output that a specific user can access All reports in the user’s private content area plus any in the content areas of all groups of which the user is a member “Display for User” prompt is only shown to Report Manager or Helpdesk users; normal users cannot view another user’s output “i” (“info”) icon displays a popup window showing the parameters used for this report Ties content area back to RW_SERVER_QUEUE table to retrieve command line used for this report, then extracts the parameter values from the command line

25 Application: System Administration: ReportsHTML Portlet, containing a simply IFRAME control IFRAME wraps around a _large_ number of PL/SQL Pages (PSP) and JSP pages Provides access to almost all system configuration and administration screens This screenshot shows part of the “Reports” configuration function (we have our own reports configuration, security, etc over and above what Reports Server and Portal provide) “In Review” status, “Inactive” (but still accessible to certain users), etc.

26 Application: System Administration: User AdminAnother typical “Administration” screen

27 Application: System Administration: Helpdesk ViewThe more restricted Admin menu that is displayed to the Helpdesk staff

28 Application: Monitoring SectionHTML portlet containing an IFRAME; IFRAME wrapped about a set of JSP and Servlet-based pages The monitoring & status pages are also available from outside the application (from various monitoring pages), so this portion was not developed as a portlet. This leads us to the next section…..

29 Monitoring Tools: Custom application pagesWhatsUp Gold (www.ipswitch.com) Compaq Insight Manager CA Unicenter Oracle Enterprise Manager Custom application pages Screens show different information based on security Primarily JSP / servlets / EJB wrapped around Windows Resource Kit utilities or HttpURLConnection’s Accessed both from inside application and from other monitoring tools and sites

30 Monitoring Page from Inside Application

31 Monitoring Page from Central Monitoring SiteShows same “Monitor” menu as seen inside the Intranet application, but this time shown from inside a central monitoring site

32 Lessons Learned 9iAS Overall Reports Report Server Apache EnvironmentDisaster Recovery

33 Lessons: 9iAS Overall API provides TREMENDOUS amount of flexibility, but you’ll need to go exploring “API spelunking” Documented API’s Semi-documented API’s (OK to use?) Private API’s: YGYBYF (use only as a last resort) JPDK doesn’t have everything yet; may need to call the PL/SQL API’s Learn the API. Use the API. (YGYBYF: Your Gun, Your Bullet, Your Foot) * Your Gun, Your Bullet, Your Foot

34 Lessons: Reports Extract all data up front (via functions)Do NOT go back to database on each line Line-by-line can crush your DBMS Difficult to detect Database sessions appear to be inactive Report servers show low activity Rapidly refresh database session information to see tiny but frequently repeated activity Use packet sniffer (Network Monitor, Sniffer, etc.) to monitor the network Do the math: Assume a 0.1 second query on each line of a 1000 line report running on each of 17 report engines: 0.1 x 1000 x 17 = 1700 CPU seconds, not including network overhead. Can also use Query Manager to try to detect: If report spends all its time up front (extracting data), this is good If it is slow to generate each page, probably hitting this problem

35 Lessons: Reports Load test each reportIf you don’t do this, your users will ! Assume the worst-case scenario: All available report engines will simultaneously run this report with its worst-possible criteria 17 Pentium III-based report engines can easily overwhelm a 12 CPU, 64 bit RS/6000 However, report server will ensure that all reports in the queue do eventually get processed

36 Lessons: Report ServerUpgrade to Reports Patch 11 Security & scalability fixes added post Patch 10, especially in integration of Reports & Content Areas Determine the proper number of report engines Too few: “run immediately” reports will fail Too many: Can overwhelm the database server Changing number of report engines may require restarting the report cluster

37 Lessons: Report ServerNeed multiple ways to look at report queue Might want users or helpdesk to look at queue May be times when one method is inaccessible Queue Manager application is the best way RWCGI60 output is almost as good RW_SERVER_QUEUE is not always current (but it’s good enough for most users) Worst case: Can use the Report Server log file

38 Lessons: Report ServerRepServer.ORA: traceoption=portalpush|all Useful when diagnosing Reports / Portal interaction problems Don’t kill a report from the database until it has first been killed from the report server Otherwise this will crash the engine that was running the report, sometimes in bad ways

39 Lessons: Report ServerMonitor for cluster failovers If slave cannot start properly, the master will detect the problem, and fail the cluster over. The new master will attempt to restart, fail, and fail back to the original node, and so on. Not a problem if slave is offline, only if online but unable to start report service False alerts are sometimes generated if an engine on the master crashes Slave not starting properly: could be a misconfigured .ORA file, network problem, security problem, etc.

40 Lessons: Apache Upgrade to 9iAS v1.0.2.2.2 !Increase all “timeout” and “retry” settings JServ.conf: ApJServRetryAttempts 30, ApJServVMTimeout 20 zone.properties: servlet.page.initArgs=requesttime=60 Change JServ port to 8008 JServ.conf: ApJServDefaultPort 8008 jserv.properties: port=8008 Monitor servlet service; restart web server if this goes down Much better in v

41 Lessons: Environment Have identical environments in development, QA and production Same report server configuration: cluster/single Same load balancing method Similar database size & content If only single report server in development, cannot test clustering, Fail Safe, failover, report job preservation, etc. Report Server configuration: Must have same master/slave configuration in QA and production; very good to have in development as well. System acts differently in single-server configuration than in master/slave Cannot test certain functions (maintaining report queue on failover, etc.) in dev if not same as QA Threading issues that we only hit in QA and production Upgrade from 9iAS v to v required upgraded FailSafe software; could not test in dev environment Load Balancing: Issues encountered with content-switched vs. round-robin DNS load balancing Database size & content: Helps prevent overwhelming the database accidentally

42 Lessons: Disaster RecoveryWith proper preparation, this is easy Document all cluster details (especially disk signatures) Use dumpcfg.exe to get the disk signature Clustering solutions for lab “Poor Man’s Clustering” for details

43 Tools TracePlus Web Detective (www.sstinc.com) Sniffer (www.nai.com)Or Windows Network Monitor VMWare GSX Server (www.vmware.com) DBArtisan (www.embarcadero.com) Or any favorite client tool that can rapidly refresh session information Sysinternals tools (www.sysinternals.com) Windows Resource Kit

44 Future 9iAS Release 2 Additional “Application” portlets OEM !Portal 9.0.2 Reports 9i Forms 9i ? Expand use of content areas (WebDAV, workflow, greatly expanded content) Additional “Application” portlets Quick Entry Department Inventory, QE Purchase Order OEM ! OEM has MUCH better monitoring for Reports Service (for reports servers)

45 Benefits In production since 9/2001 Has become “business-critical”1400 users 116 reports available (in several variations) 190,000 Reports run in the past 11 months Report size: 1 – 71,000 pages

46 Benefits Vendor access to reports On-line Training DocumentationCentralized document management Reduce hard-to-find network-stored documents Eliminating MS Exchange Folder storage Eliminating daily & weekly s

47 Summary State what has been learnedRequest feedback of training session

48 Contact Info / Questions Bill Bartlett Software Analytics, Inc. Session # 32133: Real-World Reporting in the Enterprise: A Case Study