In a farm of fewer than (roughly) thirty servers, with all the servers in one location, IMA runs itself, and the defaults are appropriate.
In a larger farm, or a more complex implementation where there are servers in multiple locations, the IMA defaults may need to be modified in order to optimize the implementation.
And even in smaller farms, there are a few important IMA issues to be aware of, such as how to back-up and restore the IMA data store, and how to recover from a corrupt “Local Host Cache”.
Independent Management Architecture (IMA) is both an architecture and a protocol; as a protocol, it runs over port 2512 and holds the Presentation Server farm together. As an architecture, it is what makes the farm scalable.
Back with MetaFrame 1.8, Citrix servers were like Windows 3.11 machines, in that they BROADCAST to each other in order to be in the farm together. We were intended to have one, or maybe two or three servers together, on one LAN, and they broadcast to each other to maintain connectivity. Before it’s time, Citrix took off, and there were customers with HUNDREDS of MetaFrame 1.8 servers in a farm, BROADCASTING to each other. If servers were on multiple subnets, we had to configure single-point-of-failure “ICA Gateways”.
IMA Data Store
With MetaFrame XP, the scalable IMA architecture was released. First of all, there was the new “data store”, a static DBMS database, which holds all the configuration data for the farm. Smaller POC implementations could use a runtime Access database, and the standard in production is to place the data store on a SQL 2000 or 2K5 server. There is only ever ONE data store for the farm. The data store is a 30-day fault-tolerant single point of failure, and we can set up Resource Manager alerts to tell us immediately if a server looses connection to the data store. When the data store is MS Access, or SQL 2005 Express, it is installed DIRECTLY on the first Citrix server in the farm. When it goes on SQL, (or ORACLE or DB2), the data store should be on a server OUTSIDE the Citrix farm, not on a Presentation Server.
In the case of a remote data store, the first server in the farm is given a DSN – direct connection to the data store. The rest of the servers in the farm receive an INDIRECT connection the data store; if the first server in the farm is down, no server can access the data store. This is the most subtle single-point-of-failure in Citrix. The single-point-of-failure is 30-days fault-tolerant, but after thirty days without that one particular Citrix server, the whole farm stops accepting connections.
The Citrix recommendation for this situation is to create DIRECT connections to the data store, by going to some other servers in the farm, even all the servers, and adding a DSN file manually that points to the data store.
The data store can be backed up and restored to a different server if necessary. To back up a local MS Access data store, there is a Citrix command line utility: “dsmaint backup path:”, which takes the locally stored IMA data store from the “Program FilesCitrixIndependent Management Architecture” directory, and places a closed copy of the “mf20.mdb” at the “path” defined at the end of the command line, (preferably a thumb drive).
To back up a SQL data store, use Enterprise Manager or Management Studio, to back up the database as a “single file”, and place it in a secure place, as this is the complex heart of the Citrix implementation, and we wouldn’t want to have to recreate it from scratch.
Even if the SQL team is backing up the data store nightly, we still want a recent, ‘last-known-good’, on a separate, static, thumb drive. If the data store becomes corrupt on the SQL server, we can always go back to our ‘last known good’. How often should this data store be backed up? Not necessarily nightly, because a simple data store could easily become overwritten with the corrupt version. Rather, each time we do significant configuration work – adding more servers, changing policy settings, changing the printer configuration – we want to get a ‘last known good’, so we can always bring the implementation back to this point.
Without being diligent and backing up the data store, we can still get a ‘last known good’, sort of. Every time the IMA service restarts, it renamed the access data store to mf20.bak, and that can also be considered a backup, but we don’t want to have to depend on a backup from the last time we rebooted the server.
A strategy for restoring our diligently backed-up data store will follow, at the end of this chapter.
Zone Data Collector
The second new component within the new IMA architecture was the IMA “Local Host Cache”, which is a runtime version of the data store information that’s relevant to the particular server – when an admin configures an IMA server, or a user connects and launches an app, they are actually contacting the IMA Local Host Cache (IMALHC).
And the IMA LHC on each server is the basis for the “Zone Data Collector” (ZDC). The ZDC is in some ways more critical to the farm than the data store, because we can’t go thirty days without a ZDC. No one can connect to the farm, if we don’t have a ZDC.
A Zone Data Collector is elected dynamically, and in the small farm that runs itself, the first server in the farm, (whether or not the data store is installed locally), is set as “Most Preferred” to win the ZDC elections that will occur every time a server reboots, or joins the zone. The rest of the servers in the zone are set to “default preference”.
Once we have several servers in a farm, Citrix recommends hard-coding a backup ZDC or two. The ZDC is a critical role, answering all client requests, querying the “dynamic store” which it maintains in RAM, returning the name of the least busy server, and maintaining the updated dynamic store information. If a ZDC goes offline, if it can’t be contacted, and even if another server comes online, there is a ZDC “election”, and no matter what the preferences, SOME server will be elected the ZDC. The current ZDC can be enumerated by typing “qfarm” at the command prompt of any Citrix server. The “D” to the right of one of the servers means that server is the ‘zone “D”ata collector’. (The asterisk just means this is the server we are typing on at the moment.)
By default, the server denoted as “Most Preferred” will always win the zone elections. But if the main ZDC is down, by default the next one elected could be anybody, since all the other servers in the zone are set to “Default Preference”; Citrix recommends setting a second-in-command for the important position of ZDC, even a third, rather than leaving it up to the random “host-id” that got configured dynamically during server install.
To control who gets elected ZDC, we use the PSC farm management tool, go to the properties of the farm, and click the “zones” tab. In the GUI, the blue check means “Most Preferred”. To set another server as “Preferred”, we can right click a default preference server, and add the orange pyramid, which means “preferred”, or “next-in-line to be ZDC”.
Though the “qfarm” command comes back telling us with a “D” who the ZDC is, the “qfarm” command is not going to be available if the data store is down; ‘qfarm’ queries the data store for who has won the ZDC election. But when the data store is down, the cockpit of the implementation is closed, and ‘qfarm’ only works in the cockpit.
There is however another command, not installed in Citrix by default but available on the server CD, under “support”, “debug”, “win2K3”, which can be copied to the server and used at the command line: ‘querydc’. The ‘querydc’ command queries the ZDC itself, and so will work in the cabin, when the cockpit is closed. Also, a ‘querydc -e’ will force a ZDC election, (something that otherwise would happen within the next five minutes if left alone.)
If the whole point of implementing Citrix is greater centralization, why is it that many implementations involve Citrix servers in multiple locations? Wouldn’t it be better to keep all the servers in one location, and send out ICA to all clients everywhere?
Ideally, that would be the design, with the only ‘other’ location for Citrix servers being a “Disaster Recovery” (DR) site / zone.
But many Citrix implementations are not designed from the ground up, but rather evolve slowly; the first reason to bring Citrix into the enterprise is often because it “fixes” some back-end database application that was running slow over the wire. The application vendor tells the customer that Citrix fixes the problem, so they put a Citrix server in front of the database app, and instead of the application logic going over the WAN to all the user PC’s, the client software is installed on the Citrix server, the app logic runs only on the backbone between the database and the Citrix server, and everything is made better by the lean ICA protocol.
And if we are putting Citrix servers in front of our back-end databases, and we have already spread the databases out at multiple locations, we now have Citrix servers in multiple locations.
With IMA, we can span locations, buildings, countries, and continents, with the Citrix servers in a single farm, and the farm can be managed from anywhere in the world, from any one server.
The default configuration of IMA in the farm that spans locations MAY be ok, but it depends on how the Citrix servers are utilized, and there is often an opportunity to improve things by changing the defaults.
By default the “zone name” is the IP subnet that the server is being added to. The first server on the subnet is the “Most Preferred” ZDC, and all the other servers on the same subnet realize that there is already a “Most Preferred” ZDC, so they join with their elected representative over IMA port 2512, then stay silent as far as the WAN is concerned.
If the admin goes to a DIFFERENT subnet, and adds a server to the existing farm, (which will only work if 2512 is open between the subnets), then the server is still part of the same FARM, but because it realized it is on a new subnet and there is not yet a data collector for this subnet, the first server on the new subnet becomes the “Most Preferred” ZDC for the NEW ZONE, and IMA continues to run itself, with one data store per farm, and one ZDC per zone, according to “IMA law”.
With a distributed database-related structure such as IMA, consistency and connectivity issues can arise in a production environment.
If an admin is sitting at one WAN location, saying he published an app or changed a PSC policy, and the admin at the second location is saying he doesn’t see the change, there could be several different reasons for this.
It isn’t the default behavior. Ideally, as soon as the first admin made the change to the IMALHC on the server he was connected to, the change would have propagated to the data store, and then a change notification should have gone out to all the other IMA servers in the farm to get the change immediately. All servers should be reflecting the change in real-time.
If the second admin was across a slow WAN from the first admin, then it is conceivable that the change was lost in the traffic on the wire, and the local IMALHC had given up on getting the update. The solution for this situation is a simple
“dsmaint” command: “dsmaint refreshLHC”. The admin at the second location types this at the command prompt of the Citrix server, and if this was ll that was wrong, then the refresh was all that was needed.
The IMALHC is just an MS Access database, and can also become corrupt. As opposed to the mission-critical IMA Data Store, the IMALHC is expendable. IF the admin does a “refreshLHC” and still isn’t happy with what they are seeing, the admin can then choose to go farther and type “dsmaint recreateLHC” at the command prompt.
The “recreateLHC” command doesn’t actually do any update or creation, but simply sets a registry key value – “PSREQUIRED” under HKLMSoftwareCitrixIMA – to a “one”, from a “zero”. The significance of this is that now when the Citrix Independent Management Architecture service is RESTARTED, the IMALHC will be completely cast aside, and a brand-new IMALHC will be rebuilt from the IMA data store over the wire. If the data store is big, or across a WAN, expect this process to take a little longer than normal. (The number of servers, number of published apps, number of PSC policies, and number of print drivers in the data store are what make it big.) The admin on the second set of servers has to restart the IMA service manually after entering the “dsmaint recreateLHC” command, in order to get the LHC back to normal. While the LHC is building, there is a blank Windows screen with the mouse-cursor in the middle. There is no Citrix GUI to tell you what data store the server is pulling from, how far along it is so far, or how long it expects the entire process to take, or even the fact that something important is happening. (In fact, there is no command in Citrix to say where the server THINKS the data store is.)
If the server the second admin was sitting at was NOT across a WAN from the first server, then none of these “dsmaint” commands will help, and there is a much more likely solution to the problem: “ODBC Connectivity”!
When the “Citrix Independent Management Architecture” starts, it loads the critical DLL’s, then the less critical DLL’s, then reads the “PSREQUIRED” registry key to figure out if it is going to be a refresh (O) or a total rebuild (1) of the IMA LHC from, the data store. Finally, the server starting IMA tries to make the actual database connection.
Fortunately the typical data store issue is not internal corruption, but simply loss of connectivity to individual IMA servers. The solution for ODBC connectivity loss is quick, and simple.
Check back for Part 2