What is identity management? Answer to that question is both easy and very complex. The easy part is: Identity management is everything that deals with managing identities in the cyberspace. The complex part of the answers takes the rest of this document.
This paper deals with Enterprise Identity Management which is identity management applied to larger organizations such as enterprises, financial institutions, government agencies, universities, health care, etc. The focus is on managing employees, contractors, partners, students and other people that cooperate with the organization. Although some concepts are applicable generally this paper is not focused on Internet identity (a.k.a. user-centric or consumer-oriented identity) or government identity (in the G2C sense).
The central concept of identity management is usually a data record that contains a collection of data about a person. This concept has many names but the most common are: account, persona, user record, user identity. Accounts usually hold the information that describes the real-world person such as person's given name and family name. But probably the most important part is the technical information that relates to operation of an information system for which the account is created. This includes specification of home directory, wide variety of permission information such as group and role membership, system resource limits, etc. User accounts may be centralized and unified, distributed and unaligned or anywhere between these two extremes. But regardless of the architecture the aim of identity management is the management of accounts.
Identity management is not a single technology. In fact it is a wild mix of various technologies that both complement and overlap each other. There are at least three main technological branches in the identity management:
Although these technologies formally form a single field of identity management, their purpose and approach is significantly different. Any complex identity management solution will need at least a bit of each of them. These technologies are explained below in much more details.
Accounts are stored in databases called identity stores. Underlying technologies of these databases vary, ranging from flat text files through relational database to directory servers. Especially directory servers accessed by LDAP protocol are very popular because of their scalability. Identity store may be integrated with the application that is using it or it may be a shared stand-alone system.
Shared identity store is making user management easier. An account needs to be created and managed in one place only. Authentication happens in each application separately. But as the applications use the same credentials from the shared store, the user may use the same password for all the connected applications.
Identity management solutions based on shared identity stores are simple and quite cost-efficient. But capabilities of such solutions are considerably limited.
Identity stores are just that: storage of information. The protocols and APIs used to access such databases are primarily designed to be database interfaces. It means that they are excellent for storing, searching and retrieving data. While the data in account may contain entitlement information (permissions, groups, roles, etc.), identity stores are not well suited to evaluate them. I.e. identity store can provide information what permissions an account has but it cannot make a decision whether to allow or deny specific operation. Identity stores also do not contain data about user sessions. It means that identity stores do not know whether user is currently logged in or not. Some identity stores are frequently used for basic authentication and even authorization, especially LDAP-based directory systems. But the stores were not designed to do it and therefore provide only the very basic capabilities. Identity stores are databases, not authentication or authorization servers.
Meta-directory is a special directory system that synchronizes several "normal" directory systems. Meta-directory copies data. Therefore the performance of the original synchronized directory systems is mostly unaffected.
Virtual directory is a proxy and a protocol converter. Virtual directory creates unified view from several "normal" directory servers. Unlike meta-directory the data are not copied. Virtual directory fetches the data from the original directory on each request and converts them to the the unified view.
Some meta-directories and virtual directories can reach beyond common concept of directory, e.g. they can fetch data from relational database tables. Both meta-directory and virtual directories can convert data. However the data conversion ability of both meta-directory and virtual directory is very limited. E.g. it can convert number or date formats but usually cannot execute complex routines. They also cannot synchronize records that use a significantly differing data models. Entry correlation abilities are limited even more. Therefore meta-directories and virtual directories are suitable only for a very simple deployments.
Provisioning systems can usually do all that a meta-directory can do and are much more flexible. Provisioning systems can also act as virtual directories under some circumstances. However the performance of a provisioning system is usually worse than that of a meta-directory or a virtual directory.
Shared identity store is making user management easier but this is not a complete solution and there are serious limitations to this approach. The heterogeneity of information systems in the common medium-to-large enterprise environment makes it nearly impossible to implement single directory system for the following reasons:
Even using meta-directory or virtual directory mechanisms may not provide expected results, as such systems only provide the data and protocol transformation, but do not change the basic principle of directory services. A more complex approach is needed to manage the users' records in heterogeneous systems, especially in large enterprise environment.
Single directory approach is feasible only in very simple environments or almost entirely homogeneous environments. In all other cases there is a need to use also other identity management technologies.
Open source identity store implementations include:
Provisioning systems integrate many different identity stores. The goal of provisioning systems is to keep the identity stores as synchronized as possible (and practical). Priority of provisioning systems is to be non-intrusive. Provisioning systems do not try to change existing account data models in the applications. A provisioning system tries to adapt its own mechanisms to match the data model of each connected system. Provisioning systems are therefore quite complex and need to be customizable and programmable. Adaptation of the data models is frequently done by using complex rules and expressions.
A provisioning system is just managing existing data stores. It is not doing any authentication or authorization on behalf of the application; that is a job of access management. Therefore a provisioning system affects the enforcement of security policies indirectly by manipulating data in other systems. Provisioning technologies are focused on application back-end without affecting the front-end in any significant way.
Provisioning systems can communicate with each application using application's own protocol or interface. There are two basic approaches:
Provisioning systems do not deal only with the technical aspects of the integration. Policies and processes are almost always part of provisioning system deployment projects. Most provisioning systems include its own version of workflow subsystem customized for identity management applications. It is usually quite easy to set up rules that automatically determine the basic accounts for a new hire and let system administrators approve the creation of such accounts. This is a unique aspect of provisioning systems when compared to other identity management technologies. Other technologies usually focus only on the technical side of the problem, not the business side.
Why do we even need provisioning systems? Isn't is easier to just deploy one single unified identity store such as LDAP server? Yes, it is easier. But it is possible only in a very simple situations (see Single Identity Store Myth above). Even if technical architecture favors the single identity store approach, there are still non-technical issues. E.g. the single identity store will not appear in a day. Its deployment and integration may take a long time. Provisioning system is needed in the meantime. Also the applications cannot adapt quickly. E.g. many applications support LDAP authentication out of the box. But LDAP authentication is sufficient only for very simple applications. Complex applications usually needs local data records: accounts. Even if such accounts do not contain credentials (passwords) they still contain authorization data (roles, privileges, organization unit membership) that are not stored in the central identity store. Other applications need local data records to be able to do database join e.g. for the purpose of reporting. And even if the application can theoretically work with single identity store it may take years to make it work practically. In such cases provisioning system can provide solution much faster and often also less costly.
The support of processes in the provisioning system is yet another reason in favor of such solution. Identity stores present static data. But provisioning systems often deals with data changes. Therefore a provisioning system may enforce an approval of the change before it is applied. Provisioning system may send a notification after the data are changed. Provisioning system can also integrate manual processes into the identity management solution (e.g. legacy systems where identity management cannot be automated).
The deployment of a provisioning system is usually quite a complex project. Not because the technology itself is complex but because the problem that the project solves is complex. If you need to deploy a provisioning system it s very likely that you have many identity stores to integrate, several sources of information that are only partially authoritative, messy business processes and so on. Even though provisioning deployments are complex, it is the best solution to these problems that we know of.
Provisioning systems are always customized during deployment. This may be a small customization or a huge one, but some customization is always there. The most important difference between provisioning products is the approach to customization. Some products are little more than a platform that requires to develop almost everything during deployment (e.g. OpenIDMv2). Such products are extremely flexible but may be relatively costly to deploy especially if your environment is quite the usual one. Other products implement many common IDM scenarios out of the box while still allowing some space for customization (e.g. midPoint). These products are generally easier and less costly do deploy but may not be suitable if your environment is miles away from the usual thing. There is no "one size fits all" when it comes to provisioning. It is important to select the right tool for the job.
Provisioning systems are essentially complex data synchronization tools. Therefore there are several limitations that should be kept in mind when designing a deploying provisioning solution:
Open source provisioning system implementations include:
Access Management deals with user authentication and (partially) authorization. The goal of access management is to unify the security mechanisms that take place when a user is accessing a specific system or functionality. Access management technologies are focused on application front-end as opposed to provisioning which is focused at back-end. Access management changes how is the user authenticated and authorized to access the applications.
The following figure illustrates theoretical case of the access management deployment. The access management systems acts as a mediator to all access to all applications. Access management system authenticates and authorizes the user based on the identity information stored in the identity repository. In case that all access checks pass the user is allowed to access the application.
Access management should provide all necessary access control mechanisms to the application. It can also easily provide single (or simplified) sign-on as user session data are stored in the access management system and therefore can be shared across applications. That's the theoretical case. But the practice is slightly different.
Access management system should theoretically simplify the applications as they do not need to implement their own access management mechanisms and no other identity management mechanism should be required. However there are practical problems:
Aor not. But it cannot decide if the user is authorized to modify property
fooin a record number
1234in that application. Therefore applications must very often implement their own additional authorization mechanisms. For that reason the applications must maintain their own user records (accounts) or must have back-end access to the identity repository.
Access management is an umbrella term for quite a wide range of mechanisms. Some access management systems deal only with authentication or single sign-on (SSO), others also deal with authorization, some are focused mostly on web applications, while other work only in enterprise environment where client machines can be strictly under control. Individual access management systems provide partial solutions to the identity management problems and they almost always must be combined with other identity management technologies.
Typical access management system tries to do user authentication instead of the applications. The access management system authenticates the user and creates a user session. It then forwards or proxies the connection to the application. Application must be at least partially aware of that so it will not authenticate the user again (see Application Side below).
If the access management is applied to all applications it effectively creates a single sign-on mechanism. User that logs into one of the application in fact logs into an access management system and therefore is logged into all the applications. There are many variants and flavors of SSO mechanisms but the two most widespread are:
Typical SSO systems work on an assumption that everybody trusts the SSO system. This works well in typical enterprise environment and it allows efficient SSO protocols, e.g. based on shared secrets. This is of course not applicable to the Internet in general therefore this type of SSO system is not used in the "big" Internet. Federation technologies are used instead (see below).
Although Enterprise Single Sign-On (ESSO) has SSO in its name, it has very little in common with other SSO mechanisms. ESSO is in fact just a very simple agent that is silently waiting for login dialog to appear. When the dialog appears the ESSO agent fills in username and password and submits that dialog. User usually notices nothing of this and therefore he thinks that he was already logged in. This creates an illusion of SSO.
ESSO requires an agent on every workstation, therefore it can only be applied in a strictly controlled environments. It may also be security vulnerability as the agent needs to know all user's (cleartext) passwords. Some ESSO systems try to overcome this by simulating a one-time password mechanisms by changing the password right before or after the login. But this creates a password management nightmare and it practically feasible only in quite homogeneous environments. And it also changes nothing on the fact that the agent must know or be able to obtain current password for any application.
Access management systems usually do at least some kind of authorization. However the ability of the access management system to provide authorization services is significantly limited. The access management system knows who is accessing the system (subject) but has only very rough idea about the operation and knows almost nothing about the object that the operation affects. Yet, knowing all three parts of the authorization triple is essential requirement for good authorization decisions. Therefore access management systems can do only a rough authorization decisions such as: "allow (any) access to system
Foo", "allow HTTP POST operations to URLs that match the pattern
If a finer authorization is required that that must usually be done by the application itself. The application knows all the details about operation and object but does not have the details about subject. Therefore an application must be able to get the details of the authenticated user from the access management system.
Simply speaking, Identity Federation is just an SSO over the Internet. SSO usually assumes that both identity repository and the applications are in a single organization. Therefore the trust is implicit and the protocols may be proprietary. Federation goes a step further and makes no such assumption. Therefore there needs to be explicit trust and strong authentication of the communicating parties. Also the federation protocols must be open and the mechanism must be designed for the Internet use (e.g. must be robust and scalable). But otherwise the technical principles of SSO and federation are almost the same.
A simple federation scenario is illustrated in the following figure. Each color means a different organization connected over the Internet. One of the organizations is an Identity Provider. This organization maintains an identity repository that is used to authenticate users. After user logs in Identity Provider issues an assertion (federation token) to the user. This assertion is used as a proof of authentication. It may be presented to other organizations (Service Providers) that will let the user in. The assertion may be quite rich, e.g. containing also user attributes, privileges, roles, authorization decisions, etc. This may be used for further authorization by the Service Providers.
Probably the most popular federation protocol is SAML. There is also a mechanism called Cross-Domain SSO (CDSSO) which may seem as identical to federation as it is also spanning several Internet domains. But CDSSO is mostly used as a workaround to propagate usual SSO cookie across Internet domains and does not have other features of federation (openness, trust, robustness, scalability).
Access Management is a front-end identity integration technology. It means that it is changing the way how user interacts with the application. Even though the only aspect that changes is usually the way how user authenticates to the application there needs to be a change. The application or its supporting framework needs to have support access management solution. This least intrusive case is to configure the framework (e.g. Java EE application servers) or install a special-purpose agent. This way may be almost transparent for application when it comes to authentication and coarse-grain authorization. But if the access management needs to be integrated more tightly then the modification of the application is almost unavoidable.
The need to modify the applications to add access management support will come sooner or later. Access management system usually provide APIs and libraries to make such task easier but it is non-trivial task nevertheless. Such need will be come very soon especially if fine-grained authorization is needed. This may be further complicated if the security models of the applications and the access management system are not well aligned.
This requirement to modify the applications makes access management somehow intrusive technology. And also quite expensive to deploy.
Access management technologies usually require single, consistent and authoritative identity repository. This requirement is especially important if a Single Sign-On solution is being deployed. But how to get such a repository? Usernames are often not synchronized among applications. Therefore taking the identity repository of one of the applications does not usually work. That's one of the reason why full-scale access management deployments usually fail.
Access management technologies seldom care about local state of the application. Therefore if an application needs a local user record the application must create it on demand when a user is first accessing the system. This is a common case especially in federated deployments. So the user gets provisioned automatically, on demand. But the user never gets deprovisioned. If the user account is deleted in the Identity Provider repository it just disappears. Service Providers are not notified. The user data remains on Service Provider side indefinitely. This is a potential risk of data exposure especially if additional (local) authentication or credentials reset was configured. But in any case it wastes resources and may cause unnecessary cost e.g. if per-user service pricing model is used.
Provisioning system is usually a prerequisite to an access management deployment. The provisioning system has an ability and flexibility to create unified identity repository that can be used by the access management system. It is also well suited to keep several identity repositories synchronized. Therefore it can efficiently solve the deprovisioning problem. Large-scale deployment of an access management solution that lacks the provisioning aspect can hardly be successful.
There is a large number of implementations that might fall into the access management category. To name just a few:
None of the identity management technologies provide a solution of its own. Perhaps except for the smallest and simplest identity management projects any practical solution requires a combination of several technologies.
The key to the project is to know what needs to be integrated. When it comes to the identity management there are several types of systems:
Simple stateless system
Does not maintain any identity information of its own. Such systems may not even need to know user's identity. Simple allow all/deny all authorization decision is all that is needed.
Trivial to integrate just by access management system. If the system needs any identity data then the easiest way is to inject them into cookies or HTTP parameters.
Stateful system with identity repository integration
Needs access to identity information but it is able to use shared identity repository (e.g. by LDAP). May support complex authorization decisions based on the content of the shared identity repository (e.g. by evaluating account attributes).
Connect to a shared identity repository (e.g. a directory service). Replace authentication with access management if possible. The only thing that the system usually needs is user identifier (e.g. username) which can usually be conveyed by the platform (e.g. Java EE security subsystem). The system can then fetch the rest of the profile directly from the identity repository
Stateful system that requires local data
Needs to maintain its own copy of user records (accounts) because of performance reasons (e.g. ability to
Synchronize local data with the shared identity repository. The best approach is usually deployment of a provisioning system. It still may be able to integrate with access management system.
Cannot integrate with shared identity repository, keeps its own local accounts. Cannot integrate with access management, implements its own hard-coded authentication. We are lucky that it still works.
Not too many options. Probably the best we can do is to synchronize its account records with the shared identity repository. Provisioning system is really the best solution here. The only thing that can be done about convenience of authentication is to synchronize the password from the shared identity repository to this system using the provisioning mechanism.
Then there are systems that are sources of the data. The typical one is a Human Resources (HR) system which is usually an authoritative source of employee data. Then there is CRM which is a source of customer and partner data. And usually there are also contractors, independent agents, volunteers and many other types of users that do not have any authoritative source. And even if the source is authoritative for existence and status of an employee record it may not be entirely authoritative when it comes to the e-mail address or placement of the user in the organizational structure. The larger, more flexible and business-oriented is the organization the more complex is the resulting situation. Probably the only practical solution that can handle such situation is a provisioning system.
Practical Identity Management solution requires combination of identity repository, provisioning and access management mechanisms at the very least. These technologies complement each other as illustrated in the following figure.
Provisioning system is synchronizing accounts and user records through the organization. It pulls data from various data sources such as HR and CRM systems and creates an unified view of such data. It keeps the databases of legacy and stateful systems in sync. However its most important responsibility is to maintain shared identity repository. The identity repository is used by applications that are able to do so. It is also used by the access management system as a coherent and authoritative user database.
Identity management projects fail frequently. The typical identity management project is a big all-or-nothing waterfall-like project. That is not a project but a plan for a disaster and a huge waste of money. There are many reasons for that, but probably the most important is the lack of knowledge:
Waterfall-like project will not work. It may succeed to meet project goals but it will not bring expected value to the customer.
Divide the project into smaller steps and proceed in iterations and increments. Bring value in each step. Proceed only as far as it is efficient to proceed. Some tasks are still most efficient when done manually by a human. Set the basic structure and improve it in each step as needed. Avoid paying high licensing cost at the beginning of the project as this efficiently kills all value and ruins TCO. Either negotiate with the vendor or simply use open-source software.
The usual order of identity management technology implementation is:
The order is not simple provisioning, repository, access. It is iterative. Therefore it looks more like this:
Some steps may be reordered or even skipped. E.g. the first "Basic provisioning" step may be skipped if there is already a solid repository (e.g. Active Directory instance populated with employees). But that task will come back. The next iteration over provisioning will be more difficult (e.g. if customers and partners also need to be in the repository). The effort might be slightly optimized but it will not change much on the overall project shape.
inetOrgPerson). Almost all deployments will need some schema extension and customization.
Identity management is a mix of many technologies, a mix that can create both healing and deadly elixirs. The best approach seems to be pragmatic: to avoid big expectations, to improve the system where an improvement is needed and where it is economically and technically feasible. Identity management is no magic, it is just technology. And quite a young one.
This paper provides description of various identity management mechanisms, techniques and their combinations. It also warns against dead ends and debunks myths. But it does not provide a single correct approach to identity management implementation - as there is no such thing. Every environment is different, requirements vary and resources are limited. Every deployment is different. Every solution is different.