Tuesday, March 26, 2013

Testing and Data Privacy, is there an issue (final post or is it)?

In the previous posts I covered the issue of testing data and privacy. What options are  generally available to 'address' the issue, and a description of what each of the options are.

This time I will wrap up this portion of the discussion and then further delve into related issues that may be of interest.

If you have read the previous post, you may have surmised that the option I favour is analysis of the data structure/elements. Then applying intelligent business savvy masking rules to the copied data This entails designing a process that would obfuscate the Personnel Identifiable Information(PII) data by applying rules that take into account the business logic  to the information that is retained within the organization for testing purposes.

But at the same time there is never an all or nothing answer to these issues. It all depends on the situation, the company culture and requirements to name but a few mitigating circumstances. But let me explain.

And by the way, I will try to stop myself from going down the techy talk that most NON IT people get lost in.

So let's assume the company we work for has surmised that the testing environment(s) that exists presently needs to be scrubbed  to ensure that there is no real PII information. Yet the CIO also insists that one of the requirements to ensure quality work is the ability to copy real data for testing from time to time.

So the requirement is to copy data, when needed, but removing PII at the same time. That the removal of the sensitive information will still retain the quality that is needed.We need to develop a  process that scrubs the data consistently and have it executed whenever a data copy is to be done. Right?.... But how much data do we scrub? Do we need more then one copy? Who is going to be responsible to maintain the obfuscating rules etc.?

These are just some of the other factors that need to be considered.

As you start your analysis  you may come up with a question along these lines. Will there be  a need to sub-set the data while copying the real data for testing?

The more revealing question may be, what are we going to be doing with this data after it is scrubbed? You might think that testing is the response  and you would be right. But what kind of testing? You see in most medium to large companies there are more then one kind of testing that is done before any changes are put into the real world.

There is the testing that the coder/programmer does to help make changes to the code to ensure that the program works and produces the anticipated results. Generally speaking, this is called unit testing. In this case there may not even be a need for real data just some made up stuff. So we might not need to consider this type of testing in our requirement analysis.

Then there is what I call kernel testing. To run a logical unit/series of 'programs' (yes they can be stored procedures, scripts  etc, but I am trying to keep the terminology simple and it really means the same thing) to see if it runs with the changes successfully. Usually this is where a small sample of real data would be used. The data used here does not have to be related to any other application/data, so the masking process would be rather easy to implement. There would be no need to ensure that the same rules that are applied here would be applied to another application within the organization.

Next is some form of regression testing. Simply put, this is to make sure the application still works with the changes done to code. However, you will probably not want the same number of records as production data. Otherwise each test would take the same amount of resources/time as production. Remember, you are testing to make sure everything works, and if it doesn't you need to correct the issue and retest.  The old adage goes like this, time is money. The quicker the programers/coders can turn around the testing the better.  That means you will need to sub-set the data in question. AN example would be take a single branch's data as a test versus the entire companies branches. However this is not as easy as it sounds.

For example, if we have a banking application that we are going to be testing, we may decide to use only branch 'A' as the testing branch. This branch has a wide variation of customers etc. and it fits very nicely in the testing that needs to be done. We will need to copy only those customers within that branch (this most likely will be in some other location database). We will then need to copy only those accounts of those customers within that particular branch. In other words copy all the related information and only the related information for that branch, sub-setting the data. Oh, don't forget that we will need to mask the data as it is being moved over from production to avoid any potential issues further down the line.

Next maybe a user acceptance test allowing the users of the application in question to test the change(s) to ensure it is what they asked for and it works are required. While a complete copy of data can be used,  a sub-set of data can also be used in most cases.

And then in the next order of business there may be  a volume test. This test is normally done to ensure that the application can take the real world volume. (all the branches), the final kick of the tires, you can say.

Now while I have generalized, and each company/requirements are different, I hope you can see complexity that is involved.  The type of testing and the data used for that testing is extremely important, and it is just as important to analyze each testing requirement and come up with a solution that meets  all the needs.

So lets assume that we have all the answers to the questions posed above. We know what kind of data we need, the various versions/copies and the  other parameters that may have been discovered. What is next?

The next  post will cover the how to. The components of privacy project, the pitfalls, the bumps on the road, and the elephant in the room (and yes there is a BIG elehpant that needs to be fed)

While is may not be directly related to a privacy role, anyone in privacy needs to understand the complexity inherent in the process that a company needs to go through, so the project will come to a successful completion.

So I strongly suggest you stay tuned for the next installement. Till then if you have any questions feel free in contacting me at the email address below


View Robert Galambos CIPP/C CIPP/IT VA3BXG's profile on LinkedIn

Monday, March 18, 2013

Testing and Data Privacy, is there an iIssue, (PART III out of IV)?

Let's recap. In the previous posts I discussed why we should be aware of how application changes are tested within your IT department etc. or we may have a data breach before you know it. Then I explained how to mitigate some of the risks  with different processes/choices and listed the pros and con for each of them.
I will now continue the discussion about the various options, and which ones are the best etc.

So lets get started

The four choices that I presented previously are
1) Create your own test data
2) Copy production data into the test environment
3) Same as #2 but have everyone sign Non-disclosure agreements
4) Same as number #2 but obfuscate(scrub) the data

Looking at the obvious option #2. that is clearly a TABOO or is it?. The reason that we should not do this is obvious, right?  Copying Data  is what happens in the real world today. As far as I know there are not studies along these lines, (most companies would not want to share this type of information) but experience tells me that you would be surprised at the number of companies which have at least some areas where this practice is done regularly. While it can be argued that this would happen only within smaller companies, experience would say otherwise. Remember that you may have a policy in place forbidding this, but in some corner area of IT that has been around for years, they may be practicing "copy the data" because that is how it was always done. That being said, you may be surprised to hear me say that there can be times when there is a legitimate reason (fooled you) to copy production data within a testing environment.

This will be a topic for a future post concerning (and this is a BIG hint) testing, cost, risk  and support issues that revolve around data and data privacy.

For now let's just say this is not a good option and should only be considered in specific areas and reasons.

Option #3 in my opinion is slightly different then just 'saying no'. It should be standard policy that all individuals, no matter who they are, employees, consultants or outsourceers need to sign a non disclosure agreement.  But let me clear, this will not help in preventing any data breaches. And just to remind you why, there are studies concerning data breaches that state that more than 70% of all data breaches are non malicious. If the breach is malicious (disgruntled employee, criminal activity etc.) it will not stop data from be exposed either. So if it does not prevent breaches, why bother? What this  does is make it easier for legal remedies in case there is a need.

Option #1 is a viable option. Many companies I worked with have policies along those lines And in fact chances are that your testers will have to make up some data to test things that should not happen in real life. IE testing for error checking/handling. But is it be all to end all? No. One can never make up all the permutations and combinations one would need to test  to ensure that, first the change worked, and two that it did not break anything else. Now there are processes that mitigate the risks (for another post) involved. However there are no guarantees.

Last but not least there is  Option #4. This option states that all product data copied over to testing should have the Personnel Identifiable Information (PII) scrubbed. There are problems even with this option. To do a good job in scrubbing the data (it took me two years to be able to even pronounce obfuscate, never mind to spell it, so scrub is the term that describes the option as well, and easier to roll off my tongue) takes time, money, expertise and some risk.

So what does the process entail. How does one go about scrubbing data? The first step is to identify all the fields that have PII. Easy, right?. Nope. In this complex world we live in, I can assure you in saying,  No 'data' is an island entire of itself' (to Paraphrase John Donne)

Programs (applications, process etc) work together. The bill that is entered in the Accounts receivable system needs to be posted into the GL (as an example). etc. The bill also has a purchaser's Credit Card Number that feeds the Credit Card processor etc. The address on the bill is entered in the customer information system.

This interaction can be complex to say the least. One application has edits in place to verify a Zip/Postal code matches the address because the program that sends out mail needs to make sure the combinations make sense. But the application that is used for analyzing buying habits may not even look at this.

 Once all the PII fields are discovered and how they are related between applications/files/databases, the next step is to figure out what method should be used to scrub the data given the interaction I just described. Do we scramble the values, or should we generate new ones. Does the data need to follow certain business rules? Are there home made systems that need to be used to mask the data (IE. account number generator).

There are basically four differnet types of scrubbing methods. 

#1 A simple scrambling method. Taking wherever the letter 'A' appears and changing it to 'X' as an example. (there are variations of this to make it harder to reverse it the results).

#2 Looking up a translation table. by various methods  using the original value as a key to find an entry within the translation table. So if that value appears in another location, the same scrubbed value is returned.

#3 Generating new data. Basically either randomly or with some guidelines. This is an issue because every time the same value will be scrubbed, the result will be different this losing consistency.

#4 Replace the data with a 'string' or blank etc. As an example putting 'N/A' in each free form field because no processing is done to that data.

And there are other techniques that I did not mention, such as, date aging, flip flopping of real data, mathematically manipulating the values etc.

After it is determined what techniques are to be used, the next step is 'coding' the rules to be applied. and then testing them. Expect that this is an iterative process because the more you do, the more will appear that you may have missed something.

And finally the implementation of the process.

This is not an easy task, nor is it something that should be taken on lightly. But if you don't want to have your company in the cross hairs of journalists, bureaucrats, courts, general public. You need to do due diligence (making sure you do the best you can to prevent data leakages).

In the next chapter I will talk about how this  fits together in the overall picture, how one needs to  consider other factors when talking about testing.

View Robert Galambos CIPP/C CIPP/IT VA3BXG's profile on LinkedIn

Robert Galambos's Resume

Robert Galambos
M: 416.876.2979   |   rgalambos@gmail.com   |   http://galambos.me/

Professional, results oriented Presales Engineer and Consultant with a proven track record within the software industry, combining high-level sales and marketing knowledge with deep operational experience, technical savvy and cross-functional communication skills. Extensive experience supporting sales initiatives, managing customer relationships, handling customer service calls and consultations, and maximizing client ROI on software solutions.
  • Data Privacy
  • Client Relations
  • Trouble Shooting
  • Security+
  • Data Management
  • C-level & Client Presentations
  • Executive Communications
  • Staff Training & Development
  • Data Optimization
  • Oracle/SQLServer/DB2
  • WebEx
  • MS Project
  • Customer Service
  • Solutions Demonstrations
  • Technical Consulting
  • Salesforce.com
  • MS Office

Security+   |  CIPP/C  |  CIPP/IT  |  IBM / DB2 9 DBA for z/OS

professional experience
COMPUWARE                                                                                                                1996 to 2013
Sales Engineer / Consultant / Trainer
  • Helped close minimum $2 million dollar sales 13 years in a row.
  • Contributed to a team that achieved a minimum 95 percent maintenance renewal.
  • Held Discover Meetings with current and potential clients to discover client issues, concerns and sales opportunities.
  • Delivered high-impact presentations, trained clients, staff and c-level executives on various solutions, concepts and best practices.
  • Worked with Multiple Projects/Clients simultaneously.
  • Facilitated customers and partners, as well as on-site professional services support such as installations, post sales transition, and configurations upon deployment of software.
  • Liaised with Product Development and Marketing departments reporting on industry/market trends, competition, and proposed new product functionality.
  • Provided technical analysis as well as collaborated with sales to develop cost justifications, to facilitate completion of RFI and RFP responses for various clients in an efficient manner and helped prepared sales package proposals.
  • Managed interoperability and alliance between software solutions and customers’ strategic business plans.
  • Helped potential clients understand, compare and contrast several IT solutions.
  • Produced detailed phone support, and on-site evaluations of clients’ current software solutions.
  • Engaged and coordinated implementation engagements with a +90% success ratio.
  • Served as Project Manager/Team Lead with the participation of 10 team members, developed, updated, disseminated training materials for 10 software products with a specific timeline.
  • Mentored individuals for the Professional Development Program, training non-IT professionals to become support personnel.
  • Was one of the ‘Go To Guys’ for difficult situations/clients.

Delivery Consultant and/or Solution Architect services to major financial institutions I.E. Barclays (UK), Kasikornbank (Thailand), Royal Bank of Canada, Banque National du Canada, among  others.                                                           

  • Determined requirements, Designed, and then Deployed Data Privacy process and successfully meld various complex relationships into a cohesive business process.
  • Resource person on the functionality to both Compuware’s software and the client’s own software environment.
  • Mentored local consultants, foreign consultants, non-bank consultants in both evaluations and interpretation of the project results.
  • Addressed concerns and provided pro-active concepts to the client to maintain the quality of data as well as reducing QA costs. This lead to a 20% reduction of time and material costs.
  • Targeted training on usage of the software.
  • Designed and implemented pilot projects/POCs to completion and presented the solution to the stakeholders.
  • Proactively advised on best practices within the industry and provided various industry resources.

MONTREAL TRUST / BANK OF NOVA SCOTIA                                     1984 to 1996
Principal Analyst & Team Lead
  • Team Lead responsible for financial systems, including payroll, human resources, general ledgers, accounts receivable and accounts payable within the Trust Unit.
  • Supervised analysts responsible for critical financial, HRS and payroll systems. Systems.
  • Apprised management of more efficient methodologies to ensure better business decisions.
  • Provided guidance, instruction, direction and leadership to the team to achieve key results for internal clients & users.
  • Coached and matured the skill level of direct reports in order to continue their long-term development and ensure solid succession planning and departmental success.
  • Liaised with Payroll, HR and Executive Offices as a subject matter expert.
  • Created “What if” scenarios and provided support for non-technical end-users.
  • Worked with the Finance Team to determine the ongoing business needs and requirements for the reporting of all assets, sales, redemptions, management fees, trailer fees, and advisory fees      
Education And Professional Development

B. Comm. - Bachelor of Commerce, Accounting                                                              1979
Concordia University, Montreal, Quebec

IBM DB2 DBA for z/OS                                                                                                         2008
International Business Machine, USA

CIPP/C. – Certified International Privacy Professional/Canada                                       2007
CIPP/IT. – Certified International Privacy Professional/Information Technology              2008
International Association of Privacy Professional, USA

Security+                                                                                                                               2013

CCENT. – Cisco Routing and Switching                                                                 spring 2014
Cisco, USA

Wednesday, March 6, 2013

Testing and Data Privacy, Is there an issue(Part II)??

So here we are now. Lets  recap some of the major points about the subject that we covered previously before we go on.

IT departments maintain and use both 'Production' (what is used to run the business) and Testing environments. They need data to test with. And where do you think most of the testing data comes from? In the 'real world', it is most likely 'real' Credit cards numbers ( PCI DSS does not allow this (Payment Card Industry Data Security Standard), Tax Identification numbers etc.

And to further complicate the matters, testing by it's very nature, means easier access to the data by Developers, Testers, IT operations etc. And this gives us the exposure that business try  so hard to avoid. And you may not even know about it.

So lets take a look at some legal ramifications of this matter. 

An example is in  Canada, where one of the principle laws governing Privacy is Personal Information Protection and Electronic Documents Act (PIPEDA). Basically (and this is an over simplification but is good enough for this discussion) the Company will use the Personal Identifiable Information (PII) it gathers solely for the intent that 'advises'  the user. So if a user goes into a bank to open an account as an example , he/she has to sign a 'whole bunch' of papers, and more often then not get a copy of them to take home to wall paper the house (I know its a bad joke.) Realistically these statements are only read by a lawyer or a privacy specialist). 

But in all seriousness, at least one of these documents(Best practices) is basically an agreement made with the bank that allows the bank  to gather the information they need, to provide the service you are requesting from them. It also states who they may share that information with, and how they will protect it, and hopefully list a  Department/Person in case one has any questions about the Privacy Policy of the Company. 

I guarantee that there is no place in that document that states the company may use the information for testing purposes. And don't forget the looser criteria requirements of the testing world.

If you think that this is only for Canada you will be mistaken, big time. As an another example in the EU one of the applicable 'laws' is called Directive 95/46/EC (Or more commonly known as the The EU Directive on Data Protection). It is one of the most stringent laws pertaining to Privacy there is. And don't be fooled by thinking that just because you do not have any offices in the EU or Canada etc, you don't have to worry about that. In fact if you have any customers from the EU, or collect some information while they are on your website, you may still be under their Privacy jurisdiction.

Now this particular aspect is worth a book in itself, but let's just leave it for now, and if you, the reader agrees, we can try to figure out what needs to be done, and the benefits/cost of each solution.

1) Well, lets create the test material needed and not rely on ANY real data.

The Pros:

Will not need to worry about relaxed security restrictions because the information does not represent any real person.

The data is  'easy' to create. So  even if the printed reports are found in the trash bin there will be no worries.

The Cons:

 'Quality' of the made up data. Is the data,  a good sampling of the various permutations and combinations of different aspects of your customers. I.E. do you have  customers who live in NYC (Hong Kong, Budapest, Montreal etc) and who have a chequeing account in the spouses name as well as two children's accounts, etc. If you do not cover all the different variations that exist, how do you know that your testing is complete and will be able to  discover failures before implementation?

2) Copy Real Data for use in testing

The Pros:

You will be testing with real data, and if there are a issues, they will be discovered before the change is put into 'production. If the tests work then there is no reason why it will not work during productions

The Cons:

As previously discussed, chances are that you are close to breaking some laws  (if any of the information in question is PII).

The data volumes, is another concern. Who nowadays has the capacity, large or small business, to be able to copy the entire production data to be used for testing. And if we are talking about most major companies they may have many testing environments to help them to move forward.

Then there is extra time you will need for multiple testing to be done with large amounts of data. (another topic in my series of Blogs in the future will be about volumes of data and testing types, etc. and issues/solutions).

The reduced Security (see above) around the testing will allow increased access. This could increase the chances of a Data Breach.

If there is a  Data Breach, your company's reputation would suffer and its name may appear on the front page of the local/national newspaper etc. The  cost of loss of customer confidence with your organization may also effect the bottom line. This can cost millions of dollars and loss of business. (All depending on the number of records exposed).

3) Copy Real Data For use in testing and have everyone sign non disclosure agreements.

The Pro:

You now use real data, with all its different combinations, to test with and the legal protection of a non disclosure agreement.

The Cons:

According to some studies, over 70% of all Data Breeches are non malicious and therefore agreements of this sort would not stop a breach.

We are also still looking at large volume issues.

Real data may not have all the information you need for testing properly (testing for error handling as an example)

4) Copy and  obfuscate(scrub)  the PII data so no one can figure out who the real data record represents

The Pros:

You get real data to work with and thus even if a reports ends up in a trash bin, no one can figure out who the data identifies, belongs to.

The Con:

You will need to have a full understanding your data

You will have to do  analysis work on how to scrub the data.

You will need to understand how the PII data  work together within your environment/application.

In my next blog I will further investigate all of the above options and discuss which option maybe the most suitable for your situation. Maybe a hybrid solution could be the answer.

If you have any comments or questions, feel free in dropping me a line

As a note, this blog is not attended to be legal advice.


View Robert Galambos CIPP/C CIPP/IT VA3BXG's profile on LinkedIn