Mosaic Research and Cybersecurity

By Arleen Malley Zank June 2017

Cyber attackers use a mosaic of puzzle pieces to understand your stolen data.

Control Freak
Mosaics of Data & Advanced Persistent Threat


A stealth on-going cyber attack that uses advanced techniques to exploit vulnerabilities and steal data that is persistent in the environment for a long period continuously monitoring and extracting data from a target posing a threat to the entity's information supply chain and intellectual property.


A recent post on the Yahoo data breach noted that you can still buy the data set of stolen personal information for one billion (1,000,000,000) users for a cool $200,000. The offeror notes that while the passwords had been changed, other information about the user — phone numbers, secondary email addresses and even the password of those who lack originality or don’t change their passwords — might be useful in linking information in the Yahoo data trove to other information roaming around the dark web. Take this data, match it up with other pieces of data and voila you have insight or at least a path to continuing to ply your crimeware enterprise, sending out personalized spear phishing emails, or pretending to be from the IRS.

Accumulate a mosaic of data to create actionable information.

Hedge fund research operates in much the same way, without the dark web and crimeware enterprise aspects of the endeavor. Use the data from public sources like the Patent Office, FDA, SEC, other regulatory sources; data bought from services like Factiva, CB Insights and scientific and academic services; articles by key members of the corporate R&D posse in PubMed and other scientific journals; news reports on insiders at the companies and information from public filings of every ilk. Count cars in the parking lot to see if there is a third shift at the local factory. Install sensors on private property outside of corporate rail yards to count railcars coming in empty and going out full (or not so full.) Count shopping bags on Black Friday and people roaming without bags to determine if people are shopping or touring the place like it’s mall museum.

Accumulate a mosaic of data to create actionable information.


The artful term for this hedge firm research technique is Mosaic Research. For hackers it's the usual modus operandi. Build a mosaic of information that reveals what's going on, analyze it, assemble it and create comprehensive insight into the market or firm you're interested in. Figure out what's going on with the bellwethers of an industry as a predictor of whether disruptive start ups are having an impact on the market — use pieces of information on the cost of a taxi medallion in New York City as a marker of adoption rates services like Via, Uber, and Lift.

The cybersecurity message here is that little pieces of data strung together can reveal a very complex picture of the digital enterprise and can be analyzed, augmented, and matched with other data to get a comprehensive picture of what's going on. Steal some social security numbers here, some mother's maiden name there, a few old passwords and addresses and you're in business. It works for hedge fund researchers and analyts. It works for hackers.


The data that is valuable isn't always the data and information you think it is. It’s not just the information that falls under the standard definition of intellectual property the is useful, things like your trade secrets, pre-published patent applications, formulas and manufacturing practices, or your algorithms. Its things like the invoices from your suppliers that would enable an adversary (or competitor) to calculate your volume discounts and an estimate of next quarter’s build plan. It’s your spreadsheets with run rates and cost numbers that would enable your competitor to figure out exactly how to underbid you on that $100M components contract. It’s the names of your customers so a competitor will know exactly who to call to make a sales pitch. It’s a network of individual streams of data that can be “glued” back together again to model your business and steal your intellectual property. Patents, trademarks and trade secrets are important but all this other stuff is also intellectual property and it’s often left in the digital wild.

Assembling this information takes time. To build a successful cyber mosaic you need to be patient and assemble the right data. Get in the system, hang around, snoop around, and then steal stuff that looks valuable.


An attack focused on systematically stealing your most important stuff over a long period of time is called an Advanced Persistent Threat (APT). APTs are a network attack where an adversary gains access to a network and stays there undetected for a long period of time. The intention of an APT attack is to steal (exfiltrate) data rather than to cause damage to the network or organization. (They can always come back later and wreak havoc on the network or your information or on the rest of your infrastructure to slow you down while they use your intellectual property to copy your latest innovations and get to market before you do.). Like mosaic research, the goal is to collect compelling pieces of information and then weave them into a story later.

Attackers find ways to sneak into your network often using compromised credentials or social engineering or spear phishing. APT attackers lay low and use carefully crafted exfiltration that blends in with the rest of your data flows to not call attention to the outbound flow of stolen data from your information supply chain.


Building strategies to defeat an adversary who wants your most important stuff requires thinking about your data, how it's organized and where it’s stored in a new way. Think about your data the way your competitors or your adversaries would. What would you do if you could get your hands on… your competitor's price list, list of suppliers with component pricing; formula for your fast dry sealent; new patent applications. What pieces of information would be awesome to have? How could you recreate important data by sewing together different but connected data sets.

When you figure this out, figure out how to make your data harder to use, more difficult to reassemble, and not understandable to outsiders who don't have the secret recipe.

What technologies and strategies can you use to you make your data harder to get, less useful to your adversaries, and really difficult to reassemble. Continuously think about how to make it hard to understand your data on the outside.


Advanced Persistent Threat (APT) involves time, technical expertise, and patience and those who do it are usually part of a well-funded enterprise. Protecting your information supply chain requires well crafted controls and automated cybersecurity continuous monitoring tools. Here are some things to think about:


Employ the principle of least privilege. Allow only authorized access to data by users (or processes acting on behalf of users.) Assign the fewest privileges that you can get away with, privileges that are necessary to accomplish assigned tasks in accordance with your mission and business functions. Give these privileges out sparingly, watch how the privileges are given to external partners or service providers even more closely. Monitor what people actually use and cut off access to permissions that provide access to resources that the user isn't using.

The better you control how you hand out privileges and functionality the harder it will become for an adversary to find and use stolen credentials and to wander around in your information supply chain if they get in. It will also make it easier to detect your data flowing out of the systems because there will be fewer users who can legitimately move data around and detect out of band behavior. Audit the use of privileged functions. It will lower your risk profile and make your cyber insurer and risk manager happy. It is one way to detect unusual behavior and in doing so, help mitigate the risk from both insider threats and the APT.


Static, homogeneous, and standardized information systems make systems more susceptible to cyber attacks with less adversary cost and effort for an attack to be successful. There is simply less to figure out. It's easier to extend information about one part of your system to understand how other parts are architected. Avoid the sitting duck.

Sitting ducks enable an adversary to execute their own mosaic research using naming conventions, file locations, and other characteristics of how your data is organized to figure out where to go next. (Think Mosaic.) Once an adversary figures out how you've standardized your architecture — server names, endpoints, network segments — they have lots attack vectors to exploit (or put in a non-cyber way, one you figure out how one server is set up, you can probably figure out the other ones and start finding more useful stuff. Think about ways to make your architecture hard to exploit without making it overly complex to manage and support. Store your most important data and make it hard to acquire all the pieces of your mosaic. Diversify. Consider architectures that don’t look alike from every endpoint, every server and every user.


Moving Target Defense is an approach to cybersecurity where you continuously change your architecture and networks on the fly. Consider techniques to implement the capability to create new network connections on the fly so that network topology are not defined until runtime and isn't known to the bad guys in advance.  And it's different the next time they stop by. Create the network at runtime and then make it disappear before the bad guys can attack.

Use techniques such as virtualization, distributed processing, and replication. This enables you to relocate the information resources like processing and/or storage. Changing locations of processing activities and/or storage sites introduces uncertainty into the targeting activities by adversaries. Uncertainty increases the work factor of adversaries making compromises or breaches more difficult and time-consuming, and increases the chances that adversaries may inadvertently slip up or hit your system often enough that they will show up on the radar of your continuous monitoring and data loss protection tools.

Now you see it and now you don't. Using tools like virtualization and configure at runtime means your assets are not persistent. This significantly increases the time your adversary has to spend to find and exfiltrate your data. It also reduces the window of opportunity and exposure of attack surface available to initiate and complete attacks. It may also result in the attacker having to steal your data in smaller and smaller chunks. The harder it is and the longer it takes ups the odds you'll figure out what's going on.

Threat awareness programs that continuously remind your users what the threats are, how adversaries will try to gain access to your information and systems, and what you are protecting are important. Situational awareness and continuously training your team on what your most important digital assets are and why you need to protect them helps. And make it interesting, create positive incentives for users to identify issues and protect information. Boring cybersecurity awareness training is right up there with death by Powerpoint. Keep it interesting and relevent.


It's so much more than patents, trademarks, copyrights and trade secrets.

People generally don’t know what constitutes Intellectual Property (IP). For a lot of folks the phrase IP implies golden files and magical data like patents, patent applications, the formula for Coke, and the methods for making a drug. IP is spreadsheets with customer lists, documents, budgets, presentations, price lists, supply chain information, the names of your key people, the decision to switch suppliers. IP is negative know-how, the things you’ve tried that don’t work. It is pretty valuable to a competitor to know all the things you tried before your product worked so they don’t have to waste time and money going down the same path. Come up with a strategy to protect all of your intellectual property not just the obvious magical stuff because someone may be working on a mosaic of your organization. Explain to your team why all of your data needs to be protected, and how to do it. Explain that downloading the price list and putting it into a spreadsheet and then emailing it to your personal account puts your IP in the wild and outside all of your carefully crafted cybersecurity. For that matter, printing it out and reading it on a plane isn't good either.

(Let us know if you need an intellectual property inventory. We're good at it.)


Deploy automated tools to help you monitor what's going on and to keep track of your data, assets and IP.

The cornerstone of risk-based cybersecurity is the use of automation tools to implement continuous monitoring processes. Orchestrating a portfolio of Continuous Diagnostics and Mitigation (CDM) tools and the data they provide enables near real-time digital risk management and risk posture awareness. All the other controls are great but if you can't monitor what's going on you're toast.

Cybersecurity Automation and Orchestration help you:

    • Maintain a picture of your security posture
    • Measure that security posture
    • Identify deviations from expected results and actual statesstates
    • Provide visibility into assets that should be there and those that shouldn't
    • Orchestrate and leverage automated data feeds
    • Monitor the continued effectiveness of security controls

When building your risk-based cybersecurity strategy map the key cybersecurity automation domains to the controls you select for your information supply chain, technology and user base. We've mapped key cybersecurity tools to the controls where they are useful. Check them out here.

When you think about protecting your information supply chain think about hackers and hedge fund and what mosaic of data from your digital enterprise would be the most valuable to your adversaries. Then protect it.


A Geekery Alert gives you advanced warning that the reference is filled with technical and scientific information.

© 2017-2021 Wayfinder Digital, LLC. All Rights Reserved | Sitemap | The Fine Print — Terms of Use & Privacy