Coding for apps, APIs, and SDKs is a new form of literacy that requires knowledge of interactive tools, formats, and standards. It also requires knowledge of visualization concepts for the final presentation of data. Poor use of visualization inhibits the progression of data up the DIKW pyramid to ultimate wisdom. Mastering these skill sets is probably beyond the capabilities of those on the left-hand side of the IQ bell curve. Data visualization is still relevant to less literate audiences as elites use it to channel their behavior. One DataWeek speaker on visualization mentioned that "data trumps opinion" at Google and other highly literate enterprises. Most humans don't think at that level. Visualization does not have to enable critical thinking to be useful. It is valuable as an elite tool for social management. Control systems in a post-literate society will be very pretty to behold.
The keynote on graphing reiterated the importance of link density in assessing the value of data points and networks nodes. Links form valuable patterns and fraud detection relies upon identifying links to outlier data. Gartner's five graphs of the consumer web help visualize the value of linked data. Some verticals are good early adopters of graph solutions. SaaS providers should seek use cases in segments that must link several master data sets. Pain points in visualization applications are connected to an enterprise's need to manage workflows that handle critical data volumes.
Money transfer could use more robust data supply chains. Your local bank still has physical branches yet smartphones enable payment and money transfer independent of banks. Your transaction history should be tied to your personal identity first and your banks' identity second. I have attended enough fin-tech meetings in San Francisco to see the innovation coming out of unregulated startups, and regulated financial institutions are taking note.
Visualization best practices exist primarily in the works of Edward Tufte and Stephen Few. Designers of enterprise dashboards, UIs, knowledge management pages, and data visualization products should read those geniuses. They should also read my DataWeek 2013 write-up where I mentioned some really good sources for chart display factors. There must be a market for cheap and simple business intelligence tools priced for SMBs.
John Musser from API Science presented ten reasons why developers hate your API. Read the ten reasons yourself so I don't have to repeat them here. These are the kinds of best practices that make attending the conference well worth my time. Programmable Web is a huge API directory for developers. Developers should pay attention to these factors because at some point their APIs will get traction if they get these things right. Heavily-adopted APIs must then migrate from small freemium hosts to major cloud hosts and they will face all of the growing pains of an SMB that becomes a large enterprise. Check out the developer pages of Facebook, GitHub, Google, Twitter, Apple, and Microsoft to see how industry leaders manage their API platforms.
My biggest discovery at DataWeek / API World is that multiple freemium platforms enable the creation of an entire data supply chain . . . free of charge. That's right, folks. Do some Google searches yourself to find the providers. I have not seen a standard definition of an API life cycle but some common stages are emerging around development, deployment, management, and retirement. All of this can be done at no cost to developers, at least until the data supply chain gains traction with other developers and users. Converting a Google Doc or Microsoft Office file into an app or API establishes a minimum viable product. Freemium translation platforms can also automatically build an API into an SDK. They can even add speech recognition. This opens up transformational possibilities for business domain experts who are not proficient in coding. I am tempted to create a data supply chain for Alfidi Capital. A valuable data supply chain reflects unique domain knowledge and deep master data sets. Remember that Data.gov is a free source, ready for the taking.
Anya Stettler from Avalara had one of the best talks I've ever seen at a tech conference, hands down. Her tips on documenting APIs walked through examples of technical references, code snippets, tutorials, and live interactive formats that keep developers excited about an API. Check out her presentation on SlideShare, because it's too good to miss. More speakers need to focus on action items with just enough of a soft sell to let us know their brand is a go-to source for expert services. "Do this to be successful" is the kind of talk I like to hear.
Open data from governments is a free input to data supply chains. Bay Area government agencies have held "datathons" encouraging citizens to construct visualization products from government data. It's understandable that they have to farm out product development to the public if government agencies don't have the bench strength in data science to do it themselves. The philosophy behind the Science Exchange's reproducibility initiative needs to make its way to government research. Aspiring data analysts don't have to wait for the creation of a GitHub or Bitbucket for the analytics community; they can get right to work on DataSF.
I learned a new term during an excellent talk on API lifecycle management. That term is "data scraping," the harvesting of proprietary data from a popular API. Platform managers who implement protective measures against scraping will also deter legitimate developers from using an API. There's always a tradeoff between usability and security.
The IoT talks were not as informative as I had expected because they were mostly disguised product pitches. The platforms with the largest API ecosystems - Google and a few others - will be the defaults for IoT integration when devices are ready to connect to ERP systems. That's why Google bought Nest. Data scientist job descriptions in the future will include a lot more emphasis on machine learning and BRMS than they do today. Mark my words.
IBM had a lot to say about analytics at this conference. Their offerings at IBM Watson Analytics and IBM InfoSphere BigInsights look cool. Big Data requires iterative and exploratory analytics in a whole new layer between the Hadoop back-ends and data processing front-ends. This sounds like the old term "middleware" made new again to incorporate compilers and optimizers. Analytic language must graduate from running on small systems because Big Data is so big. Business domain experts can learn more about this at Big Data University because they need to close their language gaps with data scientists.
The potential end of Moore's Law has implications for data storage. If data flow volume grows faster than data storage density, SaaS and PaaS cloud providers will have to spend major capex building out data centers. I am doing some research on data center providers organized as REITs. I first noticed them as potential hard assets in a hyperinflation-resistant portfolio, because they are really just inputs into a supply chain. I now think they may be a growth opportunity by themselves as a pick-and-shovel play on the data sector. Keep watching my blog for future discussions of data centers.
The final panel I attended was appropriately the venture investors' discussion of funding and acquisition for data and infrastructure startups. If the emerging term "Infrastructure 2.0" for the data / API sector catches on, it must encompass apps, SDKs, and tools for visualization and analytics. The VCs think they can make money at all levels of the tech stack but I have blogged many times that unaddressed needs in ERP links are probably the most lucrative markets. I do not share their pessimism that open source business models are too hard to monetize. After all, IBM seems to be doing just fine selling Hadoop-based solutions because they can address multiple vertical segments as a "horizontal" provider. I did pay close attention to the mention of several parts of a data science pipeline: data cleansing, feature engineering, collaboration, and modeling. I now have some new buzzwords to throw around at the next data sector conference.