Latest Entries »

The Web and Internet Security

Part 1 of n

Hi Everyone!  I’m back!  The last couple years have been a hell of my own making, through the depths of depression and back again.  I’m still a bit burnt out, but definitely on the upswing.  But enough about me.

Pointedly ignoring the clusterfuck that is US and world politics right now, I’ve decided to speak on a topic that is near and dear to everyone’s heart, privacy and security.  This is something I know a bit about, having worked with private data in web and mobile, as well as a few hobby attempts at secure chat and connection apps.  Rather than go deep into the tech, in which I’m far from an expert, I’ll talk about the concepts and issues that I’ve seen.

As a disclaimer, I work for Facebook.  In this article I’ll touch on them, but I am not in any way shape or form speaking for them.  These are strictly my personal opinions.

So let’s start with a model of your basic web or mobile app.  There’s a server somewhere, or more likely a server farm (the “cloud”).  There’s the Internet itself, a web (see what I did there?) of interconnected special-purpose devices whose job it is to shuttle data around, based on attached metadata.  Then there are clients, which can be mobile devices, tablets, desktops, or a web page or custom app.  Not surprisingly, this is called a “client-server” architecture.

Whether the client is a web page or a mobile app, the people who funded the system want to see how their investment is doing, how they can make it better, and bring the site back up quickly if something goes wrong.

The servers will keep logs of how many requests were made, what information was returned, and in case of a problem, what failed and where.  The clients will also send back information, about how people are using the app, and what information was requested.

Together, these analytics let management know who to advertise to, what their interests are, and how best to serve the users and the investors.

You’ll note I’m being careful not to put a value judgment on any of this.  The bottom line is, content costs money to produce, serve, and maintain, and if the people fronting that cost aren’t making any money on that investment, they’ll find another investment.  If they know what people want from their application, it’s in their interest to provide it, since that will bring more users.

Which brings us to ads.  Once you know what people like, you have a good idea of what kinds of things they’ll buy.  If you can target the advertising you vend, they’re more likely to click your ad links, buy that stuff, and thus make you more money.  That also means that the space you reserve for advertisers will be worth more, since the clickthrough rate is higher.  In advertiser lingo, putting an ad in front of someone’s eyes is an impression, and buying the product is a conversion.

So what can go wrong?

Let’s start with the advertising itself.  Since impressions and clickthroughs are worth money, advertisers will go to great lengths to maximize their number.  This means annoyingly distracting ads, popovers, popunders, and other obnoxiousness to ensure that ads are seen and maybe even accidentally clicked.  Fortunately content providers are starting to become aware how much badly behaved ads affects their goodwill, and are taking steps.

As far as “data mining” and such, the legitimate use of my data for marketing purposes is the least of my concerns.  If I don’t want to buy something, I simply won’t, no matter how well targeted the ad.  If tracking my data means the content will be more interesting to me, all the better.

There are concerns that some sites are in fact customizing or suppressing content in order to make users feel better.  I can’t speak for my employer, but once again I’m not too worried.  I’ve seen plenty of dissenting content on my feeds, and whenever I see news that seems relevant to me I’ll check the source before believing or forwarding it.  Checking sources will be the topic of a future article, but not too far in the future.

Then there are hackers.  Some sites use woefully inadequate security, or don’t keep up with the latest exploits as they are found, and so leave themselves open.  Once someone gets in, everything is potentially up for grabs.  Also, hacking has become big business, so there are a lot of people, in and out of governments, who are doing it.

Finally, the government themselves.  No matter who is in office, there’s going to be some abstract noun that’s used as the latest witch hunt.  This means that every byte of data, once it’s left your computer (and maybe even before that, soon-to-be-upcoming article), can be copied and analyzed at any stop along its way.

A subpoena can also be issued to retrieve the analytics data that companies have on you.  Without a warrant, and without notifying you.

The primary takeaway here is if you don’t want something to be public, don’t post it.  Even if you set the most restrictive privacy settings, and the site you’re posting it to assures you that your private data remains private and yours.  You’re just one hacker, one subpoena, and/or one “Our Terms of Service Have Changed” email away from your deepest secrets becoming front-page news.

An excerpt from my upcoming book

There are problems that are complex.  These are the ones that have many moving parts, and there is an emergent behavior that seems greater than the sum of its parts.  Complex problems are fun to solve, and when you’ve done it, you have a great sense of accomplishment.  For example, the Mandelbrot Set is based on a relatively simple equation, and is literally infinitely complex.

Complexity is an intrinsic property of a problem.  Looking at the problem itself, what is the minimal solution for the entirety of the problem?  As simple as possible, but no simpler is the mantra.  As long as the problem is fully stated, the complexity is known, and it can be completed.

Complication, on the other hand, is the set of external considerations, the extrinsic properties.  I could add an Apache server to a linux box and throw it on the internet with a 20-line PHP script to serve content (thus satisfying the full complexity of the problem), but there’s a 99% chance the linux box will be pwned within 24 hours.  Security, therefore, is a complication.  Pre-existing code that you don’t understand and have to work with is a complication.  Development process is a complication.

Complications comprise the ‘work’ part of work.  Some of them you’ll know about, like adding enough security so that your web site is hard to hack, or the code review process you need to get your work into master.  Many of them you won’t know about – a new OpenSSL vulnerability is discovered, an edge case in the code you inherited, a sudden change in the specification.

I will state that at best you can know 40% of the complications involved in a task.  That means that at least 60% of your work is unknown at the time you commit your code to master.  You need to anticipate that 60% and allocate resources to handling these complications.

It seems like the Religious War Du Jour is the discovery that Object-Oriented Programming Sucks, and anyone who wants to write code that works should use a Functional paradigm.

The blogs and websites are full of semi-contrived examples where OOP has gone horribly wrong.  And indeed, the problem they are trying to solve is not a good one for OOP, and the result is hacky spaghetti code.  Then they show the same problem solved in a functional language in 10 lines.

The lesson here isn’t that OOP sucks and FPL is a panacea.  The lesson here is:

Use the right tool for the job.

If I hired a contractor to do work on my house, and she insisted that everything can be done with a pair of pliers, I’d be skeptical.  Yes, you can grab a screw head and turn it with pliers, but a screwdriver will be a lot easier and safer.  If I were paying her by the hour, I’d only be paying for one hour.

In the many years I’ve been coding professionally, I’ve found that both good and bad architectures are self-sustaining.  A bad architecture will force you to pass data all over to get it where you need to, copy-paste nearly identical code in multiple places, and implement hacks to get what you need done.  The learning curve will take forever, and you’ll feel dirty.  Your self-esteem will fall since you’re writing buggy code you’re not proud of, and you don’t feel confident it will work for all edge cases.

A good architecture will have clear interfaces, separation of concerns, comments and unit tests.  When you make a mistake it’s almost immediately underlined in red by your IDE because the static checker can catch the common problems.  When you enter ‘git push origin master’, you do it confidently because the result feels right.

Is a circle a subclass of ellipse or vice versa?  Depends on what you’re doing with them, but if you have a renderAsPDF() method in either of them, you’re probably doing it wrong.

Wow, it’s been three years since my last post!!  Also, a lifetime.

I spent about half the time working at Twitter, the other half at AOL and Facebook.  I moved from NYC back to our house in the Bay Area, CA.

Last year I was privileged1 to participate in a layoff, which was presented to me as a termination for lack of performance, along with, unbeknownst to me until later, half the office.  At first I took it really hard, but then I decided to see what I could learn from the experience.

First of all, I realized that my performance was just fine – in fact, while I was being told how bad I was doing, I was also dealing with a severity 0 failure, starting with finding the root cause, contacting the appropriate developers, and keeping management aware of progress.  During one of these management calls, I had forgotten to ask some specific question (I don’t remember; it doesn’t matter) and got screamed at by the manager for two minutes about how sloppy my work was.  Meantime the sev0 was fixed over a weekend, and a hotfix saved a lot of people trouble.

But the main learning here is that as a senior engineer and beyond, the primary skillset isn’t how well you program, it’s how well you navigate the work social graph.  At the end of Thinking Physics, Lewis Carrol Epstein has an appendix on the growth of business organizations that was really enlightening.  A small startup will have one manager, the CEO, and say 3-8 engineers.  Everyone will be producing and working hard, and the CEO will have the additional task of securing funding and marketing the product.

Eventually, if lucky, the CEO will need to hire an office manager, an accountant, and more engineers.  At this point, she or he will probably have to hire an additional manager or two.  Epstein estimates it to one manager per 10 employees, whose job it is to coordinate resources, and goals for those employees, and communication with other managers and the CEO.

If the startup makes it to mid-level, around 100 employees, an amazing thing happens.  We now have 10 managers, which means we need to hire a middle manager, whose sole job is to coordinate the 10 managers, and is removed from the ‘front lines’ by a full level.

This is used as an argument that smaller companies can move faster and work more efficiently, by minimizing the number of communication lines that have to be maintained.  But the reality is, companies grow.  Sometimes they’ll split off smaller companies, but often they just get bigger, and hence further removed from their original mission, toward maintaining their own size and maximizing profit and shareholder value.

What this means is that you will likely work in large companies, and have to take on a good portion of this communication yourself.  Senior and Staff engineers have to coordinate with engineers in other groups, because the managers may not have the bandwidth to fully immerse themselves in the technical details, all they can do is provide the introductions and step back.

If you want to accomplish your cross-team goals, it’s critical therefore that the engineers you need to work with want to work with you.  Going to social events, taking people out for drinks, remembering their kids’ names, now are part of your way of life.  You want them to hear your voice, or see your Instant Message, and think happy thoughts.  Otherwise they’ll put you off and delay you, and your project will fall behind, and you get “(barely) meets expectations” at your next review instead of “exceeds.”  Even if you’re a topnotch programmer!

And this is what happened to me way back when – I failed to understand the importance of the social graph.  I would let my moods and my paranoia influence my interactions with peers in other offices to the point where I couldn’t get my code reviewed and kept missing deadlines.  Which of course led to my mood getting even worse.

But in truth, a failure is only a failure if you don’t learn from it.  Learn from my failure, and don’t fail yourself!
1 That’s sarcasm, just in case you couldn’t tell

Been reading http://www.codinghorror.com/blog/2012/07/new-programming-jargon.html, and thinking about how many I’ve dealt with and how many I’ve written.

When a project grows, there is going to be an unavoidable accumulation of tech debt as methods and objects are moved and expanded to handle new requirements.  Eventually you get to the point where trying to change anything results in a plethora of bugs and crashes.  You’ve hit the refactor zone.

The biggest challenge is convincing project management that the refactor is needed, since changes to the “plumbing” don’t advance the revenue-generating features that they want to get out ASAP.  Even worse, the refactor will take time, since you have to retest every bug fix and feature that got the code here in the first place.

Best bet is to follow the Agile Methodology of Constant Refactoring – whenever you see something that doesn’t look right, fix it.  This way net tech debt grows more slowly, or even shrinks, and Product isn’t counting the opportunity cost of your refactor since you’re still adding features.

Also remember, “Clever” is a four-letter word.  Doing something weird and clever that saves 3 machine instructions or lets you write an entire function in one line of code will only bite you later.  Not only won’t anyone else understand it, but when you have to change it in 6 months you’ll wonder what the f— you were thinking at the time.  Trust me- been there, done that, bought the t-shirt at thinkgeek.com .

Hallway Lights

My Hallway Lights, controlled by three MOSFETs and an Arduino

The Assignment From Hell

Disclaimer: This is not about any specific employer I’ve ever worked for; there have been aspects in all of them.  Any resemblance to past, present, or future employers is purely intentional.

Anyone who’s worked in the industry for more than a year or two has had one.  They’ve had to maintain or enhance a code base that was a heaping stinking pile of, er, sunshine.  The manager is putting lots of pressure to fix it now, there’s a year’s worth of features and enhancements, and only 6 months to do them.  The pressure varies between abuse and outright attacks on your ability and work ethic.

Welcome to the Assignment From Hell.  You’ve probably inherited it from the last person who worked on it, who was fired or left in a hurry.  Now you know why.

The program crashes constantly, to call the code spaghetti code is an offense to spaghetti, and things which could have easily been calculated or put in a database are hardcoded all over the place.  The procedure to add a common feature involves modifying 10 different source files, in 20 different data structures, that hold overlapping but contradictory information.

Your every instinct screams that you need to tear this garbage out and rewrite it, which you could do in half the time it will take to make it work, but there’s never the time or resources to do that, and you can never get buyin from the stakeholders anyway.

Meantime your self-esteem is sub-basement – simple tasks that should take a day are taking you a week, and this is being pointed out to you starting on day 2.  And what about the backlog that’s now 5 days late?

So what to do?

Step 1.  Breathe.  The unreasonable expectations of the powers that be are how this situation got to this point in the first place.  The previous developer or developers who caused this mess probably weren’t given nearly enough time, and just took on technical debt to get paid at all.  Even that obnoxious manager is likely getting pressure from their bosses to get this thing working, and doesn’t want to admit that their inability to push back in the first place is what got them into this mess.

You’re probably someone they see as competent, so they’re really hoping you can fix it.  The hardest challenge for me through my career is not taking the criticism personally.

Step 2.  Communicate.  Don’t point out that the code base is crap – they know it, and are in a level of denial that would give Freud a headache.  Come up with a plan based on reality, and point it out respectfully – respecting past and present management and engineering.  Be prepared to defend the plan and to push back – hard.  Try to find allies from the Old Times who agree with you and have pull with management.

Step 3.  Breathe.  Again.  Like I said, I take this stuff too personally.

Step 4.  Be honest.  Don’t agree to fix anything in a shorter timeframe than your gut tells you, doubled.  It just won’t happen, and you’ll only look worse when you can’t deliver.  Speaking reasonably and respectfully will get you respected in return.  If you need additional resources, say so, and back it up with hard facts.

Step 5.  Be honest with yourself.  You won’t be able to singlehandedly fix an entire organization that has grown up around this method of doing business.  Do your best to get through it alive.

Happy 2012!

Let’s hope the last year of the Mayan calendar is better than the previous!  And that someone is busy carving another 35,000 year calendar.

It’s been a while

Life has become interesting since we last spoke!

I wrote another iPhone app, you’ll be able to buy it on the App store next week, search for Top of the Rock.  I won’t get anything from sales, but it’s a pretty cool app!

Meantime, my responsibilities at AOL increased, I’m now working on the Huffington Post iPhone and iPad apps, the recent updates to those apps are in part due to my work.  I’m going to start looking at the server side, to see what I can improve there.

Which brings me to my next topic set, the Server Side.  This consists of quite a few subtopics which I hope to touch on:

  1. Server Architecture.  There are a lot of wheels out there already invented.  Except for some weird specialty ones, wheels are round, and have an axle.  Most servers have a database full of some sort of content, some business logic which is mostly concerned with categorizing and authenticating to content, and a bit more business logic for adding content to the database.  Content may include professionally authored and/or curated data, as well as user-provided comments and questions.
  2. Business Logic.  This consists of the nontrivial business logic once you get past authentication and generation.  A server may have to process a lot of data, and it’s important to ensure that this data moves through as quickly as possible.

It is the business logic that is most fascinating.  Sure, you need to make decisions as to which wheel you want to use, and how you want to spread out your server farm, database replication, etc.  But these are already solved problems.  One from column A, …

I’m working on a system that analyzes and optimizes stock portfolios.  Imagine you have 5 years’ worth of data for the entire S&P 500.  You first generate metrics for the 500 securities that comprise it.  That means iterating over about 1260 sets of statistics per security.  Then you need to compare each of the 500 against its 499 counterparts to come up with comparison metrics.  Taking advantage of the fact that m(a,b) = m(b,a), and that fact that I don’t need to recalculate m(a,a)  I have to perform 124,750 comparisons to get the final set of metrics.

My answer to this was to use OpenCL.  This is a system that allows you to utilize the 64-1600 cores provided by the GPU to perform calculations in parallel (or at least better than linear) time.  Xcode provides an OpenCL framework that makes integration a bit easier.

Future posts will document my explorations and discoveries of this platform.

On Teaching

This weekend I attempted to teach iOS programming to a class of 11 people. Although I had specified that people should have a mac with Snow Leopard and XCode installed, and familiarity with the C programming language, 8 of the 11 present did not. The writeup for the course said it was great for beginners, no prior knowledge assumed, and I didn’t read over the writeup before it went up.
A true iPhone dev camp costs $1000+, is an entire intense weekend (16 hours+), and is fully staffed. I was charging $75 for 4 hours – I couldn’t possibly offer the same level of education.
I’m going to make it up to the 6 people who did not demand their money back, with an intro to C, followed by a re-presentation of the iphone material, as Beta version two.