SoLng and Thx4 All the -Phish

6 min readDec 13, 2022

The End of the First Phase of My Data Science Journey

The completion of my Capstone marks the last data science course in my Master’s in Health Data Science program. I have one more semester before I graduate, but the courses I have left are health research and misc. Effectively, as far as data science goes, the formal learning process I began in January of 2021 has come to a close. It’s a big moment — that’s why they call it a Capstone! This blog post may or may not be my final entry, but it marks the end of my intentional “learning out loud” formal move into the field of data science.

In The Beginning…

Nearly two years ago I published my Hello World for this blog. This was mainly a way for me to force myself to learn new things well enough to be able to explain them. To that end, this blog has been a tremendous success. As I was learning new things about data science and machine learning, I wrote about them. As I prepped to facilitate Cornell machine learning courses, school projects, and challenging work conversations, I had to dig much, much deeper, and quite often my perspective and understanding evolved along the way. I’m sure this will continue to be true as I move forward. In looking back, I’m glad most of what I wrote is at least directionally accurate. I’m sure there’s a lot I could go back and fix, but I’m happy enough with how the writing represented the journey to leave it mostly as-is, for now anyway… :)

Signal Vs. Noise

When I started out, like most people new to the field I think, I thought data science was mostly about applying statistics to real-world data and iterating over it to find patterns. In other words, I thought the secret to data science was simply having access to data and knowing how to build models. I still believe knowing how to build models is important, but I know fully realize that is a tiny, tiny, tiny little piece of what makes a data scientist valuable.

If I may opine for just a bit: There are far too many people claiming expertise around building models that don’t actually know *how* those models work, and are making really bad choices in this vacuum of understanding. A few of them have Ph.D’s and social media presences. They make what sound like completely reasonable statements that turn out to be practically wrong / incomplete. The misinformation or incomplete information from people who claim to be authoritative sources saturates the information space. It’s a dangerous realm for a newbie to step into.

One thing I really tried hard to do, even at the risk of being publicly wrong, was dig down deeply into the weeds of some very difficult topics and try to make them easier to understand. You can be the judge as to how effective I was, but one thing that really turned (turns?) me off about most data science bloggers and video makers is that they tend to repeat the same things — the high-level bullet points about whatever the latest buzzword is. I suspect very, very few of them really deeply understand — or have even tried to deeply understand — the thing that they are writing / speaking about. It’s like the marketing version of an AI expert — they can sound like the real thing, just don’t press too hard on the details. We all start somewhere, but it’s that laziness — the jumping on the next bandwagon, the not even trying to explain a concept topic and just hoping the audience doesn’t actually ask you about it — that really got to me. Thankfully, if you’re patient you can usually find at least one person who understands it well enough and, if you’ve done enough homework, has explained it well enough that you can take it to the next level of understanding. And when you find those folks, you bookmark them and read everything they put out. But it’s a lot harder than it should be to find the good stuff. There’s an overwhelming amount of noise (pun intended). And as this field gets more popular, it just seems to get worse.

Real-World Consequences

It’s disheartening, and is starting to cause people who are good at this to reconsider the effort and risk. When the world can’t tell the difference between a true expert and a charlatan, and indeed, believes the charlatan to be better because of amount of work produced or promises claimed to be delivered on, there are probably more rewarding things to do with your time. Good people are leaving the field because transparency and honesty are penalized in the marketplace far too often. And that’s tragic.

Real Science

That said, there is an immense amount of opportunity in this field. I consider some of my frustration to just be growing pains of a new, difficult field of study that suddenly got very popular. Looking ahead, as data volume explodes and AI models get better and better at ever more complex tasks, the sky is the limit in terms of how machine learning can impact human lives (and hopefully for the better.) Just like every other scientific field, there are risks, but there are also rewards. Figuring out how to leverage the power of these things we are building while making them safe and universally beneficial is going to take a lot of care.

And it’s that part of it that gets me most excited. This is not a solved problem. There are virtually limitless opportunities, both extremely simple and terribly complex, to find ways to use data science for good. Everyone wants to predict the stock market, at first (and the first ones to crack that nut will probably be the first and very last gazillionaire’s…) But the things that really matter — early detection of cancer from medical imaging, helping a retiree manage limited finances, helping a struggling student understand math — all of those are things that AI could potentially be applied to solve for. We have the technology. We just have to figure out how to use it.

My Next Journey?

This is far and away not the end of my learning curve in data science. I may well write a new blog post from time to time to support something I’m doing in an academic or professional sense, but a lot of the next phase of my learning will be application-focused. How do we move forward? How do we employ these technological advances into the areas where they can benefit humanity the most? (Ahem… healthcare?) What can we do to ensure that people coming into this field step on fewer landmines than I did getting started in terms of finding good, accurate, valuable information? How do we democratize AI so that it benefits everyone, not just the largest companies with the most resources?

I still love my job and have no intentions of changing careers in the near-to-medium term, but I do intend to look for ways to contribute both within and outside of my main job responsibilities. I don’t need to exclusively be an AI expert — my cloud computing knowledge still matters, as does the learnings from my M.B.A., my psychology degree, and my technical education and IT background. The relatively unique makeup of what I know means I should be able to find unique ways to leverage any number of technologies — including AI/ML — into any number of situations for a greater good / better experience / efficiency gain / and so on. In some respects, I just have to make sure I’m being a good steward of the time I have and the opportunities that will be available to me.

If only I had an AI model to optimize that…

Thank You

So goodbye for now (and thanks for all the phish — or, rather, the lack thereof.) Thank you for reading, clapping for, and commenting on my work. It was mostly for my own benefit, but I’m glad it did at least a few people some good along the way. Perhaps I’ll find a reason to pick up the pen here again, or maybe I will be doing something completely different this time next year. Either way, it’s been a rewarding journey, and I hope you’ve enjoyed being part of it. See you on the flip side!