Transcripts of Living Blindfully are made possible by Pneuma Solutions, a global leader in accessible cloud technologies. On the web at http://PneumaSolutions.com.

You can read the full transcript below, download the transcript in Microsoft Word format, or download the transcript as an accessible PDF file.

 

Contents

Welcome to 269.. 2

Three Extra Episodes Coming Up.. 2

The Founders of Envision, Karthik Mahadevan and Karthik Kannan, Discuss Envision Today and in the Near Future.. 3

iOS 17.4 is a Significant Release.. 24

JAWS Introduces AI Image Recognition.. 29

Comments on the Deane Blazie Interview.. 36

More Memories of David Holladay. 38

The Bonnie Bulletin.. 38

Closing and Contact Info.. 45

 

 

 

Welcome to 269

[music]

Voiceover: From Wellington, New Zealand, to the world, it’s Living Blindfully – living your best life with blindness or low vision. Here is your host, Jonathan Mosen.

This week: for the first time ever, the 2 founders of Envision are on the same podcast. We hear about how AI is transforming what they can offer via their glasses and smartphone app., iOS 17.4 is a significant release but it brings more VoiceOver bugs, and AI image recognition comes to JAWS.

Welcome to episode 269, and welcome to the people of Kalamazoo! There are some great place names around the world. We’ve got some here in New Zealand. There are a lot of them in the United States. But I reckon Kalamazoo is one of the coolest place names ever. It is right up there on the epicness scale.

I would play a little snippet of the Glenn Miller tune “I Got a Gal in Kalamazoo”, but I don’t think it’s out of copyright, so I’d probably get pinged for it.

So welcome to you, if you’re in Kalamazoo in Michigan, or anywhere surrounding area code 269.

And in terms of country codes, country code 269 belongs to the Comoros, the archipelago of Comoros. There are 3 islands, apparently.

And I must confess, geography was not my favorite subject at school, and I haven’t heard of the Comoros before.

But 868,000 people definitely have, because 868,000 is the population of the Comoros. So welcome to you, if you happen to be listening from the Comoros. It’s a pleasure to have you tuned into Living Blindfully.

Three Extra Episodes Coming Up

Now, a show note, and as some people like to say, a heads up.

We will be doing some extra episodes outside the normal weekday time frame shortly. 3 extra episodes, to be precise. Because when the transcriptions are done, we are going to be publishing my 3 reviews on the Zoom recorders. I promised you these.

You may remember we interviewed Samuel Greene from Zoom about these recorders. I’ve had them in my hands, and I’ve done the reviews. They are actually already available on the Blind Podmaker feed.

But a lot more people, I mean, many many many times more people listen to Living Blindfully compared to that podcast, and I think this is a significant breakthrough.

But I thought it might be a little bit of a stretch to devote 3 weeks in a row to the Zoom recorders. So what we’re going to do is just keep doing the weekly schedule like we always do. But at some point, when the transcriptions are ready, one day after the other, we will publish these 3 extra episodes about the Zoom H6 Essential, the Zoom H4 Essential, and the little Zoom H1 Essential. And they will go out simultaneously for plus subscribers and those who listen for free. So I hope you enjoy those, when they turn up.

I think having the transcripts will make a real difference in terms of them being a handy reference as people learn these recorders, and I have no doubt that they are selling like hotcakes in the blind community. Because while it’s not perfect yet, to have this degree of accessibility is certainly a breakthrough. So thanks to Zoom for all the work that they’ve done on this.

Advertisement: We can make transcripts of Living Blindfully available, thanks to the generous sponsorship of Pneuma Solutions. Numa Solutions, among other things, are the RIM people.

If you haven’t used Remote Incident Manager yet, you really want to give it a try. It is a fully accessible, screen-reader-agnostic way to either get or provide remote assistance.

It’s not only screen-reader-agnostic. It’s multi-platform, too. You can use it on your Windows or your Mac machine, and people can assist each other regardless of the platform they’re on.

These days, not a day goes by that I’m not using RIM. And one of the ways I use it is to either receive or provide technical support from family members.

I’m kind of the tech support guy in our family, so I quite often get questions from family members that they want me to solve. It’s not realistic to expect them to install a specific screen reader, even the demo. So before RIM came along, I found myself having to try and talk them through what they needed to do.

Now, I can tell them to go to GetRIM.app. that’s G-E-T-R-I-M.app, install a simple application on their Windows PC, and just by exchanging a code word, I can have a look at what’s going on. I can either run Narrator on their system, or if you’re using NVDA, you don’t even have to do that.

It’s an amazing tool, so do check it out. RIM from Pneuma Solutions, available now for your PC or your Mac at at getRIM.app. that’s G-E-T-R-I-M.app.

The Founders of Envision, Karthik Mahadevan and Karthik Kannan, Discuss Envision Today and in the Near Future

It has been a while since we looked at the Envision Smart Glasses. And you may remember that we produced a comprehensive demo of them that lasted around about two hours. But that is ancient history, because a lot has changed with the Envision Smart Glasses and indeed, the app. Because before, when we did this demo, we hadn’t quite entered this amazing new era of AI. So it is high time that we caught up with what’s happening at Envision.

And to be founding Envision, you’ve got to have skill, you’ve got to have a bit of entrepreneurial flair, and you’ve got to be called Karthik. So it’s like the Monty Python Bruce’s sketch, you know, mind if we call you Karthik to save confusion.

So we’ve got Karthik Kannan and Karthik Mahadevan from Envision. Karthik Mahadevan is on my left, Karthik Kannan is on my right.

Great to have you both here on Living Blindfully. Thank you so much.

Karthik M: Thanks, Jonathan.

Karthik K: Thanks for having us.

Karthik M: Yeah, it’s always exciting to be back on your show. So yeah, super excited to be here again.

Karthik K: And one really interesting thing to notice is the first time that we’re actually recording a podcast together, you know. So KM and me haven’t really done like a together podcast before, as far as I remember. So this is a very nice Mosen exclusive. So yeah.

Jonathan: I’m really honored that you chose Living Blindfully to do that. This is fun.

It might be worth us talking a little bit, for those who don’t know it, about the origin story of Envision, since we have you both on the show. It started, I understand, as a Masters thesis project, right? You didn’t actually have in mind the idea that this would become the commercial entity that it has. Karthik Mahadevan?

Karthik M: No, not at all. The whole idea started when I was still a student. I was studying at the Delft University of Technology here in the Netherlands. It was towards the end of my masters, and I was looking to do a project as my thesis. I happened to be back in India for my winter holidays at that time. At that time, I was invited by a school for the blind in India to give a talk about what it means to be a designer, right? And what are the job opportunities students could have in the future.

So I was in this room talking to a bunch of tech blind and low vision students. And I was just explaining to them exactly what a designer does. And I told them all that, “Hey, a designer is just somebody who solves a problem. And if you can design a solution to a problem that you have, all of you here can be a designer.”

And I remember towards the end of my talk, I asked them a question. I said, “Hey, if all of you were to become a designer tomorrow, which problems are all of you going to solve?” And all the kids in the room that day were like, “I want to be more independent.”, right? “I want to be able to go to school on my own.” “I want to be able to hang out with my friends.” “I want to be able to just pick up and read a book all by myself.”

And that is something that really stuck with me, that whole incident and that whole experience, that I came back to the university and I spoke to a professor of mine and I said, “Hey, this is something I want to do as a thesis. This is something I want to explore as a thesis.”

And in the beginning, that’s all it was. It was a research endeavor. So I was just going around and speaking to as many people who are blind or have low vision, just to understand what is independence for them. What do they mean when they say the word independence?

And I understood from conversations that for a lot of them, independence almost always meant access to information. And because of the fact that we’ve built this world in a visual way, their inability to access that visual information is the one that is causing them dependencies in their life.

So as soon as I got that insight, I was like, “Hey, this is something I really want to understand how to solve.”, because it’s pretty difficult to change the world. It’s pretty difficult to change all the visual information around us into accessible information. You cannot go on putting a Braille sticker on everything that’s around you.

So I started to take a look at how can the technologies of today, like artificial intelligence, image recognition, how can all of these be used to make sense of a visual world as is, and have it be translated in a way that can become accessible to everybody? That’s when I called up the other tech architect who I knew from back in my undergraduation days. We actually happened to be roommates in the dorms there. So I called him up because he was a software engineer, and he was doing a lot of stuff around AI. And I said, “Hey, I want to do this as a thesis. Do you want to perhaps help me out with the AI stuff that you have been working on?”

And then, yeah, just the two of us started spending our evenings trying to explore what’s possible with AI. The very early versions of stuff we did was very fundamental and rudimentary. We did very naive object recognition to begin with.

But then, we were able to offer that very early prototypes to a group of users that I had built here. And then based on that feedback, we were able to very iteratively improve.

Then it just so happened that by the end of my thesis, we had built a prototype of the Envision application, which when we started to show it on to people, they got very excited about it and they said, “Hey, you know what? This is actually a useful application for us, something that we would want to use every day, and something that we are okay to pay for if there’s a need to pay for it.”

I think it was at that point we started to think that, “Hey, we have built something here that is actually adding value to people, and we need to ensure that this technology can actually be out in the hands of all the blind and low vision users in the world.”, because that’s the intention. We need to build a business around it that can be sustainable and scalable.

And then, that’s how the transition from a thesis or a project to a startup started to begin.

Jonathan: I mean, I remember when we spoke with Karthik Kannan back in The Blind Side days. The product has steadily evolved.

I wanted to ask about the current state of the hardware because this is something that comes up on social media a little bit. Google has kind of gone lukewarm on its glasses now. What’s the future looking like? Are you considering some new hardware for Envision to thrive in the future?

Karthik M: Yes. There are multiple hardware options that’s opening up. The whole wearable devices as a space has been evolving at such a rapid rate just in the past few months. It’s actually really hard to keep up with the announcements that are starting to come out.

We are betting on the fact that in the near future, there’ll be a lot of wearable camera devices that will be out there. It will come in all shapes and sizes. It will come in all different tech price points.

We are trying to work towards a future where we can build our software in a way that it can be applied to as many different hardware that’s out there as possible. We will offer definitely people a combined solution like we do as of now with the Envision glasses, where we have the Envision software on the Google Glass. But as more hardware devices started to emerge, our goal is to be able to offer as many different hardware solutions as possible.

So the Google Glass, at the time that we launched Envision on the glasses, it was the best smart glasses in the market. But since then, there has been a lot more development in the market, with folks like Meta coming in with Ray-Ban, folks like Humane coming up with the AI pin, and with Apple also entering into the whole realm of wearables, it’s not too far to extrapolate that they will probably have smart glasses at some point in the near future. So that will be our goal, to be on as many different smart glasses or wearable devices as possible.

Jonathan: Alright. So you envisage Envision more as a platform that’s kind of agnostic and capable of running on whatever hardware is available. What does that mean for existing users of the product that you have now? And how well will that be supported into the future? I guess there must come a point where the capabilities that you want to add may outstrip the capabilities of the current hardware.

Karthik M: Yeah, I think it will always be supported for as long as we can. We have always done that, even with our applications. We support the very early Android versions. And even on iOS, we go way back into very early iOS as well. So we do our best to ensure that our software is as backward compatible as possible. And we do have a roadmap of constantly having updates to our software that will be compatible with the hardware that we have been currently offering with Envision glasses.

Because the core of it is just having a camera, and then all the processing that we’re doing on it is basically what is happening on a software end. So the hardware doesn’t necessarily need to upgrade as much, right? Once you have an 8 megapixel camera, if you keep adding more megapixels to it, after a point, it becomes redundant. So it’s not that having a better camera will do that much of a significant difference to the accuracy or the quality of the output that you’re receiving.

So at least our plan is to support all of the hardware and all of the users that are out there for as far as we can. And as we evolve into becoming more software focused in the future, we will still continue to offer software updates to all the users who have Envision glasses.

Jonathan: Karthik Kannan, if I can come to you. How does the division of labor work in Envision? What do you do? What does Karthik M do? How do you kind of divide who does what?

Karthik K: Well, that’s a very interesting question because you know, it’s a question that we’ve been figuring out for quite some time. But I think, honestly speaking, I spend a lot of my time primarily focused on the product, the engineering, and the technology side of things, right? And if I had to put it in a nutshell, I think I spend a lot of my time just thinking about how can we take the advances that we currently have in AI and convert them into products, or convert them into features that benefit people who are blind or low vision, right?

And I think KM spends a lot of his time on the general strategy, with investors, and just generally making sure that you know, whatever we build can be put out there and can be sold, can make an impact on people, and also put a lot of time into the design side of things.

So initially, when we started building InVision, it was primarily me building, or me doing or writing all the code, and him taking over the design, right? And I think it’s by and large, remained somewhat similar, though the team has grown around us quite a bit. So yeah, it’s very complementary skills in the way we build, the way we think, and the way we look at the same product, right? And it’s nice to have that kind of balance between someone who’s thinking about technology and someone who’s thinking about design.

Jonathan: Business is tough, right? And there are difficult decisions that have to be made. Do you still get on well? You’re both still talking to each other and good friends?

Karthik K: Yeah, I would definitely say so. I think fundamentally, it boils down to the fact that we are incredibly aligned on what kind of company we want to build, and what kind of product we want to build, right? I think when two people have that kind of an alignment, it becomes very easy to confront issues that come along the way, right?

And I also believe that since we first started out as friends, and since Envision started out as a fun thing, it wasn’t really meant to be a company by any means. For him, it was a masters thesis. For me, it was a sabbatical from a really bad job. And I just thought, OK, you know what, let me just take the sabbatical building this thing with Karthik. ’’ and it just snowballed eventually into a company. So the ethos has always been building something that has an impact for fun.

And to be honest, Envision wasn’t the first thing that we built together. We had our own student startup together back in college. And even after college, we actually built and launched a bunch of things just for fun. And I think the primary element has always been the same. And we’re kind of aligned on what we want to do.

So despite differences, I think we understand each other that hey, this is ultimately work, and all of the arguments and the disagreements are all in service of building something that has an impact.

Jonathan: This is really interesting because I’m sitting here, and I had an epiphany. There are so many similarities between what you’re doing at Envision and how it got started, and what happened with Apple. Because you’ve got 2 people who essentially started a project that grew very big.

In the case of Apple, they were both called Steve. [laughs] And here we have Envision and you’re both called Karthik. So it’s quite interesting. Maybe that bodes well.

But in terms of the competition that’s available, there seems to be a lot of it all of a sudden. So you guys have established a product category of smart glasses for the blind and low vision, and there’s a lot of interest in it now. Just recently, we were talking to Selest, which is one of the newer competitors. But there are a couple of others as well. I guess you must feel part flattered, part threatened by that. The market’s heating up a bit. Karthik Mahadevan?

Karthik M: I think we’re just excited by the fact that the markets are heating up.

I remember in our early days, like 5 years ago, when we were just starting out. I think we were filled with a lot more uncertainty. And we used to get very anxious about the competition very easily, right? Like if somebody has a new update out, when somebody has a new launch of a feature, that used to immediately induce a lot of anxiety amongst the two of us.

But I think over the 5 years, we have seen so many competition sort of come and go.

I think we now feel very secure about what we are doing. So as long as we are focused on our vision and we’re executing on our vision, we’re not really as, I would say, threatened by competition because I think that’s always a good sign that the opportunity is out there. And I think that’s something I always say, that it’s not a zero sum game.

It’s not like a winner takes it all kind of a market. We’re all trying to do the same thing. Be it folks like Selest or any of these other companies, I think all of us ,are in a way, on the same side of the battle. All of us are attempting to make the world a bit more accessible. And all of us are doing it in our own, a different way. So we don’t really get as anxious or as distracted about the competition now. I think we are way too focused on our own execution of the vision that we have.

Karthik K: Totally. And another perspective to add to it is when we first started Envision back in 2016 and 2017, Envision was probably the only one app that existed in this space, right?

We were doing AI for accessibility. And then, Seeing AI was doing it. But seeing AI was also doing it in a limited way.

And then, we launched on Android. And then for the longest time before Lookout came out, Envision was the only such kind of an app out there, which is what gave it a lot of attention and boost.

And I think it’s similarly with the glasses as well. There was just one company doing this before.

And then, we came and we completely, I believe, redefined the space where all of a sudden, we put AI on these glasses, and you could make video calls, and a whole bunch of things with it.

And now, when the competitions finally caught up to AI, putting AI on glasses, and giving people a menu, and being able to scan documents, and doing OCR, I think we have moved on to building something that’s even bigger than what we’ve been able to do with the glasses so far.

And at this point right now, I think we’re completely redefining Envision as a product. We’re in the process of doing that. We’re going to be showing people a sneak peek of that at CSUN as well. So if you’re at CSUN, do drop by our booth.

And I think by the time the competitions realized, “Hey, this is what these guys are building, and this is what they’ve moved on to.”, I think we would have defined a completely new type of product in this space.

And I say this with the reasonable amount of humility – that what’s going to come out of Envision this year will truly define how accessibility tools are built going forward, especially with how much the world has changed over the last 18 months.

Jonathan: Let’s talk about that, and what’s happened over the last 18 months, because the change in the space has been remarkable. And I know, as somebody who has access to a couple of products that leverage the image technology of ChatGPT, it’s quite incredible the amount of information I now have access to, and the way that it’s presented.

So you are on that bandwagon as well. And you’ve got OpenAI’s product integrated into the glasses and more recently, into the app.

I’ll leave it to either of you to determine which one of you wants to answer this. But could we just cover the present state of AI in the Envision product, what it can do, and the difference that you perceive it to have made?

Karthik M: So the way we think of AI in the app and within Envision is 2 things, right? So there is traditional computer vision, which is object detection, OCR, which is sort of the foundation. And then, you’ve got the new generative AI tools like GPT, the OpenAI’s GPTVision, and so on. So these are two different types of AIs that we use within the Envision platform today, both on the glasses and the app.

And a lot of people believe that using just a ChatGPT API or GPTVision is probably going to solve all problems. Whereas what we believe is we take the best of what is available in traditional computer vision, and complement that with generative AI wherever possible.

So when you, for example, take the scan text feature on the Envision glasses, right? It uses an image segmentation model. To get a little technical, it uses like a document scanning AI to help you go ahead and scan a document first. Now, that is just traditional computer vision that runs entirely on device, can’t run real-time.

And once a user scans a document, we use OCR, which is also using just traditional computer vision tools and just a lot of data going into building like the best possible OCR.

And once both of these things are done, we then pass it on to a large language model. In this case, OpenAI’s GPT. And if people want to ask questions based on the text they scanned, they could use that.

So we have a very hybrid approach to using AI, and this is reflective of all different types of all the AI features on the Envision glasses and the envision app.

And what I’m most excited about and where the future is probably heading is we’re gonna see a lot of these large language models eventually be small enough to run on the edge devices like a phone, or a laptop, or the glasses even. And the next generation of smart glasses are definitely gonna be capable of running these powerful AIs. And when that happens, I think we’ll be able to build something that can work really fast, completely offline. And that’s also sort of where Envision is heading off as a company as well.

Jonathan: There’s a lot of anticipation about what Apple is gonna be doing in iOS 18, and the idea that there will be a large language model actually on device, which is very consistent with the way that Apple operates. They’re very privacy-focused. We may all need new phones to really leverage that because that takes a lot of processor power. But it is interesting.

And I think the holy grail in this space will be when we get to the point that you can be wearing a pair of smart glasses, walking around and getting real time description about things that are happening around you, rather than taking a snapshot in time.

Karthik M: Yeah, totally. I’d say we’re about 18 months away from that kind of technology. I think it’s gonna be either in conjunction with the Envision glasses and the phone working together, or we could have advances enough in this space that we would be able to run these kind of models directly on device.

And I think the open source space is definitely going to come in and make a huge difference in there. And Envision is heavily using open source in our own products, both because we believe that open source is on par with closed source AI, and also helps us deliver the kind of value that we do at scale for a lot of people for free on the InVision app, for example.

Jonathan: How does what you’re offering compare with Be My AI, which I think has reached a lot of people? And obviously, with the app. Your app is free. Their app is free, so I guess it’s not so much a competitive thing. But then, you’ve got the glasses, which is really your value proposition. But can you talk a little bit about how those 2 offerings contrast with one another?

Karthik M: As of now, the things that we do is, we have this feature called Ask Envision. And this Ask Envision is a feature that’s basically embedded inside of different features, right? So it’s not like a singular feature, it’s sort of embedded in different places within the app.

For example, one thing that is really popular is Ask Envision within Scantext, where people are first able to take a picture of something, we do a very accurate OCR of it, and after that, they can ask a question of that document.

And this is a big differentiator, because the one thing that OpenAI’s GPT model really struggles with is with hallucinations, especially when it comes to text, right? So if you do ask it to scan a document and you ask it a question of what’s on a document, very often, it hallucinates the contents of the document. It does pretty good with just a general description of images and objects. But when it comes to text, the hallucinations are still very very high. They’ve decreased it a lot, but it’s still frequent enough that it’s not entirely reliable.

So the Envision’s approach to this is we first scan a text, we do the OCR which is very very accurate, and then we ask the GPT to only look for answer within the piece of Scantext, so that there is a very high level of reliability in the answers that you get from Ask Envision.

Then there is also Ask Envision within a described scene. And over there, what we do is as soon as you take an image, the very first thing that we offer you is a short, sort of alt text sized description of exactly what’s in front of you. So we don’t start off with a heavy big description of everything. We sort of start off with a smallest description. And then, you can straightaway jump into asking a question about it immediately. So that’s the approach that we are taking, which is currently a bit different from anybody else who is also including a GPT in their applications.

But I think like KK alluded to, that the next step that we are attempting with Envision, I think, that’s where the major differentiation is going to come into. And that’s what we’re super excited about announcing at CSUN.

Karthik K: Just to iterate on the point that KM made here, one of the things that I think a lot of people who currently use Be My Eyes, for example, is that they would like to import a document, and then read it. And say, for example, it reads only the first page of the text for you, or it’s not something that you can read text and save it inside locally within the app, and so on.

Whereas with the Ask Envision feature on the Envision app, especially when it comes to scanning text, you can import a PDF, an EPUB, Word file, HTML file, a text file. You could pretty much take any, I would say file format that’s out there. No matter what the size is, no matter what the language of the text is, you’d be able to import that directly into the Envision app. You can go ahead and scan it, get that OCR or the text of the entire document. And you can also ask questions very similar to what you’re able to do with some of the other apps right there.

Even with something like Seeing AI, for example, what tends to happen is people are able to scan maybe one or two documents with their scanning feature, and be able to ask questions only of the document where they scanned.

Whereas what we see a lot of people tend to do with the Envision app is, for example, they have a textbook they want to read. They can just dump a 500 page PDF textbook, and then just ask questions. Like, say, for example, summarize the first chapter for me. Or they’re reading a contract or another type of a document, they can just import that directly into the Envision app and be able to read it.

So just being able to read a document with N number of pages, and being able to ask questions, and get it summarized is something that’s a very big differentiator between the apps that use GPT 4 in this, like Be My Eyes or Seeing AI versus what we’re doing.

Jonathan: I can attest to the effectiveness of this, actually, because I’ve loaded documents in. And you know that the subject matter of the document covers something specific that you’re looking for. And in the old days, you would go and do a find. You’d push whatever, command F or control F, and find the thing that you’re looking for.

But sometimes, you’re not sure about the terminology that’s been used to define the concept that you know is in the document somewhere. And this is where the Ask Envision feature takes care of that because you can ask a very general question, and you don’t have to know the language, the terminology in the document. So it can be very effective in terms of getting access to the information that you want quickly.

Karthik K: Yeah, that’s the aim.

This is the first step. We’re having an update rolling out probably the end of March or April, where you’ll also be able to ask questions of tables, figures, charts, any image that is there within a document. You could go ahead and ask questions of that as well.

So you could be reading, I don’t know, like a scientific paper with a table in it, and you can ask very specific questions about the table that you’re reading. And that is also going to be available for people who are using the glasses and the app as well. Again, a big differentiator from what you might see with other apps that are trying to similarly use GPT as well.

Jonathan: That’s really interesting, because I work with a lot of financial data in my day job, and I can read tables with my screen reader just fine. There’s no problem there, if the table is well-designed. But I think I’m more of a narrative kind of person. And so if that information were presented in a narrative form, for me, it’s more likely to resonate and stay fresh in my mind. So that’s a really interesting thing.

I guess you guys are constantly evaluating large language models.

Recently, I’ve taken to using a computer to pass the same picture through OpenAI and also through Gemini, which is Google’s offering. And the differences in descriptions are quite interesting. I guess the way that I would summarize it right now (and this is being recorded in the middle of February because things can change so quickly [laughs]). But the way that I would summarize it right now is that the OpenAI descriptions seem to be a bit more flowery, a bit more verbose. Whereas the descriptions on Gemini are a bit more factual, a little cleaner. And maybe as a result, they don’t give you the detail.

So do you constantly evaluate the state of these various LLMs, and which one will suit your purposes best?

Karthik K: Yeah, definitely. I think we spend a lot of time.

So internally, we have an AI channel. And I think that blows up 5, 6 times every day. We’re constantly looking at new models. And at some point, it got a little too much because we kept getting really excited about every single advance that was happening in both proprietary AI and the open source world.

And so now, what we do is we just collect all those links and models. And then, spend an hour or two every week on Friday just going through them and testing them for its effectiveness. And we’ve also built a lot of internal data sets that will help us check or benchmark these new models and these new changes that are happening to these new models against actual real-world data because these models tend to really have inflated evaluation benchmarks, and so on and so forth.

So yes, we’re constantly in touch with a lot of people in all these major companies. We’ve been in touch with folks who are building LLAMA, Gemini, even OpenAI as well. They are very actively contributing and using open source AI inside Envision’s own apps and Glasses. So we are very much in sync with everything.

And I do agree with you. I do see a lot of big difference in the way, for example, Gemini gives a description. And even Gemini Pro and Ultra have very different descriptions for the same image than you might see with OpenAI. And there is also a big difference between what I get from the ChatGPT interface, and then the GPT 4 description as well, the API description that I get. So yeah, that’s the reason why we built, over a period of time, an internal data set. And we just run these models against those data sets and just benchmark them.

And I think KM and the design team also have a way of working with these models as well.

Jonathan: As I understand it, you’ve gone all in on text descriptions. So you can take a picture of a document. You can do some really impressive things with that. But I don’t think you’ve integrated AI, at least in the app, with scene descriptions. Is that correct? And if so, is there a deliberate decision that you’ve taken there?

Karthik M: It’s coming. So we wanted to start with using Ask Envision for scanning documents, for documents first, because that was a big ask from a lot of people. And with adding the same feature over to Describe Scene, we’re definitely working on it. We currently have a private beta as we speak. And it’s going to be out as well.

So what essentially people could do now is when they hit the Describe Scene button, they can either ask for a concise description or a detailed description. And once they get a description, they can just go ahead and ask back and forth questions. And yes, it is faster than the ChatGPT app.

Jonathan: Alright, that’ll be interesting to see.

Hallucinations. I mean, I love this stuff and the difference that it’s made. I’ve gone through holiday snaps that I’ve taken years ago and got descriptions that are just incredible. And because I was there, I can pretty much validate that what it’s saying is true.

I have also taken pictures of other things, like a remote control. And ChatGPT has told me that I’m using a completely different remote control from the one I actually am.

This really troubles me as a blind person about when I can trust this stuff and when I can’t. And as a totally blind person, I really have no idea. So it’s important to be somewhat skeptical about the information I’m getting back. And as producers, that must be a real concern to you both.

Karthik M: I think that is definitely sort of like one of the cons that is a part of all the pros that is coming in with the technology, is the aspect of hallucinations. And it is something that we definitely take very seriously.

I think the very first thing to do is just to spread awareness of it, so that we let people know that hey, you know, at the end of the day, it is still an AI doing the recognition for you. And even though it’s becoming more and more accurate, still sort of verify the stuff that it says because there is still the issue of hallucination.

I remember that that is something that’s being constantly worked on. So it is decreasing a lot.

I remember just 6 months ago, when we were doing a lot of tests with OpenAI’s GPT, the hallucinations were incredibly annoying and frequent. But now, the hallucinations have gone down to a significant rate as far as at least the objects are concerned, right? I think earlier days, it will sort of hallucinate objects that are not there. So if you take a picture of a living room, for some reason, it will always add a clock on the wall of a living room because it always hallucinates there being a clock in the living room. So I think at least as far as the objects part is concerned, the hallucinations are significantly decreased, but there’s still a lot of hallucinations that is there when it comes to text.

And that’s the approach that Envision is taking is sort of using it in addition to scan text so that the OCR can always verify whether the information in the text is actually right or not.

But then, in addition to that, the hallucination is a technical problem. So there is a lot of progress that’s happening from a technical standpoint where people are developing stuff like RAG to sort of ground the answers that you get from these LLMs. Maybe KK can talk a bit more from a technical standpoint how he’s tackling the hallucinations.

Karthik K: One of the things we do, Karthik mentioned earlier as well, we try to ground the results. I think the big silver bullet that the industry is kind of chasing right now is how can we ground the results that we get.

For example, when we use Describe Scene. Right now, we don’t use OpenAI’s Describe Scene, for example, the GPT vision feature on the app. We use our own fine-tuned open source AI that we’ve been using on the app.

And the reason for that is we wanted to try a slightly different type of an approach, where when this particular AI gives us a caption, we try to force the AI to exactly pinpoint in the image as to where that particular object they’re talking about is. So if it says that looks like there is a laptop on a table, this particular model, literally, we ask it saying, “Okay. Draw a box around the table, draw a box around the laptop.”, right?

And in that way, this particular variant of model where you ask it a question, and then you force it to draw boxes around the objects in the image actually helps reduce hallucinations a lot.

It is quite rumored that the next version of large visual language models that OpenAI and Gemini and Lama 3 that is coming out by Meta, all of them are going to use a similar technique in order to ground the results they get more in reality. Of course, this doesn’t solve all the problems. This is very cutting-edge AI that can still make mistakes, that can still hallucinate objects and draw boxes around it that is not there. But compared to what you might see with a traditional vision language model, which tends to make up stuff a lot, this type of an approach where we ground the results in the image don’t.

And another approach, for example, Karthik mentioned this thing called RAG (Retrieval Augmented Generation), where the main idea is just like how you ask a human to cite their sources, you also ask the AI to cite the sources.

So we don’t show this to the users, but what happens behind the scenes is if you ask a question, a very specific question about something in a big document, the AI also points to specific paragraphs and pages that it’s actually referring to when it’s giving you a reply. That is something that we eventually bring this to users as well, and they can go ahead and check how accurate the response is compared to what the large language model is saying.

This is still not perfect. This is still a big area of research. But grounding is one aspect and I think generally, as these models tend to get fed more data, tend to get more human feedback, they do tend to become less hallucinatory over time.

And that’s basically how OpenAI has solved the problem. They just had a lot of people giving a thumbs up or a thumbs down, and they’re using ChatGPT. And that eventually got fed into the training process of subsequent versions of these models.

And that’s also another approach the industry is taking, but that’s an approach that requires a lot of users to kind of participate in at once, and that’s something that only companies like OpenAI or Google will be able to do. Whereas this is something the open source, the whole grounding idea is what the open source world is tackling as well.

Jonathan: Yeah. I noticed technologies like Perplexity and even Microsoft CoPilot are citing sources, and there are hyperlinks. Actually, I can see some value in there being some sort of link to those sources when you’re reading a document. So if I’m trying to find a concept in the Envision app, and the AI has told me what I want to know, I may well want to click through if there’s a particular paragraph or to win the document that are relevant.

Karthik M: Yeah. That is an update that we’re going to be pushing out shortly, where you’ll be able to click on specific paragraphs or information, and then jump right into that page. We just have to figure out the UX around it, which is more important than the engineering side of things, to be honest, in this case. So we’ll do that. And when we do that, I think people will be able to know exactly what the AI is talking about.

Jonathan: Do people need to be cautious about what they send Envision and put through the Ask Envision service? For example, if you are in an environment where there’s a highly commercially sensitive document, is that something you should not put into Envision and use the AI to summarize for you?

Karthik K: So when people install the Envision app for the first time, we clearly ask them consent to actually use the data to improve our services. And people have the option to always opt out of them if needed, right? So if they are using the Envision app in a sensitive environment and they don’t want Envision, even for a very brief period of time, to store the data and then use it to improve our services, we always go ahead and give people the option to opt out.

Moreover, having said that, we don’t store any of the chats that people are having right now on our servers. We store them all on device. So going forward, we’ll continue to store the chats on device because it’s not that big to store those chats, and people can always go ahead and clear their data from Envision if required, yeah.

Jonathan: But a document must have to go to the cloud to be analyzed, right?

Karthik K: Correct, correct. And then there, what we do is if people have given us consent, we briefly store the document and extract the data from the document and use that to improve our models. If people have opted out, then we just don’t store the data at all. We don’t actually look at the data in itself.

Jonathan: So it’s not that any commercially sensitive document might be helping to make the AI more intelligent?

Karthik K: No. And we don’t need that as well because over time, Envision has moved more towards using synthetic data for training than actual user data itself, right? User data is considered a seed for generating synthetic data because Envision is a startup. We don’t have the kind of resources and the kind of data that Google or Microsoft provides, sorry, has. And when we don’t have access to that level of data and we still need to train models from the very beginning, I think for the last 3, 4 years, we’ve been relying more on synthetic data . And with how good image generation has gotten right now, it’s possible for us to train a model largely based on synthetic data itself.

In fact, that’s the approach that a lot of companies have taken, including OpenAI. So they’ve been training their models on synthetic data. And I think over the next 2, 3 years, I don’t see any need for any company to store a lot of user data because they could take a small portion of user data, and then use that to generate vast amounts of synthetic data, which can be used to train the model on.

Jonathan: One of the use cases that you guys worked out very early was that it’s handy as a blind person to be able to find particular people that you might be looking for. So you can either go and say hello or alternatively, avoid them. [laughs] Both relevant use cases, really.

But there’s been a bit of controversy in the AI community about this. And we got a lot of feedback. I think it was mid last year, when people got their hands on Be My AI. And for a brief period there, you were getting messages back when you took a picture involving a person that said this person’s face has been obscured for privacy reasons. And I think what the community struggles with is there’s a difference between just describing someone’s facial features in general, and actually identifying them by name because that obviously requires some sort of facial recognition and facial matching, and would be consensual, and would have to require opt-in.

Is this something that you think about in terms of how AI can ethically be used in the facial recognition features that you already have?

Karthik M: Yeah. I think it’s always about getting the consent, right? And that’s why the way that Envision has the facial recognition, you always have to take the pictures of the people with the app on your phone, if you want the Envision’s AI to recognize them. That’s why we don’t allow people to just upload images or do stuff like that, and have the faces you’re trained. So that is a safety measure to make sure that you are actually doing it with the consent of the people whose faces are being trained. And even in that case, the stuff that is stored in order to do the training is never really the images or the faces of people. It’s more abstract data, that’s more a metadata of the facial features that’s being stored. And that’s mainly what we do the facial recognition aspect on.

I think the whole issue that happened with OpenAI where they just stopped recognizing the faces, I think it had more to do with them just collecting images of every face. And because the way OpenAI operates is a bit of a black box, because it’s really unclear as to what OpenAI is doing with these images once it’s been collected. I think that’s what sort of causes a bit more of a concern because people at that point didn’t want their faces to end up in a sort of retaining data online. So I think it is definitely something to think through a bit.

That’s why the approach that Envision always have is just always ask for consent, always you have opt in as a measure instead of asking people to opt out. So I think those are a few things that we do. But yeah, these are definitely stuff that we will encounter a lot more of, I believe, as this AI becomes a lot more prevalent in our everyday life.

Jonathan: So we’ve talked a lot about the philosophy of AI. And along the way, we’ve covered some of the features that have been added to Envision, or that are coming. I’m really keen to just dig deeper into this and find out about if there are any other features that you think are of note that have been added recently, or things that are coming that you’re gonna be profiling at CSUN that you can tell us about.

Karthik M: Yeah. I think we would definitely want to hint a bit at this new experiment that we’re doing and that’s really what we have been excited about, actually, for a while, is we’ve been sort of like rethinking the whole Envision application for a while, just zooming out and understanding exactly what it is that Envision is trying to do. And we had a few observations, right?

Like we observed that Envision has evolved into this application which basically has a series of different features and functionality. There’s instant text, scan text, there’s a find objects, find people. So it sort of has like a linear menu of different features, and people just have to pick from the menu, which function they want to open. And that’s basically the interaction that is there, not just the Envision’s app, but any app out there that is doing this kind of visual recognition stuff for accessibility. So that has been like the default interface for a while across all of these apps, like Seeing AI, or Lookout, or anyone else who’s doing this.

So what we have been thinking through is sort of re-imagining exactly what the goal is, and the goal as we see, it has always been offering people access to information in the easiest way possible, in the fastest time that is possible. That’s actually the thing that we’re optimizing for. And maybe the interface that we have as of now is not the most ideal interface for that, where you need to open your phone, unlock it, open the app, then look for the feature that you want, and then actually have that do the task for you.

What we are trying to evolve into is a bit more to go from just being an application, to being an assistant that you can just interact with, and just straight away have the answers to stuff that you’re looking for spoken out to you.

It basically is built around 3 core principles that we’re trying to adapt as we are going to start experimenting, building this assistant.

The first core principle is that we want this assistant to be a conversational one, where we do want to abstract away all the AI, all the processing, all the technology that we spoke about into the background. We want to just offer people a clean, simple conversational interface where you simply ask a question, and then we figure out which AI model that we need to invoke to offer you the answer to that question. And we do that, and we offer you the answer to that.

For example, if you walk into a cafe, you can simply ask what they say on the blackboard. And then, what Envision will do is it will take an image, it will do a segmentation, it will do a cropping of the blackboard, and then do an OCR on it, and just pick out the output. But it can also go a step ahead where you can just simply start with asking how much is the cappuccino, and then Envision will again do a similar process. But this time, just look for the answer to a question that you’re asking.

So just having a conversation interface and abstracting away everything into the background is going to offer people the easiest way to get answers to what they’re looking for, but also the fastest of it to get the answers to what you’re looking for.

Jonathan: Right, because the past segmentation, I guess, within the app and within the Glasses UI is all about the technology rather than the user, isn’t it? So now, you’ve got to the point where you can be more user-centric and say look, the user doesn’t actually care about these arbitrary definitions. Users just want these things to magically happen.

Karthik M: Exactly. Because for a user, it’s about doing a job, right? You have a problem. You want to know what does it say on the menu, or you want to know how much is the ice cream in the menu, right? That’s all you need. You don’t have to think, “Hey, should I open instant text? Should I open scan text?”, and all of those things. You just need to be able to think of the question, and just have the answer offered to you. And that’s what hopefully, eventually we’ll have.

But as of now, you can just speak out the question and we will do the heavy lifting of understanding which AI to invoke, and do all of that, and offer you the output. So the only thing that you have is a conversational interface. So that’s the first principle of the system that we’re trying to build.

The second principle of the system that we’re trying to build is we want this assistant to be a personal assistant, right? And we want it to be personal in 2 ways, essentially. We want it to be personal in the sense that it’s actually able to offer you your personalized insights, in addition to just information that’s out there. So you can offer this assistant a few of the basic information or technology about yourself. What’s your name? What do you do? What are the stuff that you like? What are the things that you don’t like? What kind of food you like? What kind of food you don’t like? And it can, over time, build a knowledge base of who you are as a person. And then, when it offers you answers, it will incorporate those stuff about you in the answers that it offers you.

Jonathan: Okay. This has really piqued my interest. So I’ve got to interrupt you there, because this is very interesting.

I eat strictly keto, very low carb. And sometimes, going into a restaurant or a cafe, it can be quite difficult to explain to someone who doesn’t understand what keto means that I only want to be told about the keto options. So this is very exciting to me, because the AI definitely knows what keto is. I mean, all the large language models understand what foods are keto and what foods are not.

So if that preference can be recorded somewhere, so that when we, let’s go back to that cafe analogy and I walk into the cafe, I don’t want to be told about the ice cream, and the cake, and all the things that are going to take me out of ketosis. I just want to be told what items are on that menu that suit my eating style. So this sounds very exciting.

Karthik M: Yeah, exactly. And then it’s offering you actually a personalized information, right? So instead of just doing the OCR and picking out everything that’s on the menu, you can actually speak out stuff that you actually have a preference for. So that’s the personalized aspect that we are building. And eventually, we want it to grow. So eventually, the more information that you offer, the more personalized it can actually make the output.

But the second aspect of making a personal assistant is that we also want to offer the assistant a personality, right? Where you’re not just talking to a TTS or like a machine. You can actually build a personality for the assistant. You can offer it a name, you can offer it a tone of your voice. You can offer it like a level of verbosity that it should have in the answers that it gives. You can pick a voice that is suitable to this assistant. And you can also add stuff like if it should be sassy, if it should have a sense of humor. Anything and everything that you like, you can define that as a personality for this assistant.

For example, the assistant that I have right now in my beta actually has a personality of Alfred from the Batman comics. So it always addresses to me as if I’m Batman, and it has that personality of an English butler as it speaks to me.

But Alijan, who is one of our engineers, he has a personality of his assistant to be sage, which is like a wise old man, almost like a Gandalf, like a figure who is offering him insights as he asks the questions to it.

So that’s the other element that we’re adding to it, where we do want to make these interactions a lot more enjoyable. If you want to make these interactions to be a bit more personal, so it feels like you’re actually interacting and building a relationship with this assistant over time, instead of just having to talk to a machine and just have the answers being offered to you.

Jonathan: And it has to be responsive, too. I mean, if you’re sitting there waiting for something to be looked up on the internet before you get an answer, when you’re in an environment like that, those delays are going to be quite grating, I think.

Karthik M: Yeah, a hundred percent. And that’s actually the part that we’re optimizing for, as we speak. So we have an alpha of the system that is working right now, and I’ll probably do a demo of that in a bit.

But the thing that we’re working on is sort of increasing the speed. And we had actually an incredible breakthrough on that front end today, where our engineers have been able to crack a way to increase the speed of responses so significantly.

I think, KK, correct me if I’m wrong, but I think we got a response back now in under 200 milliseconds. That’s what we’re able to do right now.

Jonathan: Wow! Okay.

Karthik K: Yeah, that’s 200 milliseconds. That’s from asking a question, to us picking the right tool, to answering that question, to get you the answers, and then actually giving you a response so that by the time you’re done with your question fully, there is an answer waiting for you and you can just have a back and forth conversation like how you and I are having. So yes, 200 milliseconds is the holy grail at Envision right now. And we’ve been able to hit that number, and we will be able to consistently hit that number no matter whether you ask it a question via text, or you give an image as an input. And very soon, you’ll also be able to record short videos with the Envision glasses, the Envision app, or the Envision Assistant, and have it also interpret videos for you. So that’s also something that we’ll have with the Assistant when it’s coming out.

Karthik M: The principle of this Assistant is going to be that it’s going to be a ubiquitous Assistant, which means that it will be available to you across all of your devices. So this assistant will live on your phone, on your laptop, on the web, everywhere, and even on all of the smart glasses that are going to be out there. We want this assistant to be available to you.

So that’s a thing that we’re also going to be putting our efforts on from day 1, to ensure that this is a device-agnostic ubiquitous assistant, so you don’t ever have to be without an assistant. And it’s not going to be something that’s going to be confined by your hardware. That’s something that we are putting efforts on from day 1.

Jonathan: Right, so does this mean that there’s going to be a Windows Envision app?

Karthik M: 100%.

Jonathan: When do we get that?

Karthik K: Windows, Mac. Hopefully, in the month of April, at least a V1 of it.

So we’re initially going to start this off as a browser extension, so you can have it installed on your Chrome, Safari, Firefox, so that lives on the web. And then, we’ll also roll out a Windows version that will work across Windows, Mac, Linux. So yeah, very soon.

Jonathan: Once we get the browser extension then, that means that you would be able to take a document, say a Microsoft Word document, I presume, and anything else, and put it through the Envision browser extension and ask all the same questions that we currently can do with documents in the Envision library. Am I understanding that right?

Karthik K: That is one of the things that it’s going to do. It can also summarize entire webpages for you. It can also help you navigate across webpages as well, because I know sometimes, when you go on a particular website, it’s kind of annoying because it breaks your screen reader, and you could basically ask for a summary of a webpage. And if you want to navigate to a particular button, or like a login button, or if you want to know what a particular element on a webpage is, if it doesn’t have alt text, or if it’s not accessible, then the assistant can just be summoned and you can have the assistant basically guide you through accessing a webpage.

And eventually, you’ll also be able to fill out forms, because who likes to do that on the internet anyway? You could just have the Envision app or the Envision extension suggest options and fill out forms for you if it’s inaccessible.

So yes, a lot of that is what’s going to be happening on the web as well.

Jonathan: When you first started rolling out some of the AI features, I think I’m right in saying they were glasses-specific, and I thought to myself, I completely understand that, because there’s a lot of development time that must have gone into making them happen.

But in recent times, there’s a lot of goodness that is coming into the Envision app, which is now free to the user.

I’m interested in the business case for that. Are you hoping, essentially, that people will just love what you’re doing in the app (and now, the browser extension so much that they will say look, I want this hands-free and therefore, you’re happy for this technology to be essentially a loss leader for the glasses? Is that how it works?

Karthik M: That is an aspect that we’ll have to constantly evolve the business model around as we’ve also, as of now, that is the business model where we are making our revenues from the glasses, which is a paid device. And the application is something that we’re offering for free. And that’s also a reason why every big AI development is sort of hitting the glasses first, because there is a cost attached to it. So you can offer stuff without incurring huge API on the glasses first. And then, we do have the time to optimize our API usage to a point where we can also deploy it on the free app.

So ultimately, our goal is to be able to offer our solutions to as many people as possible. Accessibility is key. And of course, when it’s a free app, the accessibility is the highest.

So the business model has to constantly evolve as the landscape is going to continue to evolve. So as of now, the business model is to make a revenue from the variables that we sell.

But eventually, as we do go towards more of a device-agnostic pathway, we will have to also be more creative about the business model that we adopt at that point.

Jonathan: And can we talk about the price of the glasses? Because I believe there has now been a subscription model introduced, at least for some markets. I’m not sure if that is global. What’s pricing like for the glasses right now?

Karthik M: The glasses start retailing at $1,899 US. That is the price point it starts at, and it goes all the way up to $3,500 US. We have the home edition of the Envision glasses, which is the one that is the most purchased, which is retailing at $2,499 US.

We are experimenting with a subscription edition for the US market at the moment. It is a pilot that we’re doing with the help of a company that is facilitating the subscription, like payments for the hardware. So it is an experiment that we’re doing, to see if that is a business model that could make the glasses accessible to even more people. So yeah, we are still analyzing and just seeing if subscription model is something that will be able to help us achieve that.

Jonathan: And people can check the Envision website for further details on all of that. And of course, you’re available for iOS and Android. So you can download the apps there and have a play. It is free to do that. And soon, you’ll be able to do the same with the browser extension.

Now, Karthik M, you tantalized us and said we might have the chance of hearing a demo of the new assistant. Can you do that?

Karthik M: Cool! So in my hand here, I have a menu from a restaurant that I went to recently. It is a German restaurant. So the menu is in German.

I’m going to quickly take a picture of this with my app, and I’m going to ask Alfred, who is the assistant that I have, to make me a recommendation.

Hey, Alfred. Can you recommend me what I should eat?

So Alfred is going to have a bit of a think. This is still the pre-optimization alpha that we have so in this one, the processing time is still a bit. But it’s improved in the next version.

Alfred: Master Karthik, as you are looking to increase your protein intake while reducing sugars, I might suggest opting for a salad with a lean protein source: The Palo Salad, hash 26, which includes chicken breast, would be a suitable choice. It should align with your nutritional preferences for higher protein and lower sugar. Pair it with the house-made dressing, as indicated, for potentially less added sugar than commercial dressings.

Enjoy your meal, sir.

Jonathan: That’s pretty impressive. [laughs]

Karthik M: So if you were able to teach your assistant that you are following the keto as a diet, then it will take that into account, and it will offer you recommendations on the basis of that.

So that is actually what we’re trying to build. And we will be starting to hand out invites to a private beta at CSUN, so do come to our booth and find us if you do want access to a private beta, so that we can also offer you your own assistant as quickly as possible.

Jonathan: Now, did you say that that was originally in German, that restaurant menu?

Karthik M: Yes, it was. So it automatically is doing a translation and just speaking out to me the contents. I don’t even have to ask it for doing a translation. It automatically is able to do that on its own.

Jonathan: So that’s pretty cool. If you’re doing a lot of traveling and you’re not fluent in the language, I can honestly say, Ich verstehe nur ein bisschen Deutsch.

Karthik M: [laughs]

Jonathan: So if I’m in Germany, I can probably order a little bit, but not very much.

So that’s also very impressive that it’s doing that translation for you as well.

Karthik M: Exactly.

Jonathan: Hey! Thank you both very much for coming on the podcast. I’ve really enjoyed catching up with what you are up to, and I’m looking forward to what happens next.

Karthik K: Yup, thank you so much, Jonathan. Super super happy to be here on the podcast.

And yeah, we’ve got a lot of great stuff planned, and a lot of great stuff that we’re looking to showcase at CSUN. We’re also doing a couple of talks about the assistant itself.

I’m more than excited to get all of the stuff that we spoke about today into your hands so you could take them for a spin.

Advertisement: Living Blindfully is brought to you in part by Turtleback.

They’ve been working with our community for decades now, and they understand our requirements when it comes to leather cases for the devices that we use. They need to be durable, protective, and most of all, functional. You can be confident that when you buy a leather case or accessory from Turtleback, you’re buying quality.

Visit TurtlebackLV.com today to find out more, and be sure to use the coupon code LB12, that’s LB for Living Blindfully, and the number 12 for 12% off your order at checkout.

Or you can phone their friendly team. The number is 855-915-0005. That’s 855-915-0005.

And if you’re attending CSUN, stop by their booth, which is 1014. Say hi, and feel the quality for yourself.

Make the most of your device by putting it in a Turtleback case. TurtlebackLV.com, And don’t forget that coupon code LB12.

iOS 17.4 is a Significant Release

Some interesting tech developments to talk about this week.

Apple has released iOS 17.4 officially. This is the operating system that powers all iPhones.

And depending on where you are in the world, it’s quite a significant change. Because of legislative changes in the EU that are trying to make sure that things are more open, this legislation is called the Digital Markets Act.

Apple has got a different experience in Europe compared to the rest of the world. In Europe, they’re having to allow alternative app stores and sideloading. You remember the old days of phones where you used to be able to just download an app from somewhere and copy it onto your phone? Now, in some ways, that was simple. In others, it was complicated because you’d have to go scrambling around the web, trying to find an app that you wanted, and going through the install process.

The one-stop-shop approach of the App Store is quite consumer-friendly, but it’s also potentially anti-competitive. Apple determines what apps will be allowed to go into the store, they keep their markup, they take advantage of the fact that not only are they a provider of this App Store, but they’re also a competitor with the number of the apps that are in the App Store, so it can be a bit dodgy.

Spotify has been fighting this battle particularly hard with Apple, and you may have read in the tech press over the last week that Apple has been hit with a 2 billion euro fine, 2 billion euros almost, for their antics towards Spotify.

And Apple says, “Well, we’re going to appeal that. We don’t think that’s fair.”

Also, if you’re in the EU and you’re a parent, you should know that there is now an option in ScreenTime which restricts your kids from being able to use alternative marketplaces, so you might want to be aware of that if you do look after iPhones on behalf of youngsters.

Also in the EU, there’s a pretty significant change as well relating to browsers. You may not know this, but when you use Safari, or Chrome, or Brave, or whatever you choose to use on iOS, it’s actually really the same browser engine. The thing that’s different is the user interface. And obviously, it might sync your favorites and your bookmarks and all those things with the browser that you use on your desktop, and that has value, but the actual browser engine is the same, no matter what browser you use because Apple says it should be so.

Now in the EU, it’s possible for browser developers to deploy different engines entirely, which means that at least in the EU, the experience that you get on different browsers could be radically different. Maybe some sites will render better on certain browsers than others, and you will really notice this difference.

Another thing that the banking industry has been asking for, demanding, in fact, and going to regulators about for years is they want independent access to the NFC chip, the Near Field Communications chip. It, among other things, makes Apple Pay possible, and Apple has said no, we are not going to do this for security reasons. And many people have said, “Bah! It’s nothing to do with security. It’s all to do with anti-competitive behaviour.”

So they have caved here in the EU as well. And now, other banking institutions can have access to the NFC chip in the phone. And that means that there will be a plethora of payment options available to people in the EU. It’s going to be quite a different experience from the rest of the world. So we’ll see where that one goes.

But it’s a hot topic, and even though these changes only affect Europe right now, there is every chance that it could extend to other markets. As other markets look at the EU legislation and potentially emulate it, there have been consistent rumours for some time that the Department of Justice in the United States is about to begin an antitrust investigation on Apple.

But if you are in the EU and you start to see some of these alternative app stores rolling out and some of the other changes that are EU-centric right now, I would be very interested to hear how it goes, because one of the challenges, of course, is that if other app stores come along, they may not be as accessible as Apple’s App Store, so that will be a consideration.

Another change that’s global is the availability of transcripts in the Podcast app. These transcripts are generated automatically, and I have to say, I’m much more impressed than I expected to be.

That’s not going to replace Hannah, the Incredible Transcribing Transcriber any time soon, for a couple of reasons. First, it still has errors. And second, I just don’t think the Apple Podcast app is a very good podcast experience at all. Overcast is much better, Pocketcast is much better, Castro (now that it’s getting its act together), is back in contention. There are much better options out there.

But people do use it because it’s on the phone, and it’s effortless, and there’s nothing to install, and the transcript feature is pretty cool, actually.

I went through Living Blindfully episode 268, that’s the previous one, the one we did with Deane Blazie, and there are definitely errors. It’s nowhere near as good as Hannah the Human. So there.

But it is pretty readable, and it’s much better than I expected because in that episode, you’ve got somebody with a New Zealand accent and you’ve got somebody with an American accent (I’ll leave it to you to try and work out which of us has the one and which of us has the other.), and it really does work quite well.

there are a lot of podcasts that don’t make their own transcripts. And as long as the audio quality is good, I suspect this will open up a lot of podcast content to deaf-blind people who couldn’t access it before.

And although I don’t like the app, I might actually try some news podcasts that I subscribe to, where reading it with the text-to-speech cranked up like I have it would be far quicker than listening to the episodes even I think sped up a little bit. So this is a very intriguing feature.

And if you’re checking out transcripts in iOS 17.4 in Apple’s Podcast app, let me know what you think, and whether it’s a valuable feature. Is it so valuable to you that you would even consider jettisoning a superior podcast app just because the transcripts are such a game-changer? It’ll be interesting to see what impact this has.

Also, good news from the Name That Tune department, because songs that you’ve identified when you use a music recognition service can now be added to Apple Music Playlist, the Apple Music Library, and also Apple Music Classical.

It’s a small thing. But one thing I really like is that the stopwatch is now a live activity. So you can start the stopwatch, go to your home screen and get on with your life, and then you can just check the dynamic island, and you will see that as a live activity. If you don’t have a phone that uses the dynamic island, then it appears in the lock screen as well.

There are a range of security updates, which means that it’s always good to update from that perspective.

And there seem to be some new VoiceOver bugs. Sadly, iOS 17 seems to be getting worse for me, and not better. There are a number of long-standing bugs that I’ve had, some of them I’ve mentioned here before, and some of them I haven’t.

One of them that’s really impactful for me, and it may just be something to do with made-for-iPhone hearing aids, (or even my made-for-iPhone hearing aids, my particular brand), but I have found that audio ducking has become extremely unreliable. So I used to put my phone on charge, wander around the house doing chores, and listening to podcasts using my made-for-iPhone hearing aids, and I would get a notification, such as a breaking news alert or an email, and I’d hear it, and then the audio would duck back up again. What I’m often finding (and it’s not consistent, and I can’t find a pattern to it) is that sometimes, after I’ve heard that notification, the audio does not duck back up again, so the audio is very quiet. And when it gets into this state, if I hear another notification come through, I hear the alert tone, but VoiceOver doesn’t speak the notification.

So it’s a really annoying bug, and the only way to fix it is to go back to the phone and perform a 2-finger single tap, and that causes the ducking to come back up again and everything to return to normal, although I don’t then hear all the notifications that I’ve missed.

Continuous reading for me in iOS 17 continues to be horrible. Sometimes, when it crosses paragraph breaks or when a notification comes through, it just stops and it’s really unreliable. It’s basically unusable, in my view, trying to do continuous reading, and that’s a shame. It’s a feature that I use a lot. Or I would, if it behaved itself.

New in iOS 17.4, I’ve noticed a couple of things.

If you set your VoiceOver language to one other than the default, and then you touch the status bar, the status bar will speak in your default language. So for example, let’s say that I’ve got Daniel as my default language (which I have), and I have him cranked up pretty fast when I’m not doing demos for Living Blindfully). But I change, for whatever reason, for a while to eloquence. When I touch the status bar, I hear the status in Daniel’s voice. It’s a most bizarre thing. I don’t think it’s deliberate. I can’t think of why this is a feature. It does seem like a bug to me.

One thing that is definitely a bug is for me, in email, the misspelled word rotor has now gone. You can rotate to your heart’s content, and you can’t get to the misspelled words to check your spelling before you send an email. That is a pretty significant one there.

And Apple, while you’re in there fixing the misspelled word rotor in the email, please fix the audio destination rotor issue, which has gone on for years now. This is where you can have audio destination enabled in rotor settings, and it just doesn’t show up. It shows up at random. Somehow, there’s some weird focus bug with the rotor. And you flick up, and you find that magically, you’ve engaged with this rotor option that doesn’t appear on the rotor itself.

And this is the thing that is affecting made-for -iPhone hearing aid wearers, in terms of VoiceOver being so low on a call that if you’re put on hold, even if you’ve got VoiceOver at 100%, it’s often impossible to hear VoiceOver speech. That one also really needs to be fixed. But gosh! I can’t remember when that one came along. It’s been a wee while.

And Justin Daubenmire is writing in on iOS 17.4 and says:

“Hi, Jonathan,

I was wondering if you could help me out by sharing this with the community. I’ve noticed a peculiar issue with my iPhone 12 mini running iOS 17.4, and I’m curious if anyone else has come across it.

When I use Siri to send a message like, “Hi, Joe! How are you today?”, It reads back, “Hi, Joe!”, and then unexpectedly stops without even confirming if I’m ready to send. Strangely, I never experienced this problem with iOS 16.

After a bit of digging, it seems like others using CarPlay have also encountered a similar hiccup on iOS 17. Siri isn’t repeating back their messages, making it challenging to confirm if the message is accurate.

Have you, or anyone else in the community faced this issue? If so, do you happen to know of any fixes?

I appreciate everyone’s insights and help in resolving this quirk.”

Justin, I’ve not seen this myself. I don’t do a lot of dictation or voice stuff because I’m too picky about the quality of what I put out there [laughs], and I find dictation to be a little bit hit and miss.

But if others are using this and have experienced this, and even better, if you’ve got a workaround for Justin, please do be in touch. opinion@LivingBlindfully.com is my email. You can also give us a call on 864-60-Mosen, 864-606-6736.

Stan Litrel is reporting in from Medford, Oregon on iOS 17.4.

For some reason, I’ve got Oregon Trail stuck in my mind. I used to play that game on the Apple IIe. Oh, that was fun.

Stan says:

“When previously updating the software of my iPhone, I mentioned that I had trouble during the update where I totally lost speech, and had to get sighted assistance when I was out and about.

When updating to 17.4, it happened again, but it wasn’t quite as bad. After getting the verifying update message, things totally froze with no speech.

I waited and waited, to no avail, and I decided to call Apple. After talking to one of the great people at Apple, they transferred me to the Accessibility team.

Even though I was upset on the inside, I managed to get the people cracking up.

The Accessibility people advised me to press the volume up button followed by the volume down button, and then press the Siri button on the right side of my iPhone 15 Pro Max, and I heard the wonderful speech come up.

This is the second update where things went weird. The Accessibility folks said that they had no other complaints with people having similar problems updating.

I figured that I would send you this story to see if others had my kind of issues during updating software. Knowing that my unit was at 51%, I plugged the unit in before updating, thinking that I would avoid the problem that I mentioned last time.

I’m looking forward to receiving my new Zoom H6 Essential Digital Recorder.”

Oh,you’ve gone for the top one, Stan. Good for you.

“It should be in my hot little hands really soon.”, says Stan.

I’ve not heard of anyone else having these problems, Stan, and I do wonder what’s on the screen, if anything, when the phone gets into this state and it’s not talking for you. Is it sitting there at the password screen and you’re just not getting VoiceOver speech? Is there some message coming up that somehow VoiceOver isn’t speaking that would be helpful to know about? That could be most informative, in terms of trying to track this down.

But failing something obvious, (and it’s a very drastic thing to do), but it may be best for you to reset your phone, and start again, and install your apps, and do all that sort of thing because it’s better than having this happen to you every single time. You do go into that limbo land when your phone starts to update and you think, how long’s it going to take? And I’m sure it must be a really sinking feeling when your phone just doesn’t come back.

JAWS Introduces AI Image Recognition

I said it was a big week for tech. And I’m not wrong, I tell you, because I want to move on to the JAWS 2024 update that Freedom Scientific has recently posted. This is a big one because it makes ChatGPT and Google’s Gemini AI available for PictureSmart without the end user having to go through a lot of technical hoops.

NVDA has been doing this for a while with a couple of add-ons, and they’re great. But one of the things I’ve learned well and truly in my last 25 years of doing these sorts of things like Living Blindfully and also working in the assistive technology industry for some time, is that what frustrates people about assistive technology is that they understand the promise of it. But sometimes, getting to where they want to be is just too much hassle and too geeky. And for many people, having to install API keys so they can recognize images on their computer is just too much. They might be able to get some assistance to do it if it’s beyond their comprehension, but JAWS has cracked this by simply making it available with a bunch of key presses. And for a lot of people who’ve missed out on the whole AI craze through Be My AI, Envision, and similar apps because they’re not smartphone users, this puts this technology in their hands for the first time, and I think it’s going to be quite a significant thing.

I’ve used this in an interesting case. So I’ve scanned lots of photos that I have on my hard drive of my kids growing up. Thankfully, I have an awful lot of audio of my kids growing up, which is far more meaningful to me. But I have looked back at the photos, and it’s really great to have them, actually, particularly of my dad, who died 7 years ago. And I have some pictures of him with the children and things, and that’s precious. So I’ve enjoyed doing that.

But one thing I’ve also realized is that this could be a happy answer, at least for online meetings, to this debate that we were having a while ago on visual descriptions in meetings. You remember we had quite the conversation about this for a while. This is where some people really liked the idea of having people describe themselves – what they look like, what they’re wearing, because it’s information they felt deprived of. Other people felt it made them, as a blind person, uncomfortable because especially if they were the only blind person on the call, it felt like people were going to a lot of trouble for them, and they felt like they were doing the right thing and everything, when actually the blind person just wanted to get on with the meeting and felt a little bit uncomfortable about being made a fuss of in that way.

I’ve used this PictureSmart to take a picture of the window that I’m in when I’m on a Zoom call and a Teams call, and it gives me some pretty good visual descriptions. So now, I have access to this information without holding up the meeting, and I think that’s great.

There’s another work-related use case that I’ve been using PictureSmart for, and it’s a bit controversial potentially, so I think I need to explain myself.

When Freedom Scientific introduced the ability to perform optical character recognition on inaccessible PDF files, there were some people who were concerned about this. They said Freedom Scientific was letting content creators off the hook. And if a blind person needs to read a PDF file, it should be universally accessible, in the same way that we expect web pages to be universally accessible.

And I agree with all of that. I’m a strong advocate for universal design. Every time a document is out there in the wild, whether that document be a web page or a PDF file, that is inaccessible to some of us, it should be unacceptable to all of us. That’s a fundamental principle of universal design.

But we’re not going to change the world overnight. And right now, some of us have work to do. Some of us need to get at that PDF file.

So this takes me back to the use to which I’m putting PictureSmart that could be controversial. If you’re attending an online meeting and somebody’s giving a presentation, and they know that you’re blind, perhaps you’ve requested the PowerPoint presentation from them, or maybe the meeting is being run by somebody who you deal with regularly, and they know that you’re a blind person, they should make sure that the slides are accessible to you. One way is to send them in advance, another way is to use the feature that’s built into Microsoft Teams, and a third way, of course, is to use Scribe for Meetings from Pneuma Solutions. But again, we’ve got work to do, and we don’t live in a perfect world, so we should strive to make the world more perfect while also being pragmatic about what we’ve got to do now.

And I’ve had good luck running PictureSmart on a screen where there’s a slide. I’ve asked PictureSmart to tell me about that slide, and it seems to be doing a reasonable thing.

Of course, you can’t tell when it’s hallucinating. That’s the thing about all this AI stuff. It might be getting it wrong. But based on the context when somebody’s been speaking to the presentations where I’ve done this, it does seem to work very well, so I read the description that’s coming from PictureSmart on my Braille display while I’m listening to the presentation.

Also, just the user interface of this being so easily able to take a look at images on social media, or on a website, wherever they might appear. Actually, where you’re browsing for a product, and you see an image of a product and you might want it described to you. All sorts of really cool use cases. And it’s just integrated right here into JAWS.

At the moment, it is an experimental development, though, and that means that you will have to enable this feature through JAWS’s early adopter program, and I want to show you how to do that. So I’m going to go to the JAWS window on my computer.

JAWS: JAWS context menu. Options, submenu.

Jonathan: And it’s the options menu that we want, so we’ll press enter.

JAWS: Basics…

Jonathan: And I’m going to up arrow.

JAWS: Exit.

Remote desktop…

Restore to factory set.

Manage application set.

Early adopter program…

Jonathan: I’ll press enter on that.

JAWS: Leaving menus.

Jonathan: And there’s a bit of verbiage here, but let me just do a say line now.

JAWS: Read-only edit. Enable and disable features that are still in development. We used the early adopter program to allow users to try out new features still in development. If you would like to leave a rating for a feature, please use the send feedback option.

Jonathan: I’ll press tab.

JAWS: MathCat check box, not checked.

Read only edit. This feature enables the use of MathCat.

Jonathan: You can read more about that, if you wish, but I’m going to press tab.

JAWS: Send feedback button.

PictureSmart with generative AI check box, not checked.

Jonathan: Now I need to check this. If you don’t check this box, you’ll get the older PictureSmart. And I think the newer one is definitely worth doing this. So let’s check the box.

JAWS: Checked.

Jonathan: And I’ll press tab.

JAWS: Edit, read only. This is an update to the PictureSmart feature that uses large language models (LLMs) from AI services to describe pictures.

Jonathan: Alright, that’s what we’ve got. So I’ll press tab.

JAWS: Send feedback, button. Disable all, button. OK, button.

Jonathan: And press enter on OK.

JAWS: Early adopter program dialog. You must restart JAWS to apply these changes. OK, button. Desktop.

Jonathan: So I’ll quit JAWS.

JAWS: Unloading JAWS.

Jonathan: And we’ll just give it a couple of seconds to do its thing and exit from memory. And then, we’ll run it again. And now, PictureSmart will be enabled with the generative AI feature.

I’m going to press control alt J, which I’ve assigned to run JAWS.

JAWS: JAWS.

Jonathan: And you may or may not have used PictureSmart in JAWS before, but there are a couple of new commands. Let’s have a look at how we get into PictureSmart.

All you have to do is press the JAWS key with the spacebar. So that could be your insert key. It might be caps lock. Of course, you can use both if you’ve got laptop layout enabled. And now, I’ll press P for PictureSmart.

JAWS: PictureSmart.

Jonathan: If you don’t know what command you want at this point, you can press question mark and get a list.

JAWS: Heading level 1. The following commands are available in the PictureSmart layer.

Acquire and describe an image from a scanner or Freedom Scientific camera such as Pearl using the letter A.

From Windows Explorer, you can describe any image file using the letter F.

From the computer screen or application, you can describe a control with a letter C.

An image on the clipboard can be described using the letter B.

An image can be described by saving it to the PictureSmart directory in your users picture folder.

To get full results from all available services, add the shift key to any of the above keystrokes.

Jonathan: And if you’re unclear about what that means, as I understand it, what’s happening is, if you don’t add the shift key to get descriptions from all services, you will get a description from Gemini. That is the AI that is offered by Google.

Having been playing with this for a wee while now, my experience is that with Gemini, you get a kind of a more factual description. With ChatGPT, which is the same technology that Be My AI uses, you tend to get a slightly more what I would call flowery description. It tends to be more detailed and it talks more about atmosphere and things of that nature.

I will try this by pressing escape to get out of this keyboard help.

Now, I’m in File Explorer and I’m on my Synology drive where I have a lot of photos. And the one that I’ve highlighted is a picture of my mum and my late dad with my kids and some other grandchildren.

I’m going to run this picture through PictureSmart and see what we get. So I’m going to press the JAWS key with the space, and then P for PictureSmart. And now, I’m going to press SHIFT with F for file. And by adding SHIFT to the command, I’ll get descriptions from both services.

JAWS: PictureSmart is in progress.

Jonathan: You hear some progress clicks to tell us that the image recognition is happening, and it’s sending that image away to the cloud. And you’ll know if you’ve used something like Be My AI. Sometimes, it can take some time. And of course, in this case, we’re sending it to 2 services. In my experience, Gemini is much faster than ChatGPT.

JAWS: Heading level 2. Gemini.

Jonathan: Alright. I’ll perform a say all and let it do its thing.

JAWS: Heading level 2. Gemini. The image shows a group of people standing in a garden. There are 4 adults and 7 children. The adults are standing in a row, with the children standing in front of them. The adults are all smiling, and the children are mostly smiling. The people are all wearing casual clothes. In the background, there is a fence.

Heading level 2. ChatGPT. This image shows a group of people posing for a photo in what appears to be a backyard or garden setting. There are 11 individuals in total, suggesting this could be a family portrait. Starting from the left, there’s a woman with long hair, wearing a blue long-sleeved top and jeans. Next to her is a young boy holding a plush toy that resembles a tiger. He is dressed in a dark sweater and light-colored pants. Beside the boy is an older man with gray hair, wearing a green sweater and dark pants. He is standing in front of a woman who is partially obscured and seems to be wearing a white cap. Next to the older man is another young boy in a dark sweater and tan trousers. In the center front of the photo is a young girl with braided hair, wearing a traditional-looking black and white dress. Behind her is a man in a dark cap, a black t-shirt with white lettering, and a beige coat. To the right of the center girl is a woman with shoulder-length hair, dressed in a lavender long-sleeved top and blue jeans. Behind her, almost fully obscured, is another young man wearing a white cap and what appears to be a brownish jacket. Moving further right, there is a tall young man in the back row with curly hair, wearing a green striped sweater. In the front row, towards the right side of the image, there is a young girl with short hair, dressed in a black and white checkered dress with a red motif at the center. Lastly, there is a teenage boy wearing a blue hoodie and dark pants, standing on the far right of the image. The group is standing in front of a wooden fence, with a tall bush on the left and a structure with a sloped roof on the right. The ground looks like grass, with a few scattered leaves, and there appears to be a garden or lawn area behind them. The sky is overcast, offering even lighting for the picture.

Jonathan: I mean, that is an incredible description from ChatGPT.

As you hear, the description we got from Gemini was much less helpful, quite superficial. And the ChatGPT one was just incredible.

So that’s why I like to use the shift key to add to these commands so I get descriptions of both. If you don’t use the shift key, you can use the link at the bottom of the Gemini description to go out and get ChatGPT ’s description.

A couple of limitations. It sounds like there was some truncation of that description towards the end of it from ChatGPT.

And the other thing is that at the moment, you can’t interrogate the AI descriptions that you get back. For example, if you take a picture with Be My AI or you supply an image to ChatGPT itself, you would be able to ask it questions about the image. For now at least, you can’t do that in JAWS.

There is a daily limit (I believe it may be 50) to how many ChatGPT descriptions you can retrieve.

Now, let’s go shopping!

I’m on the Amazon website now, and I’m interested in a microphone. I’ve been looking at these for a while, actually, and thinking that now that I’ve got these Zoom recorders, I should upgrade my Samsung Q2Us. Now I’ve got highlighted here a graphic.

JAWS: Sennheiser MD46 carioid interview microphone.

Jonathan: That is the mic I’m interested in, and I’d like a description of it.

So I’m going to press the JAWS key with space. And this time, I’ll do P for PictureSmart, …

JAWS: PictureSmart.

Jonathan: and C for Control. I won’t use Shift because I’ll show you what you can do if you do this. So I’ll just press C for Control.

JAWS: PictureSmart is in progress.

[progres clicks]

JAWS: Sennheiser MD46 cardioid interview microphone. Black, graphic.

Jonathan: Now, for whatever reason, that’s kept me in Microsoft Edge. But I can Alt Tab.

JAWS: PictureSmart results.

Jonathan: That should be it. There we go. And now, let’s have a look at what we’ve got.

JAWS: The image is of a black Sennheiser MD445 cardioid dynamic microphone. Link, more results.

Jonathan: So that’s not very helpful. I mean it doesn’t describe at all what the microphone’s like. So let’s choose More Results and that’ll cause a query to go out to ChatGPT. And we can hear the click. And I think I need to Alt Tab again, actually. No, I don’t.

JAWS: Heading level 2, Gemini.

Jonathan: So now, …

JAWS: Heading level 2, ChatGPT. You’re looking at an image of a dynamic hand-held microphone. The microphone is predominantly black with a grill at the top which a textured, spherical mesh typically used for voice capture. The body of the microphone is cylindrical, tapering slightly toward the bottom, and it bears white lettering along its side, indicating the brand and model information. The microphone design suggests it’s a professional-grade tool commonly used in live performances, speeches, or recordings. There is no visible switch, suggesting it may have a simple plug and play operation without and on-off toggle. This particular microphone is lying against a light grey background, making its dark color stand out prominently.

Jonathan: Wow! Okay. So I now have a better image in my head of what that microphone is like. The Sennheiser MD46 microphone. I think it got the number wrong but there’s probably some sort of regional variation there. And it’s a nice interview mic if you want a highly directional microphone for interview with a cardioid pattern.

So that is a brief overview of how PictureSmart works.

I just find the convenience of having this in JAWS means that I’m using it a lot – when I’m shopping, when I’m doing other things. It is a very cool addition indeed to JAWS 2024. And all you have to do to enable it is to just turn it on in the early adopter program from the options menu of JAWS.

[music]

Voiceover: If you’re a member of Living Blindfully plus, thanks for helping to keep the podcast viable.

If you haven’t yet subscribed, why not do it today?

Get access to episodes 3 full days ahead of their release to the public, you’ll get advanced notice of some of our interviews so you can have a say in what we ask, and you’ll help keep the podcast viable by helping to fund the team to do their work.

Our guarantee to you is that everyone who works on the podcast is blind or low vision, so we’re keeping it in the community.

Find out more. Visit LivingBlindfully.com/plus. That’s LivingBlindfully.com/P-L-U-S.

Pay what you can. It all helps. Thanks for your support of Living Blindfully plus.

Comments on the Deane Blazie Interview

Voice message: Hello, Jonathan! This is Howard Goldstein in Connecticut. Just a few comments on the Deane Blazie interview.

I really enjoyed hearing the stories about the old days. I was getting started with technology in the late 1970s. And so I remember a lot of the stuff he was talking about, and those were really fun times. [laughs]

I also wanted to talk about the VersaBraille. You mentioned that you thought that the cord structure on the VersaBraille was the same as what was on the Braille ’N Speak. But that’s not true. The VersaBraille’s cord structure was completely different. I think VersaBraille may have been the first device where they used the term cord to talk about pressing a Braille key along with a space bar.

But the structure itself was quite different. Moving the cursor, for example, was dot 2 to move it left, and dot 1 to move it right. Dot 5 cord turned the cursor on and off. And the cursor, of course, was all 6 dots. [laughs] It was different.

And when I finally did get a Braille light, I had to relearn all the cords, and it took a lot of getting used to.

But for what the VersaBraille did, those cords, those strange cords actually made sense. For what it did, it did a really good job.

And I agree that the VersaBraille 2 was never anything that I really wanted anything to do with. I didn’t like the keyboard, and I didn’t like that big heavy metal case. But the VersaBraille, original VersaBraille was a wonderful machine.

Jonathan: Lovely to hear from you, Howard. Thank you so much for getting in touch.

And obviously, my recollection is faulty then, so I certainly bow to your superior knowledge because I was pretty young when I was using the VersaBraille, and it was a long time ago.

So that is interesting that the cord structure that became so common to notetakers actually did start with Blaizie, while we give TSI and the VersaBraille the credit for coming up with the concept of cording in the first place.

John Riehl writes:

“Hi, Jonathan,

I just finished listening to episode 268 and your extensive interview with Deane Blaizie. I found the historic reminiscences part of the interview interesting.

And while I certainly applaud Dean’s pioneering work in providing incredible, useful technology, the Braille ’N Speak was technology that was amazingly useful and ahead of its time.

Unfortunately, in my opinion, Deane is behind in his thinking. I could do a heck of a lot more with my iPhone or iPad and my Bluetooth keyboard than the BT Speak, although my setup is less portable. So I won’t be shelling out $1,000 for the product.”

Thanks very much for your thoughts, John.

I use Braille screen input on my iPhone a lot, and I do sit and write quite lengthy documents using Braille screen input on my phone. So that is very portable and it’s always with me. I’m not gonna leave my phone behind at any time.

And then, you get into the whole area of synchronizing data. So if you keep information on one device and that device doesn’t sync seamlessly with the other device you have with iCloud in particular, then you start to scrounge around wondering, where did I put this particular piece of information? So that would slow you down.

I’m not really interested in a speech product. I might look at it seriously if there was a new Braille Lite type device because it would be small, I am enjoying very much the QWERTY experience of being able to type on a QWERTY keyboard, and have my Braille display right there. For me, that is a pretty sweet mix, and I don’t mind having it over my shoulder in the carrying case that is made for the Mantis.

That said, there are a lot of people who struggle with Braille screen input. They want a simple speech-based device. And of course, this is more than a simple speech-based device, isn’t it?

Once you start getting into Linux, there are all sorts of interesting things that one might do.

So I will be very interested in hearing from people when they get this device in their hands, and we hear what real world users think of it, and how many people take it up.

More Memories of David Holladay

Let’s also continue with our tribute to David Holladay.

“Hi, Jonathan,” writes Rick Roderick.

“I remember how amazed I was by BrailleEdit and the Apple IIe when I first tried them in 1984. I loved being able to actually write and know what I was writing.

I was a rehab counsellor at the time, and didn’t get into assistive technology until the end of 1988. Keith Creazy taught me the basics of HotDots, the bridge between BrailleEdit and Megadots.

One of the things HotDots could do was to turn WordStar files from that code to text files. I got a document to transcribe from WordStar, I could figure it out. But when I figured that a global file could do this in seconds, my job was made much easier.

The real wonder for me was Megadots. Duxbury is more versatile, but in the vast majority of cases, Megadots was easier to use and gave me all the versatility I needed. Some of it was absorbed into Duxbury, but not all of it. I had to give up because later versions of Windows couldn’t handle it.

I talked to Caryn and David many times on the phone when I had questions, especially during the early days. Like you, I read the Raised Dot Computing newsletter as soon as it came out.

I loved the introduction of non-existent products every April 1st. One of my favourites was Transcribucks, an app for producing counterfeit money. The National Federation of the Blind says it changes what it means to be blind. As you said, David had an ability to push the limits further than most people thought they could be pushed.

By the way, I got a Victor Reader Stream 3 for Christmas. I love it, and am considering writing a review and submitting it to you.

I mostly love it, but no device is perfect. I use it for NLS Bard books. On the NLS site, on Bard Mobile and Express, magazines are displayed from the latest to the earliest. On Humanware devices including the Brailliant, the NLS eReader and the VR3, all magazines ever produced are displayed with the latest issue of each going first. I find this tedious.”

I wonder, Rick, if you’re saying that on these devices, it’s the oldest magazine that appears first, which I can certainly appreciate would be tedious. But thank you for writing in, and I look forward to your further thoughts.

The Bonnie Bulletin

[music]

Jonathan: Ladies and gentlemen, welcome to a long overdue Bonnie Bulletin, recorded with the latest and greatest technology. We’re sitting here on our couch, recording this on the Zoom H6 Essential.

Welcome, Bonnie Mosen!

Bonnie: Hello!

Jonathan: You’re stricken again.

Bonnie: I am. I have a cold of some form, or allergy or something.

Jonathan: So people need to just keep away from their podcast players.

Bonnie: I know, exactly.

Jonathan: Now, we got a comment from a listener a while ago. Because last time you were on, we did a kind of what we did on our holidays thing. And I had to explain quite a few things, like what a yoto player was, and different things like that. And this listener complained that there was not enough Bonnie in the Bonnie Bulletin.

Bonnie: [laughs]

Jonathan: And when I pointed this out to Bonnie, Bonnie said, “I agree.”

Bonnie: [laughs]

Jonathan: So I’m going to try and shut up.

Bonnie: Back off the mic. You ask the question and I answer it. [laughs]

Jonathan: Well, maybe. But the thing is that sometimes, it’s good for us to talk about some common experiences.

Bonnie: Oh, exactly. That is true.

Jonathan: That is true. I mean, there have been quite a few common experiences.

Let’s lead with the big story. And this is definitely your story, so off you go.

Bonnie: Yup. Well, I start a new job on Monday, March 11, which I’m super excited about.

I decided I had been working for Blind Low Vision New Zealand, which is our blindness rehabilitation provider (sole provider, one-stop shop here in New Zealand). And I’ve been with them for the past 4 and a half years and have had 2 roles there. I’ve worked as a vision rehabilitation counselor and also as a work-ready advisor. And I’ve been kind of not particularly satisfied with my job for a long time.

But when you’re in a position, sometimes it’s hard to give it up. You just feel sort of secure. And there’s the unknown, and not being able to really find anything that suits you. So I just sort of stuck with it.

And at some points, you start to think about what do you wanna do with the rest of your life? Is this really what you wanna do for the next 30 years?

Jonathan: And would you be working there when you’re 85?

Bonnie: Yeah, 85, yeah.

So just thinking about that, and is this really what I wanna do? And it wasn’t what I wanted to do at all, but not a lot out there, looking at what was out there, not really seeing a whole lot of things.

So I decided that I just couldn’t do it anymore. It was just causing a lot of stress for me, personally, and just was really didn’t wanna do it anymore. And just seeing sadly how far behind on some levels blindness services are here in New Zealand as compared to the rest of the world. And not a lot is really being done about it. And I’m only one person. So I decided that it was time to go.

The plan was to write full-time to really build up my communications portfolio, to actually work on actually getting some things published.

Well, we all know what happens when you make plans. The universe. Life is what happens when you’re busy making other plans. And before I left my position, a coworker had sent me a job that was here in Wellington. And I thought, okay. I’ll apply.

So I did. I was interviewed, and I’m happy to say that I will be the Accessibility Lead for Wellington City Council starting on March 11th.

Jonathan: That is a wonderful thing.

It mirrors my own experience, actually. Because I was in a situation where I was working for the very same organization actually. I came back as chairman a couple of years later. But I couldn’t take it anymore either. And I resigned not having anything specific to go to. and within hours, I was offered the ACB radio gig. I think sometimes, when you do the right thing,you just have to trust that things will work out because as you say ,life’s too short to be unhappy.

I remember that awful Sunday night feeling, you know, Monday’s coming, and you dread it, and that is no way to live.

Bonnie: It’s not, it’s not.

And I had some wonderful co-workers, particularly in the Wellington office. We had a great team of people, and it was hard to leave them.

But I haven’t missed the place. I don’t miss it at all.

It’s been great having 3 weeks off, which will end this Friday. But you know, having those few weeks just to kind of really change your mindset…

And today, I’ll be working in the city, which makes me very happy – where I worked before still in Wellington, but a bit out of Wellington. A good hour on the bus if you’re lucky. And not that it’s that far, but when you’re on a bus, you know you have to stop every 2 seconds and you hit traffic. So it’s not like being on the subway where you just kind of go through, and hope there’s no signaling issues and traffic at South Station, that sort of thing. So a lot closer commute .

And what I’ll essentially be doing is, I guess the closest that I can compare it to for the US it’s gonna be like an ADA coordinator for the mayor’s office, or the City Council. so what I’ll be doing is working across all council projects, and making sure that they’re accessible, so working with the different units within the City Council to make sure that digital accessibility, built environment, that sort of thing. So any capital projects they’re working on construction, just being the kind of advisor for those and working with internal and external stakeholders on making Wellington an accessible city.

And Wellington is an interesting city because geographically, it’s kind of a tough city because it’s very hilly, and there are things you can’t change. You can’t necessarily change the geography of an area. So it does have some kind of unique challenges.

And so it will be a lot of fun working with where I’m working. To get there, you have to go up to get onto the terrace. It runs parallel to what we call our Golden mile, which is Lambton Quay. But you have to go up to get there. And a lot of them you can go up through some of the buildings, they have lifts that’ll take you up on the terrace. Or you can go up a big hill, or you can go up many many stairs. So today, Eclipse and I went.

And that’s one thing when you’re blind or when you have any disability, you think about what’s the easiest way to get there. Because sighted people, they a lot of times will go hither, there, up and down and yonder and around the woods and everything. So you can go through some buildings. And I’m like, I don’t really wanna have to do that. That just sounds too complicated. So there is a way to go up through a pedestrian walkway, and you go find the cookie store, which you can usually smell.

Jonathan: It’s a good landmark, isn’t it?

Bonnie: It’s a good landmark, yeah.

Jonathan: [laughs]

Bonnie: And then you go down the pedestrian, you go up some stairs and up a big hill to the terrace, and then down the terrace, and there you are.

Jonathan: Well, I’m very proud of you, and I’m pleased for you. And I think Wellington’s in good hands, and I expect you will fix all the problems and life will be good.

Bonnie: [laughs] Not sure about that.

I’ll be working with it. I’m really excited to be the collaboration with working with a lot of different teams that I’ll be working with. I’m on the Connected Communities and the Harm Reduction teams. And it’ll be exciting working with different people.

Jonathan: And it will have its moments, because if there are emergencies, for example, like a weather emergency, or even … We had a tragic fire here last year, then you need to respond to that. And obviously, we need to make, or you need to make sure, rather, that the disability response is appropriate, and there are people with a range of needs in natural disasters and emergencies. And obviously, we’re a quake-prone city. It’s pretty real, dude.

Bonnie: It is. That’s one thing that I have to be on call sometimes on the weekends, just in case something does occur, natural disaster or man-made disaster. In terms of what happened with the terrible fire that they had, not too far from where I worked a few months ago.

But I’m looking forward to it. It’s going from an office of about 15 people to a building of about 2,500. I’m excited to be back in the midst of being in the city and being in a high-rise again.

They’ve been wonderful in terms of accommodations and finding out what I need. I have to laugh because I worked for government before, and It seems like everywhere I go now, I’m hearing people complaining about the Wellington City Council. So I’m just kind of like get used to it again [laughs].

Jonathan: Yeah. You notice it now. Yeah. Yeah. Yeah. Oh well, this is great news. I’m really thrilled.

And we’ve also been doing some exciting things like the big Mosen towers clean out, which has been long overdue. And we filled this massive skip. Do they call them skips in the States?

Bonnie: I think dumpsters.

Jonathan: Dumpsters, Yeah. we got a big one. And Heidi and Henry and Richard and Nadia were helping us, and so we had a lot of people down.

And the next morning, (because it didn’t take us very long to fill this skip), we Decided that we would ask Heidi if she wouldn’t mind just helping us to Braille label a few things. And we’ve got this old Dymo label that’s probably seen better days, and it was a painstaking process. And Heidi was getting annoyed because the Dymo labeler was so, what’s the word I’m looking for?

Bonnie: sluggish?

Jonathan: Fiddly.

Bonnie: Fiddly.

Jonathan: Unreliable. So I said to Heidi, “Well, we’ll look up what’s available and get something new.”

And she found this device. We’d both heard about it, hadn’t we? when it came out. This Braille labeler device that has a Perkins style keyboard, and it has a USB port so you can plug a QWERTY keyboard into it. And If you do that, then the labels it generates are grade 1. But you can Braille in contracted Braille on the little Perkins keyboard, and press enter, and it prints it out and it’s ready to rip off.

And it’s quite an expensive device. I think it’s like a thousand US dollars, or something. But for us, for a 2 blind family, It seems like a really sensible thing to do. And I’m sure we get a lot of use out of it.

So we order this parcel, but they won’t ship to New Zealand. I’m going to contact them and see if they might. But they wouldn’t ship here.

So we use this parcel forwarding service that we use a lot called uShop. And how it works, they give you a US forwarding address. And when you go to check out to a place that won’t ship directly to New Zealand, you just give your US address (This one is in Oregon.), and they ship it to this freight forwarding warehouse thing, and then they intercept it, and they contact you, and they charge you a massive amount for shipping, and you have to pay the import tax, the goods and services tax, that sort of stuff.

And anyway, I had some other stuff coming from uShop. And I thought, “Where’s the labeller?”, because it was showing no sign of having left that warehouse after I paid all the clearance fees.

And then last Friday, I got an email to say it had been stolen from the warehouse. [laughs]

Bonnie: That person’s going to be surprised.

Jonathan: Yeah. They’re going to open it up, and find a Braille labeler.

So it’s been stolen, not recovered. And now, they’re refunding us. But that was just really extraordinary. I haven’t heard anything like that happen before.

Bonnie: No, I haven’t either.

Jonathan: Should we get it again?

Bonnie: Well hopefully, if we can actually get it here.

Jonathan: Yeah.

Bonnie: I did know someone that had Braille books stolen off their porch once. They had a box on their porch, and the person stole it, and it turned out it was a Braille cooking magazine from the National Library.

Jonathan: And did they get the guilt and return it?

Bonnie: Yes. I guess it wasn’t anything good. [laughs]

Jonathan: And they felt bad about robbing someone blind.

Bonnie: Maybe. [laughs]

Jonathan: [laughs]

And we had a lovely 21st party, didn’t we?

Bonnie: Yeah, we did. Nicola had her 21st birthday party last Saturday, so that was nice.

Jonathan: That’s our youngest, so no more, yeah no more 21st birthdays.

Bonnie: No more 20 no more. They’re all grown up now.

Jonathan: And I have my big project. And since this is the Bonnie Bulletin, and I’m going to get pinged if I say too much, you can talk about the project if you like.

Bonnie: Yeah. He does a montage for each of the kids, just collecting different audio segments that they’ve done over the years, things they’ve said. He did one for Nicola. It was very good. Took about 12 hours to complete, for 4 minutes of audio.

Jonathan: 5 minutes and 38 seconds. I’m so pleased that I’ve got so much audio of all my kids that I can do that. I mean I was able to do it starting from one of her first baby cries, from when she was about a day old, all the way to the present and move through her life. And you can hear how her voice changes, and her vocabulary changes, and then during those sullen changes teenage years, our attitudes change. [laughs]

Bonnie: It’s very good, and you put the Taylor Swift Never Grow p song.

Jonathan: Yeah, by Taylor Swift.

Bonnie: Because Nicola’s a big Taylor Swift fan, so it was very nice.

So he did a speech, and her mother did a speech, and her stepfather did a speech, and one of her brothers did a speech.

Jonathan: There’s a lot of speech.

Bonnie: And her boyfriend did a speech, and yeah.

Jonathan: He’s a good lad her boyfriend, isn’t he? He speaks very well.

Bonnie: He did, he did. But she didn’t open it up to the floor.

Jonathan: No, no, probably just as well.

Bonnie: [laughs]

Jonathan: So it was a lovely night, and it’s always a big milestone when they turn 21.

Bonnie: Yeah.

Jonathan: And of course, I’ve been a little bit, I mean, you know the day job is very very busy as always, and then I’ve had these Zoom Essential recorders arrive. What do you think of the Zoom Essential recorders?

Bonnie: They’re cute. They’re very little.

Jonathan: This one’s bigger. Have you felt the one that’s just on the side here?

Bonnie: Uh-huh.

Jonathan: This is the H6 Essential. So we’re just sitting here on the couch. See, this is a bigger one.

Bonnie: Yep. Okay. It looks like a toy car.

Jonathan: [laughs]

Bonnie: I don’t know why But actually, its shape if you put wheels on it, it’d look like a model car.

Jonathan: That’s a really good point. Maybe we should put some wheels on it.

Bonnie: [laughs]

Jonathan: Well, thank you very much for another great Bonnie bulletin on the couch.

Bonnie: Thank you.

Jonathan: and I hope that people Consider that there was a sufficient quota of bonnie in the bulletin this week So I won’t be castigated again. We can live in hope.

Not bad that zoom h6 essential, is it? Not bad at all. And of course, as I say, more on those recorders soon here on Living Blindfully.

[music]

Advertisement: Transcripts of Living Blindfully are brought to you by Pneuma Solutions, a global leader in accessible cloud technologies. On the web at PneumaSolutions.com. That’s P-N-E-U-M-A solutions dot com.

Closing and Contact Info

right. Now, it’s time that I went. Thank you very much for listening to this episode.

Remember that when you’re out there with your guide dog, You’ve harnessed success. And with your cane, you’re able.

[music]

Voiceover: If you’ve enjoyed this episode of Living Blindfully, please tell your friends and give us a 5 star review. That helps a lot.

If you’d like to submit a comment for possible inclusion in future episodes, be in touch via email,. Write it down, or send an audio attachment: opinion@LivingBlindfully.com. Or phone us. The number in the United States is 864-60-Mosen. That’s 864-606-6736.

[music]