<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Untitled Publication]]></title><description><![CDATA[Untitled Publication]]></description><link>https://blog.pushkaryadav.in</link><generator>RSS for Node</generator><lastBuildDate>Fri, 17 Apr 2026 12:06:15 GMT</lastBuildDate><atom:link href="https://blog.pushkaryadav.in/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Cost-Effective Video Search with Frame-Based Multimodal Embeddings]]></title><description><![CDATA[TL;DR: Traditional video models are costly and often impractical for large-scale search. By splitting videos into ~800 frames per hour, embedding both visuals and transcribed audio into a vector database, we can build a precise and low-cost system to...]]></description><link>https://blog.pushkaryadav.in/cost-effective-video-search-with-frame-based-multimodal-embeddings</link><guid isPermaLink="true">https://blog.pushkaryadav.in/cost-effective-video-search-with-frame-based-multimodal-embeddings</guid><category><![CDATA[video search embeddings]]></category><category><![CDATA[video indexing]]></category><category><![CDATA[Vector Databases]]></category><category><![CDATA[FFmpeg]]></category><category><![CDATA[#multimodalai]]></category><dc:creator><![CDATA[Pushkar Yadav]]></dc:creator><pubDate>Thu, 14 Aug 2025 09:08:56 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755162382142/2d3f65c1-ba27-4967-8be1-807dcd600602.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>TL;DR:</strong> Traditional video models are costly and often impractical for large-scale search. By splitting videos into ~800 frames per hour, embedding both visuals and transcribed audio into a vector database, we can build a precise and low-cost system to query exact video moments. This approach makes multimodal search affordable without sacrificing accuracy.</p>
<ol>
<li><h3 id="heading-why-video-search-is-expensive-today">Why Video Search is Expensive Today</h3>
<p> Most AI models don't process videos directly. To search within a video, you typically need to process it frame by frame, create embeddings for each frame, and store these embeddings in a database. For a 1-hour video, this can cost nearly a dollar with models like Gemini-2.0-flash. Costs rise quickly when scaling to hundreds of hours of content, and using more advanced models increases the price even more. This makes precise, multimodal search (visual + audio) expensive and often impractical for everyday use.</p>
<p> By dividing videos into frames and embedding both visuals and transcribed audio into a vector database, we can create a precise and low-cost system to find exact video moments, making multimodal search affordable without losing accuracy</p>
<p> Here I will show you a approach which would reduce prices and increase accuracy 🌸</p>
</li>
<li><h3 id="heading-splitting-videos-into-frames">Splitting Videos Into Frames</h3>
<p> As of today, there are models like Gemini that can directly process videos. However, I won't be using them because I want to create something that remembers videos by frames and costs less than traditional video models.</p>
<p> <a target="_blank" href="https://ffmpeg.org/"><strong>FFMPEG</strong></a> can help you split a video into any number of frames.</p>
<p> I wrote a script that splits a video into <code>800</code> frames per hour. Typically, videos are 60 FPS, and this script uses FFMPEG to convert the video into .jpeg files at 800 FPH.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1754893550601/f08a4643-08e3-44c2-998c-fd4239cc09ca.png" alt class="image--center mx-auto" /></p>
</li>
<li><h3 id="heading-from-frames-to-embeddings">From Frames to Embeddings</h3>
<p> This part gets interesting because there are models that can help with video-based embeddings. Here, I used <a target="_blank" href="https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings">multimodal embedding</a> from Vertex AI, and there's a good reason for that. This model supports both text and images, making our text-to-image searches feel precise.</p>
<p> Looping over <code>folder:extracted_frames</code> and store them in a vector database using <a target="_blank" href="https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings">google/multimodalembedding</a>. Convert them into <code>1408-dimension vectors</code> and save them to a vector database with the configuration (cosine, 1048).</p>
<p> With a bit more looping, we're ready to use a script that processes all videos in the <code>video-to-train</code> folder. It creates their <code>extracted_frames</code> and saves their vectors to Upstash. The <em>id</em> of each vector is structured to point to any video at a specific timestamp.</p>
<p> <strong>Example:</strong> <code>videos-to-train/12115024_3840_2160_30fps.mp4-11/16</code> points to a video stored in the <em>videos-to-train</em> folder with the name <em>12115024_3840_2160_30fps.mp4</em>. The current frame timestamp is <em>11/16</em> (68% complete).</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1754896580657/122a1ba2-5b85-46f1-b997-d659df975a8a.gif" alt class="image--center mx-auto" /></p>
</li>
<li><h3 id="heading-querying-the-vector-database">Querying the Vector Database</h3>
<p> For querying text-to-embeddings, the same model can be used. It will return vectors of the same dimensions, which can be matched with the vectors stored in Upstash.</p>
<p> Here’s a demo of a few queries</p>
<p> Prompt: Shot of a river from a cliff where clouds seem coming towards me</p>
 <iframe width="100%" height="450" src="https://www.youtube.com/embed/CbMrTGrnEEk"></iframe>

<p> Prompt: squirrel jumping off a toast in the forest</p>
 <iframe width="100%" height="450" src="https://www.youtube.com/embed/YwUeRVXV5T8"></iframe>

<p> Watch how it accurately identified the moment when the squirrel was about to jump off the toast.</p>
</li>
<li><h3 id="heading-whats-next-adding-audio-context">What’s Next: Adding Audio Context</h3>
<p> With this approach, a single moment from thousands of videos can directly point to that specific video, and it doesn't stop there.</p>
<p> Sound can also be extracted using ffmpeg at the same rate as the frames. This audio can be converted to text and embedded with the frame data, then saved to the same vector. This way, not only the visuals but also a small piece of dialogue from the video can be precisely identified.</p>
</li>
</ol>
<p>This method shows that video search doesn’t have to be expensive. By combining frame extraction, embeddings, and audio transcripts, you can build a multimodal system that pinpoints exact moments in hours of footage at a fraction of the cost of traditional video models.</p>
]]></content:encoded></item><item><title><![CDATA[Realtime GitHub Readme Tweets]]></title><description><![CDATA[Hey now you can integrate your tweets into github readme in realtime. I have created a api which will fetch your tweets and give you a response in picture format. A tweet looks like this:

Let's see how to integrate this into your readme.

visit twee...]]></description><link>https://blog.pushkaryadav.in/realtime-github-readme-tweets</link><guid isPermaLink="true">https://blog.pushkaryadav.in/realtime-github-readme-tweets</guid><category><![CDATA[README]]></category><category><![CDATA[GitHub]]></category><category><![CDATA[tools]]></category><category><![CDATA[tweeco]]></category><dc:creator><![CDATA[Pushkar Yadav]]></dc:creator><pubDate>Thu, 16 Mar 2023 02:26:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750901033063/92f3b570-18c9-450b-b391-b3f06f4b2fd0.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey now you can integrate your tweets into github readme in realtime. I have created a api which will fetch your tweets and give you a response in picture format. A tweet looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678932672262/4e5b045b-aac4-46f8-a7e0-89542fe343ce.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-lets-see-how-to-integrate-this-into-your-readme">Let's see how to integrate this into your readme.</h2>
<ol>
<li><h3 id="heading-visit-tweecopushkaryadavinhttpstweecopushkaryadavin">visit <a target="_blank" href="https://tweeco.pushkaryadav.in/">tweeco.pushkaryadav.in</a></h3>
</li>
<li><h3 id="heading-enter-your-twitter-username"><strong>Enter your twitter username</strong></h3>
<p> here you have 2 choices. You can either enter your Twitter username or you can enter a specific tweet URL.</p>
<ul>
<li><p>Tweet username will always return the latest tweet in svg rendered form</p>
</li>
<li><p>Tweet URL will return a specific tweet in svg rendered form</p>
</li>
</ul>
</li>
</ol>
<ol start="3">
<li><h3 id="heading-copy-the-markdown-code-and-paste-it-in-your-readme-file"><strong>Copy the markdown code and paste it in your readme file</strong></h3>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678932914168/ec7d676c-f8bf-4bef-96b8-e289e88a6d4e.png" alt class="image--center mx-auto" /></p>
</li>
<li><h3 id="heading-costumization"><strong>Costumization</strong></h3>
</li>
</ol>
<p>You can customize tweets with URL parameters. Here are some:</p>
<p>Modify the URL with these parameters and you will get a customized tweet.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Property Code</td><td>Work</td></tr>
</thead>
<tbody>
<tr>
<td>?text=fff</td><td>text color</td></tr>
<tr>
<td>?width=700</td><td>width of image rendered</td></tr>
<tr>
<td>?border=000</td><td>border color</td></tr>
<tr>
<td>?bg=333</td><td>background color</td></tr>
<tr>
<td>?title=F5D76E</td><td>title color</td></tr>
<tr>
<td>?icon=F5D76E</td><td>twitter icon color</td></tr>
</tbody>
</table>
</div><pre><code class="lang-markdown">[<span class="hljs-string">![</span>](<span class="hljs-link">https://tweeco.pushkaryadav.in/api/handle/pushkaryadavin?text=fff&amp;border=000&amp;width=700&amp;bg=333&amp;title=F5D76E&amp;icon=F5D76E</span>)](<span class="hljs-link">https://tweeco.pushkaryadav.in</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678933167082/4082987c-9db0-4488-bba0-2a99191be88e.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Some Tips:</p>
<ul>
<li><p>Do not use <code>#</code> in color code. Use <code>F5D76E</code> instead of <code>#F5D76E</code></p>
</li>
<li><p>For colors you can refer to <a target="_blank" href="https://colpic.pushkaryadav.in/">COLPIC</a> or any color code website.</p>
</li>
<li><p><code>PRO TIP 😎</code>: You can also use this api in your website. Just use the api url in your img tag.</p>
</li>
</ul>
</li>
</ul>
<ol>
<li><p>Commit and Done.</p>
<p> This is how my GitHub looks after adding this tweet integration:</p>
<p> <a target="_blank" href="http://github.com/pushkarydv">github.com/pushkarydv</a></p>
</li>
</ol>
<h2 id="heading-need-help"><strong>Need Help?</strong></h2>
<p>See our GitHub repo for more information and time to time upgrades.</p>
<ul>
<li><p><a target="_blank" href="https://github.com/pushkarydv/readme-tweets">GitHub Repository</a></p>
</li>
<li><p><a target="_blank" href="https://tweeco.pushkaryadav.in/">Tweeco website</a></p>
</li>
<li><p><a target="_blank" href="https://twitter.com/pushkaryadavin">Twitter</a></p>
</li>
</ul>
]]></content:encoded></item></channel></rss>