undefined

points

[-]

You can fine tune a small LLM with a few thousand examples in just a few hours for a few dollars. It can be a bit tricky to host, but if you share a rough idea of the volume and whether this needs to be real-time or batched, I could list some of the tradeoffs you'd think about.

Source: Consulted for a few companies to help them finetune a bunch of LLMs. Typical categorical / data extraction use cases would have ~10x fewer errors at 100x lower inference cost than using the OpenAI models at the time.

by faxmeyourcode9 hours ago|

prev|

[-]

Labeling or categorization tasks like this are the bread and butter of small fine tuned models. Especially if you need outputs in a specific json format or whatever.

I did an experiment where I did very simple SFT on Mistral 7b and it was extremely good at converting receipt images into structured json outputs and I only used 1,000 examples. The difficulty is trying to get a diverse enough set of examples, evaling, etc.

If you have great data with simple input output pairs, you should really give it a shot.

by airstrike10 hours ago|

prev|

[-]

if you add 2 spaces at the start of the line, you turn it into a code block

  like this

by andai9 hours ago|

parent|

[-]

  | Model | DocType% | Year% | Subject% | In $/MTok |

  |----------------|----|-----|----|-------|

  | llama-70b -----| 83 |  98 | 96 | $0.72 |

  | gpt-oss-20b ---| 83 |  97 | 92 | $0.07 |

  | ministral-14b -| 84 | 100 | 90 | $0.20 |

  | gemma-4b ------| 75 |  93 | 91 | $0.04 |

  | glm-flash-30b -| 83 |  93 | 90 | $0.07 |

  | llama-1b ------| 47 |  90 | 58 | $0.10 |

by 10 hours ago|

prev|

[-]

deleted

by 10 hours ago|

prev|

[-]

deleted